udapi.block.corefud.markpairs module

class udapi.block.corefud.markpairs.MarkPairs(same_entity=0, both_continuous=0, both_discontinuous=0, nested=0, crossing=0, interleaved=0, same_head=0, same_span=0, same_subspan=0, print_form=False, print_total=True, log=True, mark=True, **kwargs)[source]

Bases: Block

Find pairs of coreference mentions within the same sentence with given properties. Mark these pairs of mentions (using misc[“Mark”]), so they can be further processed or printed.

Usage: # Find pairs of mentions of the same entity within the same sentence: cat my.conllu | udapy -TM corefud.MarkPairs same_entity=1 | less -R

Properties: same_entity - both mentions belong to the same entity (cluster) both_continuous - both mentions have continuous spans both_discontinuous - both mentions have discontinuous spans nested - span of one mention is nested (a subset of) in the span of the other mention crossing - spans are crossing (i.e. intersecting, but neither is subset of the other) interleaved - spans are interleaved (i.e. not intersecting, but neither span precedes the other) same_head - the same node is a head of both mentions same_span - both mentions have the same span (which is invalid according to UD’s validate.py) same_subspan - at least one of the mentions is discontinuous and one of its subspans

is also a subspan (or span) of the other mention

You can combine any number of properties. Each property can have one of the three values: include - this is the default value: include pairs with this property, i.e. ignore the property exclude - exclude (from the marking) pairs of mentions with this property only - pairs of mentions without this property will be excluded

As a shortcut, you can use -1 and 1 instead of exclude and only, so e.g.

nested=only same_head=exclude

can be written as

nested=1 same_head=-1

after_process_document(doc)[source]

This method is called after each process_document.

process_tree(tree)[source]

Process a UD tree