udapi.block.corefud.link2cluster module

class udapi.block.corefud.link2cluster.Link2Cluster(id_attr='proiel-id', ante_attr='antecedent-proiel-id', delete_orig_attrs=True, infstat_attr='information-status', coreftype_attr='coreftype', bridge_attr=None, bridge_relation_attr=None, eid_counter=1, **kwargs)[source]

Bases: Block

Block corefud.Link2Cluster converts link-based coreference annotation to the (cluster-based) CorefUD format.

Params: id_attr: name of the attribute in MISC that stores the original-format IDs of nodes ante_attr: name of the attribute in MISC that stores the ID of the antecedent

of the current node (in the same format as id_attr).

delete_orig_attrs: Should we delete the MISC attributes that were used for the conversion?

(i.e. id_attr and ante_attr, plus possibly also infstat_attr, coreftype_attr, bridge_attr, bridge_relation_attr if these are used). Default=True.

infstat_attr: name of the attribute in MISC that stores the information status of a given mention

Will be stored in mention.other[‘infstat’]. Use None for ignoring this.

coreftype_attr: name of the attribute in MISC that stores the coreference type of a given mention

Will be stored in mention.other[‘coreftype’]. Use None for ignoring this.

bridge_attr: name of the attribute in MISC that stores the ID of the bridging antecedent

of the current node/mention (in the same format as id_attr). Default=None, i.e. ignore this parameter.

bridge_relation_attr: name of the attribute in MISC that stores the bridging relation type

(e.g. “part” or “subset”). Default=None, i.e. ignore this parameter.

eid_counter: use a global counter of entity.eid and start with a given number. Default=1.

The main goal of this parameter is to make eid unique across multiple documents. If you use eid_counter=0, this feature will be turned off, so entities will be created using root.document.create_coref_entity(), with no eid parameter, so that the eid will start from “e1” in each document processed by this block.

process_document(doc)[source]

Process a UD document