udapi.block.corefud.link2cluster module¶
- class udapi.block.corefud.link2cluster.Link2Cluster(id_attr='proiel-id', ante_attr='antecedent-proiel-id', delete_orig_attrs=True, infstat_attr='information-status', coreftype_attr='coreftype', bridge_attr=None, bridge_relation_attr=None, eid_counter=1, **kwargs)[source]¶
Bases:
BlockBlock corefud.Link2Cluster converts link-based coreference annotation to the (cluster-based) CorefUD format.
Params: id_attr: name of the attribute in MISC that stores the original-format IDs of nodes ante_attr: name of the attribute in MISC that stores the ID of the antecedent
of the current node (in the same format as id_attr).
- delete_orig_attrs: Should we delete the MISC attributes that were used for the conversion?
(i.e. id_attr and ante_attr, plus possibly also infstat_attr, coreftype_attr, bridge_attr, bridge_relation_attr if these are used). Default=True.
- infstat_attr: name of the attribute in MISC that stores the information status of a given mention
Will be stored in mention.other[‘infstat’]. Use None for ignoring this.
- coreftype_attr: name of the attribute in MISC that stores the coreference type of a given mention
Will be stored in mention.other[‘coreftype’]. Use None for ignoring this.
- bridge_attr: name of the attribute in MISC that stores the ID of the bridging antecedent
of the current node/mention (in the same format as id_attr). Default=None, i.e. ignore this parameter.
- bridge_relation_attr: name of the attribute in MISC that stores the bridging relation type
(e.g. “part” or “subset”). Default=None, i.e. ignore this parameter.
- eid_counter: use a global counter of entity.eid and start with a given number. Default=1.
The main goal of this parameter is to make eid unique across multiple documents. If you use eid_counter=0, this feature will be turned off, so entities will be created using root.document.create_coref_entity(), with no eid parameter, so that the eid will start from “e1” in each document processed by this block.