udapi.block.corefud.removenocorefentities module

class udapi.block.corefud.removenocorefentities.RemoveNoCorefEntities(zones='all', if_empty_tree='process', **kwargs)[source]

Bases: Block

Some corpora (e.g., AnCora) include annotation of named entities that are not annotated for coreference. To distinguish them, their cluster ID starts with ‘NOCOREF’ (optionally followed by entity type, so that one cluster still has just one type). We may want to remove such entities from datasets that are used to train coreference resolves, to prevent the resolvers from thinking that all members of a NOCOREF cluster are coreferential. That is what this block does.

process_document(doc)[source]

Process a UD document