udapi.block.ud.addmwt module¶
Abstract base class ud.AddMwt for heuristic detection of multi-word tokens.
- class udapi.block.ud.addmwt.AddMwt(zones='all', if_empty_tree='process', **kwargs)[source]¶
Bases:
Block
Detect and mark MWTs (split them into words and add the words to the tree).
- multiword_analysis(node)[source]¶
Return a dict with MWT info or None if node does not represent a multiword token.
An example return value is:
{
‘form’: ‘aby bych’, ‘lemma’: ‘aby být’, ‘upos’: ‘SCONJ AUX’, ‘xpos’: ‘J,————- Vc-S—1——-‘, ‘feats’: ‘_ Mood=Cnd|Number=Sing|Person=1|VerbForm=Fin’, # _ means empty FEATS ‘deprel’: ‘* aux’, # * means keep the original deprel ‘main’: 0, # which of the two words will inherit the original children (if any) ‘shape’: ‘siblings’, # the newly created nodes will be siblings or alternatively #’shape’: ‘subtree’, # the main-indexed node will be the head
}