udapi.block.ud.el.addmwt module

Block ud.el.AddMwt for heuristic detection of multi-word (σε+DET) tokens.

Notice that this should be used only for converting existing conllu files. Ideally a tokenizer should have already split the MWTs. Also notice that this block does not deal with the relatively rare PRON(Person=2)+'*+PRON(Person=3, i.e. "σ'το" and "στο") MWTs.

class udapi.block.ud.el.addmwt.AddMwt(zones='all', if_empty_tree='process', **kwargs)[source]

Bases: AddMwt

Detect and mark MWTs (split them into words and add the words to the tree).

multiword_analysis(node)[source]

Return a dict with MWT info or None if node does not represent a multiword token.