udapi.block.ud.el package¶
Submodules¶
udapi.block.ud.el.addmwt module¶
Block ud.el.AddMwt for heuristic detection of multi-word (σε+DET) tokens.
Notice that this should be used only for converting existing conllu files.
Ideally a tokenizer should have already split the MWTs.
Also notice that this block does not deal with the relatively rare
PRON(Person=2)+'*+PRON(Person=3, i.e. "σ'το" and "στο")
MWTs.
-
class
udapi.block.ud.el.addmwt.
AddMwt
(zones='all')[source]¶ Bases:
udapi.block.ud.addmwt.AddMwt
Detect and mark MWTs (split them into words and add the words to the tree).