udapi.block.ud.fr.addmwt module¶
Block ud.fr.AddMwt for heuristic detection of French contractions.
According to the UD guidelines, contractions such as “des” = “de les” should be annotated using multi-word tokens.
Note that this block should be used only for converting legacy conllu files. Ideally a tokenizer should have already split the MWTs.
- class udapi.block.ud.fr.addmwt.AddMwt(zones='all', if_empty_tree='process', **kwargs)[source]¶
Bases:
AddMwt
Detect and mark MWTs (split them into words and add the words to the tree).