udapi.block.ud.ca.addmwt module
Block ud.ca.AddMwt for heuristic detection of Catalan contractions.
According to the UD guidelines, contractions such as “del” = “de el”
should be annotated using multi-word tokens.
Note that this block should be used only for converting legacy conllu files.
Ideally a tokenizer should have already split the MWTs.
-
class udapi.block.ud.ca.addmwt.AddMwt(verbpron=False, **kwargs)[source]
Bases: AddMwt
Detect and mark MWTs (split them into words and add the words to the tree).
-
fix_personal_pronoun(node)[source]
-
multiword_analysis(node)[source]
Return a dict with MWT info or None if node does not represent a multiword token.
-
report_suspicious_lemmas(node)[source]