udapi.block.ud.gl.addmwt module
Block ud.gl.AddMwt for heuristic detection of Galician contractions.
According to the UD guidelines, contractions such as “do” = “de o”
should be annotated using multi-word tokens.
Note that this block should be used only for converting legacy conllu files.
Ideally a tokenizer should have already split the MWTs.
-
class udapi.block.ud.gl.addmwt.AddMwt(verbpron=False, **kwargs)[source]
Bases: AddMwt
Detect and mark MWTs (split them into words and add the words to the tree).
-
multiword_analysis(node)[source]
Return a dict with MWT info or None if node does not represent a multiword token.
-
postprocess_mwt(mwt)[source]
Optional postprocessing of newly created MWTs.