udapi.block.ud.ro package¶
Submodules¶
udapi.block.ud.ro.fixneg module¶
Block ud.ro.FixNeg ad-hoc fixes
Author: Martin Popel
-
class
udapi.block.ud.ro.fixneg.
FixNeg
(zones='all')[source]¶ Bases:
udapi.core.block.Block
Block for fixing the remaining cases (after ud.Convert1to2) of deprel=neg in UD_Romanian.
udapi.block.ud.ro.setspaceafter module¶
Block ud.ro.SetSpaceAfter for heuristic setting of SpaceAfter=No in Romanian.
Usage:
udapy -s ud.ro.SetSpaceAfter < in.conllu > fixed.conllu
Author: Martin Popel
-
class
udapi.block.ud.ro.setspaceafter.
SetSpaceAfter
(not_after='¡¿([{„', not_before='., ;:!?}])', fix_text=True, **kwargs)[source]¶ Bases:
udapi.block.ud.setspaceafter.SetSpaceAfter
Block for heuristic setting of the SpaceAfter=No MISC attribute in Romanian.
Romanian uses many contractions, e.g.
raw meaning tokenized lemmatized n-ar nu ar n- ar nu avea să-i să îi să -i să el într-o în o într- o întru un nu-i nu îi nu -i nu el nu-i nu e nu -i nu fi Detokenization is quite simple: no space after word-final hyphen and before word-initial hyphen. There are just two exceptions, I have found: * “-” the hyphen itself (most probably it means a dash separating phrases/clauses) * negative numbers, e.g. “-3,1”