udapi.block.ud.ro package¶

Submodules¶

udapi.block.ud.ro.fixneg module¶

Block ud.ro.FixNeg ad-hoc fixes

Author: Martin Popel

class udapi.block.ud.ro.fixneg.FixNeg(zones='all')[source]¶

Bases: udapi.core.block.Block

Block for fixing the remaining cases (after ud.Convert1to2) of deprel=neg in UD_Romanian.

process_node(node)[source]¶: Process a UD node

udapi.block.ud.ro.setspaceafter module¶

Block ud.ro.SetSpaceAfter for heuristic setting of SpaceAfter=No in Romanian.

Usage:

udapy -s ud.ro.SetSpaceAfter < in.conllu > fixed.conllu

Author: Martin Popel

class udapi.block.ud.ro.setspaceafter.SetSpaceAfter(not_after='¡¿([{„', not_before='., ;:!?}])', fix_text=True, **kwargs)[source]¶

Bases: udapi.block.ud.setspaceafter.SetSpaceAfter

Block for heuristic setting of the SpaceAfter=No MISC attribute in Romanian.

Romanian uses many contractions, e.g.

raw	meaning	tokenized	lemmatized
n-ar	nu ar	n- ar	nu avea
să-i	să îi	să -i	să el
într-o	în o	într- o	întru un
nu-i	nu îi	nu -i	nu el
nu-i	nu e	nu -i	nu fi

Detokenization is quite simple: no space after word-final hyphen and before word-initial hyphen. There are just two exceptions, I have found: * “-” the hyphen itself (most probably it means a dash separating phrases/clauses) * negative numbers, e.g. “-3,1”

process_tree(root)[source]¶: Process a UD tree

udapi.block.ud.ro package¶

Submodules¶

udapi.block.ud.ro.fixneg module¶

udapi.block.ud.ro.setspaceafter module¶

Module contents¶