udapi.block.ud.ro package

Submodules

udapi.block.ud.ro.fixneg module

Block ud.ro.FixNeg ad-hoc fixes

Author: Martin Popel

class udapi.block.ud.ro.fixneg.FixNeg(zones='all')[source]

Bases: udapi.core.block.Block

Block for fixing the remaining cases (after ud.Convert1to2) of deprel=neg in UD_Romanian.

process_node(node)[source]

Process a UD node

udapi.block.ud.ro.setspaceafter module

Block ud.ro.SetSpaceAfter for heuristic setting of SpaceAfter=No in Romanian.

Usage:

udapy -s ud.ro.SetSpaceAfter < in.conllu > fixed.conllu

Author: Martin Popel

class udapi.block.ud.ro.setspaceafter.SetSpaceAfter(not_after='¡¿([{„', not_before='., ;:!?}])', fix_text=True, **kwargs)[source]

Bases: udapi.block.ud.setspaceafter.SetSpaceAfter

Block for heuristic setting of the SpaceAfter=No MISC attribute in Romanian.

Romanian uses many contractions, e.g.

raw meaning tokenized lemmatized
n-ar nu ar n- ar nu avea
să-i să îi să -i să el
într-o în o într- o întru un
nu-i nu îi nu -i nu el
nu-i nu e nu -i nu fi

Detokenization is quite simple: no space after word-final hyphen and before word-initial hyphen. There are just two exceptions, I have found: * “-” the hyphen itself (most probably it means a dash separating phrases/clauses) * negative numbers, e.g. “-3,1”

process_tree(root)[source]

Process a UD tree

Module contents