udapi.block.ud.convert1to2 module

Block Convert1to2 for converting UD v1 to UD v2.

See http://universaldependencies.org/v2/summary.html for the description of all UD v2 changes. IMPORTANT: this code does only SOME of the changes and the output should be checked.

Note that this block is not idempotent, i.e. you should not apply it twice on the same data. It should be idempotent when skipping the coordination transformations (skip=coord).

Author: Martin Popel, based on https://github.com/UniversalDependencies/tools/tree/master/v2-conversion by Sebastian Schuster.

class udapi.block.ud.convert1to2.Convert1to2(skip='', save_stats=True, **kwargs)[source]

Bases: Block

Block for converting UD v1 to UD v2.

HEAD_PROMOTION = {'advcl': 1, 'advmod': 5, 'ccomp': 2, 'csubj': 4, 'iobj': 7, 'nsubj': 9, 'obj': 8, 'obl': 6, 'xcomp': 3}
after_process_document(document)[source]

Print overall statistics of ToDo counts.

change_deprel_simple(node)[source]

mwe→fixed, dobj→obj, pass→:pass, name→flat, foreign→flat+Foreign=Yes.

change_feats(node)[source]

Negative→Polarity, Aspect=Pro→Prosp, VerbForm=Trans→Conv, Definite=Red→Cons,…

Also Foreign=Foreign→Yes and log if Tense=NarTense=Nar or NumType=GenNumType=Gen is used.

static change_headfinal(node, deprel)[source]

deprel=goeswith|flat|fixed|appos must be a head-initial flat structure.

change_neg(node)[source]

neg→advmod/det/ToDo + Polarity=Neg.

In addition, if there is a node with deprel=neg and upos=INTJ, it is checked whether it is possibly a real interjection or a negation particle, which should have upos=PART (as documented in http://universaldependencies.org/u/pos/PART.html) This kind of error (INTJ instead of PART for “не”) is common e.g. in Bulgarian v1.4, but I hope the rule is language independent (enough to be included here).

change_nmod(node)[source]

nmod→obl if parent is not nominal, but predicate.

static change_upos(node)[source]

CONJ→CCONJ.

static change_upos_copula(node)[source]

deprel=cop needs upos=AUX (or PRON).

fix_remnants_in_tree(root)[source]

Change ellipsis with remnant deprels to UDv2 ellipsis with orphans.

Remnant’s parent is always the correlate (same-role) node. Usually, correlate’s parent is the head of the whole ellipsis subtree, i.e. the first conjunct. However, sometimes remnants are deeper, e.g. ‘Over 300 Iraqis are reported dead and 500 wounded.’ with edges:

nsubjpass(reported, Iraqis)
nummod(Iraqis, 300)
remnant(300, 500)

Let’s expect all remnants in one tree are part of the same ellipsis structure.

TODO: theoretically, there may be more ellipsis structures with remnants in one tree, but I have no idea how to distinguish them from the deeper-remnants cases.

fix_text(root)[source]

Make sure root.text is filled and matching the forms+SpaceAfter=No.

static is_nominal(node)[source]

Returns ‘no’ (for predicates), ‘yes’ (sure nominals) or ‘maybe’.

Used in change_nmod.

static is_verbal(node)[source]

Returns True for verbs and nodes with copula child.

Used in change_neg.

log(node, short_msg, long_msg)[source]

Log node.address() + long_msg and add ToDo=short_msg to node.misc.

process_tree(tree)[source]

Apply all the changes on the current tree.

This method is automatically called on each tree by Udapi. After doing tree-scope changes (remnants), it calls process_node on each node. By overriding this method in subclasses you can reuse just some of the implemented changes.

reattach_coordinations(node)[source]

cc and punct in coordinations should depend on the immediately following conjunct.