udapi.block.ud.fixpunct module

Block ud.FixPunct for making sure punctuation is attached projectively.

Punctuation in Universal Dependencies has the tag PUNCT, dependency relation punct, and is always attached projectively, usually to the head of a neighboring subtree to its left or right (see https://universaldependencies.org/u/dep/punct.html). Punctuation normally does not have children. If it does, we will fix it first.

This block tries to re-attach punctuation projectively and according to the guidelines. It should help in cases where punctuation is attached randomly, always to the root or always to the neighboring word. However, there are limits to what it can do; for example it cannot always recognize whether a comma is introduced to separate the block to its left or to its right. Hence if the punctuation before running this block is almost good, the block may actually do more harm than good.

Since the punctuation should not have children, we should not create a non-projectivity if we check the root edges going to the right.

class udapi.block.ud.fixpunct.FixPunct(check_paired_punct_upos=False, copy_to_enhanced=False, **kwargs)[source]

Bases: Block

Make sure punctuation nodes are attached projectively.

process_tree(root)[source]

Process a UD tree