udapi.block.util.normalize module

util.Normalize normalizes the ordering of various attributes in CoNLL-U.

class udapi.block.util.normalize.Normalize(feats=True, misc=True, sent_id=False, empty_node_ord=False, start_sent_id=1, sent_id_prefix='', **kwargs)[source]

Bases: Block

Normalize the ordering of attributes in the FEATS and MISC columns.

The attribute-value pairs in the FEATS column in CoNLL-U files must be sorted alphabetically (case-insensitive) according to the guidelines (https://universaldependencies.org/format.html#morphological-annotation). The same is highly recommended for the MISC column. It is useful e.g. for comparing two conllu files with diff.

Udapi does the sorting automatically, but for speed reasons only when writing into these attributes. This block thus just forces deserialization of node.feats and node.misc, so that the Udapi later sorts the attributes during serialization. It is a bit more efficient than something like util.Eval node=’node.feats[“Number”] = node.feats[“Number”]’ or util.Eval node=’node.misc[“NonExistentAttribute”] = None’

process_bundle(bundle)[source]

Process a UD bundle

process_node(node)[source]

Process a UD node

process_tree(tree)[source]

Process a UD tree