udapi.block.ud.splitunderscoretokens module

Block ud.SplitUnderscoreTokens splits tokens with underscores are attaches them using flat.

Usage: udapy -s ud.SplitUnderscoreTokens < in.conllu > fixed.conllu

Author: Martin Popel

class udapi.block.ud.splitunderscoretokens.SplitUnderscoreTokens(deprel=None, default_deprel='flat', **kwargs)[source]

Bases: Block

Block for spliting tokens with underscores and attaching the new nodes using deprel=flat.

E.g.:: 1 Hillary_Rodham_Clinton Hillary_Rodham_Clinton PROPN xpos 0 dep

is transformed into: 1 Hillary Hillary PROPN xpos 0 dep 2 Rodham Rodham PROPN xpos 1 flat 3 Clinton Clinton PROPN xpos 1 flat

Real-world use cases: UD_Irish (default_deprel=fixed) and UD_Czech-CLTT v1.4.

deprel_for(node)[source]

Return deprel of the newly created nodes: flat, fixed, compound or its subtypes.

See http://universaldependencies.org/u/dep/flat.html http://universaldependencies.org/u/dep/fixed.html http://universaldependencies.org/u/dep/compound.html Note that unlike the first two, deprel=compound does not need to be head-initial.

This method implements a coarse heuristic rules to decide between fixed and flat.

process_node(node)[source]

Process a UD node