udapi.block.ud.splitunderscoretokens module¶
Block ud.SplitUnderscoreTokens splits tokens with underscores are attaches them using flat.
Usage: udapy -s ud.SplitUnderscoreTokens < in.conllu > fixed.conllu
Author: Martin Popel
- class udapi.block.ud.splitunderscoretokens.SplitUnderscoreTokens(deprel=None, default_deprel='flat', **kwargs)[source]¶
Bases:
Block
Block for spliting tokens with underscores and attaching the new nodes using deprel=flat.
E.g.:: 1 Hillary_Rodham_Clinton Hillary_Rodham_Clinton PROPN xpos 0 dep
is transformed into: 1 Hillary Hillary PROPN xpos 0 dep 2 Rodham Rodham PROPN xpos 1 flat 3 Clinton Clinton PROPN xpos 1 flat
Real-world use cases: UD_Irish (default_deprel=fixed) and UD_Czech-CLTT v1.4.
- deprel_for(node)[source]¶
Return deprel of the newly created nodes: flat, fixed, compound or its subtypes.
See http://universaldependencies.org/u/dep/flat.html http://universaldependencies.org/u/dep/fixed.html http://universaldependencies.org/u/dep/compound.html Note that unlike the first two, deprel=compound does not need to be head-initial.
This method implements a coarse heuristic rules to decide between fixed and flat.