udapi.block.ud.splittoken module¶
Block ud.SplitToken will split a given token into multiple tokens.
- class udapi.block.ud.splittoken.SplitToken(misc_name='SplitToken', **kwargs)[source]¶
Bases:
BlockSplit a token into two or more. A MISC attribute is used to mark the tokens that should be split. (The attribute may have been set by an annotator or by a previous block that tests the specific conditions under which splitting is desired.) Multiword tokens are currently not supported: The node to be split cannot belong to a MWT. Note that the result will not be a MWT either (use the block ud.AddMwt if that is desired). There will be simply a new attribute SpaceAfter=No, possibly accompanied by CorrectSpaceAfter=Yes (indicating that this was an error in the source text).
- process_node(node)[source]¶
The SplitToken (or equivalent) attribute in MISC will trigger action. Either the current node will be split to multiple nodes and the attribute will be removed from MISC, or a warning will be issued that the splitting cannot be done and the attribute will stay in MISC. Note that multiword token lines and empty nodes are not even scanned for the attribute, so if it is there, it will stay there but no warning will be printed.