udapi.block.ud.es.fixexclamation module

Block to fix tokenization of exclamation marks in UD Spanish-AnCora.

class udapi.block.ud.es.fixexclamation.FixExclamation(zones='all', if_empty_tree='process', **kwargs)[source]

Bases: Block

process_node(node)[source]

In Spanish AnCora, there are things like ‘¡Hola!’ as one token. The punctuation should be separated. One may question whether this should include names of companies (Yahoo!) or products (la revista Hello!) but it should, as company and product names often have multiple tokens (even multiple full words, not just punctuation) and these are also separated in UD.