udapi.block.ud.markbugs module

Block MarkBugs for checking suspicious/wrong constructions in UD v2.

See http://universaldependencies.org/release_checklist.html#syntax and http://universaldependencies.org/svalidation.html IMPORTANT: the svalidation.html overview is not generated by this code, but by SETS-search-interface rules, which may give different results than this code.

Usage: udapy -s ud.MarkBugs < in.conllu > marked.conllu 2> log.txt

Some tests may be customized for individual languages if the language code is available as the zone id. The zone id can be provided in the sentence id after the slash (e.g., “sent_id = s125/en” for English), or as a parameter of the reader:

udapy -s read.Conllu zone=en ud.MarkBugs < in.conllu > marked.conllu 2> log.txt

Errors are both logged to stderr and marked within the nodes’ MISC field, e.g. node.misc[‘Bug’] = ‘aux-chain’, so the output conllu file can be searched for “Bug=” occurences.

Author: Martin Popel based on descriptions at http://universaldependencies.org/svalidation.html

class udapi.block.ud.markbugs.MarkBugs(save_stats=True, tests=None, skip=None, max_cop_lemmas=2, **kwargs)[source]

Bases: Block

Block for checking suspicious/wrong constructions in UD v2.

after_process_document(document)[source]

This method is called after each process_document.

log(node, short_msg, long_msg)[source]

Log node.address() + long_msg and add ToDo=short_msg to node.misc.

process_node(node)[source]

Process a UD node