udapi.block.ud.cs.markfeatsbugs module

Block to identify missing or ill-valued features in Czech. Any bugs that it finds will be saved in the MISC column as a Bug attribute, which can be later used in filters and highlighted in text output.

Usage: cat *.conllu | udapy -HAMX layout=compact ud.cs.MarkFeatsBugs > bugs.html Windows: python udapy read.Conllu files=”a.conllu,b.conllu” merge=1 ud.cs.MarkFeatsBugs write.TextModeTreesHtml files=”bugs.html” marked_only=1 layout=compact attributes=form,lemma,upos,xpos,feats,deprel,misc

class udapi.block.ud.cs.markfeatsbugs.MarkFeatsBugs(zones='all', if_empty_tree='process', **kwargs)[source]

Bases: MarkFeatsBugs

check_adjective_like(node, r0, a0)[source]

Long form of adjectives, pronouns and determiners mostly share declension paradigms and thus the sets of features that are expected. Whether the actual feature sets are the same depends on the tagging convention (PDT vs. ČNK): in PDT, adjectives are fully disambiguated while pronouns are not; in ČNK, both adjectives and pronouns (incl. determiners) are fully disambiguated. This method defines the core inflectional features while any extras (such as PronType for pronouns) have to be provided by the caller in parameters r0 (list) and a0 (dict).

pdt20 = False
process_node(node)[source]

This is a generic block, do nothing here. In a language-specific block based on this one, rules similar to the examples below can be specified:

# NOUNS ################################################################ if node.upos == ‘NOUN’:

self.check_required_features(node, [‘Gender’, ‘Number’, ‘Case’, ‘Polarity’]) if node.feats[‘Gender’] == ‘Masc’:

self.check_required_features(node, [‘Animacy’]) self.check_allowed_features(node, {

‘Gender’: [‘Masc’, ‘Fem’, ‘Neut’], ‘Animacy’: [‘Anim’, ‘Inan’], ‘Number’: [‘Sing’, ‘Dual’, ‘Plur’], ‘Case’: [‘Nom’, ‘Gen’, ‘Dat’, ‘Acc’, ‘Voc’, ‘Loc’, ‘Ins’], ‘Polarity’: [‘Pos’, ‘Neg’], ‘Foreign’: [‘Yes’]})

else:
self.check_allowed_features(node, {

‘Gender’: [‘Masc’, ‘Fem’, ‘Neut’], ‘Number’: [‘Sing’, ‘Dual’, ‘Plur’], ‘Case’: [‘Nom’, ‘Gen’, ‘Dat’, ‘Acc’, ‘Voc’, ‘Loc’, ‘Ins’], ‘Polarity’: [‘Pos’, ‘Neg’], ‘Foreign’: [‘Yes’]})

#… # THE REST: NO FEATURES ################################################ else:

self.check_allowed_features(node, {})