udapi.block.util package¶

Submodules¶

udapi.block.util.eval module¶

Eval is a special block for evaluating code given by parameters.

class udapi.block.util.eval.Eval(doc=None, bundle=None, tree=None, node=None, start=None, end=None, before_doc=None, after_doc=None, before_bundle=None, after_bundle=None, expand_code=True, **kwargs)[source]¶

Bases: udapi.core.block.Block

Special block for evaluating code given by parameters.

Tricks: pp is a shortcut for pprint.pprint. $. is a shortcut for this. which is a shortcut for node., tree. etc. depending on context. count_X is a shortcut for self.count[X] where X is any string (S+) and self.count is a collections.Counter() instance. Thus you can use code like

util.Eval node=’count_$.upos +=1; count_”TOTAL” +=1’ end=”pp(self.count)”

after_process_document(document)[source]¶: This method is called after each process_document.

before_process_document(document)[source]¶: This method is called before each process_document.

expand_eval_code(to_eval)[source]¶: Expand ‘$.’ to ‘this.’, useful for oneliners.

process_bundle(bundle)[source]¶: Process a UD bundle

process_document(document)[source]¶: Process a UD document

process_end()[source]¶: A hook method that is executed after processing all UD data

process_start()[source]¶: A hook method that is executed before processing UD data

process_tree(tree)[source]¶: Process a UD tree

udapi.block.util.filter module¶

Filter is a special block for keeping/deleting subtrees specified by parameters.

class udapi.block.util.filter.Filter(delete_tree=None, delete_tree_if_node=None, delete_subtree=None, keep_tree=None, keep_tree_if_node=None, keep_subtree=None, mark=None, **kwargs)[source]¶

Bases: udapi.core.block.Block

Special block for keeping/deleting subtrees specified by parameters.

Example usage from command line: # extract subtrees governed by nouns (noun phrases) udapy -s util.Filter keep_subtree=’node.upos == “NOUN”’ < in.conllu > filtered.conllu

# keep only trees which contain ToDo|Bug nodes udapy -s util.Filter keep_tree_if_node=’re.match(“ToDo|Bug”, str(node.misc))’ < in > filtered

# keep only non-projective trees, annotate non-projective edges with Mark=nonproj and show. udapy -T util.Filter keep_tree_if_node=’node.is_nonprojective()’ mark=nonproj < in | less -R

# delete trees which contain deprel=remnant udapy -s util.Filter delete_tree_if_node=’node.deprel == “remnant”’ < in > filtered

# delete subtrees headed by a node with deprel=remnant udapy -s util.Filter delete_subtree=’node.deprel == “remnant”’ < in > filtered

process_tree(tree)[source]¶: Process a UD tree

udapi.block.util.findbug module¶

Block util.FindBug for debugging.

Usage: If block xy.Z fails with a Python exception, insert “util.FindBug block=” into the scenario, e.g. to debug second.Block, use

udapy first.Block util.FindBug block=second.Block > bug.conllu

This will create the file bug.conllu with the bundle, which caused the bug.

class udapi.block.util.findbug.FindBug(block, first_error_only=True, **kwargs)[source]¶

Bases: udapi.core.basewriter.BaseWriter

Debug another block by finding a minimal testcase conllu file.

process_document(document)[source]¶: Process a UD document

udapi.block.util.mark module¶

util.Mark is a special block for marking nodes specified by parameters.

class udapi.block.util.mark.Mark(node, mark=1, add=True, **kwargs)[source]¶

Bases: udapi.core.block.Block

Mark nodes specified by parameters.

Example usage from command line:: # see non-projective trees with non-projective edges highlighted udapy -TM util.Mark node=’node.is_nonprojective()’ < in | less -R

process_node(node)[source]¶: Process a UD node

udapi.block.util.markdiff module¶

util.MarkDiff is a special block for marking differences between parallel trees.

class udapi.block.util.markdiff.MarkDiff(gold_zone, attributes='form, lemma, upos, xpos, deprel, feats, misc', mark=1, add=False, **kwargs)[source]¶

Bases: udapi.core.block.Block

Mark differences between parallel trees.

process_tree(tree)[source]¶: Process a UD tree

udapi.block.util.resegmentgold module¶

util.ResegmentGold is a block for sentence alignment and re-segmentation of two zones.

class udapi.block.util.resegmentgold.ResegmentGold(gold_zone='gold', **kwargs)[source]¶

Bases: udapi.core.block.Block

Sentence-align two zones (gold and pred) and resegment the pred zone.

The two zones must contain the same sequence of characters.

static choose_root(p_tree, g_tree)[source]¶: Prevent multiple roots, which are forbidden in the evaluation script.

extract_pred_trees(document)[source]¶: Delete all trees with zone!=gold_zone from the document and return them.

process_document(document)[source]¶: Process a UD document

udapi.block.util.see module¶

Block util.See prints statistics about the nodes matching a given condition.

Example usage from the command line:

udapy util.See node=’node.is_nonprojective()’ n=3 stats=dir,children,c_upos,p_lemma,deprel,feats_split < in.conllu

Example output:

node.is_nonprojective() matches 245 out of 35766 nodes (0.7%) in 174 out of 1478 trees (11.8%) === dir (2 values) ===

right 193 78% delta=+37%

left 52 21% delta=-33%

=== children (9 values) ===

0 64 26% delta=-38% 2 58 23% delta=+14% 3 38 15% delta= +7%

=== c_upos (15 values) ===

NOUN 118 23% delta= +4%

DET 61 12% delta= -3%

PROPN 47 9% delta= +1%

=== p_lemma (187 values) ===

il 5 2% delta= +1%

fonction 4 1% delta= +1%: écrire 4 1% delta= +1%

=== deprel (22 values) ===

appos 41 16% delta=+15%: conj 41 16% delta=+13%

punct 36 14% delta= +4%

=== feats_split (20 values) ===

Number=Sing 114 21% delta= +2% Gender=Masc 81 15% delta= +3%

_ 76 14% delta= -6%

In addition to absolute counts for each value, the percentage within matching nodes is printed and a delta relative to percentage within all nodes. This helps to highlight what is special about the matching nodes.

class udapi.block.util.see.See(node, n=5, stats='dir, edge, depth, children, siblings, p_upos, p_lemma, c_upos, form, lemma, upos, deprel, feats_split', **kwargs)[source]¶

Bases: udapi.core.block.Block

Print statistics about the nodes specified by the parameter node.

process_end()[source]¶: A hook method that is executed after processing all UD data

process_node(node)[source]¶: Process a UD node

process_tree(root)[source]¶: Process a UD tree

udapi.block.util.split module¶

util.Split is a special block for splitting documents.

class udapi.block.util.split.Split(parts=None, bundles_per_doc=None, **kwargs)[source]¶

Bases: udapi.core.basereader.BaseReader

Split Udapi document (with sentence-aligned trees in bundles) into several parts.

static is_multizone_reader()[source]¶

Can this reader read bundles which contain more zones?.

This implementation returns always True. If a subclass supports just one zone in file (e.g. read.Sentences), this method should be overriden to return False, so process_document can take advatage of this knowledge and optimize the reading (no buffer needed even if bundles_per_doc specified).

process_document(document)[source]¶: Process a UD document

udapi.block.util.wc module¶

Wc is a special block for printing statistics (word count etc).

class udapi.block.util.wc.Wc(**kwargs)[source]¶

Bases: udapi.core.block.Block

Special block for printing statistics (word count etc).

process_end()[source]¶: A hook method that is executed after processing all UD data

process_tree(tree)[source]¶: Process a UD tree

udapi.block.util package¶

Submodules¶

udapi.block.util.eval module¶

udapi.block.util.filter module¶

udapi.block.util.findbug module¶

udapi.block.util.mark module¶

udapi.block.util.markdiff module¶

udapi.block.util.resegmentgold module¶

udapi.block.util.see module¶

udapi.block.util.split module¶

udapi.block.util.wc module¶

Module contents¶