udapi.block.util package¶
Submodules¶
udapi.block.util.eval module¶
Eval is a special block for evaluating code given by parameters.
-
class
udapi.block.util.eval.
Eval
(doc=None, bundle=None, tree=None, node=None, start=None, end=None, before_doc=None, after_doc=None, before_bundle=None, after_bundle=None, expand_code=True, **kwargs)[source]¶ Bases:
udapi.core.block.Block
Special block for evaluating code given by parameters.
Tricks: pp is a shortcut for pprint.pprint. $. is a shortcut for this. which is a shortcut for node., tree. etc. depending on context. count_X is a shortcut for self.count[X] where X is any string (S+) and self.count is a collections.Counter() instance. Thus you can use code like
util.Eval node=’count_$.upos +=1; count_”TOTAL” +=1’ end=”pp(self.count)”
udapi.block.util.filter module¶
Filter is a special block for keeping/deleting subtrees specified by parameters.
-
class
udapi.block.util.filter.
Filter
(delete_tree=None, delete_tree_if_node=None, delete_subtree=None, keep_tree=None, keep_tree_if_node=None, keep_subtree=None, mark=None, **kwargs)[source]¶ Bases:
udapi.core.block.Block
Special block for keeping/deleting subtrees specified by parameters.
Example usage from command line: # extract subtrees governed by nouns (noun phrases) udapy -s util.Filter keep_subtree=’node.upos == “NOUN”’ < in.conllu > filtered.conllu
# keep only trees which contain ToDo|Bug nodes udapy -s util.Filter keep_tree_if_node=’re.match(“ToDo|Bug”, str(node.misc))’ < in > filtered
# keep only non-projective trees, annotate non-projective edges with Mark=nonproj and show. udapy -T util.Filter keep_tree_if_node=’node.is_nonprojective()’ mark=nonproj < in | less -R
# delete trees which contain deprel=remnant udapy -s util.Filter delete_tree_if_node=’node.deprel == “remnant”’ < in > filtered
# delete subtrees headed by a node with deprel=remnant udapy -s util.Filter delete_subtree=’node.deprel == “remnant”’ < in > filtered
udapi.block.util.findbug module¶
Block util.FindBug for debugging.
Usage:
If block xy.Z fails with a Python exception,
insert “util.FindBug block=” into the scenario,
e.g. to debug second.Block
, use
udapy first.Block util.FindBug block=second.Block > bug.conllu
This will create the file bug.conllu with the bundle, which caused the bug.
-
class
udapi.block.util.findbug.
FindBug
(block, first_error_only=True, **kwargs)[source]¶ Bases:
udapi.core.basewriter.BaseWriter
Debug another block by finding a minimal testcase conllu file.
udapi.block.util.mark module¶
util.Mark is a special block for marking nodes specified by parameters.
-
class
udapi.block.util.mark.
Mark
(node, mark=1, add=True, **kwargs)[source]¶ Bases:
udapi.core.block.Block
Mark nodes specified by parameters.
Example usage from command line:: # see non-projective trees with non-projective edges highlighted udapy -TM util.Mark node=’node.is_nonprojective()’ < in | less -R
udapi.block.util.markdiff module¶
util.MarkDiff is a special block for marking differences between parallel trees.
-
class
udapi.block.util.markdiff.
MarkDiff
(gold_zone, attributes='form, lemma, upos, xpos, deprel, feats, misc', mark=1, add=False, **kwargs)[source]¶ Bases:
udapi.core.block.Block
Mark differences between parallel trees.
udapi.block.util.resegmentgold module¶
util.ResegmentGold is a block for sentence alignment and re-segmentation of two zones.
-
class
udapi.block.util.resegmentgold.
ResegmentGold
(gold_zone='gold', **kwargs)[source]¶ Bases:
udapi.core.block.Block
Sentence-align two zones (gold and pred) and resegment the pred zone.
The two zones must contain the same sequence of characters.
-
static
choose_root
(p_tree, g_tree)[source]¶ Prevent multiple roots, which are forbidden in the evaluation script.
-
static
udapi.block.util.see module¶
Block util.See prints statistics about the nodes matching a given condition.
Example usage from the command line:
udapy util.See node=’node.is_nonprojective()’ n=3 stats=dir,children,c_upos,p_lemma,deprel,feats_split < in.conllu
Example output:
node.is_nonprojective() matches 245 out of 35766 nodes (0.7%) in 174 out of 1478 trees (11.8%) === dir (2 values) ===
- right 193 78% delta=+37%
- left 52 21% delta=-33%
- === children (9 values) ===
- 0 64 26% delta=-38% 2 58 23% delta=+14% 3 38 15% delta= +7%
- === c_upos (15 values) ===
- NOUN 118 23% delta= +4%
- DET 61 12% delta= -3%
PROPN 47 9% delta= +1%
- === p_lemma (187 values) ===
- il 5 2% delta= +1%
- fonction 4 1% delta= +1%
- écrire 4 1% delta= +1%
- === deprel (22 values) ===
- appos 41 16% delta=+15%
- conj 41 16% delta=+13%
punct 36 14% delta= +4%
- === feats_split (20 values) ===
Number=Sing 114 21% delta= +2% Gender=Masc 81 15% delta= +3%
_ 76 14% delta= -6%
In addition to absolute counts for each value, the percentage within matching nodes is printed and a delta relative to percentage within all nodes. This helps to highlight what is special about the matching nodes.
-
class
udapi.block.util.see.
See
(node, n=5, stats='dir, edge, depth, children, siblings, p_upos, p_lemma, c_upos, form, lemma, upos, deprel, feats_split', **kwargs)[source]¶ Bases:
udapi.core.block.Block
Print statistics about the nodes specified by the parameter node.
udapi.block.util.split module¶
util.Split is a special block for splitting documents.
-
class
udapi.block.util.split.
Split
(parts=None, bundles_per_doc=None, **kwargs)[source]¶ Bases:
udapi.core.basereader.BaseReader
Split Udapi document (with sentence-aligned trees in bundles) into several parts.
-
static
is_multizone_reader
()[source]¶ Can this reader read bundles which contain more zones?.
This implementation returns always True. If a subclass supports just one zone in file (e.g. read.Sentences), this method should be overriden to return False, so process_document can take advatage of this knowledge and optimize the reading (no buffer needed even if bundles_per_doc specified).
-
static
udapi.block.util.wc module¶
Wc is a special block for printing statistics (word count etc).