udapi.core.node module¶

Node class and related classes and functions.

In addition to class Node, this module contains also helper classes CycleError, EmptyNode, OrdTuple and ListOfNodes and function find_minimal_common_treelet.

exception udapi.core.node.CycleError(message, node1, node2=None)[source]¶

Bases: Exception

A cycle in the dependency tree detected (or would be created).

class udapi.core.node.EmptyNode(root, form=None, lemma=None, upos=None, xpos=None, feats=None, deprel=None, misc=None)[source]¶

Bases: Node

Class for representing empty nodes (for ellipsis in enhanced UD).

deprel¶

form¶

is_empty()[source]¶: Return True for all EmptyNode instances.

lemma¶

property ord¶

property parent¶: Return dependency parent (head) node.

remove()[source]¶: Delete this empty node.

shift(reference_node, after=0, move_subtree=0, reference_subtree=0)[source]¶: Attempts at changing the word order of EmptyNode result in NotImplemented exception.

upos¶

xpos¶

class udapi.core.node.ListOfNodes(iterable, origin)[source]¶

Bases: list

Helper class for results of node.children and node.descendants.

Python distinguishes properties, e.g. node.form … no brackets, and methods, e.g. node.remove() … brackets necessary. It is useful (and expected by Udapi users) to use properties, so one can do e.g. node.form += “suffix”. It is questionable whether node.parent, node.root, node.children etc. should be properties or methods. The problem of methods is that if users forget the brackets, the error may remain unnoticed because the result is interpreted as a method reference. The problem of properties is that they cannot have any parameters. However, we would like to allow e.g. node.children(add_self=True).

This class solves the problem: node.children and node.descendants are properties which return instances of this clas ListOfNodes. This class implements the method __call__, so one can use e.g. nodes = node.children nodes = node.children() nodes = node.children(add_self=True, following_only=True)

origin¶

class udapi.core.node.Node(root, form=None, lemma=None, upos=None, xpos=None, feats=None, deprel=None, misc=None)[source]¶

Bases: object

Class for representing nodes in Universal Dependency trees.

Attributes form, lemma, upos, xpos and deprel are public attributes of type str, so you can use e.g. node.lemma = node.form.

node.ord is a int type property for storing the node’s word-order index, but assigning to it should be done with care, so the non-root nodes have ord`s 1,2,3… It is recommended to use one of the `node.shift_* methods for reordering nodes. Note that EmptyNode`s (subclass of `Node) have decimal ords (and no shift_* methods).

For changing dependency structure (topology) of the tree, there is the parent property, e.g. node.parent = node.parent.parent and node.create_child() method. Properties node.children and node.descendants return object of type ListOfNodes, so it is possible to do e.g. >>> all_children = node.children >>> left_children = node.children(preceding_only=True) >>> right_descendants = node.descendants(following_only=True, add_self=True)

Properties node.feats and node.misc return objects of type DualDict, so one can do e.g.: >>> node = Node() >>> str(node.feats) ‘_’ >>> node.feats = {‘Case’: ‘Nom’, ‘Person’: ‘1’}` >>> node.feats = ‘Case=Nom|Person=1’ # equivalent to the above >>> node.feats[‘Case’] ‘Nom’ >>> node.feats[‘NonExistent’] ‘’ >>> node.feats[‘Case’] = ‘Gen’ >>> str(node.feats) ‘Case=Gen|Person=1’ >>> dict(node.feats) {‘Case’: ‘Gen’, ‘Person’: ‘1’}

Handling of enhanced dependencies, multi-word tokens and other node’s methods are described below.

address()[source]¶

Return full (document-wide) id of the node.

For non-root nodes, the general address format is: node.bundle.bundle_id + ‘/’ + node.root.zone + ‘#’ + node.ord, e.g. s123/en_udpipe#4. If zone is empty, the slash is excluded as well, e.g. s123#4.

property children¶

Return a list of dependency children (direct dependants) nodes.

The returned nodes are sorted by their ord. Note that node.children is a property, not a method, so if you want all the children of a node (excluding the node itself), you should not use node.children(), but just

node.children

However, the returned result is a callable list, so you can use: nodes1 = node.children(add_self=True) nodes2 = node.children(following_only=True) nodes3 = node.children(preceding_only=True) nodes4 = node.children(preceding_only=True, add_self=True)
as a shortcut for: nodes1 = sorted([node] + node.children, key=lambda n: n.ord) nodes2 = [n for n in node.children if n.ord > node.ord] nodes3 = [n for n in node.children if n.ord < node.ord] nodes4 = [n for n in node.children if n.ord < node.ord] + [node]

See the documentation of ListOfNodes for details.

compute_text(use_mwt=True)[source]¶

Return a string representing this subtree’s text (detokenized).

Compute the string by concatenating forms of nodes (words and multi-word tokens) and joining them with a single space, unless the node has SpaceAfter=No in its misc. If called on root this method returns a string suitable for storing in root.text (but it is not stored there automatically).

Technical details: If called on root, the root’s form (<ROOT>) is not included in the string. If called on non-root nodeA, nodeA’s form is included in the string, i.e. internally descendants(add_self=True) is used. Note that if the subtree is non-projective, the resulting string may be misleading.

Args: use_mwt: consider multi-word tokens? (default=True)

property coref_entities¶

property coref_mentions¶

create_child(**kwargs)[source]¶: Create and return a new child of the current node.

create_coref_entity(eid=None, etype=None, **kwargs)[source]¶

create_empty_child(deprel, after=True, **kwargs)[source]¶

Create and return a new empty node child of the current node.

Args:: deprel: the enhanced dependency relation (required to be stored in DEPS) form, lemma, upos, xpos, feats, misc: as in Node, the default is ‘_’ after: position the newly created empty node after this node?

If True (default), the new_node.ord will be node.ord + 0.1, unless there is already an empty node with such ord, in which case it will be node.ord + 0.2 etc. If False, the new node will be placed immediately before node.

deprel¶

property deps¶

Return enhanced dependencies as a Python list of dicts.

After the first access to the enhanced dependencies, provide the deserialization of the raw data and save deps to the list.

property descendants¶

Return a list of all descendants of the current node.

The returned nodes are sorted by their ord. Note that node.descendants is a property, not a method, so if you want all the descendants of a node (excluding the node itself), you should not use node.descendants(), but just

node.descendants

However, the returned result is a callable list, so you can use: nodes1 = node.descendants(add_self=True) nodes2 = node.descendants(following_only=True) nodes3 = node.descendants(preceding_only=True) nodes4 = node.descendants(preceding_only=True, add_self=True)
as a shortcut for: nodes1 = sorted([node] + node.descendants, key=lambda n: n.ord) nodes2 = [n for n in node.descendants if n.ord > node.ord] nodes3 = [n for n in node.descendants if n.ord < node.ord] nodes4 = [n for n in node.descendants if n.ord < node.ord] + [node]

See the documentation of ListOfNodes for details.

draw(**kwargs)[source]¶

Print ASCII visualization of the dependency structure of this subtree.

This method is useful for debugging. Internally udapi.block.write.textmodetrees.TextModeTrees is used for the printing. All keyword arguments of this method are passed to its constructor, so you can use e.g.: files: to redirect sys.stdout to a file indent: to have wider trees attributes: to override the default list ‘form,upos,deprel’ See TextModeTrees for details and other parameters.

property feats¶

Property for morphological features stored as a Feats object.

Reading: You can access node.feats as a dict, e.g. if node.feats[‘Case’] == ‘Nom’. Features which are not set return an empty string (not None, not KeyError), so you can safely use e.g. if node.feats[‘MyExtra’].find(‘substring’) != -1. You can also obtain the string representation of the whole FEATS (suitable for CoNLL-U), e.g. if node.feats == ‘Case=Nom|Person=1’.

Writing: All the following assignment types are supported: node.feats[‘Case’] = ‘Nom’ node.feats = {‘Case’: ‘Nom’, ‘Person’: ‘1’} node.feats = ‘Case=Nom|Person=1’ node.feats = ‘_’ The last line has the same result as assigning None or empty string to node.feats.

For details about the implementation and other methods (e.g. node.feats.is_plural()), see udapi.core.feats.Feats which is a subclass of DualDict.

form¶

get_attrs(attrs, undefs=None, stringify=True)[source]¶

Return multiple attributes or pseudo-attributes, possibly substituting empty ones.

Pseudo-attributes: p_xy is the (pseudo) attribute xy of the parent node. c_xy is a list of the (pseudo) attributes xy of the children nodes. l_xy is the (pseudo) attribute xy of the previous (left in LTR langs) node. r_xy is the (pseudo) attribute xy of the following (right in LTR langs) node. dir: ‘left’ = the node is a left child of its parent,

‘right’ = the node is a rigth child of its parent, ‘root’ = the node’s parent is the technical root.

edge: length of the edge to parent (node.ord - node.parent.ord) or 0 if parent is root children: number of children nodes. siblings: number of siblings nodes. depth: depth in the dependency tree (technical root has depth=0, highest word has depth=1). feats_split: list of name=value formatted strings of the FEATS.

Args: attrs: A list of attribute names, e.g. ['form', 'lemma', 'p_upos']. undefs: A value to be used instead of None for empty (undefined) values. stringify: Apply str() on each value (except for None)

property gloss¶: String property as a shortcut for node.misc[“Gloss”].

is_descendant_of(node)[source]¶: Is the current node a descendant of the node given as argument?

static is_empty()[source]¶

Is the current node an empty node?

Returns False for all Node instances. True is returned only by instances of the EmptyNode subclass.

is_leaf()[source]¶: Is this node a leaf, ie. a node without any children?

is_nonprojective()[source]¶

Is the node attached to its parent non-projectively?

Is there at least one node between (word-order-wise) this node and its parent that is not dominated by the parent? For higher speed, the actual implementation does not find the node(s) which cause(s) the gap. It only checks the number of parent’s descendants in the span and the total number of nodes in the span.

is_nonprojective_gap()[source]¶

Is the node causing a non-projective gap within another node’s subtree?

Is there at least one node X such that - this node is not a descendant of X, but - this node is within span of X, i.e. it is between (word-order-wise)

X’s leftmost descendant (or X itself) and X’s rightmost descendant (or X itself).

static is_root()[source]¶

Is the current node a (technical) root?

Returns False for all Node instances, irrespectively of whether is has a parent or not. True is returned only by instances of udapi.core.root.Root.

lemma¶

property misc¶

Property for MISC attributes stored as a DualDict object.

Reading: You can access node.misc as a dict, e.g. if node.misc[‘SpaceAfter’] == ‘No’. Features which are not set return an empty string (not None, not KeyError), so you can safely use e.g. if node.misc[‘MyExtra’].find(‘substring’) != -1. You can also obtain the string representation of the whole MISC (suitable for CoNLL-U), e.g. if node.misc == ‘SpaceAfter=No|X=Y’.

Writing: All the following assignment types are supported: node.misc[‘SpaceAfter’] = ‘No’ node.misc = {‘SpaceAfter’: ‘No’, ‘X’: ‘Y’} node.misc = ‘SpaceAfter=No|X=Y’ node.misc = ‘_’ The last line has the same result as assigning None or empty string to node.feats.

For details about the implementation, see udapi.core.dualdict.DualDict.

property multiword_token¶

Return the multi-word token which includes this node, or None.

If this node represents a (syntactic) word which is part of a multi-word token, this method returns the instance of udapi.core.mwt.MWT. If this nodes is not part of any multi-word token, this method returns None.

property next_node¶: Return the following node according to word order.

property no_space_after¶: Boolean property as a shortcut for node.misc[“SpaceAfter”] == “No”.

property ord¶

property parent¶: Return dependency parent (head) node.

precedes(node)[source]¶

Does this node precedes another node in word order?

This method handles correctly also nodes from different trees (but the same zone). If you have nodes from the same tree, it is faster and more elegant to use just nodeA < nodeB, which is equivalent to calling nodeA.ord < nodeB.ord. For sorting nodes from the same tree, you can use nodes.sort() or sorted(nodes).

property prev_node¶: Return the previous node according to word order.

print_subtree(**kwargs)[source]¶: deprecated name for draw()

property raw_deps¶

String serialization of enhanced dependencies as stored in CoNLL-U files.

After the access to the raw enhanced dependencies, provide the serialization if they were deserialized already.

remove(children=None)[source]¶

Delete this node (and all its descendants unlsess specified otherwise).

Args: children: a string specifying what to do if the node has any children.

The default (None) is to delete them (and all their descendants). rehang means to re-attach those children to the parent of the removed node. warn means to issue a warning if any children are present and delete them. rehang_warn means to rehang and warn:-).

property root¶

property sdeprel¶

Return the language-specific part of dependency relation.

E.g. if deprel = acl:relcl then sdeprel = relcl. If deprel=`acl` then sdeprel = empty string. If deprel is None then node.sdeprel will return None as well.

shift_after_node(reference_node, without_children=False, skip_if_descendant=False)[source]¶: Shift this node after the reference_node.

shift_after_subtree(reference_node, without_children=False, skip_if_descendant=False)[source]¶

Shift this node (and its subtree) after the subtree rooted by reference_node.

Args: without_children: shift just this node without its subtree?

shift_before_node(reference_node, without_children=False, skip_if_descendant=False)[source]¶: Shift this node before the reference_node.

shift_before_subtree(reference_node, without_children=0, skip_if_descendant=False)[source]¶

Shift this node (and its subtree) before the subtree rooted by reference_node.

Args: without_children: shift just this node without its subtree?

property siblings¶

Return a list of dependency sibling nodes.

When used as a property, node.siblings is just a shortcut for:: [n for n in node.parent.children if n!=node]

However, it is especially helpful when used as a method, so e.g. node.siblings(preceding_only=True) stands for

[n for n in node.parent.children if n.ord < node.ord]

which is something else than: node.parent.children(preceding_only=True).

See the documentation of ListOfNodes for details.

property udeprel¶

Return the universal part of dependency relation, e.g. acl instead of acl:relcl.

So you can write node.udeprel instead of node.deprel.split(‘:’)[0].

unordered_descendants()[source]¶: Return a list of all descendants in any order.

upos¶

property words¶

Return one-item list with this node.

This property is there for compatibility with udapi.core.mwt.MWT.words. So that it is possible to use code such as:

for token in root.token_descendants:
words = token.words …

xpos¶

class udapi.core.node.OrdTuple(string)[source]¶

Bases: object

Class for the rare case of 9+ consecutive empty nodes, i.e. ords x.10, x.11 etc.

Ord 1.10 cannot be stored as float, which would result in 1.1. We thus store it as a tuple (1,10) wrapped in OrdTuple, so that comparisons work, e.g.: 1.9 < OrdTuple(‘1.10’) < 2

increase()[source]¶: Increment the decimal part of this ord.

udapi.core.node.find_minimal_common_treelet(*args)[source]¶

Find the smallest tree subgraph containing all nodes provided in args.

>>> from udapi.core.node import find_minimal_common_treelet
>>> (nearest_common_ancestor, _) = find_minimal_common_treelet(nodeA, nodeB)
>>> nodes = [nodeA, nodeB, nodeC]
>>> (nca, added_nodes) = find_minimal_common_treelet(*nodes)

There always exists exactly one such tree subgraph (aka treelet). This function returns a tuple (root, added_nodes), where root is the root of the minimal treelet and added_nodes is an iterator of nodes that had to be added to nodes to form the treelet. The nodes should not contain one node twice.