udapi.core.root module

Root class represents the technical root node in each tree.

class udapi.core.root.Root(zone=None, comment='', text=None, newpar=None, newdoc=None)[source]

Bases: Node

Class for representing root nodes (technical roots) in UD trees.

add_comment(string)[source]

Add a given string to root.comment separated by a newline and space.

address()[source]

Full (document-wide) id of the root.

The general format of root nodes is: root.bundle.bundle_id + ‘/’ + root.zone, e.g. s123/en_udpipe. If zone is empty, the slash is excluded as well, e.g. s123. If bundle is missing (could occur during loading), ‘?’ is used instead. Root’s address is stored in CoNLL-U files as sent_id (in a special comment).

property bundle

Return the bundle which this tree belongs to.

comment
create_empty_child(**kwargs)[source]

Create and return a new empty node within this tree.

This root-specific implementation overrides Node.create_empty_child()’. It is faster because it does not set `deps and ord of the newly created node. It is up to the user to set up these attributes correctly. It is used in udapi.block.read.conllu (where speed is important and thus, only raw_deps are set up instead of deps).

create_multiword_token(words=None, form=None, misc=None)[source]

Create and return a new multi-word token (MWT) in this tree.

The new MWT can be optionally initialized using the following args. Args: words: a list of nodes which are part of the new MWT form: string representing the surface form of the new MWT misc: misc attribute of the new MWT

property descendants

Return a list of all descendants of the current node.

The nodes are sorted by their ord. This root-specific implementation returns all the nodes in the tree except the root itself.

property descendants_and_empty
property document
empty_nodes
flatten(deprel='root')[source]

Flatten the tree (i.e. attach all nodes to the root) and reset all deprels.

This is equivalent to
for node in root.descendants:

node.parent = root node.deprel = ‘root’

but it is faster.

get_sentence(if_missing='detokenize')[source]

Return either the stored root.text or (if None) root.compute_text().

Args: if_missing: What to do if root.text is None? (default=detokenize)

  • detokenize: use root.compute_text() to compute the sentence.

  • empty: return an empty string

  • warn_detokenize, warn_empty: in addition emit a warning via logging.warning()

  • fatal: raise an exception

is_descendant_of(node)[source]

Is the current node a descendant of the node given as argument?

This root-specific implementation returns always False.

is_root()[source]

Return True for all Root instances.

json
property multiword_tokens

Return a list of all multi-word tokens in this tree.

newdoc
newpar
property parent

Return dependency parent (head) node.

This root-specific implementation returns always None.

remove(children=None)[source]

Remove the whole tree from its bundle.

Args: children: a string specifying what to do if the root has any children.

The default (None) is to delete them (and all their descendants). warn means to issue a warning.

property sent_id

ID of this tree, stored in the sent_id comment in CoNLL-U.

shift(reference_node, after=0, move_subtree=0, reference_subtree=0)[source]

Attempts at changing the word order of root result in Exception.

steal_nodes(nodes)[source]

Move nodes from another tree to this tree (append).

text
property token_descendants

Return all tokens (one-word or multi-word) in the tree.

ie. return a list of core.Node and core.MWT instances, whose forms create the raw sentence. Skip nodes, which are part of multi-word tokens.

For example with: 1-2 vámonos _ 1 vamos ir 2 nos nosotros 3-4 al _ 3 a a 4 el el 5 mar mar

[n.form for n in root.token_descendants] will return [‘vámonos’, ‘al’, ‘mar’].

property zone

Return zone (string label) of this tree.