udapi.core.mwt module

MWT class represents a multi-word token.

class udapi.core.mwt.MWT(words=None, form=None, feats=None, misc=None, root=None)[source]

Bases: object

Class for representing multi-word tokens in UD trees.

address()[source]

Full (document-wide) id of the multi-word token.

property feats

Property feats in MWT should be used only for Typo=Yes.

See https://universaldependencies.org/changes.html#typos-in-multiword-tokens However, Udapi does not enforce this restriction and mwt.feats works exactly the same as node.feats.

form
get_attrs(attrs, undefs=None, stringify=True)[source]

Return multiple attributes or pseudo-attributes, possibly substituting empty ones.

MWTs do not have children nor parents nor prev/next nodes, so the pseudo-attributes: p_xy, c_xy, l_xy and r_xy are irrelevant (and return nothing). Other pseudo-attributes (e.g. dir) return always the string “<mwt>”. The only relevant pseudo-attributes are feats_split and misc_split: a list of name=value formatted strings. The ord attribute returns actually mwt.ord_range.

Args: attrs: A list of attribute names, e.g. ['form', 'ord', 'feats_split']. undefs: A value to be used instead of None for empty (undefined) values. stringify: Apply str() on each value (except for None)

static is_empty()[source]

Is this an Empty node?

Returns always False because multi-word tokens cannot be empty nodes.

static is_leaf()[source]

Is this a node/mwt without any children?

Returns always True because multi-word tokens cannot have children.

static is_mwt()[source]

Is this a multi-word token?

Returns always True. False is returned only by instances of the Node class.

property misc

Property for MISC attributes stored as a DualDict object.

See udapi.core.node.Node for details.

property no_space_after

Boolean property as a shortcut for mwt.misc[“SpaceAfter”] == “No”.

property ord_range

Return a string suitable for the first column of CoNLL-U.

remove()[source]

Delete this multi-word token (but keep its words).

root
words