udapi.block.read package

Submodules

udapi.block.read.addsentences module

AddSentences class is a reader for adding plain-text sentences.

class udapi.block.read.addsentences.AddSentences(zone='', into='text', **kwargs)[source]

Bases: udapi.core.basereader.BaseReader

A reader for adding plain-text sentences (one sentence per line) files.

The sentences are added to an existing trees. This is useful, e.g. if there are the original raw texts in a separate file:

cat in.conllu | udapy -s read.Conllu read.AddSentences files=in.txt > merged.conllu

static is_multizone_reader()[source]

Can this reader read bundles which contain more zones?.

This implementation returns always False.

process_document(document)[source]

Process a UD document

udapi.block.read.conllu module

“Conllu is a reader block for the CoNLL-U files.

class udapi.block.read.conllu.Conllu(strict=False, separator='tab', empty_parent='warn', attributes='ord, form, lemma, upos, xpos, feats, head, deprel, deps, misc', **kwargs)[source]

Bases: udapi.core.basereader.BaseReader

A reader of the CoNLL-U files.

static parse_comment_line(line, root)[source]

Parse one line of CoNLL-U and fill sent_id, text, newpar, newdoc in root.

read_tree()[source]

Load one (more) tree from self.files and return its root.

This method must be overriden in all readers. Usually it is the only method that needs to be implemented. The implementation in this base clases raises NotImplementedError.

udapi.block.read.sentences module

Sentences class is a reader for plain-text sentences.

class udapi.block.read.sentences.Sentences(files='-', filehandle=None, zone='keep', bundles_per_doc=0, encoding='utf-8', sent_id_filter=None, split_docs=False, ignore_sent_id=False, **kwargs)[source]

Bases: udapi.core.basereader.BaseReader

A reader for plain-text sentences (one sentence per line) files.

static is_multizone_reader()[source]

Can this reader read bundles which contain more zones?.

This implementation returns always False.

read_tree(document=None)[source]

Load one (more) tree from self.files and return its root.

This method must be overriden in all readers. Usually it is the only method that needs to be implemented. The implementation in this base clases raises NotImplementedError.

udapi.block.read.vislcg module

Vislcg is a reader block the VISL-cg format.

class udapi.block.read.vislcg.Vislcg(files='-', filehandle=None, zone='keep', bundles_per_doc=0, encoding='utf-8', sent_id_filter=None, split_docs=False, ignore_sent_id=False, **kwargs)[source]

Bases: udapi.core.basereader.BaseReader

A reader of the VISL-cg format, suitable for VISL Constraint Grammer Parser.

read_tree()[source]

Load one (more) tree from self.files and return its root.

This method must be overriden in all readers. Usually it is the only method that needs to be implemented. The implementation in this base clases raises NotImplementedError.

Module contents