udapi.block.read.sentences module

Sentences class is a reader for plain-text sentences.

class udapi.block.read.sentences.Sentences(ignore_empty_lines=False, newdoc_if_empty_line=False, rstrip='\r\n ', **kwargs)[source]

Bases: BaseReader

A reader for plain-text sentences (one sentence per line) files.

Args: ignore_empty_lines: if True, delete empty lines from the input.

Default=False.

newdoc_if_empty_line: if True, empty lines mark document boundaries,

which are marked with root.newdoc. Default=False.

rstrip: a set of characters to be stripped from the end of each line.

Default=’rn ‘. You can use rstrip=’n’ if you want to preserve any space or ‘r’ (Carriage Return) at end of line, so that udpipe.Base keeps these characters in SpacesAfter. As most blocks do not expect whitespace other than a space to appear in the processed text, using this feature is at your own risk.

after_process_document(document)[source]

This method is called after each process_document.

static is_multizone_reader()[source]

Can this reader read bundles which contain more zones?.

This implementation returns always False.

read_tree(document=None)[source]

Load one (more) tree from self.filehandle and return its root.

This method must be overriden in all readers. Usually it is the only method that needs to be implemented. The implementation in this base clases raises NotImplementedError.