udapi.core.basereader module

BaseReader is the base class for all reader blocks.

class udapi.core.basereader.BaseReader(files='-', filehandle=None, zone='keep', bundles_per_doc=0, encoding='utf-8-sig', sent_id_filter=None, split_docs=False, ignore_sent_id=False, merge=False, max_docs=0, **kwargs)[source]

Bases: Block

Base class for all reader blocks.

property file_number

Property with the current file number (1-based).

property filehandle

Property with the current file handle.

property filename

Property with the current filename.

filtered_read_tree()[source]

Load and return one more tree matching the sent_id_filter.

This method uses read_tree() internally. This is the method called by process_document.

static is_multizone_reader()[source]

Can this reader read bundles which contain more zones?.

This implementation returns always True. If a subclass supports just one zone in file (e.g. read.Sentences), this method should be overriden to return False, so process_document can take advatage of this knowledge and optimize the reading (no buffer needed even if bundles_per_doc specified).

next_filehandle()[source]

Go to the next file and retrun its filehandle.

process_document(document)[source]

Process a UD document

read_documents()[source]

Load all documents of this reader and return them as a list.

read_tree()[source]

Load one (more) tree from self.filehandle and return its root.

This method must be overriden in all readers. Usually it is the only method that needs to be implemented. The implementation in this base clases raises NotImplementedError.

read_trees()[source]

Load all trees from self.filehandle and return a list of their roots.

This method may be overriden in a reader if a faster alternative to read_tree() is needed. The implementation in this base clases raises NotImplementedError.

try_fast_load(document)[source]

Try to use self.read_trees() if possible and return True, otherwise False.