udapi.core.basereader module¶
BaseReader is the base class for all reader blocks.
- class udapi.core.basereader.BaseReader(files='-', filehandle=None, zone='keep', bundles_per_doc=0, encoding='utf-8-sig', sent_id_filter=None, split_docs=False, ignore_sent_id=False, merge=False, max_docs=0, **kwargs)[source]¶
Bases:
Block
Base class for all reader blocks.
- property file_number¶
Property with the current file number (1-based).
- property filehandle¶
Property with the current file handle.
- property filename¶
Property with the current filename.
- filtered_read_tree()[source]¶
Load and return one more tree matching the sent_id_filter.
This method uses read_tree() internally. This is the method called by process_document.
- static is_multizone_reader()[source]¶
Can this reader read bundles which contain more zones?.
This implementation returns always True. If a subclass supports just one zone in file (e.g. read.Sentences), this method should be overriden to return False, so process_document can take advatage of this knowledge and optimize the reading (no buffer needed even if bundles_per_doc specified).
- read_tree()[source]¶
Load one (more) tree from self.filehandle and return its root.
This method must be overriden in all readers. Usually it is the only method that needs to be implemented. The implementation in this base clases raises NotImplementedError.