udapi.block.read.ccv module

Ccv class is a reader for Corpus of Czech Verse json files.

class udapi.block.read.ccv.Ccv(tokenize=True, **kwargs)[source]

Bases: BaseReader

A reader for Corpus of Czech Verse json files.

See https://github.com/versotym/corpusCzechVerse Each verse (line) is stored as one tree (although it is quite often not a whole sentence). Start of each stanza is marked with newpar. Start of each poem is marked with newdoc = [poem_id].

Args: tokenize: create nodes

static is_multizone_reader()[source]

Can this reader read bundles which contain more zones?.

This implementation returns always False.

read_tree()[source]

Load one (more) tree from self.filehandle and return its root.

This method must be overriden in all readers. Usually it is the only method that needs to be implemented. The implementation in this base clases raises NotImplementedError.

read_trees()[source]

Load all trees from self.filehandle and return a list of their roots.

This method may be overriden in a reader if a faster alternative to read_tree() is needed. The implementation in this base clases raises NotImplementedError.