udapi.block.write package¶
Submodules¶
udapi.block.write.conllu module¶
Conllu class is a a writer of files in the CoNLL-U format.
-
class
udapi.block.write.conllu.
Conllu
(print_sent_id=True, print_text=True, print_empty_trees=True, **kwargs)[source]¶ Bases:
udapi.core.basewriter.BaseWriter
A writer of files in the CoNLL-U format.
udapi.block.write.html module¶
Html class is a writer for HTML+JavaScript+SVG visualization of dependency trees.
-
class
udapi.block.write.html.
Html
(path_to_js='web', **kwargs)[source]¶ Bases:
udapi.core.basewriter.BaseWriter
A writer for HTML+JavaScript+SVG visualization of dependency trees.
# from the command line udapy write.Html < file.conllu > file.html firefox file.html
For offline use, we need to download first three JavaScript libraries:
wget https://code.jquery.com/jquery-2.1.4.min.js wget https://cdn.rawgit.com/eligrey/FileSaver.js/master/FileSaver.min.js wget https://cdn.rawgit.com/ufal/js-treex-view/gh-pages/js-treex-view.js udapy write.Html path_to_js=. < file.conllu > file.html firefox file.html
This writer produces an html file with drawings of the dependency trees in the document (there are buttons for selecting which bundle will be shown). Under each node its form, upos and deprel are shown. In the tooltip its lemma and (morphological) features are shown. After clicking the node, all other attributes are shown. When hovering over a node, the respective word in the (plain text) sentence is highlighted. There is a button for downloading trees as SVG files.
Three JavaScript libraries are required (jquery, FileSaver and js-treex-view). By default they are linked online (so Internet access is needed when viewing), but they can be also downloaded locally (so offline browsing is possible and the loading is faster): see the Usage example above.
This block is based on Treex::View but takes a different approach. Treex::View depends on (older version of) Valence (Perl interface to Electron) and comes with a script view-treex, which takes a treex file, converts it to json behind the scenes (which is quite slow) and displays the json in a Valence window.
This block generates the json code directly to the html file, so it can be viewed with any browser or even published online. (Most of the html file is actually the json.)
When viewing the html file, the JavaScript library js-treex-view generates an svg on the fly from the json.
udapi.block.write.sdparse module¶
Sdparse class is a writer for Stanford dependencies format.
-
class
udapi.block.write.sdparse.
Sdparse
(print_upos=True, print_feats=False, always_ord=False, **kwargs)[source]¶ Bases:
udapi.core.basewriter.BaseWriter
A writer of files in the Stanford dependencies format, suitable for Brat visualization.
Usage:
udapy write.Sdparse print_upos=0 < in.conllu
Example output:
~~~ sdparse Corriere Sport da pagina 23 a pagina 26 name(Corriere, Sport) case(pagina-4, da) nmod(Corriere, pagina-4) nummod(pagina-4, 23) case(pagina-7, a) nmod(Corriere, pagina-7) nummod(pagina-7, 26) ~~~
To visualize it, use embedded Brat, e.g. go to http://universaldependencies.org/visualization.html#editing. Click the edit button and paste the output of this writer excluding the ~~~ marks.
Notes: The original Stanford dependencies format allows explicit specification of the root dependency, e.g. root(ROOT-0, makes-8). However, this is not allowed by Brat, so this writer does not print it.
UD v2.0 allows tokens with spaces, but I am not aware of any Brat support.
Alternatives:
- write.Conllu Brat recently supports also the CoNLL-U input
- write.TextModeTrees may be more readable/useful in some usecases
- write.Html dtto, press “Save as SVG” button, convert to pdf
udapi.block.write.sentences module¶
Sentences class is a writer for plain-text sentences.
-
class
udapi.block.write.sentences.
Sentences
(if_missing='detokenize', **kwargs)[source]¶ Bases:
udapi.core.basewriter.BaseWriter
A writer of plain-text sentences (one per line).
Usage: udapy write.Sentences if_missing=empty < my.conllu > my.txt
udapi.block.write.textmodetrees module¶
An ASCII pretty printer of dependency trees.
-
class
udapi.block.write.textmodetrees.
TextModeTrees
(print_sent_id=True, print_text=True, add_empty_line=True, indent=1, minimize_cross=True, color='auto', attributes='form, upos, deprel', print_undef_as='_', print_doc_meta=True, print_comments=False, mark='ToDo|ToDoOrigText|Bug|Mark', marked_only=False, hints=True, **kwargs)[source]¶ Bases:
udapi.core.basewriter.BaseWriter
An ASCII pretty printer of dependency trees.
# from the command line (visualize CoNLL-U files) udapy write.TextModeTrees color=1 < file.conllu | less -R
In scenario (examples of other parameters):
write.TextModeTrees indent=1 print_sent_id=1 print_sentence=1 write.TextModeTrees zones=en,cs attributes=form,lemma,upos minimize_cross=0
This block prints dependency trees in plain-text format. For example the following CoNLL-U file (with tabs instead of spaces):
1 I I PRON PRP Number=Sing|Person=1 2 nsubj _ _ 2 saw see VERB VBD Tense=Past 0 root _ _ 3 a a DET DT Definite=Ind 4 det _ _ 4 dog dog NOUN NN Number=Sing 2 dobj _ _ 5 today today NOUN NN Number=Sing 2 nmod:tmod _ SpaceAfter=No 6 , , PUNCT , _ 2 punct _ _ 7 which which DET WDT PronType=Rel 10 nsubj _ _ 8 was be VERB VBD Person=3|Tense=Past 10 cop _ _ 9 a a DET DT Definite=Ind 10 det _ _ 10 boxer boxer NOUN NN Number=Sing 4 acl:relcl _ SpaceAfter=No 11 . . PUNCT . _ 2 punct _ _
will be printed (with the default parameters) as:
─┮ │ ╭─╼ I PRON nsubj ╰─┾ saw VERB root │ ╭─╼ a DET det ├────────────────────────┾ dog NOUN dobj ├─╼ today NOUN nmod:tmod │ ├─╼ , PUNCT punct │ │ │ ╭─╼ which DET nsubj │ │ ├─╼ was VERB cop │ │ ├─╼ a DET det │ ╰─┶ boxer NOUN acl:relcl ╰─╼ . PUNCT punct
Some non-projective trees cannot be printed witout crossing edges. TextModeTrees uses a special “bridge” symbol ─╪─ to mark this:
─┮ │ ╭─╼ 1 ├─╪───┮ 2 ╰─┶ 3 │ ╰─╼ 4
By default parameter
color=auto
, so if the output is printed to the console (not file or pipe), each node attribute is printed in different color. If a given node’s MISC contains any of ToDo, Bug or Mark attributes (or any other specified in the parameter mark), the node will be highlighted (by reveresing the background and foreground colors).This block’s method process_tree can be called on any node (not only root), which is useful for printing subtrees using
node.print_subtree()
, which is internally implemented using this block.SEE ALSO
TextModeTreesHtml
-
before_process_document
(document)[source]¶ Initialize ANSI colors if color is True or ‘auto’.
If color==’auto’, detect if sys.stdout is interactive (terminal, not redirected to a file).
-
udapi.block.write.textmodetreeshtml module¶
An ASCII pretty printer of colored dependency trees in HTML.
-
class
udapi.block.write.textmodetreeshtml.
TextModeTreesHtml
(color=True, title='Udapi visualization', **kwargs)[source]¶ Bases:
udapi.block.write.textmodetrees.TextModeTrees
An ASCII pretty printer of colored dependency trees in HTML.
SYNOPSIS # from command line (visualize CoNLL-U files) udapy write.TextModeTreesHtml < file.conllu > file.html
This block is a subclass of TextModeTrees, see its documentation for more info.
-
before_process_document
(document)[source]¶ Initialize ANSI colors if color is True or ‘auto’.
If color==’auto’, detect if sys.stdout is interactive (terminal, not redirected to a file).
-
udapi.block.write.tikz module¶
Tikz class is a writer for LaTeX with tikz-dependency.
-
class
udapi.block.write.tikz.
Tikz
(print_sent_id=True, print_text=True, print_preambule=True, attributes='form, upos', **kwargs)[source]¶ Bases:
udapi.core.basewriter.BaseWriter
A writer of files in the LaTeX with tikz-dependency format.
Usage:
udapy write.Tikz < my.conllu > my.tex pdflatex my.tex xdg-open my.pdf
Long sentences may result in too large pictures. You can tune the width (in addition to changing fontsize or using minipage and rescaling) with
\begin{deptext}[column sep=0.2cm]
or individually for each word:My \&[.5cm] dog \& etc.
By default, the height of the horizontal segment of a dependency edge is proportional to the distance between the linked words. You can tune the height with:\depedge[edge unit distance=1.5ex]{9}{1}{deprel}
See tikz-dependency documentation for details.
Alternatives: * use write.TextModeTrees and include it in verbatim environment in LaTeX. * use write.Html, press “Save as SVG” button, convert to pdf and include in LaTeX.
udapi.block.write.treex module¶
write.Treex is a writer block for Treex XML (e.g. for TrEd editing).
-
class
udapi.block.write.treex.
Treex
(files='-', filehandle=None, docname_as_file=False, encoding='utf-8', newline='n', **kwargs)[source]¶ Bases:
udapi.core.basewriter.BaseWriter
A writer of files in the Treex format.
udapi.block.write.vislcg module¶
Vislcg class is a writer for the VISL-cg format.
-
class
udapi.block.write.vislcg.
Vislcg
(files='-', filehandle=None, docname_as_file=False, encoding='utf-8', newline='n', **kwargs)[source]¶ Bases:
udapi.core.basewriter.BaseWriter
A writer of files in the VISL-cg format, suitable for VISL Constraint Grammer Parser.
See https://visl.sdu.dk/visl/vislcg-doc.html
Usage:
udapy write.Vislcg < in.conllu > out.vislcg
Example output:
"<Қыз>" "қыз" n nom @nsubj #1->3 "<оның>" "ол" prn pers p3 sg gen @nmod:poss #2->3 "<қарындасы>" "қарындас" n px3sp nom @parataxis #3->8 "е" cop aor p3 sg @cop #4->3 "<,>" "," cm @punct #5->8 "<ол>" "ол" prn pers p3 sg nom @nsubj #6->8 "<бес>" "бес" num @nummod #7->8 "<жаста>" "жас" n loc @root #8->0 "е" cop aor p3 sg @cop #9->8 "<.>" "." sent @punct #10->8
Example input:
# text = Қыз оның қарындасы, ол бес жаста. 1 Қыз қыз _ n nom 3 nsubj _ _ 2 оның ол _ prn pers|p3|sg|gen 3 nmod:poss _ _ 3-4 қарындасы _ _ _ _ _ _ _ _ 3 қарындасы қарындас _ n px3sp|nom 8 parataxis _ _ 4 _ е _ cop aor|p3|sg 3 cop _ _ 5 , , _ cm _ 8 punct _ _ 6 ол ол _ prn pers|p3|sg|nom 8 nsubj _ _ 7 бес бес _ num _ 8 nummod _ _ 8-9 жаста _ _ _ _ _ _ _ _ 8 жаста жас _ n loc 0 root _ _ 9 _ е _ cop aor|p3|sg 8 cop _ _ 10 . . _ sent _ 8 punct _ _