udapi.block.write package

Submodules

udapi.block.write.conllu module

Conllu class is a a writer of files in the CoNLL-U format.

class udapi.block.write.conllu.Conllu(print_sent_id=True, print_text=True, print_empty_trees=True, **kwargs)[source]

Bases: udapi.core.basewriter.BaseWriter

A writer of files in the CoNLL-U format.

before_process_document(document)[source]

Print doc_json_* headers.

process_tree(tree)[source]

Process a UD tree

udapi.block.write.html module

Html class is a writer for HTML+JavaScript+SVG visualization of dependency trees.

class udapi.block.write.html.Html(path_to_js='web', **kwargs)[source]

Bases: udapi.core.basewriter.BaseWriter

A writer for HTML+JavaScript+SVG visualization of dependency trees.

# from the command line
udapy write.Html < file.conllu > file.html
firefox file.html

For offline use, we need to download first three JavaScript libraries:

wget https://code.jquery.com/jquery-2.1.4.min.js
wget https://cdn.rawgit.com/eligrey/FileSaver.js/master/FileSaver.min.js
wget https://cdn.rawgit.com/ufal/js-treex-view/gh-pages/js-treex-view.js
udapy write.Html path_to_js=. < file.conllu > file.html
firefox file.html

This writer produces an html file with drawings of the dependency trees in the document (there are buttons for selecting which bundle will be shown). Under each node its form, upos and deprel are shown. In the tooltip its lemma and (morphological) features are shown. After clicking the node, all other attributes are shown. When hovering over a node, the respective word in the (plain text) sentence is highlighted. There is a button for downloading trees as SVG files.

Three JavaScript libraries are required (jquery, FileSaver and js-treex-view). By default they are linked online (so Internet access is needed when viewing), but they can be also downloaded locally (so offline browsing is possible and the loading is faster): see the Usage example above.

This block is based on Treex::View but takes a different approach. Treex::View depends on (older version of) Valence (Perl interface to Electron) and comes with a script view-treex, which takes a treex file, converts it to json behind the scenes (which is quite slow) and displays the json in a Valence window.

This block generates the json code directly to the html file, so it can be viewed with any browser or even published online. (Most of the html file is actually the json.)

When viewing the html file, the JavaScript library js-treex-view generates an svg on the fly from the json.

static print_node(node)[source]

JSON representation of a given node.

process_document(doc)[source]

Process a UD document

udapi.block.write.sdparse module

Sdparse class is a writer for Stanford dependencies format.

class udapi.block.write.sdparse.Sdparse(print_upos=True, print_feats=False, always_ord=False, **kwargs)[source]

Bases: udapi.core.basewriter.BaseWriter

A writer of files in the Stanford dependencies format, suitable for Brat visualization.

Usage: udapy write.Sdparse print_upos=0 < in.conllu

Example output:

~~~ sdparse
Corriere Sport da pagina 23 a pagina 26
name(Corriere, Sport)
case(pagina-4, da)
nmod(Corriere, pagina-4)
nummod(pagina-4, 23)
case(pagina-7, a)
nmod(Corriere, pagina-7)
nummod(pagina-7, 26)
~~~

To visualize it, use embedded Brat, e.g. go to http://universaldependencies.org/visualization.html#editing. Click the edit button and paste the output of this writer excluding the ~~~ marks.

Notes: The original Stanford dependencies format allows explicit specification of the root dependency, e.g. root(ROOT-0, makes-8). However, this is not allowed by Brat, so this writer does not print it.

UD v2.0 allows tokens with spaces, but I am not aware of any Brat support.

Alternatives:

  • write.Conllu Brat recently supports also the CoNLL-U input
  • write.TextModeTrees may be more readable/useful in some usecases
  • write.Html dtto, press “Save as SVG” button, convert to pdf
process_tree(tree)[source]

Process a UD tree

udapi.block.write.sentences module

Sentences class is a writer for plain-text sentences.

class udapi.block.write.sentences.Sentences(if_missing='detokenize', **kwargs)[source]

Bases: udapi.core.basewriter.BaseWriter

A writer of plain-text sentences (one per line).

Usage: udapy write.Sentences if_missing=empty < my.conllu > my.txt

process_tree(tree)[source]

Process a UD tree

udapi.block.write.textmodetrees module

An ASCII pretty printer of dependency trees.

class udapi.block.write.textmodetrees.TextModeTrees(print_sent_id=True, print_text=True, add_empty_line=True, indent=1, minimize_cross=True, color='auto', attributes='form, upos, deprel', print_undef_as='_', print_doc_meta=True, print_comments=False, mark='ToDo|ToDoOrigText|Bug|Mark', marked_only=False, hints=True, **kwargs)[source]

Bases: udapi.core.basewriter.BaseWriter

An ASCII pretty printer of dependency trees.

# from the command line (visualize CoNLL-U files)
udapy write.TextModeTrees color=1 < file.conllu | less -R

In scenario (examples of other parameters):

write.TextModeTrees indent=1 print_sent_id=1 print_sentence=1
write.TextModeTrees zones=en,cs attributes=form,lemma,upos minimize_cross=0

This block prints dependency trees in plain-text format. For example the following CoNLL-U file (with tabs instead of spaces):

1  I     I     PRON  PRP Number=Sing|Person=1 2  nsubj     _ _
2  saw   see   VERB  VBD Tense=Past           0  root      _ _
3  a     a     DET   DT  Definite=Ind         4  det       _ _
4  dog   dog   NOUN  NN  Number=Sing          2  dobj      _ _
5  today today NOUN  NN  Number=Sing          2  nmod:tmod _ SpaceAfter=No
6  ,     ,     PUNCT ,   _                    2  punct     _ _
7  which which DET   WDT PronType=Rel         10 nsubj     _ _
8  was   be    VERB  VBD Person=3|Tense=Past  10 cop       _ _
9  a     a     DET   DT  Definite=Ind         10 det       _ _
10 boxer boxer NOUN  NN  Number=Sing          4  acl:relcl _ SpaceAfter=No
11 .     .     PUNCT .   _                    2  punct     _ _

will be printed (with the default parameters) as:

─┮
 │ ╭─╼ I PRON nsubj
 ╰─┾ saw VERB root
   │                        ╭─╼ a DET det
   ├────────────────────────┾ dog NOUN dobj
   ├─╼ today NOUN nmod:tmod │
   ├─╼ , PUNCT punct        │
   │                        │ ╭─╼ which DET nsubj
   │                        │ ├─╼ was VERB cop
   │                        │ ├─╼ a DET det
   │                        ╰─┶ boxer NOUN acl:relcl
   ╰─╼ . PUNCT punct

Some non-projective trees cannot be printed witout crossing edges. TextModeTrees uses a special “bridge” symbol ─╪─ to mark this:

─┮
 │ ╭─╼ 1
 ├─╪───┮ 2
 ╰─┶ 3 │
       ╰─╼ 4

By default parameter color=auto, so if the output is printed to the console (not file or pipe), each node attribute is printed in different color. If a given node’s MISC contains any of ToDo, Bug or Mark attributes (or any other specified in the parameter mark), the node will be highlighted (by reveresing the background and foreground colors).

This block’s method process_tree can be called on any node (not only root), which is useful for printing subtrees using node.print_subtree(), which is internally implemented using this block.

SEE ALSO TextModeTreesHtml

add_node(idx, node)[source]

Render a node with its attributes.

before_process_document(document)[source]

Initialize ANSI colors if color is True or ‘auto’.

If color==’auto’, detect if sys.stdout is interactive (terminal, not redirected to a file).

static colorize_attr(attr, value, marked)[source]

Return a string with color markup for a given attr and its value.

colorize_comment(comment)[source]

Return a string with color markup for a given comment.

is_marked(node)[source]

Should a given node be highlighted?

print_headers(root)[source]

Print sent_id, text and other comments related to the tree.

process_tree(root)[source]

Print the tree to (possibly redirected) sys.stdout.

should_print_tree(root)[source]

Should this tree be printed?

udapi.block.write.textmodetreeshtml module

An ASCII pretty printer of colored dependency trees in HTML.

class udapi.block.write.textmodetreeshtml.TextModeTreesHtml(color=True, title='Udapi visualization', **kwargs)[source]

Bases: udapi.block.write.textmodetrees.TextModeTrees

An ASCII pretty printer of colored dependency trees in HTML.

SYNOPSIS # from command line (visualize CoNLL-U files) udapy write.TextModeTreesHtml < file.conllu > file.html

This block is a subclass of TextModeTrees, see its documentation for more info.

add_node(idx, node)[source]

Render a node with its attributes.

after_process_document(document)[source]

This method is called after each process_document.

before_process_document(document)[source]

Initialize ANSI colors if color is True or ‘auto’.

If color==’auto’, detect if sys.stdout is interactive (terminal, not redirected to a file).

static colorize_attr(attr, value, marked)[source]

Return a string with color markup for a given attr and its value.

colorize_comment(comment)[source]

Return a string with color markup for a given comment.

print_headers(root)[source]

Print sent_id, text and other comments related to the tree.

udapi.block.write.tikz module

Tikz class is a writer for LaTeX with tikz-dependency.

class udapi.block.write.tikz.Tikz(print_sent_id=True, print_text=True, print_preambule=True, attributes='form, upos', **kwargs)[source]

Bases: udapi.core.basewriter.BaseWriter

A writer of files in the LaTeX with tikz-dependency format.

Usage:

udapy write.Tikz < my.conllu > my.tex
pdflatex my.tex
xdg-open my.pdf

Long sentences may result in too large pictures. You can tune the width (in addition to changing fontsize or using minipage and rescaling) with \begin{deptext}[column sep=0.2cm] or individually for each word: My \&[.5cm] dog \& etc. By default, the height of the horizontal segment of a dependency edge is proportional to the distance between the linked words. You can tune the height with: \depedge[edge unit distance=1.5ex]{9}{1}{deprel}

See tikz-dependency documentation for details.

Alternatives: * use write.TextModeTrees and include it in verbatim environment in LaTeX. * use write.Html, press “Save as SVG” button, convert to pdf and include in LaTeX.

after_process_document(doc)[source]

This method is called after each process_document.

before_process_document(doc)[source]

This method is called before each process_document.

process_tree(tree)[source]

Process a UD tree

udapi.block.write.treex module

write.Treex is a writer block for Treex XML (e.g. for TrEd editing).

class udapi.block.write.treex.Treex(files='-', filehandle=None, docname_as_file=False, encoding='utf-8', newline='n', **kwargs)[source]

Bases: udapi.core.basewriter.BaseWriter

A writer of files in the Treex format.

after_process_document(doc)[source]

This method is called after each process_document.

before_process_document(doc)[source]

This method is called before each process_document.

print_subtree(node, tree_id, indent)[source]

Recrsively print trees in Treex format.

process_bundle(bundle)[source]

Process a UD bundle

process_tree(tree)[source]

Process a UD tree

udapi.block.write.vislcg module

Vislcg class is a writer for the VISL-cg format.

class udapi.block.write.vislcg.Vislcg(files='-', filehandle=None, docname_as_file=False, encoding='utf-8', newline='n', **kwargs)[source]

Bases: udapi.core.basewriter.BaseWriter

A writer of files in the VISL-cg format, suitable for VISL Constraint Grammer Parser.

See https://visl.sdu.dk/visl/vislcg-doc.html

Usage: udapy write.Vislcg < in.conllu > out.vislcg

Example output:

"<Қыз>"
        "қыз" n nom @nsubj #1->3
"<оның>"
        "ол" prn pers p3 sg gen @nmod:poss #2->3
"<қарындасы>"
        "қарындас" n px3sp nom @parataxis #3->8
            "е" cop aor p3 sg @cop #4->3
"<,>"
        "," cm @punct #5->8
"<ол>"
        "ол" prn pers p3 sg nom @nsubj #6->8
"<бес>"
        "бес" num @nummod #7->8
"<жаста>"
        "жас" n loc @root #8->0
            "е" cop aor p3 sg @cop #9->8
"<.>"
        "." sent @punct #10->8

Example input:

# text = Қыз оның қарындасы, ол бес жаста.
1    Қыз        қыз       _  n     nom             3  nsubj      _  _
2    оның       ол        _  prn   pers|p3|sg|gen  3  nmod:poss  _  _
3-4  қарындасы  _         _  _     _               _  _          _  _
3    қарындасы  қарындас  _  n     px3sp|nom       8  parataxis  _  _
4    _          е         _  cop   aor|p3|sg       3  cop        _  _
5    ,          ,         _  cm    _               8  punct      _  _
6    ол         ол        _  prn   pers|p3|sg|nom  8  nsubj      _  _
7    бес        бес       _  num   _               8  nummod     _  _
8-9  жаста      _         _  _     _               _  _          _  _
8    жаста      жас       _  n     loc             0  root       _  _
9    _          е         _  cop   aor|p3|sg       8  cop        _  _
10   .          .         _  sent  _               8  punct      _  _
process_tree(tree)[source]

Process a UD tree

Module contents