This is a script for a workshop on using Voyant Toos for the
Digital Humanities 2012 Conference in Hamburg.
1.0 Introduction
- Workshop leaders: Stéfan
Sinclair, McGill University and Geoffrey Rockwell
- Overview
Voyant Tools is a web-based environment for reading and analyzing
digital texts, created Stéfan Sinclair and Geoffrey Rockwell. It was
previously called "Voyeur" so don't be confused if that name is used.
Voyant is the next generation in a series of text analysis tools that
include HyperPo and TAPoRware. It provides tables and graphs related to
word use across a single document or a collection. Voyant adds, among
other things, the ability to handle much larger files than the previous
tools could. Voyant is actually a suite of modular tools that can be
combined in pre-defined or user-defined combinations called skins. This
workshop's primary objectives are to better understand how and why one
might use Voyant Tools to help in the study of digital texts.
- Outline
In this workshop we will:
- First, look at how to use a single Voyant tool, Cirrus, with a
small multilingual corpus.
- Then learn how to use the normal "skin" (multi-tool interface) of
Voyant with a single text.
- Show how to load your own text(s) into Voyant.
- Look at some of the more exploratory and advanced tools available
in Voyant, such a Bubbles and Correspondence Analysis.
- Discuss the use of Voyant Tools in a larger research process
(embedding tools in remote content, etc.)
- Help
Here are some useful links:
N.B. Voyant Tools is in beta, it has warts and blemishes. Always view
what you're looking at with some circumspection and if something doesn't
work as expected, assume it's a bug, not something that you're
misunderstanding (and please tell us about it).
2.0 Using a single Voyant Tool: Cirrus
Voyant Tools has a number of different tools
that can be composed into skins or used individually. We will start with
just one tool called Cirrus that can
then spawn other tools. We will try it with the English version of the Universal Declaration of Human Rights.
http://work.voyant-tools.org/tool/Cirrus/?corpus=unhr&docIndex=0&stopList=stop.en.taporware.txt&toolFlow=simple
The Cirrus tool shows
you a word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text.
Return to Glossary. of high frequency words. Some questions to ask yourself:
- What words did you expect? What words are missing? What words are
interesting?
- How does the tool arrange words and choose colours? Is there any
correspondence between size and frequency?
Here are some more Cirrus visualizations to consider:
These types of word clouds are prevalent from academia to advertising –
they quickly provide an intriguing representation of a text, as
demonstrated by this example of studying
gendered languages in toy advertising. But they're ability to
rapidly convey a picture with words comes at the cost of information
reduction, and some
are highly critical of word clouds as hermeneutical tools. What do
you think?
These Cirrus visualizations don't show all top frequency words, so-called stopwords are missing – stopwords are function words (like determiners and prepositions)
that typically carry less meaning. What to include in a stopword list is a matter of interpretation and purpose. Are numbers (like "one") important? What about words like "against"?
A new feature in Voyant Tools is the ability to set and edit stopword lists. To do so, click on the options (gear) icon and then click on "Edit Stop Words".
Try It: Try clicking on a word. It will launch a second
tab or window with a list of the texts in the corpus with the frequency of
the word you clicked on.
Try It: Now try double-clicking on one of the texts.
This should launch another tab or window with a Key Word In ContextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph.
Context is particularly important when generating a concordance for a string.
Return to Glossary. (KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right).
Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb:
I.1/577.1 | Four nights will quickly dream away the time; | And
I.1/578.2 Swift as a shadow, short as any dream; | Brief as the
II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander,
III.2/591.1 this derision | Shall seem a dream and fruitless vision, |
IV.1/593.1 as the fierce vexation of a dream. | But first I will
IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The
IV.1/594.2 rare | vision. I have had a dream, past the wit of man to
IV.1/594.2 the wit of man to | say what dream it was: man is but an
IV.1/594.2 he go | about to expound this dream. Methought I was--there
IV.1/594.2 his heart to report, what my dream | was. I will get Peter
IV.1/594.2 to write a ballad of | this dream: it shall be called
IV.1/594.2 it shall be called Bottom's dream, | because it hath no
V.1/599.1 | Following darkness like a dream, | Now are frolic: not a
V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not
See also the definition at Wikipedia.
Return to Glossary.)
of the word in that text. Note that you may have to allow pop-ups.
Try It: Try some of the other individual tools at docs.voyant-tools.org/tools
3.0 Using a Reading Skin
Voyant Tools can also be composed into "skins" that combine tools as
panels so that they can be used interactively. Here is the same
corpus in a simple skin.
In this skin clicking in one panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites.
Return to Glossary. will often (but not always) update
other panels. Try the following:
- Triggering: Click on words in the Cirrus word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text.
Return to Glossary..
Then click on a text in the Word Trends and play with the KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right).
Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb:
I.1/577.1 | Four nights will quickly dream away the time; | And
I.1/578.2 Swift as a shadow, short as any dream; | Brief as the
II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander,
III.2/591.1 this derision | Shall seem a dream and fruitless vision, |
IV.1/593.1 as the fierce vexation of a dream. | But first I will
IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The
IV.1/594.2 rare | vision. I have had a dream, past the wit of man to
IV.1/594.2 the wit of man to | say what dream it was: man is but an
IV.1/594.2 he go | about to expound this dream. Methought I was--there
IV.1/594.2 his heart to report, what my dream | was. I will get Peter
IV.1/594.2 to write a ballad of | this dream: it shall be called
IV.1/594.2 it shall be called Bottom's dream, | because it hath no
V.1/599.1 | Following darkness like a dream, | Now are frolic: not a
V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not
See also the definition at Wikipedia.
Return to Glossary..
- Changing Settings: Try changing the settings for the
Cirrus by clicking on the small gear icon. Try playing with the Word
Trends
- Showing and Hiding Panels: Try showing and hiding
panels using the small up and down arrows in the upper-right of the
panels.
When in doubt just restart the session by hitting refresh.
4.0 Using Voyant on You Own Text
Voyant Tools can be used on your own text or corpus. To do that you go to
the simple URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content.
For more information, see the Wikipedia.
Return to Glossary. for the tool:
Voyant: work.oyant-tools.org
Just the Cirrus tool in Voyant: work.voyant-tools.org/tool/Cirrus
You will get panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites.
Return to Glossary. that asks you for a text. You can provide:
- One or more URLs to texts on the web
- Upload a text or a zipped collection of texts
- Upload plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files.
Plain text files do not require a specialized program, such as a word processor, to read them.
For more information, see the Wikipedia.
Return to Glossary., HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers.
HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this:
< body >< p >< /p >< /body >
In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body.
Elements may also be modified by attributes and attribute values:
< p class="hangingindent" >
In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. , or XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data.
Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another:
< book >< title >< /title >< /book >
Elements may also be modified by attributes and attribute values:
< book format="hardcover" >
In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. texts
- Upload a PDF (and Voyant will try to extract the text)
- Paste in a text
Voyant is forgiving, but there are nonetheless issues (and bugs).
Note that you can create a persistent URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content.
For more information, see the Wikipedia.
Return to Glossary. for your corpus – that way your
link can be shared or bookmarked and you won't need to reload the texts
into Voyant. Click the save icon (disk icon) in the blue bar at the top and the first
URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content.
For more information, see the Wikipedia.
Return to Glossary. will be the link for your Voyant corpus.
5.0 Exploratory and Advanced Tools
Voyant Tools is conceived on the notion that text analysis in the
humanities is a practice of re-presenting the text, not
about producing incontrovertible evidence. Some Voyant Tools are more
about aesthetic or ludic aspects of experiencing digital texts, which can
directly or indirectly inspire observations that may not be otherwise
possible. Here are some examples:
Specialized Skins
At the same time, some tools are more advanced. For instance, one use a correspondence analysis skin that shows how terms map across
multiple documents, such as this view of the Humanist
Discussion Group listserv.
Voyant Tools also enables some quick-and-dirty social network analysis. This
is possible thanks to a process called named entity extraction (NER) that attempts
to automatically identify people, places and locations in a text (at the
moment Voyant Tools uses the Stanford Natural Language Processing package to
perform this automated process). It's worth emphasizing that automated
processes like these are subject to several issues and problems – for
instance, how to combine or differentiate between uses of first and/or last
names? how to tell if a same name refers to one or two different people?
What to do when an organization looks like a person's name (e.g. Johns
Hopkins)? Still, you can't beat the simplicity of Voyant Tools RezoViz,
especially when working with a mid-size corpus of shorter texts (5-50
articles, for instance). For instance, here
is a specialize interface showing connections between people mentioned
in emails to the Humanist listserv (RezoViz is in alpha and best experienced
in Chrome).
As always, the real strength of Voyant Tools is the ability to create your
own corpus – you can start at work.voyant-tools.org/tool/RezoViz.
6.0 Voyant as a Scholarly Tool
One of the essential design principles of Voyant Tools is that it tries to
be useful not just at the moment of analysis, but through more phases of
research. Here are some examples:
- as we've already seen, you can export a link to a corpus that can be
bookmarked, shared by email or Twitter, or otherwise preserved (as a
general rule, a corpus in Voyant will remain accessible as long as it
has been consulted at least once in the past month)
- there's built-in Zotero awareness – you can click on the
folder/article icon in the Firefox address bar to create a new entry
(though you may wish to complete some of the metadata)
- you can export data for other applications – for instance, produce
a tab-separated values view of a table that can be copy-and-pasted into
a spreadsheet application (where you can edit the data and produce even
more graphs, charts, etc.)
- you can embed a live tool in remote content (a blog post, a journal
article, a term paper, etc.), much as you would embed a YouTube clip –
the interactive affordances of Voyant allow you to go beyond static
screenshots and images and allow your users/readers to engage with the
content and data themselves, like the with the DH2012 abstracts
7.0 Other Stuff
Here are some other useful resources.
Other Tools:
- TAPoR 2.0 - Discover and comment on tools. For example, here are the Voyant Tools listed in TAPoR 2.0. Leave a comment on your favorite Voyant tool. Link to a project where you use it.
- TAPoRWare - Simple tools for processing plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files.
Plain text files do not require a specialized program, such as a word processor, to read them.
For more information, see the Wikipedia.
Return to Glossary., HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers.
HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this:
< body >< p >< /p >< /body >
In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body.
Elements may also be modified by attributes and attribute values:
< p class="hangingindent" >
In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. , and XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data.
Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another:
< book >< title >< /title >< /book >
Elements may also be modified by attributes and attribute values:
< book format="hardcover" >
In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary.
- CWRC
List of Visualization Tools
- DIRTROD
Other Corpora:
Other Voyant Skins: