This is a script for a workshop on using Voyant for the TCD community. It is available at http://hermeneuti.ca/node/222
1.0 Introduction
- Workshop leader and participants introduce themselves:
- Overview
Voyant is currently a beta release by Stéfan Sinclair and Geoffrey
Rockwell. It was previously called "Voyeur" so do not be confused if that name is used. Voyant is the next generation in a series of text analysis
tools that include HyperPo and TAPoRware. It provides tables and graphs
related to word use across a single document or a collection. Voyant
adds, among other things, the ability to handle much larger files than
the previous tools could. - Outline
In this workshop we will:
- First, look at how to use a single Voyant tool, Cirrus, with different texts.
- Then learn how to use the normal skin of Voyant with a single text and then a corpus.
- Finally, show how to load your own text into Voyant.
- Help
Remember that Voyant is a research tool and will often fail. There are multiple versions up if one server is down. If you need help, connect to Hermeneuti.ca and explore the resources there. Here are some useful links:
2.0 Using a single Voyant Tool: Cirrus
Voyant Tools has a number of different tools that can be composed into skins or used individually. We will start with just one tool called Cirrus that can then spawn other tools. We will try it with Mary Shelley's Frankenstein. Click on this link to open.
Cirrus (Frankenstein): http://voyant-tools.org/tool/Cirrus/?corpus=1332317356275.4528&query=&stopList=stop.en.taporware.txt&toolFlow=simple
For a backup go here: http://voyant-tools.org/tool/Cirrus and enter text http://www.gutenberg.org/cache/epub/84/pg84.txt
To learn more about the Cirrus tool go to http://hermeneuti.ca/voyeur/tools and scroll down.
The Cirrus tool shows you a word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text.
Return to Glossary. of high frequency words. Some questions to ask yourself:
- What words did you expect? What words are missing? What words are interesting.
- How does the tool arrange words and choose colours? Is there any correspondence between size and frequency?
Try It: Try clicking on a word. It will launch a second tab or window with the full Voyant reading environment. That's what we will look at next.
Try It: Now try other tools in Voyant. Go to http://hermeneuti.ca/voyeur/tools and experiment with the tools. Warning some of them are prototypes that won't work that well. You will need to give a URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content.
For more information, see the Wikipedia.
Return to Glossary. for text to them. You can use the Frankenstein from http://www.gutenberg.org/cache/epub/84/pg84.txt
3.0 Using a Reading Skin
Voyant Tools can also be composed into "skins" that combine tools as panels so that they can be used interactively. Here is the same Frankenstein text and an Austen corpus in a simple skin:
Frankenstein: http://voyant-tools.org/?corpus=1332317356275.4528&skin=simple&event=corpusTypeSelected
Austen (5 novels): http://voyeurtools.org/?corpus=JaneAusten&stopList=stop.en.taporware.txt (backup)
To learn about using the full Reading skin you can go to
In this skin clicking in one window will often (but not always) update other windows. Try the following:
- Triggering: Click on words in the Cirrus word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text.
Return to Glossary.. Then click on a text in the Word Trends and play with the KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right).
Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb:
I.1/577.1 | Four nights will quickly dream away the time; | And
I.1/578.2 Swift as a shadow, short as any dream; | Brief as the
II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander,
III.2/591.1 this derision | Shall seem a dream and fruitless vision, |
IV.1/593.1 as the fierce vexation of a dream. | But first I will
IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The
IV.1/594.2 rare | vision. I have had a dream, past the wit of man to
IV.1/594.2 the wit of man to | say what dream it was: man is but an
IV.1/594.2 he go | about to expound this dream. Methought I was--there
IV.1/594.2 his heart to report, what my dream | was. I will get Peter
IV.1/594.2 to write a ballad of | this dream: it shall be called
IV.1/594.2 it shall be called Bottom's dream, | because it hath no
V.1/599.1 | Following darkness like a dream, | Now are frolic: not a
V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not
See also the definition at Wikipedia.
Return to Glossary..
- Changing Settings: Try changing the settings for the Cirrus by clicking on the small gear icon. Try playing with the Word Trends
- Showing and Hiding Panels: Try showing and hiding panels using the small up and down arrows in the upper-right of the panels.
When in doubt just restart the session by hitting refresh.
4.0 Using Voyant on You Own Text
Voyant Tools can be used on your own text or corpus. To do that you go to the simple URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content.
For more information, see the Wikipedia.
Return to Glossary. for the tool:
Voyant: http://voyant-tools.org
Just the Cirrus tool in Voyant: http://voyant-tools.org/tool/Cirrus/
Backup older version: http://voyeur.hermeneuti.ca
You will get panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites.
Return to Glossary. that asks you for a text. You can provide:
- One or more URLs to texts on the web
- Upload a text or a zipped collection of texts
- Upload plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files.
Plain text files do not require a specialized program, such as a word processor, to read them.
For more information, see the Wikipedia.
Return to Glossary., HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers.
HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this:
< body >< p >< /p >< /body >
In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body.
Elements may also be modified by attributes and attribute values:
< p class="hangingindent" >
In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. , or XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data.
Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another:
< book >< title >< /title >< /book >
Elements may also be modified by attributes and attribute values:
< book format="hardcover" >
In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. texts
- Upload a PDF (and Voyant will try to extract the text)
Voyant is forgiving, but there are none-the-less bugs.
5.0 Other Stuff
Here are some different corpora and skins for specialized tools:
6.0 For Next Week
To understand the power and limitations of text analysis it is useful to use Voyant on your own text. Please try it over the week:
- Find or assemble a text of your own.
- Try studying it with Voyant.
- What works and what doesn't? Document your problems and questions.
- What would you like to ask of your text but can't? What sort of tool would you like?
Next week we will discuss what works and doesn't; we will look at advanced features; and discuss other tools.
Finding Texts:
Aggregating and Cleaning Texts:
7.0 Second Workshop
This second workshop will be less structured. We will do the following:
- Participants who had a chance to experiment with Voyant can report back.
- Discussion of problems and desires for Voyant.
- Exporting Voyant results. Demonstration of how you might put Voyant results into Excel.
- Getting an image result. Demonstration of getting a PNG for a trend graph.
- Placing a Voyant PanelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites.
Return to Glossary.. Demonstration of placing in TADA here http://tada.mcmaster.ca/Main/VoyantTest
- Content Analysis - what can you do? Demonstration of Globe Work.
Other Tools
What other tools are there out there? See TAPoR 2.0 beta for a growing list of tools.
Other Stuff