Workshops
Others:
1.0 Basic Introduction to Text Analysis
This is a script for a workshop on using Voyant for learning about text analysis. It is available at http://hermeneuti.ca/node/244
0.0 Before the Workshop
Here are some of the things you might want to do before the workshop on Voyant:
- Read an introduction to text analysis like The Measured Words (distributed before the camp.)
- Review this workshop outline, follow links and review some of the help materials.
- Prepare a text of your own
to try with Voyant. To start you might find a novel-length text that
you are familiar with and which interests you. Save it as a plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files.
Plain text files do not require a specialized program, such as a word processor, to read them.
For more information, see the Wikipedia.
Return to Glossary.
file somewhere where you can get at it during the workshop (on your
laptop) or online.
- Bring a laptop with wireless to use in the workshop.
1.0 Introduction
- Workshop leader and participants introduce themselves:
- Overview
Voyant
is currently a beta release by Stéfan Sinclair and Geoffrey
Rockwell. It was previously called "Voyeur" so do not be confused if
that name is used. Voyant is the next generation in a series of text
analysis
tools that include HyperPo and TAPoRware. It provides tables and graphs
related to word use across a single document or a collection. Voyant
adds, among other things, the ability to handle much larger files than
the previous online tools could. - Outline
In this workshop we will:
- First, look at how to use a single Voyant tool, Cirrus, with different texts.
- Then learn how to use the normal skin of Voyant with a single text and then a corpus.
- Finally, show how to load your own text into Voyant.
- Help
Remember
that Voyant is a research tool and will often fail, especially when a whole group of people use it at once. There are multiple
versions up if one server is down. If you need help, connect to Hermeneuti.ca and explore the resources there. Here are some useful links:
- Voyant Tools
- Individual Voyant tool descriptions and links - docs.voyant-tools.org/tools
- The main URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content.
For more information, see the Wikipedia.
Return to Glossary. for Voyant Tools in http://voyant-tools.org/, though there are other URLs that can be used:Links for this workshop will look like this: http://bit.ly/VoyantShakespeare [main, beta]
where the first link resolves automatically (use it whenever possible)
and the subsequent links provide backup. Please note that corpora for
this workshop are located on all three servers, but if you load a corpus
it will only be availble on the server where it was uploaded.
2.0 Using a single Voyant Tool: Cirrus
Voyant Tools has a number of different tools that can be composed into skins or used individually. We will start with just one tool called Cirrus that can then spawn other tools. We will try it with Mary Shelley's Frankenstein. Click on this link to open.
Cirrus (Frankenstein): http://bit.ly/VoyantCirrusFrankenstein [main, beta]
Alternatively you can go to the tool: http://voyant-tools.org/tool/Cirrus and load the text http://www.gutenberg.org/cache/epub/84/pg84.txt
To learn more about the Cirrus tool go to http://docs.voyant-tools.org/tools and scroll down to read about Cirrus. Or go to TAPoR 2.0 and read a review.
The Cirrus tool shows you a word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text.
Return to Glossary. of high frequency words. Some questions to ask yourself:
- What words did you expect? What words are missing? What words are interesting.
- How does the tool arrange words and choose colours? Is there any correspondence between size and frequency?
Try It:
Try clicking on a word. It will launch a second tab or window with the
full Voyant reading environment. That's what we will look at next.
Try It: Now try other tools in Voyant. Go to http://docs.voyant-tools.org/tools and experiment with the tools. Warning
some of them are prototypes that won't work that well. You will need to
give a URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content.
For more information, see the Wikipedia.
Return to Glossary. for text to them. You can use the Frankenstein from http://www.gutenberg.org/cache/epub/84/pg84.txt
3.0 Using a Reading Skin
Voyant
Tools can also be composed into "skins" that combine tools as panels so
that they can be used interactively. Here is the same Frankenstein text
and an Austen corpus in a simple skin:
Frankenstein: http://bit.ly/VoyantFrankensteinStop [main, beta]
For a corpus see Austen (5 novels): http://bit.ly/VoyantAustenStop [main, beta]
To learn about using the full Reading skin you can go to
In this skin clicking in one window will often (but not always) update other windows. Try the following:
- Triggering: Click on words in the Cirrus word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text.
Return to Glossary.. Then click on a text in the Word Trends and play with the KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right).
Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb:
I.1/577.1 | Four nights will quickly dream away the time; | And
I.1/578.2 Swift as a shadow, short as any dream; | Brief as the
II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander,
III.2/591.1 this derision | Shall seem a dream and fruitless vision, |
IV.1/593.1 as the fierce vexation of a dream. | But first I will
IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The
IV.1/594.2 rare | vision. I have had a dream, past the wit of man to
IV.1/594.2 the wit of man to | say what dream it was: man is but an
IV.1/594.2 he go | about to expound this dream. Methought I was--there
IV.1/594.2 his heart to report, what my dream | was. I will get Peter
IV.1/594.2 to write a ballad of | this dream: it shall be called
IV.1/594.2 it shall be called Bottom's dream, | because it hath no
V.1/599.1 | Following darkness like a dream, | Now are frolic: not a
V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not
See also the definition at Wikipedia.
Return to Glossary..
- Changing Settings: Try changing the settings for the Cirrus by clicking on the small gear icon. Try playing with the Word Trends
- Showing and Hiding Panels: Try showing and hiding panels using the small up and down arrows in the upper-right of the panels.
When in doubt just restart the session by hitting refresh.
4.0 Using Voyant on You Own Text
Voyant Tools can be used on your own text or corpus. To do that you go to the simple URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content.
For more information, see the Wikipedia.
Return to Glossary. for the tool:
http://voyant-tools.org/ [main, beta]
Just the Cirrus tool in Voyant: http://voyant-tools.org/tool/Cirrus/
You will get panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites.
Return to Glossary. that asks you for a text. You can provide:
- One or more URLs to texts on the web
- Upload a text or a zipped collection of texts
- Upload plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files.
Plain text files do not require a specialized program, such as a word processor, to read them.
For more information, see the Wikipedia.
Return to Glossary., HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers.
HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this:
< body >< p >< /p >< /body >
In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body.
Elements may also be modified by attributes and attribute values:
< p class="hangingindent" >
In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. , or XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data.
Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another:
< book >< title >< /title >< /book >
Elements may also be modified by attributes and attribute values:
< book format="hardcover" >
In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. texts
- Upload a PDF (and Voyant will try to extract the text)
Voyant is forgiving, but there are none-the-less bugs.
5.0 Other Stuff
Here are some links to other tools, different corpora and skins for specialized tools:
6.0 For After the Workshop
To
understand the power and limitations of text analysis it is useful to
use Voyant on your own text.
- Find or assemble a text of your own.
- Try studying it with Voyant.
- What would you like to ask of your text but can't? What sort of tool would you like?
Finding Texts:
Aggregating and Cleaning Texts:
7.0 Other Tools
What other tools are there out there? See TAPoR 2.0 for a growing list of tools.
2.0 Text Analysis Methods Workshop
This is a script for a workshop on using Voyant and TAPoR for a graduate class on research methods.
The script can be found at http://hermeneuti.ca/node/254/
1.0 Introduction
- Overview
This workshop will quickly introduce you to computer assisted text analysis using Voyant and TAPoR. Voyant is currently a beta release by Stéfan Sinclair and Geoffrey Rockwell and is the next generation in a series of text analysis tools that include HyperPo and TAPoRware. TAPoR is web site for the discovery and review of text analysis tools including those in Voyant. - Outline
In this workshop we will:
- First, look at how to use a single Voyant tool, Cirrus, with different texts.
- Then learn how to use the normal skin of Voyant with a single text and then a corpus.
- Then learn how to load your own text into Voyant.
- Finally, we will look at TAPoR where you can find other tools.
- Help
Remember that the tools entered in TAPoR like Voyant are research tools and will often fail, especially when a whole group of people use it at once. There are multiple versions up if one server is down. If you need help, connect to Hermeneuti.ca and explore the resources there. Here are some useful links:
- Voyant Tools
2.0 Preparing a text for a question
The first step in text analysis is to assemble a text to fit your question(s). What do you want to ask about? What sort of text would help you ask questions about an issue? How can you use the internet to build a text?
For this workshop lets assemble a text off the internet.
- Decide on some aspect of popular culture or computing culture well documented on the internet.
- Google keywords associated with the subject you want to study.
- Skim the results and then develop selection criteria for what you want to scrape.
- Scrape a set of texts using Google.
- Copy and paste the texts into a text file. Clean out the navigation information and irrelevant parts.
- Export a text file for text analysis.
For more see Appendix 1: Finding and Preparing an Electronic Text
3.0 Using a single Voyant Tool: Cirrus
Voyant Tools has a number of different tools that can be composed into skins or used individually. We will start with just one tool called Cirrus that can then spawn other tools.
Go to the Cirrus tool and load up your text: http://voyant-tools.org/tool/Cirrus and load the text.
There are a number of ways to load a text. You can provide:
- One or more URLs to texts on the web
- Upload a text or a zipped collection of texts
- Upload plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files.
Plain text files do not require a specialized program, such as a word processor, to read them.
For more information, see the Wikipedia.
Return to Glossary., HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers.
HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this:
< body >< p >< /p >< /body >
In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body.
Elements may also be modified by attributes and attribute values:
< p class="hangingindent" >
In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. , or XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data.
Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another:
< book >< title >< /title >< /book >
Elements may also be modified by attributes and attribute values:
< book format="hardcover" >
In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. texts
- Upload a PDF (and Voyant will try to extract the text)
To learn more about the Cirrus tool go to http://docs.voyant-tools.org/tools and scroll down to read about Cirrus. Or go to TAPoR 2.0 and read a review.
You can see Cirrus with a text like "Frankenstein" here: http://bit.ly/VoyantCirrusFrankenstein [temp, main, beta]
The Cirrus tool shows you a word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text.
Return to Glossary. of high frequency words. Some questions to ask yourself:
- What words did you expect? What words are missing? What words are interesting.
- How does the tool arrange words and choose colours? Is there any correspondence between size and frequency?
Try It: Try clicking on a word. It will launch a second tab or window with the full Voyant reading environment. That's what we will look at next.
Try It: Now try other tools in Voyant. Go to http://docs.voyant-tools.org/tools and experiment with the tools. Warning some of them are prototypes that won't work that well. Try your text in different tools.
4.0 Using a Reading Skin
Voyant Tools can also be composed into "skins" that combine tools as panels so that they can be used interactively. Here is the same Frankenstein text and an Austen corpus in a simple skin:
Go to Voyant and load your text into the Reading Skin: http://voyant-tools.org
If you want to see a text in the Reading Skin you can look at Frankenstein: http://bit.ly/VoyantFrankensteinStop [temp, main, beta]
For a corpus see Austen (5 novels): http://bit.ly/VoyantAustenStop [temp, main, beta]
To learn about using the full Reading skin you can go to
In this skin clicking in one window will often (but not always) update other windows. Try the following:
- Triggering: Click on words in the Cirrus word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text.
Return to Glossary.. Then click on a text in the Word Trends and play with the KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right).
Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb:
I.1/577.1 | Four nights will quickly dream away the time; | And
I.1/578.2 Swift as a shadow, short as any dream; | Brief as the
II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander,
III.2/591.1 this derision | Shall seem a dream and fruitless vision, |
IV.1/593.1 as the fierce vexation of a dream. | But first I will
IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The
IV.1/594.2 rare | vision. I have had a dream, past the wit of man to
IV.1/594.2 the wit of man to | say what dream it was: man is but an
IV.1/594.2 he go | about to expound this dream. Methought I was--there
IV.1/594.2 his heart to report, what my dream | was. I will get Peter
IV.1/594.2 to write a ballad of | this dream: it shall be called
IV.1/594.2 it shall be called Bottom's dream, | because it hath no
V.1/599.1 | Following darkness like a dream, | Now are frolic: not a
V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not
See also the definition at Wikipedia.
Return to Glossary..
- Changing Settings: Try changing the settings for the Cirrus by clicking on the small gear icon. Try playing with the Word Trends
- Showing and Hiding Panels: Try showing and hiding panels using the small up and down arrows in the upper-right of the panels.
When in doubt just restart the session by hitting refresh.
5.0 Other Stuff
Here are some links to other tools, different corpora and skins for specialized tools:
6.0 More Information
Finding Texts:
Aggregating and Cleaning Texts:
7.0 Other Tools
What other tools are there out there? See TAPoR 2.0 for a growing list of tools.
CWRCshop2 (Ryerson): Using Voyant for Analyzing Texts
This is a script for a workshop on using Voyant for the CWRC community. It is available at http://hermeneuti.ca/workshops/cwrc2
1.0 Introduction
- The workshop leaders will introduce themselves:
- Overview
Voyant is currently a beta release by Stéfan Sinclair and Geoffrey Rockwell. It was previously called "Voyeur" so do not be confused if that name is used. Voyant is the next generation in a series of text analysis tools that include HyperPo and TAPoRware. It provides tables and graphs related to word use across a single document or a collection. Voyant adds, among other things, the ability to handle much larger files than the previous tools could. - Outline
In this workshop we will:
- First, look at how to use a single Voyant tool, Cirrus, with a small corpus of Austen texts.
- Then learn how to use the normal "skin" (multi-tool interface) of Voyant with a single text.
- Finally, show how to load your own text into Voyant.
- Now make sure you can connect to the wireless.
- Help
If you need help, connect to Hermeneuti.ca and explore the resources there. Here are some useful links:
2.0 Using a single Voyant Tool: Cirrus
Voyant Tools has a number of different tools that can be composed into skins or used individually. We will start with just one tool called Cirrus that can then spawn other tools. We will try it with Jane Austen's Persuasion.
Cirrus (Austen's Persuasion): http://voyeurtools.org/tool/Cirrus/?corpus=JaneAusten&docIndex=5&stopList=stop.en.taporware.txt&toolFlow=simple (backup)
The Cirrus tool shows you a word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text.
Return to Glossary. of high frequency words. Some questions to ask yourself:
- What words did you expect? What words are missing? What words are interesting?
- How does the tool arrange words and choose colours? Is there any correspondence between size and frequency?
Here are some more Cirrus visualizations to consider:
These types of word clouds are prevalent from academia to advertising – they quickly provide an intriguing representation of a text, as demonstrated by this example of studying gendered languages in toy advertising. But they're ability to rapidly convey a picture with words comes at the cost of information reduction, and some are highly critical of word clouds as hermeneutical tools. What do you think?
Try It: Try clicking on a word. It will launch a second tab or window with a list of the texts in the corpus with the frequency of the word you clicked on.
Try It: Now try double-clicking on one of the texts. This should launch another tab or window with a Key Word In ContextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph.
Context is particularly important when generating a concordance for a string.
Return to Glossary. (KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right).
Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb:
I.1/577.1 | Four nights will quickly dream away the time; | And
I.1/578.2 Swift as a shadow, short as any dream; | Brief as the
II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander,
III.2/591.1 this derision | Shall seem a dream and fruitless vision, |
IV.1/593.1 as the fierce vexation of a dream. | But first I will
IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The
IV.1/594.2 rare | vision. I have had a dream, past the wit of man to
IV.1/594.2 the wit of man to | say what dream it was: man is but an
IV.1/594.2 he go | about to expound this dream. Methought I was--there
IV.1/594.2 his heart to report, what my dream | was. I will get Peter
IV.1/594.2 to write a ballad of | this dream: it shall be called
IV.1/594.2 it shall be called Bottom's dream, | because it hath no
V.1/599.1 | Following darkness like a dream, | Now are frolic: not a
V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not
See also the definition at Wikipedia.
Return to Glossary.) of the word in that text.
3.0 Using a Reading Skin
Voyant Tools can also be composed into "skins" that combine tools as panels so that they can be used interactively. Here is the same Austen corpus in a simple skin:
http://voyeurtools.org/?corpus=JaneAusten&stopList=stop.en.taporware.txt (backup)
In this skin clicking in one window will often (but not always) update other windows. Try the following:
- Triggering: Click on words in the Cirrus word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text.
Return to Glossary.. Then click on a text in the Word Trends and play with the KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right).
Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb:
I.1/577.1 | Four nights will quickly dream away the time; | And
I.1/578.2 Swift as a shadow, short as any dream; | Brief as the
II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander,
III.2/591.1 this derision | Shall seem a dream and fruitless vision, |
IV.1/593.1 as the fierce vexation of a dream. | But first I will
IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The
IV.1/594.2 rare | vision. I have had a dream, past the wit of man to
IV.1/594.2 the wit of man to | say what dream it was: man is but an
IV.1/594.2 he go | about to expound this dream. Methought I was--there
IV.1/594.2 his heart to report, what my dream | was. I will get Peter
IV.1/594.2 to write a ballad of | this dream: it shall be called
IV.1/594.2 it shall be called Bottom's dream, | because it hath no
V.1/599.1 | Following darkness like a dream, | Now are frolic: not a
V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not
See also the definition at Wikipedia.
Return to Glossary..
- Changing Settings: Try changing the settings for the Cirrus by clicking on the small gear icon. Try playing with the Word Trends
- Showing and Hiding Panels: Try showing and hiding panels using the small up and down arrows in the upper-right of the panels.
When in doubt just restart the session by hitting refresh.
4.0 Using Voyant on You Own Text
Voyant Tools can be used on your own text or corpus. To do that you go to the simple URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content.
For more information, see the Wikipedia.
Return to Glossary. for the tool:
Voyant: http://voyeurtools.org
Just the Cirrus tool in Voyant: http://voyeurtools.org/tool/Cirrus/
Backup version: http://beta.voyant-tools.org/
You will get panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites.
Return to Glossary. that asks you for a text. You can provide:
- One or more URLs to texts on the web
- Upload a text or a zipped collection of texts
- Upload plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files.
Plain text files do not require a specialized program, such as a word processor, to read them.
For more information, see the Wikipedia.
Return to Glossary., HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers.
HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this:
< body >< p >< /p >< /body >
In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body.
Elements may also be modified by attributes and attribute values:
< p class="hangingindent" >
In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. , or XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data.
Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another:
< book >< title >< /title >< /book >
Elements may also be modified by attributes and attribute values:
< book format="hardcover" >
In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. texts
- Upload a PDF (and Voyant will try to extract the text)
Voyant is forgiving, but there are none-the-less bugs.
Note that you can create a persistent URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content.
For more information, see the Wikipedia.
Return to Glossary. for your corpus – that way your link can be shared or bookmarked and you won't need to reload the texts into Voyant. Click the save icon in the blue bar at the top and the first URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content.
For more information, see the Wikipedia.
Return to Glossary. will be the link for your Voyant corpus.
5.0 Other Stuff
- Other Voyant Tools:
- Other Voyant Skins:
- Other Tools
CWRCshop: Using Voyant for Analyzing Texts
This is a script for a workshop on using Voyant for the CWRC community. It is available at http://hermeneuti.ca/node/211
1.0 Introduction
- The workshop leaders will introduce themselves:
- Geoffrey Rockwell, University of Alberta, geoffrey (dot) rockwell (at) ualberta (dot) ca, http://www.geoffreyrockwell.com
- Susan Brown, University of Alberta, University of Guelph, sbrown (at) uoguelph (dot) ca
- Overview
Voyant is currently a beta release by Stéfan Sinclair and Geoffrey
Rockwell. It was previously called "Voyeur" so do not be confused if that name is used. Voyant is the next generation in a series of text analysis
tools that include HyperPo and TAPoRware. It provides tables and graphs
related to word use across a single document or a collection. Voyant
adds, among other things, the ability to handle much larger files than
the previous tools could. - Outline
In this workshop we will:
- First, look at how to use a single Voyant tool, Cirrus, with a small corpus of Austen texts.
- Then learn how to use the normal skin of Voyant with a single text.
- Finally, show how to load your own text into Voyant.
- Now make sure you can connect to the wireless.
- Help
If you need help, connect to Hermeneuti.ca and explore the resources there. Here are some useful links:
2.0 Using a single Voyant Tool: Cirrus
Voyant Tools has a number of different tools that can be composed into skins or used individually. We will start with just one tool called Cirrus that can then spawn other tools. We will try it with Mary Shelley's Frankenstein. Click on this link to open.
Cirrus (Frankenstein): http://dev.voyeurtools.org:8080/tool/Cirrus/?corpus=1317355585427.2492&stopList=stop.en.taporware.txt
For a backup go here: http://voyeur.hermeneuti.ca/tool/Cirrus/ and enter text http://www.gutenberg.org/cache/epub/84/pg84.txt
The Cirrus tool shows you a word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text.
Return to Glossary. of high frequency words. Some questions to ask yourself:
- What words did you expect? What words are missing? What words are interesting.
- How does the tool arrange words and choose colours? Is there any correspondence between size and frequency?
Try It: Try clicking on a word. It will launch a second tab or window with a list of the texts in the corpus with the frequency of the word you clicked on.
Try It: Now try double-clicking on one of the texts. This should launch another tab or window with a Key Word In ContextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph.
Context is particularly important when generating a concordance for a string.
Return to Glossary. (KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right).
Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb:
I.1/577.1 | Four nights will quickly dream away the time; | And
I.1/578.2 Swift as a shadow, short as any dream; | Brief as the
II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander,
III.2/591.1 this derision | Shall seem a dream and fruitless vision, |
IV.1/593.1 as the fierce vexation of a dream. | But first I will
IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The
IV.1/594.2 rare | vision. I have had a dream, past the wit of man to
IV.1/594.2 the wit of man to | say what dream it was: man is but an
IV.1/594.2 he go | about to expound this dream. Methought I was--there
IV.1/594.2 his heart to report, what my dream | was. I will get Peter
IV.1/594.2 to write a ballad of | this dream: it shall be called
IV.1/594.2 it shall be called Bottom's dream, | because it hath no
V.1/599.1 | Following darkness like a dream, | Now are frolic: not a
V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not
See also the definition at Wikipedia.
Return to Glossary.) of the word in that text.
3.0 Using a Reading Skin
Voyant Tools can also be composed into "skins" that combine tools as panels so that they can be used interactively. Here is the same Austen corpus in a simple skin:
Frankenstein: http://dev.voyeurtools.org:8080/?corpus=1317355585427.2492&skin=simple&event=corpusTypeSelected
In this skin clicking in one window will often (but not always) update other windows. Try the following:
- Triggering: Click on words in the Cirrus word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text.
Return to Glossary.. Then click on a text in the Word Trends and play with the KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right).
Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb:
I.1/577.1 | Four nights will quickly dream away the time; | And
I.1/578.2 Swift as a shadow, short as any dream; | Brief as the
II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander,
III.2/591.1 this derision | Shall seem a dream and fruitless vision, |
IV.1/593.1 as the fierce vexation of a dream. | But first I will
IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The
IV.1/594.2 rare | vision. I have had a dream, past the wit of man to
IV.1/594.2 the wit of man to | say what dream it was: man is but an
IV.1/594.2 he go | about to expound this dream. Methought I was--there
IV.1/594.2 his heart to report, what my dream | was. I will get Peter
IV.1/594.2 to write a ballad of | this dream: it shall be called
IV.1/594.2 it shall be called Bottom's dream, | because it hath no
V.1/599.1 | Following darkness like a dream, | Now are frolic: not a
V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not
See also the definition at Wikipedia.
Return to Glossary..
- Changing Settings: Try changing the settings for the Cirrus by clicking on the small gear icon. Try playing with the Word Trends
- Showing and Hiding Panels: Try showing and hiding panels using the small up and down arrows in the upper-right of the panels.
When in doubt just restart the session by hitting refresh.
4.0 Using Voyant on You Own Text
Voyant Tools can be used on your own text or corpus. To do that you go to the simple URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content.
For more information, see the Wikipedia.
Return to Glossary. for the tool:
Voyant: http://voyeurtools.org
Just the Cirrus tool in Voyant: http://voyeurtools.org/tool/Cirrus/
Backup older version: http://voyeur.hermeneuti.ca
You will get panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites.
Return to Glossary. that asks you for a text. You can provide:
- One or more URLs to texts on the web
- Upload a text or a zipped collection of texts
- Upload plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files.
Plain text files do not require a specialized program, such as a word processor, to read them.
For more information, see the Wikipedia.
Return to Glossary., HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers.
HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this:
< body >< p >< /p >< /body >
In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body.
Elements may also be modified by attributes and attribute values:
< p class="hangingindent" >
In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. , or XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data.
Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another:
< book >< title >< /title >< /book >
Elements may also be modified by attributes and attribute values:
< book format="hardcover" >
In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. texts
- Upload a PDF (and Voyant will try to extract the text)
Voyant is forgiving, but there are none-the-less bugs.
5.0 Other Stuff
Here are some corpora and skins:
DH 2011 Visualization for Literary History (TAPoRware and Voyeur)
This is an outline for a workshop on visualization with Voyeur. It is based on a workshop given at DH 2010 in London, England.
1.0 Introduction
- The workshop leaders will introduce themselves:
- Susan Brown, University of Alberta, University of Guelph, sbrown (at) uoguelph (dot) ca
- Geoffrey Rockwell, University of Alberta, geoffrey (dot) rockwell (at) ualberta (dot) ca, http://www.geoffreyrockwell.com
- Stan Ruecker, University of Alberta, sruecker (at) ualberta (dot) ca, http://www.ualberta.ca/~sruecker/
- Stéfan Sinclair, McMaster University, sgs (at) mcmaster (dot) ca, http://stefansinclair.name/
- Overview
Voyeur is currently a beta release by Stéfan Sinclair and Geoffrey
Rockwell. Voyeur is the next generation in a series of text analysis
tools that include HyperPo and TAPoRware. It provides tables and graphs
related to word use across a single document or a collection. Voyeur
adds, among other things, the ability to handle much larger files than
the previous tools could.
- First, we will look at how to use Voyeur with a single text, and examine some of the visualizations that are possible.
- Then we will learn how to use Voyeur with a corpus.
- You will also have the opportunity to try Voyeur on your own corpus, if you have one.
- Finally, we will examine some of the more advanced features provided by Voyeur.
- Now make sure you can connect to the wireless
- Connect to Hermeneuti.ca and explore the resources there. Here are some useful links:
Here is a list of links for the Visualization for Literary History:
Here are the tools to try for the full Voyeur interface:
Now lets try the full text again in the full Voyeur:
For a list of tools see: http://entry.tapor.ca
DH 2011 Visualization for Literary History (Visualization with Voyeur)
This is an outline for a workshop on visualization with Voyeur. It is based on a workshop given at DH 2010 in London, England.
1.0 Introduction
- The workshop leaders will introduce themselves:
- Geoffrey Rockwell, University of Alberta, geoffrey (dot) rockwell (at) ualberta (dot) ca, http://www.geoffreyrockwell.com
- Stan Ruecker, University of Alberta, sruecker (at) ualberta (dot) ca, http://www.ualberta.ca/~sruecker/
- Susan Brown, University of Alberta, University of Guelph, sbrown (at) uoguelph (dot) ca
- Stéfan Sinclair, McMaster University, sgs (at) mcmaster (dot) ca, http://stefansinclair.name/
- Overview
Voyeur is currently a beta release by Stéfan Sinclair and Geoffrey
Rockwell. Voyeur is the next generation in a series of text analysis
tools that include HyperPo and TAPoRware. It provides tables and graphs
related to word use across a single document or a collection. Voyeur
adds, among other things, the ability to handle much larger files than
the previous tools could.
- First, we will look at how to use Voyeur with a single text, and examine some of the visualizations that are possible.
- Then we will learn how to use Voyeur with a corpus.
- You will also have the opportunity to try Voyeur on your own corpus, if you have one.
- Finally, we will examine some of the more advanced features provided by Voyeur.
- Now make sure you can connect to the wireless
- Connect to Hermeneuti.ca and explore the resources there. Here are some useful links:
2.0 Visualizing a Single Text
In
the first part of the Workshop we will show you how to use Voyeur to visualize a single text as a way of learning the interface. We will work
with the Introduction, Preface, Chapter 1 and Chapter 2 of Mary
Shelley's Frankenstein. The plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files.
Plain text files do not require a specialized program, such as a word processor, to read them.
For more information, see the Wikipedia.
Return to Glossary. is here:
http://taporware.ualberta.ca/sampleDocs/plainText.txt - This is just a couple of chapters
http://www.gutenberg.org/cache/epub/84/pg84.txt - This is the Gutenberg version of the full text
In order to focus on each tool independently, will open each Voyeur tool separately.
- First we will look at Cirrus: http://voyeurtools.org/tool/Cirrus/
- Cirrus is a
visualization tool that displays a word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text.
Return to Glossary. relating to the frequency
of words appearing in one or more documents. One can click on any word
appearing in the cloud to obtain detailed information about its
relativity. The larger the word, the more frequent the term.
- Show how to load a text by copying one of the Frankenstein URLs into the "Add Texts" box
- Show how hovering over the words reveals a number showing the word count of the current word in the corpus.
- Show how clicking on a word produces a textual set of results as a list on a new page. These results include a count, a relative count, and a trend graph.
- Next we will look at Links: http://voyeurtools.org/tool/Links/
- Links finds collocates
for words and displays links between them using a force directed graph.
It shows term frequencies in proximity to keyword. It is a visualization
and shows a web of terms. Once you arrive to Links,
insert / upload your content and let the tool perform its analysis. You
will be presented with a web type visualization. You may hover over
words to find data pertaining to that word within your corpus. You may
also double-click on any word to find a more detailed analysis. Clicking
and dragging allows you to organize your corpus. If there are multiple
documents within the corpus, they will be coloured differently.
- Load a text by copying one of the Frankenstein URLs into the "Add Texts" box
- If you hover over a term, Voyeur will tell you its linkage within the
corpus documents.
- Try dragging and dropping terms to organize them.
- If you would like to manipulate the visualization, right-click on any of
the terms and choose 'Stick/unstick' or 'Remove'. 'Stick/unstick' puts
the term in place, and is not moved when other terms are moved. 'Remove'
simply removes the term from the visualization.
- Clicking on the options button (the button that looks like a gear) will
launch a dialog box with various options pertaining to the Links tool.
Stop words list is if you would like to exclude words from the
visualization. (Usually words such as 'a', 'the', and 'and'.) 'NodeA node in a graph is the basic unit of data from which a graph can be constructed.
In text analysis using a hypergraph, nodes connect to other nodes. Each node represents a word, and nodes touching where words are found in conjunction with one another in the source text.
For more information on nodes, see the Wikipedia.
Return to Glossary. size
determined by type frequency' is the default, and will result in
sorting by how often the term appears in the documents. Sorting by 'NodeA node in a graph is the basic unit of data from which a graph can be constructed.
In text analysis using a hypergraph, nodes connect to other nodes. Each node represents a word, and nodes touching where words are found in conjunction with one another in the source text.
For more information on nodes, see the Wikipedia.
Return to Glossary.
links' will result in terms appearing larger if they are heavily linked
with other terms. 'Autofit graph on screen' sizes the graph depending
on the size of your browser window. 'Remove orphans' will remove terms
which are not linked to any other term in the visualization.
Now we will look at Word Trends http://voyeurtools.org/tool/TypeFrequenciesChart/
- Term Frequencies Chart
shows how terms are distributed across document(s) in a corpus
(documents are shown in the order in which they were added).
Every charted lineA line is the string of text limited by the width of a page.
Lines are often used in tokenization, and may contain parts of one or more sentences. For example
"The quick brown fox jumps over the lazy dog."
is a complete sentence and occurs on one line. By contrast,
"Hard by a great forest dwelt a poor wood-cutter with his wife and his
two children. The boy was called Hansel and the girl Gretel. He had little
to bite and to break, and once when great dearth fell on the land, he
could no longer procure even daily bread."
spans three sentences and four lines.
Return to Glossary. represents one word common throughout the entire
corpus. If you hover over specific points it will give you specific
information about that word in a specific document.
- When you add analyze a corpus with Term Frequencies Grid, you will
initially have common words at the top of the chart with colour codes.
You will see lines within the graph which are coloured accordingly to
those words. If you click on one of the terms at the top, it will omit
that term from the graph.
- When we hover over the segment points, we can see the frequency of that
term in that segment. If you click on the
point, Voyeur will open a new window with detailed information of that
segment and term within its Document KWICs tool.
If you click and drag on a section of the chart it will zoom in to
that section. To reset the chart to its original state, click on “reset
zoom”.
If you would like to see less or more segments on the chart, simply
click on “Segments” at the bottom left of the chart to choose the
desired segments.
Other Things
- We will look at how how to get help (Mention Quick Guide)
- Some things to try:
- Experiment with the Options (like the Stop Word list)
- Create a Favorites list for a theme and and explore that list
- Search for phrases
3.0 Analyzing a Corpus
In
the second part of the Workshop we will look at working with a corpus
or collection of many texts. We will use Voyeur on the archives of
HUMANIST from 1987 to 2008 (21 documents.) The Voyeur index is at:
http://voyeurtools.org/?corpus=humanist
- Bubblelines is a
visualization tool that helps to understand patterns of word repetition
in one or more documents. Each document is represented as a horizontal
lineA line is the string of text limited by the width of a page.
Lines are often used in tokenization, and may contain parts of one or more sentences. For example
"The quick brown fox jumps over the lazy dog."
is a complete sentence and occurs on one line. By contrast,
"Hard by a great forest dwelt a poor wood-cutter with his wife and his
two children. The boy was called Hansel and the girl Gretel. He had little
to bite and to break, and once when great dearth fell on the land, he
could no longer procure even daily bread."
spans three sentences and four lines.
Return to Glossary. and each seach term is represented as a bubble – the bubble
represents the frequency of the term in the corresponding segment of
text (the text is divided into segments of equal length). The larger the
bubble, the more frequent the term.
- Load a text by copying one of the Frankenstein URLs into the "Add Texts" box
Hovering over a bubble, or set of bubbles, will cause a box to appear
that displays the frequency counts for that segment of text.
Similarly, hovering over the number at the end of the lineA line is the string of text limited by the width of a page.
Lines are often used in tokenization, and may contain parts of one or more sentences. For example
"The quick brown fox jumps over the lazy dog."
is a complete sentence and occurs on one line. By contrast,
"Hard by a great forest dwelt a poor wood-cutter with his wife and his
two children. The boy was called Hansel and the girl Gretel. He had little
to bite and to break, and once when great dearth fell on the land, he
could no longer procure even daily bread."
spans three sentences and four lines.
Return to Glossary. will cause
a box to appear that summarizes the frequency for the entire document.
When Bubblelines first loads a corpus, you may see terms that have
been pre-selected and included in the URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content.
For more information, see the Wikipedia.
Return to Glossary. or embedded page. If no terms
are specified, Bubblelines automatically fetches the five most frequent
terms and displays bubbles based on those.
-
You can remove the default terms by clicking on the "Clear Terms" button.
You can add additional terms to be displayed using the "Find Term"
box. Note that available terms will appear as you type and you can pick
an item from the list to have it added.
In addition to adding and removing terms, you can toggle the display
of the terms that have been loaded. To do so simply click on the term
(active terms are underlined).
- ScatterPlot creates a scatter plot graph of terms, spaced by their variation from one another. Once you arrive to ScatterPlot,
insert / upload your content and let the tool perform its analysis. You
may hover over these dots and click on them for more information.
When you first load ScatterPlot, you will see a variety of terms
plotted on a graph. If you hover over the terms, you will see their
variation explained by each component on the x and y axis. If you click
on any of these terms, it will bring you to the Document KWICsA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right).
Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb:
I.1/577.1 | Four nights will quickly dream away the time; | And
I.1/578.2 Swift as a shadow, short as any dream; | Brief as the
II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander,
III.2/591.1 this derision | Shall seem a dream and fruitless vision, |
IV.1/593.1 as the fierce vexation of a dream. | But first I will
IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The
IV.1/594.2 rare | vision. I have had a dream, past the wit of man to
IV.1/594.2 the wit of man to | say what dream it was: man is but an
IV.1/594.2 he go | about to expound this dream. Methought I was--there
IV.1/594.2 his heart to report, what my dream | was. I will get Peter
IV.1/594.2 to write a ballad of | this dream: it shall be called
IV.1/594.2 it shall be called Bottom's dream, | because it hath no
V.1/599.1 | Following darkness like a dream, | Now are frolic: not a
V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not
See also the definition at Wikipedia.
Return to Glossary. tool for
further analysis.
ScatterPlot offers options for changing the plot. The terms button
allows you choose how many terms should be displayed. The dimensions
button lets you switch between a two or three dimensional graph. Toggle
labels simply removes or adds labels for the terms on the graph.
- Some other things to try:
- Set stoplists. You may want to exclude common words. To do this, click on the "Options" button, represented by a gear icon in the upper-right.
- Manage multiple documents.
- Show how to group results
- Show comparing document
- Try looking for trends yourself using the different tools
4.0 Using your own text
- Now you can try your own text. There are different ways of providing Voyeur a text:
- Typing a text or pasting it in
- Typing in one or more URLs (as we have done above)
- Uploading a text, using the "upload" button
- For uploading, there are a number of formats of texts that will work:
- file formats: text, HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers.
HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this:
< body >< p >< /p >< /body >
In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body.
Elements may also be modified by attributes and attribute values:
< p class="hangingindent" >
In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. , XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data.
Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another:
< book >< title >< /title >< /book >
Elements may also be modified by attributes and attribute values:
< book format="hardcover" >
In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. , RSS, TEI, PDF, MS Word, RTF
- Finally, we will discuss caching and so on.
5.0 Exporting Data and Quoting Analytics
We will now show how to export data and quote analytical results:
- How to export tab-separated values, copy and pasted into Excel
- How to export of XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data.
Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another:
< book >< title >< /title >< /book >
Elements may also be modified by attributes and attribute values:
< book format="hardcover" >
In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. results from KWICsA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right).
Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb:
I.1/577.1 | Four nights will quickly dream away the time; | And
I.1/578.2 Swift as a shadow, short as any dream; | Brief as the
II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander,
III.2/591.1 this derision | Shall seem a dream and fruitless vision, |
IV.1/593.1 as the fierce vexation of a dream. | But first I will
IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The
IV.1/594.2 rare | vision. I have had a dream, past the wit of man to
IV.1/594.2 the wit of man to | say what dream it was: man is but an
IV.1/594.2 he go | about to expound this dream. Methought I was--there
IV.1/594.2 his heart to report, what my dream | was. I will get Peter
IV.1/594.2 to write a ballad of | this dream: it shall be called
IV.1/594.2 it shall be called Bottom's dream, | because it hath no
V.1/599.1 | Following darkness like a dream, | Now are frolic: not a
V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not
See also the definition at Wikipedia.
Return to Glossary. (for instance)
- How to quote an analytical result in TADA.
- Go to http://tada.mcmaster.ca/Sandbox/VoyeurWorkshop to try it yourself.
6.0 Advanced and Other
- There are other beta tools in Voyeur that can be accessed:
7.0 To Prepare
- Make sure we have Voyeur running with a backup
- Sort out how participants can get on wireless
- Powerbars for laptops
- Preindex texts
DH 2011 Voyeur Tools
This outline is for a workshop offered at the Digital Humanities 2011 conference at Stanford.
Please note that the main server for Voyeur Tools (voyeurtools.org) may be inaccessible so we have created a backup installation (dev.voyeurtools.org:8080). They should function very similarly, but the corpora loaded into the development server may not be accessible after the workshop.
1.0 Introduction
- The workshop leaders will introduce themselves:
- What will happen?
- What is text analysis? An introduction using individual Voyeur tools.
- Distant Reading: Analyzing a single text
- Distant Reading: Analyzing a corpus
- Connect to Hermeneuti.ca and explore the resources there. Here are some useful links:
2.0 Introduction to text analysis using individual Voyeur tools
After introductions we will show you how to use individual tools in Voyeur to analyze a single text as a way of thinking about techniques in text analysis. We will work
with Mary
Shelley's Frankenstein, the Humanist discussion list corpus and a collection of Austen novels. The plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files.
Plain text files do not require a specialized program, such as a word processor, to read them.
For more information, see the Wikipedia.
Return to Glossary. to Frankenstein is here: http://www.gutenberg.org/cache/epub/84/pg84.txt - This is the Gutenberg version of the full text
Here are the tools we will try:
- Cirrus (with Austen): http://voyeur.hermeneuti.ca/tool/Cirrus/?corpus=1308408654248.9846&stopList=stop.en.taporware.txt
- Word Trends (with Humanist): http://dev.voyeurtools.org:8080/tool/TypeFrequenciesChart/?corpus=humanist
- Links (Collocates): http://dev.voyeurtools.org:8080/tool/Links/?corpus=1308459917755.5623&mode=document&stopList=stop.en.taporware.txt
We will discuss the standard controls for a panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites.
Return to Glossary. and how you can cite and embed panels (with their texts).
3.0 Distant Reading: Analyzing a Single Text
In
the third part of the Workshop we will show you how to use Voyeur to
analyze a single text as a way of learning the interface.
- We will open Voyeur:
- Show how to load a text (including XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data.
Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another:
< book >< title >< /title >< /book >
Elements may also be modified by attributes and attribute values:
< book format="hardcover" >
In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. options)
- Show the different panels that appear initially
- Discuss the order they open and the Summary panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites.
Return to Glossary.
- Go over the Words in the Entire Corpus panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites.
Return to Glossary. (Options, Columns, Search, Favorites)
- Discuss the full set of panels
- Show how to manage panels
- Discuss trigger order of panels (flow within Voyeur)
- Show how to get help (Mention Quick Guide)
- Show how to make a list of favorite words to explore searching for words and saving in favorites
- Now you should try Voyeur with your text or the Frankenstein text above. To open the Frankenstein click here:
http://dev.voyeurtools.org:8080/tool/Cirrus/?corpus=1308459917755.5623&stopList=stop.en.taporware.txt
- Some things to try:
- Experiment with the Options (like the Stop Word list)
- Create a Favorites list for a theme and and explore that list
- Search for phrases
4.0 Distant Reading: Analyzing a Corpus with Correspondence Analysis
In
the fourth part of the Workshop we will look at working with a corpus using a different skin and the Correspondence Analysis tool. We will use Voyeur on the archives of
HUMANIST from 1987 to 2008 (21 documents.)
We will use the Humanist Corpus with a different skin or arrangement of panels:
http://dev.voyeurtools.org:8080/?corpus=humanist&skin=scatter&stopList=stop.en.taporware.txt
We will discuss how to use the Correspondence Analysis tool to explore themes in a diachronic corpus. For more on CA see http://stefansinclair.name/correspondence-analysis
Some of the features to look at:
- Controlling the visualization (labels, words, etc.)
- Using the list of words (selecting multiple words)
- Controlling panels
5.0 Using your own text
- Now you can try your own text. We will show the different ways of providing Voyeur a text:
- Typing a text or pasting it in
- Typing in one or more URLs
- Uploading a text
- We will then discuss the formats of texts that will work, and what will happen to them:
- file formats: text, HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers.
HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this:
< body >< p >< /p >< /body >
In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body.
Elements may also be modified by attributes and attribute values:
< p class="hangingindent" >
In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. , XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data.
Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another:
< book >< title >< /title >< /book >
Elements may also be modified by attributes and attribute values:
< book format="hardcover" >
In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. , RSS, TEI, PDF, MS Word, RTF
- Finally we will Discuss caching and so on
Try your own text now.
6.0 Exporting Data and Quoting Analytics
There are different ways to export data and quote analytical results:
- You can export tab-separated values, copy and pasted into Excel
- You can export of XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data.
Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another:
< book >< title >< /title >< /book >
Elements may also be modified by attributes and attribute values:
< book format="hardcover" >
In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. results from KWICsA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right).
Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb:
I.1/577.1 | Four nights will quickly dream away the time; | And
I.1/578.2 Swift as a shadow, short as any dream; | Brief as the
II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander,
III.2/591.1 this derision | Shall seem a dream and fruitless vision, |
IV.1/593.1 as the fierce vexation of a dream. | But first I will
IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The
IV.1/594.2 rare | vision. I have had a dream, past the wit of man to
IV.1/594.2 the wit of man to | say what dream it was: man is but an
IV.1/594.2 he go | about to expound this dream. Methought I was--there
IV.1/594.2 his heart to report, what my dream | was. I will get Peter
IV.1/594.2 to write a ballad of | this dream: it shall be called
IV.1/594.2 it shall be called Bottom's dream, | because it hath no
V.1/599.1 | Following darkness like a dream, | Now are frolic: not a
V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not
See also the definition at Wikipedia.
Return to Glossary. (for instance)
- You can embed live tool snippets (in a blog post, TADA, etc.)
7.0 Wrap-Up
- other aspects: skins, tool browser
- how to give feedback
- future: Voyeur Notebooks, new TAPoR
- thanks!
DH2010 Introduction to Voyeur
This is an outline for a workshop on Voyeur. It was developed for a workshop before DH 2010 in London, England.
1.0 Introduction
- The workshop leaders will introduce themselves:
- What will happen?
- how to use Voyeur with a single text
- how to use Voyeur with a corpus
- try Voyeur on your corpus
- concluding remarks on advanced features
- Now make sure you can connect to the wireless
- Connect to Hermeneuti.ca and explore the resources there. Here are some useful links:
2.0 Analyzing a Single Text
In the first part of the Workshop we will show you how to use Voyeur to analyze a single text as a way of learning the interface. We will work with the Introduction, Preface, Chapter 1 and Chapter 2 of Mary Shelley's Frankenstein. The plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files.
Plain text files do not require a specialized program, such as a word processor, to read them.
For more information, see the Wikipedia.
Return to Glossary. is here:
http://taporware.ualberta.ca/sampleDocs/plainText.txt - This is just a couple of chapters
http://www.gutenberg.org/cache/epub/84/pg84.txt - This is the Gutenberg version of the full text
- We will open Voyeur:
- Show how to load a text
- Show the different panels that appear initially
- Discuss the order they open and the Summary panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites.
Return to Glossary.
- Go over the Words in the Entire Corpus panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites.
Return to Glossary. (Options, Columns, Search, Favorites)
- Discuss the full set of panels
- Show how to manage panels
- Discuss trigger order of panels (flow within Voyeur)
- Show how to get help (Mention Quick Guide)
- Show how to make a list of favorite words to explore searching for words and saving in favorites
- Now you should try Voyeur with your text or the Frankenstein text above. To open the Frankenstein click here:
http://voyeurtools.org/?corpus=1278409278561.646
- Some things to try:
- Experiment with the Options (like the Stop Word list)
- Create a Favorites list for a theme and and explore that list
- Search for phrases
3.0 Analyzing a Corpus
In the second part of the Workshop we will look at working with a corpus or collection of many texts. We will use Voyeur on the archives of HUMANIST from 1987 to 2008 (21 documents.) The Voyeur index is at:
http://voyeurtools.org/?corpus=humanist
- We will show you how to:
- Show how to set various options, like stoplists
- Show how to hide and show columns
- Manage multiple documents
- Show how to group results
- Show comparing documents
- Try looking for trends yourself
4.0 Using your own text
- Now you can try your own text. We will show the different ways of providing Voyeur a text:
- Typing a text or pasting it in
- Typing in one or more URLs
- Uploading a text
- We will then discuss the formats of texts that will work, and what will happen to them:
- file formats: text, HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers.
HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this:
< body >< p >< /p >< /body >
In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body.
Elements may also be modified by attributes and attribute values:
< p class="hangingindent" >
In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. , XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data.
Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another:
< book >< title >< /title >< /book >
Elements may also be modified by attributes and attribute values:
< book format="hardcover" >
In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. , RSS, TEI, PDF, MS Word, RTF
- Finally we will Discuss caching and so on
- Now try your own text.
5.0 Exporting Data and Quoting Analytics
We will now show how to export data and quote analytical results:
- How to export tab-separated values, copy and pasted into Excel
- How to export of XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data.
Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another:
< book >< title >< /title >< /book >
Elements may also be modified by attributes and attribute values:
< book format="hardcover" >
In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. results from KWICsA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right).
Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb:
I.1/577.1 | Four nights will quickly dream away the time; | And
I.1/578.2 Swift as a shadow, short as any dream; | Brief as the
II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander,
III.2/591.1 this derision | Shall seem a dream and fruitless vision, |
IV.1/593.1 as the fierce vexation of a dream. | But first I will
IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The
IV.1/594.2 rare | vision. I have had a dream, past the wit of man to
IV.1/594.2 the wit of man to | say what dream it was: man is but an
IV.1/594.2 he go | about to expound this dream. Methought I was--there
IV.1/594.2 his heart to report, what my dream | was. I will get Peter
IV.1/594.2 to write a ballad of | this dream: it shall be called
IV.1/594.2 it shall be called Bottom's dream, | because it hath no
V.1/599.1 | Following darkness like a dream, | Now are frolic: not a
V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not
See also the definition at Wikipedia.
Return to Glossary. (for instance)
- How to quote an analytical result in TADA.
- Go to http://tada.mcmaster.ca/Sandbox/VoyeurWorkshop to try it yourself.
6.0 Advanced and Other
- There are other beta tools in Voyeur that can be accessed:
7.0 To Prepare
- Make sure we have Voyeur running with a backup
- Sort out how participants can get on wireless
- Powerbars for laptops
- What texts will we use?
- Preindex texts and create a Workshop web page on Hermeneuti.c
DH2012 Workshop in Hamburg
This is a script for a workshop on using Voyant Toos for the
Digital Humanities 2012 Conference in Hamburg.
1.0 Introduction
- Workshop leaders: Stéfan
Sinclair, McGill University and Geoffrey Rockwell
- Overview
Voyant Tools is a web-based environment for reading and analyzing
digital texts, created Stéfan Sinclair and Geoffrey Rockwell. It was
previously called "Voyeur" so don't be confused if that name is used.
Voyant is the next generation in a series of text analysis tools that
include HyperPo and TAPoRware. It provides tables and graphs related to
word use across a single document or a collection. Voyant adds, among
other things, the ability to handle much larger files than the previous
tools could. Voyant is actually a suite of modular tools that can be
combined in pre-defined or user-defined combinations called skins. This
workshop's primary objectives are to better understand how and why one
might use Voyant Tools to help in the study of digital texts.
- Outline
In this workshop we will:
- First, look at how to use a single Voyant tool, Cirrus, with a
small multilingual corpus.
- Then learn how to use the normal "skin" (multi-tool interface) of
Voyant with a single text.
- Show how to load your own text(s) into Voyant.
- Look at some of the more exploratory and advanced tools available
in Voyant, such a Bubbles and Correspondence Analysis.
- Discuss the use of Voyant Tools in a larger research process
(embedding tools in remote content, etc.)
- Help
Here are some useful links:
N.B. Voyant Tools is in beta, it has warts and blemishes. Always view
what you're looking at with some circumspection and if something doesn't
work as expected, assume it's a bug, not something that you're
misunderstanding (and please tell us about it).
2.0 Using a single Voyant Tool: Cirrus
Voyant Tools has a number of different tools
that can be composed into skins or used individually. We will start with
just one tool called Cirrus that can
then spawn other tools. We will try it with the English version of the Universal Declaration of Human Rights.
http://work.voyant-tools.org/tool/Cirrus/?corpus=unhr&docIndex=0&stopList=stop.en.taporware.txt&toolFlow=simple
The Cirrus tool shows
you a word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text.
Return to Glossary. of high frequency words. Some questions to ask yourself:
- What words did you expect? What words are missing? What words are
interesting?
- How does the tool arrange words and choose colours? Is there any
correspondence between size and frequency?
Here are some more Cirrus visualizations to consider:
These types of word clouds are prevalent from academia to advertising –
they quickly provide an intriguing representation of a text, as
demonstrated by this example of studying
gendered languages in toy advertising. But they're ability to
rapidly convey a picture with words comes at the cost of information
reduction, and some
are highly critical of word clouds as hermeneutical tools. What do
you think?
These Cirrus visualizations don't show all top frequency words, so-called stopwords are missing – stopwords are function words (like determiners and prepositions)
that typically carry less meaning. What to include in a stopword list is a matter of interpretation and purpose. Are numbers (like "one") important? What about words like "against"?
A new feature in Voyant Tools is the ability to set and edit stopword lists. To do so, click on the options (gear) icon and then click on "Edit Stop Words".
Try It: Try clicking on a word. It will launch a second
tab or window with a list of the texts in the corpus with the frequency of
the word you clicked on.
Try It: Now try double-clicking on one of the texts.
This should launch another tab or window with a Key Word In ContextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph.
Context is particularly important when generating a concordance for a string.
Return to Glossary. (KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right).
Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb:
I.1/577.1 | Four nights will quickly dream away the time; | And
I.1/578.2 Swift as a shadow, short as any dream; | Brief as the
II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander,
III.2/591.1 this derision | Shall seem a dream and fruitless vision, |
IV.1/593.1 as the fierce vexation of a dream. | But first I will
IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The
IV.1/594.2 rare | vision. I have had a dream, past the wit of man to
IV.1/594.2 the wit of man to | say what dream it was: man is but an
IV.1/594.2 he go | about to expound this dream. Methought I was--there
IV.1/594.2 his heart to report, what my dream | was. I will get Peter
IV.1/594.2 to write a ballad of | this dream: it shall be called
IV.1/594.2 it shall be called Bottom's dream, | because it hath no
V.1/599.1 | Following darkness like a dream, | Now are frolic: not a
V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not
See also the definition at Wikipedia.
Return to Glossary.)
of the word in that text. Note that you may have to allow pop-ups.
Try It: Try some of the other individual tools at docs.voyant-tools.org/tools
3.0 Using a Reading Skin
Voyant Tools can also be composed into "skins" that combine tools as
panels so that they can be used interactively. Here is the same
corpus in a simple skin.
In this skin clicking in one panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites.
Return to Glossary. will often (but not always) update
other panels. Try the following:
- Triggering: Click on words in the Cirrus word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text.
Return to Glossary..
Then click on a text in the Word Trends and play with the KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right).
Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb:
I.1/577.1 | Four nights will quickly dream away the time; | And
I.1/578.2 Swift as a shadow, short as any dream; | Brief as the
II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander,
III.2/591.1 this derision | Shall seem a dream and fruitless vision, |
IV.1/593.1 as the fierce vexation of a dream. | But first I will
IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The
IV.1/594.2 rare | vision. I have had a dream, past the wit of man to
IV.1/594.2 the wit of man to | say what dream it was: man is but an
IV.1/594.2 he go | about to expound this dream. Methought I was--there
IV.1/594.2 his heart to report, what my dream | was. I will get Peter
IV.1/594.2 to write a ballad of | this dream: it shall be called
IV.1/594.2 it shall be called Bottom's dream, | because it hath no
V.1/599.1 | Following darkness like a dream, | Now are frolic: not a
V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not
See also the definition at Wikipedia.
Return to Glossary..
- Changing Settings: Try changing the settings for the
Cirrus by clicking on the small gear icon. Try playing with the Word
Trends
- Showing and Hiding Panels: Try showing and hiding
panels using the small up and down arrows in the upper-right of the
panels.
When in doubt just restart the session by hitting refresh.
4.0 Using Voyant on You Own Text
Voyant Tools can be used on your own text or corpus. To do that you go to
the simple URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content.
For more information, see the Wikipedia.
Return to Glossary. for the tool:
Voyant: work.oyant-tools.org
Just the Cirrus tool in Voyant: work.voyant-tools.org/tool/Cirrus
You will get panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites.
Return to Glossary. that asks you for a text. You can provide:
- One or more URLs to texts on the web
- Upload a text or a zipped collection of texts
- Upload plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files.
Plain text files do not require a specialized program, such as a word processor, to read them.
For more information, see the Wikipedia.
Return to Glossary., HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers.
HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this:
< body >< p >< /p >< /body >
In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body.
Elements may also be modified by attributes and attribute values:
< p class="hangingindent" >
In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. , or XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data.
Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another:
< book >< title >< /title >< /book >
Elements may also be modified by attributes and attribute values:
< book format="hardcover" >
In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. texts
- Upload a PDF (and Voyant will try to extract the text)
- Paste in a text
Voyant is forgiving, but there are nonetheless issues (and bugs).
Note that you can create a persistent URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content.
For more information, see the Wikipedia.
Return to Glossary. for your corpus – that way your
link can be shared or bookmarked and you won't need to reload the texts
into Voyant. Click the save icon (disk icon) in the blue bar at the top and the first
URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content.
For more information, see the Wikipedia.
Return to Glossary. will be the link for your Voyant corpus.
5.0 Exploratory and Advanced Tools
Voyant Tools is conceived on the notion that text analysis in the
humanities is a practice of re-presenting the text, not
about producing incontrovertible evidence. Some Voyant Tools are more
about aesthetic or ludic aspects of experiencing digital texts, which can
directly or indirectly inspire observations that may not be otherwise
possible. Here are some examples:
Specialized Skins
At the same time, some tools are more advanced. For instance, one use a correspondence analysis skin that shows how terms map across
multiple documents, such as this view of the Humanist
Discussion Group listserv.
Voyant Tools also enables some quick-and-dirty social network analysis. This
is possible thanks to a process called named entity extraction (NER) that attempts
to automatically identify people, places and locations in a text (at the
moment Voyant Tools uses the Stanford Natural Language Processing package to
perform this automated process). It's worth emphasizing that automated
processes like these are subject to several issues and problems – for
instance, how to combine or differentiate between uses of first and/or last
names? how to tell if a same name refers to one or two different people?
What to do when an organization looks like a person's name (e.g. Johns
Hopkins)? Still, you can't beat the simplicity of Voyant Tools RezoViz,
especially when working with a mid-size corpus of shorter texts (5-50
articles, for instance). For instance, here
is a specialize interface showing connections between people mentioned
in emails to the Humanist listserv (RezoViz is in alpha and best experienced
in Chrome).
As always, the real strength of Voyant Tools is the ability to create your
own corpus – you can start at work.voyant-tools.org/tool/RezoViz.
6.0 Voyant as a Scholarly Tool
One of the essential design principles of Voyant Tools is that it tries to
be useful not just at the moment of analysis, but through more phases of
research. Here are some examples:
- as we've already seen, you can export a link to a corpus that can be
bookmarked, shared by email or Twitter, or otherwise preserved (as a
general rule, a corpus in Voyant will remain accessible as long as it
has been consulted at least once in the past month)
- there's built-in Zotero awareness – you can click on the
folder/article icon in the Firefox address bar to create a new entry
(though you may wish to complete some of the metadata)
- you can export data for other applications – for instance, produce
a tab-separated values view of a table that can be copy-and-pasted into
a spreadsheet application (where you can edit the data and produce even
more graphs, charts, etc.)
- you can embed a live tool in remote content (a blog post, a journal
article, a term paper, etc.), much as you would embed a YouTube clip –
the interactive affordances of Voyant allow you to go beyond static
screenshots and images and allow your users/readers to engage with the
content and data themselves, like the with the DH2012 abstracts
7.0 Other Stuff
Here are some other useful resources.
Other Tools:
- TAPoR 2.0 - Discover and comment on tools. For example, here are the Voyant Tools listed in TAPoR 2.0. Leave a comment on your favorite Voyant tool. Link to a project where you use it.
- TAPoRWare - Simple tools for processing plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files.
Plain text files do not require a specialized program, such as a word processor, to read them.
For more information, see the Wikipedia.
Return to Glossary., HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers.
HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this:
< body >< p >< /p >< /body >
In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body.
Elements may also be modified by attributes and attribute values:
< p class="hangingindent" >
In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. , and XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data.
Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another:
< book >< title >< /title >< /book >
Elements may also be modified by attributes and attribute values:
< book format="hardcover" >
In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary.
- CWRC
List of Visualization Tools
- DIRTROD
Other Corpora:
Other Voyant Skins:
Dublin 2011: From Metadata to Linked Data
This workshop outline is for a Summer School at Trinity College Dublin. See http://dho.ie/summerschool2011 for the full description. This outline is for Day 3 on Generating Textual Data:
Day 3: Generating Textual Data, Tobias Blanke and Geoffrey Rockwell
Based on the results of Day II, participants will dig deeper into the details of generating textual data using text and data mining techniques. Participants will learn methods to algorithmically create textual data while critically evaluating existing tools, methods, and solutions as well as their future potential. They will gain insights on how generic services need to be modified to serve the needs of humanities research. Finally, we will investigate how to generate output can be reused in the emerging web of data.
This is an outline for a workshop on Voyeur. It was developed for a workshop before DH 2010 in London, England.
1.0 Introduction
- The workshop leaders will introduce themselves:
- What will happen?
- What is text analysis? A very short introduction.
- Voyeur tools: Using simple tools in Voyeur
- Distant Reading: Analyzing a single text
- Distant Reading: Analyzing a corpus
Other things to try:- Trying Voyeur on your corpus
- Trying different Voyeur tools like the Open Callais and exporting results
- Conclusion: Stepping back and looking again at what text analysis is through a brief historical review.
- Connect to Hermeneuti.ca and explore the resources there. Here are some useful links:
2.0 TAPoRware: A Simple Recipe for Studying Themes in a Text
In the second part of the Workshop we will show you how to use TAPoRware to analyze a single text as a way of thinking about techniques in text analysis. We will work
with the Introduction, Preface, Chapter 1 and Chapter 2 of Mary
Shelley's Frankenstein. The plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files.
Plain text files do not require a specialized program, such as a word processor, to read them.
For more information, see the Wikipedia.
Return to Glossary. is here:
http://taporware.ualberta.ca/sampleDocs/plainText.txt - This is just a couple of chapters
http://www.gutenberg.org/cache/epub/84/pg84.txt - This is the Gutenberg version of the full text
We will also be using some TAPoRware tools and Recipes for TAPoRware. The Tools and Recipes are here:
List Words: http://taporware.ualberta.ca/~taporware/textTools/listword.shtml - Use short Frankenstein
Concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word.
Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends.
See the Wikipedia entry on Concordance (Publishing)
Return to Glossary. Tool: http://taporware.ualberta.ca/~taporware/textTools/findtext.shtml - Use short Frankenstein
Weighted Centroid: http://taporware.ualberta.ca/~taporware/otherTools/wcentroid.shtml - Use short Frankenstein
Principal Component Analysis: http://taporware.ualberta.ca/~taporware/betaTools/pca.shtml - Use short Frankenstein
2.2 Using Voyeur Simple Tools
Cirrus Word Cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text.
Return to Glossary. (Frankenstein): http://dev.voyeurtools.org:8080/tool/Cirrus/?corpus=1309937516546.6692&query=&stopList=stop.en.taporware.txt
Cirrus Word Cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text.
Return to Glossary. (Austen): http://voyeur.hermeneuti.ca/tool/Cirrus/?corpus=1308408654248.9846&stopList=stop.en.taporware.txt
Other tools from Voyeur can be found here: http://hermeneuti.ca/voyeur/tools
3.0 Distant Reading: Analyzing a Single Text
In
the third part of the Workshop we will show you how to use Voyeur to
analyze a single text as a way of learning the interface.
- We will open Voyeur:
- Show how to load a text (Frankenstein: http://www.gutenberg.org/cache/epub/84/pg84.txt). Discuss different types of texts that can be loaded.
- Show the different panels that appear initially
- Discuss the order they open and the Summary panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites.
Return to Glossary.
- Discuss common features to panels
- Go over the Words in the Entire Corpus panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites.
Return to Glossary. (Options, Columns, Search, Favorites)
- Show how to manage panels
- Discuss trigger order of panels (flow within Voyeur)
- Show how to get help (Mention Quick Guide)
- Show how to make a list of favorite words to explore searching for words and saving in favorites
- Now you should try Voyeur with your text or the Frankenstein text above. To open the Frankenstein click here:
http://voyeur.hermeneuti.ca/?corpus=1309937028026.8131
- Some things to try:
- Experiment with the Options (like the Stop Word list)
- Create a Favorites list for a theme and and explore that list
- Search for phrases
4.0 Distant Reading: Analyzing a Corpus
In
the fourth part of the Workshop we will look at working with a corpus
or collection of many texts. We will use Voyeur on the archives of
HUMANIST from 1987 to 2008 (21 documents.) The Voyeur index is at:
http://voyeurtools.org/?corpus=humanist&skin=scatter&stopList=stop.en.taporware.txt
- We will discuss:
- Different skins with different panels
- Correspondence analysis and the exploration of a large corpus
- Try looking for trends yourself
5.0 Using your own text
- Now you can try your own text. We will show the different ways of providing Voyeur a text:
- Typing a text or pasting it in
- Typing in one or more URLs
- Uploading a text
- We will then discuss the formats of texts that will work, and what will happen to them:
- file formats: text, HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers.
HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this:
< body >< p >< /p >< /body >
In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body.
Elements may also be modified by attributes and attribute values:
< p class="hangingindent" >
In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. , XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data.
Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another:
< book >< title >< /title >< /book >
Elements may also be modified by attributes and attribute values:
< book format="hardcover" >
In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. , RSS, TEI, PDF, MS Word, RTF
- Finally we will Discuss caching and so on
- Now try your own text.
6.0 Exporting Data and Quoting Analytics
We will now show how to export data and quote analytical results:
- How to export tab-separated values, copy and pasted into Excel
- How to export of XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data.
Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another:
< book >< title >< /title >< /book >
Elements may also be modified by attributes and attribute values:
< book format="hardcover" >
In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. results from KWICsA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right).
Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb:
I.1/577.1 | Four nights will quickly dream away the time; | And
I.1/578.2 Swift as a shadow, short as any dream; | Brief as the
II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander,
III.2/591.1 this derision | Shall seem a dream and fruitless vision, |
IV.1/593.1 as the fierce vexation of a dream. | But first I will
IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The
IV.1/594.2 rare | vision. I have had a dream, past the wit of man to
IV.1/594.2 the wit of man to | say what dream it was: man is but an
IV.1/594.2 he go | about to expound this dream. Methought I was--there
IV.1/594.2 his heart to report, what my dream | was. I will get Peter
IV.1/594.2 to write a ballad of | this dream: it shall be called
IV.1/594.2 it shall be called Bottom's dream, | because it hath no
V.1/599.1 | Following darkness like a dream, | Now are frolic: not a
V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not
See also the definition at Wikipedia.
Return to Glossary. (for instance)
- How to quote an analytical result in TADA.
- Show going to http://tada.mcmaster.ca/Sandbox/VoyeurWorkshop to insert a panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites.
Return to Glossary..
7.0 Skinning Voyeur
We will now look at how you can develop a different skin.
- Open a corpus like http://voyeurtools.org/?corpus=1309931394540.8106
- Click on the Export button (the disk button in upper right) and export to layout builder
- Drag panels into the blank area to create a custom skin (Warning: many combinations won't work)
MLA 2013: TAPoR and Voyant Workshop for DHCommons
This is a script for a workshop on using Voyant and TAPoR for the MLA 2013: DHCommons Get Started in Digital Humanities.
The script can be found at http://hermeneuti.ca/node/250/
0.0 Before the Workshop
Here are some of the things you might want to do before the workshop on Voyant:
- Review this workshop outline, follow links and review some of the help materials.
- Prepare a text of your own
to try with Voyant. To start you might find a novel-length text that
you are familiar with and which interests you. Save it as a plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files.
Plain text files do not require a specialized program, such as a word processor, to read them.
For more information, see the Wikipedia.
Return to Glossary.
file somewhere where you can get at it during the workshop (on your
laptop) or online.
- Bring a laptop with wireless to use in the workshop.
1.0 Introduction
- Workshop leader and participants introduce themselves:
- Geoffrey Rockwell, University of Alberta, geoffrey (dot) rockwell (at) ualberta (dot) ca
- Overview
This workshop will quickly introduce you to computer assisted text analysis using Voyant and TAPoR. Voyant
is currently a beta release by Stéfan Sinclair and Geoffrey
Rockwell and is the next generation in a series of text
analysis
tools that include HyperPo and TAPoRware. TAPoR is web site for the discovery and review of text analysis tools including those in Voyant. - Outline
In this workshop we will:
- First, look at how to use a single Voyant tool, Cirrus, with different texts.
- Then learn how to use the normal skin of Voyant with a single text and then a corpus.
- Then learn how to load your own text into Voyant.
- Finally, we will look at TAPoR where you can find other tools.
- Help
Remember
that the tools entered in TAPoR like Voyant are research tools and will often fail, especially when a whole group of people use it at once. There are multiple
versions up if one server is down. If you need help, connect to Hermeneuti.ca and explore the resources there. Here are some useful links:
- Voyant Tools
- Individual Voyant tool descriptions and links can be found at docs.voyant-tools.org/tools
- The main URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content.
For more information, see the Wikipedia.
Return to Glossary. for Voyant Tools in http://voyant-tools.org/, though there are other URLs that can be used:Links for this workshop will look like this: http://bit.ly/VoyantShakespeare [temp, main, beta]
where the first link resolves automatically (use it whenever possible)
and the subsequent links provide backup. Please note that corpora for
this workshop are located on all three servers, but if you load a corpus
it will only be availble on the server where it was uploaded.
2.0 Using a single Voyant Tool: Cirrus
Voyant Tools has a number of different tools that can be composed into skins or used individually. We will start with just one tool called Cirrus that can then spawn other tools. We will try it with Mary Shelley's Frankenstein. Click on this link to open.
Cirrus (Frankenstein): http://bit.ly/VoyantCirrusFrankenstein [temp, main, beta]
Alternatively you can go to the tool: http://voyant-tools.org/tool/Cirrus and load the text http://www.gutenberg.org/cache/epub/84/pg84.txt
To learn more about the Cirrus tool go to http://docs.voyant-tools.org/tools and scroll down to read about Cirrus. Or go to TAPoR 2.0 and read a review.
The Cirrus tool shows you a word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text.
Return to Glossary. of high frequency words. Some questions to ask yourself:
- What words did you expect? What words are missing? What words are interesting.
- How does the tool arrange words and choose colours? Is there any correspondence between size and frequency?
Try It:
Try clicking on a word. It will launch a second tab or window with the
full Voyant reading environment. That's what we will look at next.
Try It: Now try other tools in Voyant. Go to http://docs.voyant-tools.org/tools and experiment with the tools. Warning
some of them are prototypes that won't work that well. You will need to
give a URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content.
For more information, see the Wikipedia.
Return to Glossary. for text to them. You can use the Frankenstein from http://www.gutenberg.org/cache/epub/84/pg84.txt
3.0 Using a Reading Skin
Voyant
Tools can also be composed into "skins" that combine tools as panels so
that they can be used interactively. Here is the same Frankenstein text
and an Austen corpus in a simple skin:
Frankenstein: http://bit.ly/VoyantFrankensteinStop [temp, main, beta]
For a corpus see Austen (5 novels): http://bit.ly/VoyantAustenStop [temp, main, beta]
To learn about using the full Reading skin you can go to
In this skin clicking in one window will often (but not always) update other windows. Try the following:
- Triggering: Click on words in the Cirrus word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text.
Return to Glossary.. Then click on a text in the Word Trends and play with the KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right).
Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb:
I.1/577.1 | Four nights will quickly dream away the time; | And
I.1/578.2 Swift as a shadow, short as any dream; | Brief as the
II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander,
III.2/591.1 this derision | Shall seem a dream and fruitless vision, |
IV.1/593.1 as the fierce vexation of a dream. | But first I will
IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The
IV.1/594.2 rare | vision. I have had a dream, past the wit of man to
IV.1/594.2 the wit of man to | say what dream it was: man is but an
IV.1/594.2 he go | about to expound this dream. Methought I was--there
IV.1/594.2 his heart to report, what my dream | was. I will get Peter
IV.1/594.2 to write a ballad of | this dream: it shall be called
IV.1/594.2 it shall be called Bottom's dream, | because it hath no
V.1/599.1 | Following darkness like a dream, | Now are frolic: not a
V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not
See also the definition at Wikipedia.
Return to Glossary..
- Changing Settings: Try changing the settings for the Cirrus by clicking on the small gear icon. Try playing with the Word Trends
- Showing and Hiding Panels: Try showing and hiding panels using the small up and down arrows in the upper-right of the panels.
When in doubt just restart the session by hitting refresh.
4.0 Using Voyant on You Own Text
Voyant Tools can be used on your own text or corpus. To do that you go to the simple URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content.
For more information, see the Wikipedia.
Return to Glossary. for the tool:
http://temp.voyant-tools.org/ (remember this is a temporary server, for more persistent URLs use the main server) [main, beta]
Just the Cirrus tool in Voyant: http://temp.voyant-tools.org/tool/Cirrus/
You will get panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites.
Return to Glossary. that asks you for a text. You can provide:
- One or more URLs to texts on the web
- Upload a text or a zipped collection of texts
- Upload plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files.
Plain text files do not require a specialized program, such as a word processor, to read them.
For more information, see the Wikipedia.
Return to Glossary., HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers.
HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this:
< body >< p >< /p >< /body >
In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body.
Elements may also be modified by attributes and attribute values:
< p class="hangingindent" >
In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. , or XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data.
Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another:
< book >< title >< /title >< /book >
Elements may also be modified by attributes and attribute values:
< book format="hardcover" >
In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. texts
- Upload a PDF (and Voyant will try to extract the text)
Voyant is forgiving, but there are none-the-less bugs.
5.0 Other Stuff
Here are some links to other tools, different corpora and skins for specialized tools:
6.0 For After the Workshop
To
understand the power and limitations of text analysis it is useful to
use Voyant on your own text.
- Find or assemble a text of your own.
- Try studying it with Voyant.
- What would you like to ask of your text but can't? What sort of tool would you like?
Finding Texts:
Aggregating and Cleaning Texts:
7.0 Other Tools
What other tools are there out there? See TAPoR 2.0 for a growing list of tools.
NITLE: Digital Reading Practices for the Liberal Arts Classroom (April 18, 2013)
(Please note that this page will be updated prior to the NITLE workshop.)
Introduction to the Session
- who we are
- about this session (show, try, discuss - questions please!)
First Encounters with Voyant Tools
Can you guess what text this is?
- first encounters with a single tool: example word cloud (above), Brontë Sisters, George Eliot, Frankenstein, Shakespeare, Humanist Discussion Group
- exploring the full interface: example word cloud (above), Brontë Sisters, George Eliot, Frankenstein, Shakespeare
- bring your own texts (formats, sources, tips): (e.g. Project Gutenberg)
- exporting a URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content.
For more information, see the Wikipedia.
Return to Glossary. of a custom corpus (bookmarks, assignments, sharing)
Introducing Voyant Tools in the Classroom
- why use tools to read and analyze?
- digital text is everywhere
- digital texts allow for a proliferation of representations
- reading digital texts: addressing the gap between algorithmic thinking and interpretation
- helpful teaching resources
- limitations & strengths of Voyant
Where to Go Next?
- advanced functionality in Voyant Tools: other tools, skins, skin builder
- embedding Voyant Tools in remote content
- development plans for Voyant: better linguistic analysis and Voyant Notebooks
- other tools and methodologies: TAPoR.ca, Bamboo DiRT, Many Eyes, etc.
THATCamp Kansas (2012)
This is a script for a workshop on using Voyant for the Kansas THATCamp. It is available at http://hermeneuti.ca/workshops/kansas12
0.0 Before the THATCamp
Here are some of the things you might want to do before the workshop on Voyant:
- Read an introduction to text analysis like The Measured Words (distributed before the camp.)
- Review this workshop outline, follow links and review some of the help materials.
- Prepare a text of your own to try with Voyant. To start you might find a novel-length text that you are familiar with and which interests you. Save it as a plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files.
Plain text files do not require a specialized program, such as a word processor, to read them.
For more information, see the Wikipedia.
Return to Glossary. file somewhere where you can get at it during the workshop (on your laptop) or online.
- Bring a laptop with wireless to use in the workshop.
1.0 Introduction
- Workshop leader and participants introduce themselves:
- Geoffrey Rockwell, University of Alberta, geoffrey (dot) rockwell (at) ualberta (dot) ca
- Overview
Voyant
is currently a beta release by Stéfan Sinclair and Geoffrey
Rockwell. It was previously called "Voyeur" so do not be confused if
that name is used. Voyant is the next generation in a series of text
analysis
tools that include HyperPo and TAPoRware. It provides tables and graphs
related to word use across a single document or a collection. Voyant
adds, among other things, the ability to handle much larger files than
the previous online tools could. - Outline
In this workshop we will:
- First, look at how to use a single Voyant tool, Cirrus, with different texts.
- Then learn how to use the normal skin of Voyant with a single text and then a corpus.
- Finally, show how to load your own text into Voyant.
- Help
Remember
that Voyant is a research tool and will often fail, especially when a whole group of people use it at once. There are multiple
versions up if one server is down. If you need help, connect to Hermeneuti.ca and explore the resources there. Here are some useful links:
- Voyant Tools
- Individual Voyant tool descriptions and links - docs.voyant-tools.org/tools
- The main URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content.
For more information, see the Wikipedia.
Return to Glossary. for Voyant Tools in http://voyant-tools.org/, though there are other URLs that can be used:Links for this workshop will look like this: http://bit.ly/VoyantShakespeare [temp, main, beta] where the first link resolves automatically (use it whenever possible) and the subsequent links provide backup. Please note that corpora for this workshop are located on all three servers, but if you load a corpus it will only be availble on the server where it was uploaded.
2.0 Using a single Voyant Tool: Cirrus
Voyant Tools has a number of different tools that can be composed into skins or used individually. We will start with just one tool called Cirrus that can then spawn other tools. We will try it with Mary Shelley's Frankenstein. Click on this link to open.
Cirrus (Frankenstein): http://bit.ly/VoyantCirrusFrankenstein [temp, main, beta]
Alternatively you can go to the tool: http://voyant-tools.org/tool/Cirrus and load the text http://www.gutenberg.org/cache/epub/84/pg84.txt
To learn more about the Cirrus tool go to http://docs.voyant-tools.org/tools and scroll down to read about Cirrus. Or go to TAPoR 2.0 and read a review.
The Cirrus tool shows you a word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text.
Return to Glossary. of high frequency words. Some questions to ask yourself:
- What words did you expect? What words are missing? What words are interesting.
- How does the tool arrange words and choose colours? Is there any correspondence between size and frequency?
Try It:
Try clicking on a word. It will launch a second tab or window with the
full Voyant reading environment. That's what we will look at next.
Try It: Now try other tools in Voyant. Go to http://docs.voyant-tools.org/tools and experiment with the tools. Warning
some of them are prototypes that won't work that well. You will need to
give a URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content.
For more information, see the Wikipedia.
Return to Glossary. for text to them. You can use the Frankenstein from http://www.gutenberg.org/cache/epub/84/pg84.txt
3.0 Using a Reading Skin
Voyant
Tools can also be composed into "skins" that combine tools as panels so
that they can be used interactively. Here is the same Frankenstein text
and an Austen corpus in a simple skin:
Frankenstein: http://bit.ly/VoyantFrankensteinStop [temp, main, beta]
For a corpus see Austen (5 novels): http://bit.ly/VoyantAustenStop [temp, main, beta]
To learn about using the full Reading skin you can go to
In this skin clicking in one window will often (but not always) update other windows. Try the following:
- Triggering: Click on words in the Cirrus word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text.
Return to Glossary.. Then click on a text in the Word Trends and play with the KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right).
Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb:
I.1/577.1 | Four nights will quickly dream away the time; | And
I.1/578.2 Swift as a shadow, short as any dream; | Brief as the
II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander,
III.2/591.1 this derision | Shall seem a dream and fruitless vision, |
IV.1/593.1 as the fierce vexation of a dream. | But first I will
IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The
IV.1/594.2 rare | vision. I have had a dream, past the wit of man to
IV.1/594.2 the wit of man to | say what dream it was: man is but an
IV.1/594.2 he go | about to expound this dream. Methought I was--there
IV.1/594.2 his heart to report, what my dream | was. I will get Peter
IV.1/594.2 to write a ballad of | this dream: it shall be called
IV.1/594.2 it shall be called Bottom's dream, | because it hath no
V.1/599.1 | Following darkness like a dream, | Now are frolic: not a
V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not
See also the definition at Wikipedia.
Return to Glossary..
- Changing Settings: Try changing the settings for the Cirrus by clicking on the small gear icon. Try playing with the Word Trends
- Showing and Hiding Panels: Try showing and hiding panels using the small up and down arrows in the upper-right of the panels.
When in doubt just restart the session by hitting refresh.
4.0 Using Voyant on You Own Text
Voyant Tools can be used on your own text or corpus. To do that you go to the simple URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content.
For more information, see the Wikipedia.
Return to Glossary. for the tool:
http://temp.voyant-tools.org/ (remember this is a temporary server, for more persistent URLs use the main server) [main, beta]
Just the Cirrus tool in Voyant: http://temp.voyant-tools.org/tool/Cirrus/
You will get panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites.
Return to Glossary. that asks you for a text. You can provide:
- One or more URLs to texts on the web
- Upload a text or a zipped collection of texts
- Upload plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files.
Plain text files do not require a specialized program, such as a word processor, to read them.
For more information, see the Wikipedia.
Return to Glossary., HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers.
HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this:
< body >< p >< /p >< /body >
In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body.
Elements may also be modified by attributes and attribute values:
< p class="hangingindent" >
In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. , or XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data.
Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another:
< book >< title >< /title >< /book >
Elements may also be modified by attributes and attribute values:
< book format="hardcover" >
In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. texts
- Upload a PDF (and Voyant will try to extract the text)
Voyant is forgiving, but there are none-the-less bugs.
5.0 Other Stuff
Here are some links to other tools, different corpora and skins for specialized tools:
6.0 For After the Workshop
To
understand the power and limitations of text analysis it is useful to
use Voyant on your own text.
- Find or assemble a text of your own.
- Try studying it with Voyant.
- What would you like to ask of your text but can't? What sort of tool would you like?
Finding Texts:
Aggregating and Cleaning Texts:
7.0 Other Tools
What other tools are there out there? See TAPoR 2.0 for a growing list of tools.
Trinity Long Room Hub 2012: Workshop on Voyant
This is a script for a workshop on using Voyant for the TCD community. It is available at http://hermeneuti.ca/node/222
1.0 Introduction
- Workshop leader and participants introduce themselves:
- Overview
Voyant is currently a beta release by Stéfan Sinclair and Geoffrey
Rockwell. It was previously called "Voyeur" so do not be confused if that name is used. Voyant is the next generation in a series of text analysis
tools that include HyperPo and TAPoRware. It provides tables and graphs
related to word use across a single document or a collection. Voyant
adds, among other things, the ability to handle much larger files than
the previous tools could. - Outline
In this workshop we will:
- First, look at how to use a single Voyant tool, Cirrus, with different texts.
- Then learn how to use the normal skin of Voyant with a single text and then a corpus.
- Finally, show how to load your own text into Voyant.
- Help
Remember that Voyant is a research tool and will often fail. There are multiple versions up if one server is down. If you need help, connect to Hermeneuti.ca and explore the resources there. Here are some useful links:
2.0 Using a single Voyant Tool: Cirrus
Voyant Tools has a number of different tools that can be composed into skins or used individually. We will start with just one tool called Cirrus that can then spawn other tools. We will try it with Mary Shelley's Frankenstein. Click on this link to open.
Cirrus (Frankenstein): http://voyant-tools.org/tool/Cirrus/?corpus=1332317356275.4528&query=&stopList=stop.en.taporware.txt&toolFlow=simple
For a backup go here: http://voyant-tools.org/tool/Cirrus and enter text http://www.gutenberg.org/cache/epub/84/pg84.txt
To learn more about the Cirrus tool go to http://hermeneuti.ca/voyeur/tools and scroll down.
The Cirrus tool shows you a word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text.
Return to Glossary. of high frequency words. Some questions to ask yourself:
- What words did you expect? What words are missing? What words are interesting.
- How does the tool arrange words and choose colours? Is there any correspondence between size and frequency?
Try It: Try clicking on a word. It will launch a second tab or window with the full Voyant reading environment. That's what we will look at next.
Try It: Now try other tools in Voyant. Go to http://hermeneuti.ca/voyeur/tools and experiment with the tools. Warning some of them are prototypes that won't work that well. You will need to give a URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content.
For more information, see the Wikipedia.
Return to Glossary. for text to them. You can use the Frankenstein from http://www.gutenberg.org/cache/epub/84/pg84.txt
3.0 Using a Reading Skin
Voyant Tools can also be composed into "skins" that combine tools as panels so that they can be used interactively. Here is the same Frankenstein text and an Austen corpus in a simple skin:
Frankenstein: http://voyant-tools.org/?corpus=1332317356275.4528&skin=simple&event=corpusTypeSelected
Austen (5 novels): http://voyeurtools.org/?corpus=JaneAusten&stopList=stop.en.taporware.txt (backup)
To learn about using the full Reading skin you can go to
In this skin clicking in one window will often (but not always) update other windows. Try the following:
- Triggering: Click on words in the Cirrus word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text.
Return to Glossary.. Then click on a text in the Word Trends and play with the KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right).
Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb:
I.1/577.1 | Four nights will quickly dream away the time; | And
I.1/578.2 Swift as a shadow, short as any dream; | Brief as the
II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander,
III.2/591.1 this derision | Shall seem a dream and fruitless vision, |
IV.1/593.1 as the fierce vexation of a dream. | But first I will
IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The
IV.1/594.2 rare | vision. I have had a dream, past the wit of man to
IV.1/594.2 the wit of man to | say what dream it was: man is but an
IV.1/594.2 he go | about to expound this dream. Methought I was--there
IV.1/594.2 his heart to report, what my dream | was. I will get Peter
IV.1/594.2 to write a ballad of | this dream: it shall be called
IV.1/594.2 it shall be called Bottom's dream, | because it hath no
V.1/599.1 | Following darkness like a dream, | Now are frolic: not a
V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not
See also the definition at Wikipedia.
Return to Glossary..
- Changing Settings: Try changing the settings for the Cirrus by clicking on the small gear icon. Try playing with the Word Trends
- Showing and Hiding Panels: Try showing and hiding panels using the small up and down arrows in the upper-right of the panels.
When in doubt just restart the session by hitting refresh.
4.0 Using Voyant on You Own Text
Voyant Tools can be used on your own text or corpus. To do that you go to the simple URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content.
For more information, see the Wikipedia.
Return to Glossary. for the tool:
Voyant: http://voyant-tools.org
Just the Cirrus tool in Voyant: http://voyant-tools.org/tool/Cirrus/
Backup older version: http://voyeur.hermeneuti.ca
You will get panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites.
Return to Glossary. that asks you for a text. You can provide:
- One or more URLs to texts on the web
- Upload a text or a zipped collection of texts
- Upload plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files.
Plain text files do not require a specialized program, such as a word processor, to read them.
For more information, see the Wikipedia.
Return to Glossary., HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers.
HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this:
< body >< p >< /p >< /body >
In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body.
Elements may also be modified by attributes and attribute values:
< p class="hangingindent" >
In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. , or XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data.
Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another:
< book >< title >< /title >< /book >
Elements may also be modified by attributes and attribute values:
< book format="hardcover" >
In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. texts
- Upload a PDF (and Voyant will try to extract the text)
Voyant is forgiving, but there are none-the-less bugs.
5.0 Other Stuff
Here are some different corpora and skins for specialized tools:
6.0 For Next Week
To understand the power and limitations of text analysis it is useful to use Voyant on your own text. Please try it over the week:
- Find or assemble a text of your own.
- Try studying it with Voyant.
- What works and what doesn't? Document your problems and questions.
- What would you like to ask of your text but can't? What sort of tool would you like?
Next week we will discuss what works and doesn't; we will look at advanced features; and discuss other tools.
Finding Texts:
Aggregating and Cleaning Texts:
7.0 Second Workshop
This second workshop will be less structured. We will do the following:
- Participants who had a chance to experiment with Voyant can report back.
- Discussion of problems and desires for Voyant.
- Exporting Voyant results. Demonstration of how you might put Voyant results into Excel.
- Getting an image result. Demonstration of getting a PNG for a trend graph.
- Placing a Voyant PanelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites.
Return to Glossary.. Demonstration of placing in TADA here http://tada.mcmaster.ca/Main/VoyantTest
- Content Analysis - what can you do? Demonstration of Globe Work.
Other Tools
What other tools are there out there? See TAPoR 2.0 beta for a growing list of tools.
Other Stuff
ePorte Roots and Routes Voyant Workshop, May 2012
This is a script for a workshop on using Voyant Toos for the
ePorte
Roots and Routes Summer Institue. It is available at http://hermeneuti.ca/workshops/roots12
1.0 Introduction
- Workshop leader: Stéfan
Sinclair, McGill University
- Overview
Voyant Tools is a web-based environment for reading and analyzing
digital texts, created Stéfan Sinclair and Geoffrey Rockwell. It was
previously called "Voyeur" so don't be confused if that name is used.
Voyant is the next generation in a series of text analysis tools that
include HyperPo and TAPoRware. It provides tables and graphs related to
word use across a single document or a collection. Voyant adds, among
other things, the ability to handle much larger files than the previous
tools could. Voyant is actually a suite of modular tools that can be
combined in pre-defined or user-defined combinations called skins. This
workshop's primary objectives are to better understand how and why one
might use Voyant Tools to help in the study of digital texts.
- Outline
In this workshop we will:
- First, look at how to use a single Voyant tool, Cirrus, with a
small corpus of Austen texts.
- Then learn how to use the normal "skin" (multi-tool interface) of
Voyant with a single text.
- Show how to load your own text(s) into Voyant.
- Explore some of the more exploratory and advanced tools available
in Voyant, such a Bubbles and Correspondence Analysis.
- Discuss the use of Voyant Tools in a larger research process
(managing links in Zotero, embedding tools in remote content, etc.)
- Help
If you need help, connect to Hermeneuti.ca and explore the
resources there. Here are some useful links:
N.B. Voyant Tools is in beta, it has warts and blemishes. Always view
what you're looking at with some circumspection and if something doesn't
work as expected, assume it's a bug, not something that you're
misunderstanding.
2.0 Using a single Voyant Tool: Cirrus
Voyant Tools has a number of different tools
that can be composed into skins or used individually. We will start with
just one tool called Cirrus that can
then spawn other tools. We will try it with Jane Austen's Persuasion.
Cirrus (Austen's Persuasion): http://voyant-tools.org/tool/Cirrus/?corpus=JaneAusten&docIndex=5&stopList=stop.en.taporware.txt&toolFlow=simple
(backup)
The Cirrus tool shows
you a word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text.
Return to Glossary. of high frequency words. Some questions to ask yourself:
- What words did you expect? What words are missing? What words are
interesting?
- How does the tool arrange words and choose colours? Is there any
correspondence between size and frequency?
Here are some more Cirrus visualizations to consider:
These types of word clouds are prevalent from academia to advertising –
they quickly provide an intriguing representation of a text, as
demonstrated by this example of studying
gendered languages in toy advertising. But they're ability to
rapidly convey a picture with words comes at the cost of information
reduction, and some
are highly critical of word clouds as hermeneutical tools. What do
you think?
Try It: Try clicking on a word. It will launch a second
tab or window with a list of the texts in the corpus with the frequency of
the word you clicked on.
Try It: Now try double-clicking on one of the texts.
This should launch another tab or window with a Key Word In ContextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph.
Context is particularly important when generating a concordance for a string.
Return to Glossary. (KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right).
Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb:
I.1/577.1 | Four nights will quickly dream away the time; | And
I.1/578.2 Swift as a shadow, short as any dream; | Brief as the
II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander,
III.2/591.1 this derision | Shall seem a dream and fruitless vision, |
IV.1/593.1 as the fierce vexation of a dream. | But first I will
IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The
IV.1/594.2 rare | vision. I have had a dream, past the wit of man to
IV.1/594.2 the wit of man to | say what dream it was: man is but an
IV.1/594.2 he go | about to expound this dream. Methought I was--there
IV.1/594.2 his heart to report, what my dream | was. I will get Peter
IV.1/594.2 to write a ballad of | this dream: it shall be called
IV.1/594.2 it shall be called Bottom's dream, | because it hath no
V.1/599.1 | Following darkness like a dream, | Now are frolic: not a
V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not
See also the definition at Wikipedia.
Return to Glossary.)
of the word in that text.
3.0 Using a Reading Skin
Voyant Tools can also be composed into "skins" that combine tools as
panels so that they can be used interactively. Here is the same Austen
corpus in a simple skin:
http://voyant-tools.org/?corpus=JaneAusten&stopList=stop.en.taporware.txt
(backup)
In this skin clicking in one window will often (but not always) update
other windows. Try the following:
- Triggering: Click on words in the Cirrus word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text.
Return to Glossary..
Then click on a text in the Word Trends and play with the KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right).
Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb:
I.1/577.1 | Four nights will quickly dream away the time; | And
I.1/578.2 Swift as a shadow, short as any dream; | Brief as the
II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander,
III.2/591.1 this derision | Shall seem a dream and fruitless vision, |
IV.1/593.1 as the fierce vexation of a dream. | But first I will
IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The
IV.1/594.2 rare | vision. I have had a dream, past the wit of man to
IV.1/594.2 the wit of man to | say what dream it was: man is but an
IV.1/594.2 he go | about to expound this dream. Methought I was--there
IV.1/594.2 his heart to report, what my dream | was. I will get Peter
IV.1/594.2 to write a ballad of | this dream: it shall be called
IV.1/594.2 it shall be called Bottom's dream, | because it hath no
V.1/599.1 | Following darkness like a dream, | Now are frolic: not a
V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not
See also the definition at Wikipedia.
Return to Glossary..
- Changing Settings: Try changing the settings for the
Cirrus by clicking on the small gear icon. Try playing with the Word
Trends
- Showing and Hiding Panels: Try showing and hiding
panels using the small up and down arrows in the upper-right of the
panels.
When in doubt just restart the session by hitting refresh.
4.0 Using Voyant on You Own Text
Voyant Tools can be used on your own text or corpus. To do that you go to
the simple URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content.
For more information, see the Wikipedia.
Return to Glossary. for the tool:
Voyant: http://voyant-tools.org
Just the Cirrus tool in Voyant: http://voyant-tools.org/tool/Cirrus/
Backup version: http://beta.voyant-tools.org/
You will get panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites.
Return to Glossary. that asks you for a text. You can provide:
- One or more URLs to texts on the web
- Upload a text or a zipped collection of texts
- Upload plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files.
Plain text files do not require a specialized program, such as a word processor, to read them.
For more information, see the Wikipedia.
Return to Glossary., HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers.
HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this:
< body >< p >< /p >< /body >
In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body.
Elements may also be modified by attributes and attribute values:
< p class="hangingindent" >
In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. , or XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data.
Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another:
< book >< title >< /title >< /book >
Elements may also be modified by attributes and attribute values:
< book format="hardcover" >
In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element.
Return to Glossary. texts
- Upload a PDF (and Voyant will try to extract the text)
Voyant is forgiving, but there are nonetheless bugs.
Note that you can create a persistent URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content.
For more information, see the Wikipedia.
Return to Glossary. for your corpus – that way your
link can be shared or bookmarked and you won't need to reload the texts
into Voyant. Click the save icon in the blue bar at the top and the first
URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content.
For more information, see the Wikipedia.
Return to Glossary. will be the link for your Voyant corpus.
5.0 Exploratory and Advanced Tools
Voyant Tools is conceived on the notion that text analysis in the
humanities is as much about proliferating representations of texts than
about producing incontrovertible evidence. Some Voyant Tools are more
about aesthetic or ludic aspects of experiencing digital texts, which can
directly or indirectly inspire observations that may not be otherwise
possible. Here are some examples:
At the same time, some tools are more advanced. For instance, once can
produce correspondence analysis views that try to show how terms map across
multiple documents, such as this view of the Humanist
Discussion Group listserv.
Voyant Tools also enables some quick-and-dirty social network analysis. This
is possible thanks to a process called named entity extraction that attempts
to automatically identify people, places and locations in a text (at the
moment Voyant Tools uses the Stanford Natural Language Processing package to
perform this automated process). It's worth emphasizing that automated
processes like these are subject to several issues and problems – for
instance, how to combine or differentiate between uses of first and/or last
names? how to tell if a same name refers to one or two different people?
What to do when an organization looks like a person's name (e.g. Johns
Hopkins)? Still, you can't beat the simplicity of Voyant Tools RezoViz,
especially when working with a mid-size corpus of shorter texts (5-50
articles, for instance). For instance, here
is a specialize interface showing connections between people mentioned
in emails to the Humanist listserv (RezoViz is in alpha and best experienced
in Chrome).
As always, the real strength of Voyant Tools is the ability to create your
own corpus – you can start at http://voyant-tools.org/?skin=rezoviz.
6.0 Voyant as a Scholarly Tool
One of the essential design principles of Voyant Tools is that it tries to
be useful not just at the moment of analysis, but through more phases of
research. Here are some examples:
- as we've already seen, you can export a link to a corpus that can be
bookmarked, shared by email or Twitter, or otherwise preserved (as a
general rule, a corpus in Voyant will remain accessible as long as it
has been consulted at least once in the past month)
- there's built-in Zotero awareness – you can click on the
folder/article icon in the Firefox address bar to create a new entry
(though you may wish to complete some of the metadata)
- you can export data toward other applications – for instance, produce
a tab-separated values view of a table that can be copy-and-pasted into
a spreadsheet application (where you can edit the data and produce even
more graphs, charts, etc.)
- you can embed a live tool in remote content (a blog post, a journal
article, a term paper, etc.), much as you would embed a YouTube clip –
the interactive affordances of Voyant allow you to go beyond static
screenshots and images and allow your users/readers to engage with the
content and data themselves
7.0 Other Stuff
- Other Voyant Skins:
- Other Tools