Voyeur Tools: See Through Your Texts

Voyeur

Click to run Voyeur


Voyeur Loading

Loading Texts

 


 

Voyeur Overview
Tools Overview

Introducing Voyeur

Voyeur is a web-based text analysis environment. It is designed to be user-friendly, flexible and powerful. Voyeur is part of the Hermeneuti.ca, a collaborative project to develop and theorize text analysis tools and text analysis rhetoric. This section of the Hermeneuti.ca web site provides information and documentation for users and developers of Voyeur.

What you can do with Voyeur:

Voyeur is a work in progress – it is currently in beta. Some things don't work properly, some planned features aren't available yet. In particular, here are some weaknesses that we recognize:

To get started, try viewing one of the screencasts to the right or continue to Workshops -> Voyeur Tools for Users

How to Use this Manual

This manual if for novice users, experienced users, writers and developers. This manual works closely with The Rhetoric of Text Analysis where you will find example essays and discussion of text analysis in general.

Novice Users who want to get started with text analysis should:

Writers, web authors and researchers who want to embed Voyeur panels (we call them hermeneuticons) into their essays, online journals, blogs and so on should:

CC-GNU GPL

Developers who want to adapt Voyeur tools or develop their own should:

 

How Voyeur connects to Hermeneuti.ca, the book

Diagram

Voyeur is the toolset that made possible the analysis reported in Hermeneuti.ca, the book and web site you are now looking at. The book reflects on text analysis, gives examples, and discusses the decisions behind Voyeur. The web site hermeneuti.ca (note how we use the lower case when referring to the web site) includes the sections of the book and the manual for Voyeur (which you are reading now.) The two connect like this:

Principles of Voyeur

Introducing Voyeur

Voyeur is the suite of tools used by Hermeneuti.ca to interpret texts and to think about tools. You too can use Voyeur to analyze your own texts, to write essays with emebedded hermeneutical panels generated by Voyeur, and you can adapt the code to create your own versions of tool. This section of the Hermeneuti.ca web site is both a tutorial and a reference.

What you can do with Voyeur

Voyeur is a new type of text analysis tool that you can use across the research cycle. You can:

Design Principles

Although text analysis tool developers might choose to highlight different aspects for their purposes (such as stand-alone software as opposed to web-based software), here are some of the primary design principles for Voyeur, as gleaned from other tools:

Though they have existed before to varying degrees in different tools, Voyeur is an attempt to pull together these design principles into a single a package. In some cases the the principles may in fact be contradictory in practice (for instance, supporting large-scale immediate analysis) and compromises must be found. Working through those challenges is one of the aspects that make Voyeur a worthy intellectual challenge.

HyperPo and TAPoRware are the tools with the strongest affinities to Voyeur.1, but we have devoted considerable thought and attention to improving existing web-based tools in ways further described below.

Scalability. Whereas HyperPo and Taporware can readily handle book-length texts for micro-analysis, both reach their practical limits when corpora grow to beyond a couple of megabytes. In contrast, Voyeur is designed to handle much larger corpora (dozens of megabytes and beyond). There is still a practical (though undefined) limit to the size of corpora for Voyeur given that it seeks to enable immediate micro-analysis, but the Voyeur architecture is desiged with scale in mind. There will always be a tension between indexing speed and retrieval speed: the more time is available for indexing, the faster retrieval tends to be. As such, text analysis tools that require pre-indexing (Philologic, Monk, etc.) will almost always operate faster because pre-processing can be done over the course of hours or even days (building very large relational databases, for instance). In contrast, Voyeur seeks to strike a balance between indexing and retrieval speed: ideally both should happen in a timeframe that seems reasonable in a web-based contextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph. Context is particularly important when generating a concordance for a string. Return to Glossary.. The ever-evolving pace of computing power and the promise of high performance computers obviously make the actual capabilities a moving target.

Ubiquity. As useful as text analysis tools like HyperPo and Taporware may be, we recognize a need to allow content providers and producers (like bloggers) to quickly and easily integrate functionality into their own space. The previous model was limited to users bringing their own texts to our tools, we now wish to also allow users to also bring our tools to their texts. In some cases users will wish to have static results, in which case we can provide a mechanism for easily copying and pasting results that can be directly embedded in other content. However, much of the most compelling functionality of Voyeur is interactive and requires considerable client-side scripting: our current approach is to provide a tiny snippet of HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers. HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this: < body >< p >< /p >< /body > In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body. Elements may also be modified by attributes and attribute values: < p class="hangingindent" > In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. that is essentially an IFRAME that contains the necessary HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers. HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this: < body >< p >< /p >< /body > In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body. Elements may also be modified by attributes and attribute values: < p class="hangingindent" > In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. elements. This approach allows Voyeur code to remain separate from its host while satisfying security limitations of cross-browser scripting. There are of course other challenges inherent to code embedded elsewhere, including version management (supporting legacy syntax) and cacheing of data (both the corpus and results2).

Referenceability. The status of text analysis tools as academic resources has been a point of debate over the years. Scholars feel compelled to cite ideas and texts that come from other authors, but they are much less likely to recognized tools that have contributed to their work (and we would probably not want every scholar to cite search engines such as Google that have been used during research). We feel strongly that text analysis tools can represent a significant contributor to digital research, whether they were used to help confirm hunches or to lead the researcher into completely unanticipated realms. In any case, we have designed Voyeur to be conducive to citation in various ways, including a general citation to Voyeur and citations for static or dynamic results. An important component of academic knowledge is reproducibility, and providing scholars with more information on the processes followed during research – including the use of text analysis tools – is sure to be useful.

Ultimately, Voyeur is an attempt to learn from the strengths and weaknesses of past tools, to recognize current user needs (ex: working with much larger corpora), and to anticipate future practices (ex: referencing text analysis tools and results). We believe that the potential for tools in the interpretive process merits continual rethinking of tool design and functionality, and as such, Voyeur is of course a work in progress.

  1. 1. The affininity to Voyeur is not surprisingly given that Sinclair developed HyperPo and Rockwell developed TAPoRware.
  2. 2. For instance, we wouldn't want to re-run a computationally expensive process each time someone visits a popular blog, but we don't wish to cache everyone's analytic results either.

Some Background of Voyeur

Text analysis tools go back to the first ad-hoc tools that Roberto Busa created for his concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. of the works of Thomas Acquinas and Andrew Booth’s Mechanical Resolution of Linguistic Problems in the 1950s.

Voyeur is a suite of analysis and exploration tools for digital texts. Very few contributions to knowledge and technology are unrecognizable from what preceded1, and Voyeur is no exception: it is largely built on the foundations of text analysis tool design and methodology from over 50 years of humanities computing research2. The following are some of the tools that have most influenced text analysis tool development and Voyeur in particular:

  1. 1. Limiting himself to the history of science, Thomas Kuhn provides examples of revolutionary advances in thinking, such as Copernican cosmology or Einstein's Theory of Relativity; Voyeur has much more modest ambitions.
  2. 2. For a brief overview of the history of humanities computing, see Hockey "History", 2002.
  3. 3. Unix is used here as shorthand for both Unix and unix-like operating systems like Linux.

Quick Guide of Voyeur for Users

Voyeur is a web-based tool. To use it go to http://voyeurtools.org. This is what you will see,

To use Voyeur you need to specify a text. You can do this different ways:

Voyeur, once it retreives your text, will index it for analysis and display this simple arrangement of two panels,

Once your text or collection is indexed Voyeur will present you will a display with two panels. The right hand panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary. summarizes the text. The left panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary. shows you the high frequency words. You can show an hide different panels using the double arrow button. You can also see more panels by selected a word or words to follow. Try the Words in the Entire Corpus panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary..

There are a number of features to the Words in the Entire Corpus panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary.:

salt, pepper, sardines, oil
"digital humanities"

Once you select one or more words you will get an arrangement of panels like this,

The panels are connected so that clicking in one will trigger updates in the others. Typically the order is the following:

 

 

 

Workshops

Others:

1.0 Basic Introduction to Text Analysis

This is a script for a workshop on using Voyant for learning about text analysis. It is available at http://hermeneuti.ca/node/244

0.0 Before the Workshop

Here are some of the things you might want to do before the workshop on Voyant:

  • Read an introduction to text analysis like The Measured Words (distributed before the camp.)
  • Review this workshop outline, follow links and review some of the help materials.
  • Prepare a text of your own to try with Voyant. To start you might find a novel-length text that you are familiar with and which interests you. Save it as a plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files. Plain text files do not require a specialized program, such as a word processor, to read them. For more information, see the Wikipedia. Return to Glossary. file somewhere where you can get at it during the workshop (on your laptop) or online.
  • Bring a laptop with wireless to use in the workshop.

 

1.0 Introduction

  • Workshop leader and participants introduce themselves:
  • Overview
    Voyant is currently a beta release by Stéfan Sinclair and Geoffrey Rockwell. It was previously called "Voyeur" so do not be confused if that name is used. Voyant is the next generation in a series of text analysis tools that include HyperPo and TAPoRware. It provides tables and graphs related to word use across a single document or a collection. Voyant adds, among other things, the ability to handle much larger files than the previous online tools could.
  • Outline
    In this workshop we will:
    • First, look at how to use a single Voyant tool, Cirrus, with different texts. 
    • Then learn how to use the normal skin of Voyant with a single text and then a corpus.
    • Finally, show how to load your own text into Voyant.
  • Help
    Remember that Voyant is a research tool and will often fail, especially when a whole group of people use it at once. There are multiple versions up if one server is down. If you need help, connect to Hermeneuti.ca and explore the resources there. Here are some useful links:
  • Voyant Tools
    • Individual Voyant tool descriptions and links - docs.voyant-tools.org/tools
    • The main URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. for Voyant Tools in http://voyant-tools.org/, though there are other URLs that can be used:
      http://voyant-tools.org/the main server, content here usually persists for a month or longer (especially when accessed regularly)
      http://beta.voyant-tools.org/a development server, less stable, so avoid linking to it
      http://voyeur.hermeneuti.ca/a much older version of Voyant, more of a tourist attraction and last ditch solution
      Links for this workshop will look like this: http://bit.ly/VoyantShakespeare [main, beta] where the first link resolves automatically (use it whenever possible) and the subsequent links provide backup. Please note that corpora for this workshop are located on all three servers, but if you load a corpus it will only be availble on the server where it was uploaded.



2.0 Using a single Voyant Tool: Cirrus

Voyant Tools has a number of different tools that can be composed into skins or used individually. We will start with just one tool called Cirrus that can then spawn other tools. We will try it with Mary Shelley's Frankenstein. Click on this link to open.

Cirrus (Frankenstein): http://bit.ly/VoyantCirrusFrankenstein [mainbeta]

Alternatively you can go to the tool: http://voyant-tools.org/tool/Cirrus and load the text http://www.gutenberg.org/cache/epub/84/pg84.txt

To learn more about the Cirrus tool go to http://docs.voyant-tools.org/tools and scroll down to read about Cirrus. Or go to TAPoR 2.0 and read a review.

The Cirrus tool shows you a word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. Return to Glossary. of high frequency words. Some questions to ask yourself:

  • What words did you expect? What words are missing? What words are interesting.
  • How does the tool arrange words and choose colours? Is there any correspondence between size and frequency?

Try It: Try clicking on a word. It will launch a second tab or window with the full Voyant reading environment. That's what we will look at next.

Try It: Now try other tools in Voyant. Go to http://docs.voyant-tools.org/tools and experiment with the tools. Warning some of them are prototypes that won't work that well. You will need to give a URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. for text to them. You can use the Frankenstein from http://www.gutenberg.org/cache/epub/84/pg84.txt



3.0 Using a Reading Skin

Voyant Tools can also be composed into "skins" that combine tools as panels so that they can be used interactively. Here is the same Frankenstein text and an Austen corpus in a simple skin:

Frankenstein: http://bit.ly/VoyantFrankensteinStop [main, beta]

For a corpus see Austen (5 novels): http://bit.ly/VoyantAustenStop [main, beta]

To learn about using the full Reading skin you can go to

In this skin clicking in one window will often (but not always) update other windows. Try the following:

  • Triggering: Click on words in the Cirrus word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. Return to Glossary.. Then click on a text in the Word Trends and play with the KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary..
  • Changing Settings: Try changing the settings for the Cirrus by clicking on the small gear icon. Try playing with the Word Trends
  • Showing and Hiding Panels: Try showing and hiding panels using the small up and down arrows in the upper-right of the panels.

When in doubt just restart the session by hitting refresh.


 

4.0 Using Voyant on You Own Text

Voyant Tools can be used on your own text or corpus. To do that you go to the simple URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. for the tool:

http://voyant-tools.org/ [main, beta]

Just the Cirrus tool in Voyant: http://voyant-tools.org/tool/Cirrus/

You will get panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary. that asks you for a text. You can provide:

  • One or more URLs to texts on the web
  • Upload a text or a zipped collection of texts
  • Upload plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files. Plain text files do not require a specialized program, such as a word processor, to read them. For more information, see the Wikipedia. Return to Glossary., HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers. HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this: < body >< p >< /p >< /body > In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body. Elements may also be modified by attributes and attribute values: < p class="hangingindent" > In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. , or XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. texts
  • Upload a PDF (and Voyant will try to extract the text)

Voyant is forgiving, but there are none-the-less bugs.



5.0 Other Stuff

Here are some links to other tools, different corpora and skins for specialized tools:


 

6.0 For After the Workshop

To understand the power and limitations of text analysis it is useful to use Voyant on your own text.

  1. Find or assemble a text of your own. 
  2. Try studying it with Voyant.
  3. What would you like to ask of your text but can't? What sort of tool would you like?

Finding Texts:

Aggregating and Cleaning Texts:


 

7.0 Other Tools

What other tools are there out there? See TAPoR 2.0 for a growing list of tools.

2.0 Text Analysis Methods Workshop

 

This is a script for a workshop on using Voyant and TAPoR for a graduate class on research methods.

The script can be found at http://hermeneuti.ca/node/254/

1.0 Introduction

  • Overview
    This workshop will quickly introduce you to computer assisted text analysis using Voyant and TAPoR. Voyant is currently a beta release by Stéfan Sinclair and Geoffrey Rockwell and is the next generation in a series of text analysis tools that include HyperPo and TAPoRware. TAPoR is web site for the discovery and review of text analysis tools including those in Voyant.
  • Outline
    In this workshop we will:
    • First, look at how to use a single Voyant tool, Cirrus, with different texts. 
    • Then learn how to use the normal skin of Voyant with a single text and then a corpus.
    • Then learn how to load your own text into Voyant.
    • Finally, we will look at TAPoR where you can find other tools.
  • Help
    Remember that the tools entered in TAPoR like Voyant are research tools and will often fail, especially when a whole group of people use it at once. There are multiple versions up if one server is down. If you need help, connect to Hermeneuti.ca and explore the resources there. Here are some useful links:
  • Voyant Tools


2.0 Preparing a text for a question

The first step in text analysis is to assemble a text to fit your question(s). What do you want to ask about? What sort of text would help you ask questions about an issue? How can you use the internet to build a text?

For this workshop lets assemble a text off the internet. 

  • Decide on some aspect of popular culture or computing culture well documented on the internet. 
  • Google keywords associated with the subject you want to study.
  • Skim the results and then develop selection criteria for what you want to scrape.
  • Scrape a set of texts using Google.
  • Copy and paste the texts into a text file. Clean out the navigation information and irrelevant parts.
  • Export a text file for text analysis.

For more see Appendix 1: Finding and Preparing an Electronic Text


3.0 Using a single Voyant Tool: Cirrus

Voyant Tools has a number of different tools that can be composed into skins or used individually. We will start with just one tool called Cirrus that can then spawn other tools. 

Go to the Cirrus tool and load up your text: http://voyant-tools.org/tool/Cirrus and load the text.

There are a number of ways to load a text. You can provide:

 

  • One or more URLs to texts on the web
  • Upload a text or a zipped collection of texts
  • Upload plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files. Plain text files do not require a specialized program, such as a word processor, to read them. For more information, see the Wikipedia. Return to Glossary., HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers. HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this: < body >< p >< /p >< /body > In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body. Elements may also be modified by attributes and attribute values: < p class="hangingindent" > In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. , or XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. texts
  • Upload a PDF (and Voyant will try to extract the text)

 

To learn more about the Cirrus tool go to http://docs.voyant-tools.org/tools and scroll down to read about Cirrus. Or go to TAPoR 2.0 and read a review.

You can see Cirrus with a text like "Frankenstein" here: http://bit.ly/VoyantCirrusFrankenstein [tempmainbeta]

The Cirrus tool shows you a word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. Return to Glossary. of high frequency words. Some questions to ask yourself:

  • What words did you expect? What words are missing? What words are interesting.
  • How does the tool arrange words and choose colours? Is there any correspondence between size and frequency?

Try It: Try clicking on a word. It will launch a second tab or window with the full Voyant reading environment. That's what we will look at next.

Try It: Now try other tools in Voyant. Go to http://docs.voyant-tools.org/tools and experiment with the tools. Warning some of them are prototypes that won't work that well. Try your text in different tools.


4.0 Using a Reading Skin

Voyant Tools can also be composed into "skins" that combine tools as panels so that they can be used interactively. Here is the same Frankenstein text and an Austen corpus in a simple skin:

Go to Voyant and load your text into the Reading Skin: http://voyant-tools.org

If you want to see a text in the Reading Skin you can look at Frankenstein: http://bit.ly/VoyantFrankensteinStop [tempmainbeta]

For a corpus see Austen (5 novels): http://bit.ly/VoyantAustenStop [tempmainbeta]

To learn about using the full Reading skin you can go to

In this skin clicking in one window will often (but not always) update other windows. Try the following:

  • Triggering: Click on words in the Cirrus word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. Return to Glossary.. Then click on a text in the Word Trends and play with the KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary..
  • Changing Settings: Try changing the settings for the Cirrus by clicking on the small gear icon. Try playing with the Word Trends
  • Showing and Hiding Panels: Try showing and hiding panels using the small up and down arrows in the upper-right of the panels.

When in doubt just restart the session by hitting refresh.


 

5.0 Other Stuff

Here are some links to other tools, different corpora and skins for specialized tools:


6.0 More Information

Finding Texts:

Aggregating and Cleaning Texts:


7.0 Other Tools

What other tools are there out there? See TAPoR 2.0 for a growing list of tools.

 

CWRCshop2 (Ryerson): Using Voyant for Analyzing Texts

This is a script for a workshop on using Voyant for the CWRC community. It is available at http://hermeneuti.ca/workshops/cwrc2

1.0 Introduction

2.0 Using a single Voyant Tool: Cirrus

Voyant Tools has a number of different tools that can be composed into skins or used individually. We will start with just one tool called Cirrus that can then spawn other tools. We will try it with Jane Austen's Persuasion.

Cirrus (Austen's Persuasion): http://voyeurtools.org/tool/Cirrus/?corpus=JaneAusten&docIndex=5&stopList=stop.en.taporware.txt&toolFlow=simple (backup)

The Cirrus tool shows you a word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. Return to Glossary. of high frequency words. Some questions to ask yourself:

  • What words did you expect? What words are missing? What words are interesting?
  • How does the tool arrange words and choose colours? Is there any correspondence between size and frequency?

Here are some more Cirrus visualizations to consider:

These types of word clouds are prevalent from academia to advertising – they quickly provide an intriguing representation of a text, as demonstrated by this example of studying gendered languages in toy advertising. But they're ability to rapidly convey a picture with words comes at the cost of information reduction, and some are highly critical of word clouds as hermeneutical tools. What do you think?

Try It: Try clicking on a word. It will launch a second tab or window with a list of the texts in the corpus with the frequency of the word you clicked on.

Try It: Now try double-clicking on one of the texts. This should launch another tab or window with a Key Word In ContextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph. Context is particularly important when generating a concordance for a string. Return to Glossary. (KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary.) of the word in that text.

3.0 Using a Reading Skin

Voyant Tools can also be composed into "skins" that combine tools as panels so that they can be used interactively. Here is the same Austen corpus in a simple skin:

http://voyeurtools.org/?corpus=JaneAusten&stopList=stop.en.taporware.txt (backup)

In this skin clicking in one window will often (but not always) update other windows. Try the following:

  • Triggering: Click on words in the Cirrus word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. Return to Glossary.. Then click on a text in the Word Trends and play with the KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary..
  • Changing Settings: Try changing the settings for the Cirrus by clicking on the small gear icon. Try playing with the Word Trends
  • Showing and Hiding Panels: Try showing and hiding panels using the small up and down arrows in the upper-right of the panels.

When in doubt just restart the session by hitting refresh.

4.0 Using Voyant on You Own Text

Voyant Tools can be used on your own text or corpus. To do that you go to the simple URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. for the tool:

Voyant: http://voyeurtools.org

Just the Cirrus tool in Voyant: http://voyeurtools.org/tool/Cirrus/

Backup version: http://beta.voyant-tools.org/

You will get panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary. that asks you for a text. You can provide:

  • One or more URLs to texts on the web
  • Upload a text or a zipped collection of texts
  • Upload plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files. Plain text files do not require a specialized program, such as a word processor, to read them. For more information, see the Wikipedia. Return to Glossary., HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers. HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this: < body >< p >< /p >< /body > In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body. Elements may also be modified by attributes and attribute values: < p class="hangingindent" > In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. , or XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. texts
  • Upload a PDF (and Voyant will try to extract the text)

Voyant is forgiving, but there are none-the-less bugs.

Note that you can create a persistent URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. for your corpus – that way your link can be shared or bookmarked and you won't need to reload the texts into Voyant. Click the save icon in the blue bar at the top and the first URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. will be the link for your Voyant corpus.

5.0 Other Stuff

CWRCshop: Using Voyant for Analyzing Texts

This is a script for a workshop on using Voyant for the CWRC community. It is available at http://hermeneuti.ca/node/211

1.0 Introduction

  • The workshop leaders will introduce themselves:
    • Geoffrey Rockwell, University of Alberta, geoffrey (dot) rockwell (at) ualberta (dot) ca, http://www.geoffreyrockwell.com
    • Susan Brown, University of Alberta, University of Guelph, sbrown (at) uoguelph (dot) ca
  • Overview
    Voyant is currently a beta release by Stéfan Sinclair and Geoffrey Rockwell. It was previously called "Voyeur" so do not be confused if that name is used. Voyant is the next generation in a series of text analysis tools that include HyperPo and TAPoRware. It provides tables and graphs related to word use across a single document or a collection. Voyant adds, among other things, the ability to handle much larger files than the previous tools could.
  • Outline
    In this workshop we will:
    • First, look at how to use a single Voyant tool, Cirrus, with a small corpus of Austen texts. 
    • Then learn how to use the normal skin of Voyant with a single text.
    • Finally, show how to load your own text into Voyant.
  • Now make sure you can connect to the wireless.
  • Help
    If you need help, connect to Hermeneuti.ca and explore the resources there. Here are some useful links:

2.0 Using a single Voyant Tool: Cirrus

Voyant Tools has a number of different tools that can be composed into skins or used individually. We will start with just one tool called Cirrus that can then spawn other tools. We will try it with Mary Shelley's Frankenstein. Click on this link to open.

Cirrus (Frankenstein): http://dev.voyeurtools.org:8080/tool/Cirrus/?corpus=1317355585427.2492&stopList=stop.en.taporware.txt

For a backup go here: http://voyeur.hermeneuti.ca/tool/Cirrus/ and enter text http://www.gutenberg.org/cache/epub/84/pg84.txt

The Cirrus tool shows you a word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. Return to Glossary. of high frequency words. Some questions to ask yourself:

  • What words did you expect? What words are missing? What words are interesting.
  • How does the tool arrange words and choose colours? Is there any correspondence between size and frequency?

Try It: Try clicking on a word. It will launch a second tab or window with a list of the texts in the corpus with the frequency of the word you clicked on.

Try It: Now try double-clicking on one of the texts. This should launch another tab or window with a Key Word In ContextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph. Context is particularly important when generating a concordance for a string. Return to Glossary. (KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary.) of the word in that text.

3.0 Using a Reading Skin

Voyant Tools can also be composed into "skins" that combine tools as panels so that they can be used interactively. Here is the same Austen corpus in a simple skin:

Frankenstein: http://dev.voyeurtools.org:8080/?corpus=1317355585427.2492&skin=simple&event=corpusTypeSelected

In this skin clicking in one window will often (but not always) update other windows. Try the following:

  • Triggering: Click on words in the Cirrus word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. Return to Glossary.. Then click on a text in the Word Trends and play with the KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary..
  • Changing Settings: Try changing the settings for the Cirrus by clicking on the small gear icon. Try playing with the Word Trends
  • Showing and Hiding Panels: Try showing and hiding panels using the small up and down arrows in the upper-right of the panels.

When in doubt just restart the session by hitting refresh.

4.0 Using Voyant on You Own Text

Voyant Tools can be used on your own text or corpus. To do that you go to the simple URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. for the tool:

Voyant: http://voyeurtools.org

Just the Cirrus tool in Voyant: http://voyeurtools.org/tool/Cirrus/

Backup older version: http://voyeur.hermeneuti.ca

You will get panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary. that asks you for a text. You can provide:

  • One or more URLs to texts on the web
  • Upload a text or a zipped collection of texts
  • Upload plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files. Plain text files do not require a specialized program, such as a word processor, to read them. For more information, see the Wikipedia. Return to Glossary., HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers. HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this: < body >< p >< /p >< /body > In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body. Elements may also be modified by attributes and attribute values: < p class="hangingindent" > In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. , or XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. texts
  • Upload a PDF (and Voyant will try to extract the text)

Voyant is forgiving, but there are none-the-less bugs.

5.0 Other Stuff

Here are some corpora and skins:

DH 2011 Visualization for Literary History (TAPoRware and Voyeur)

This is an outline for a workshop on visualization with Voyeur. It is based on a workshop given at DH 2010 in London, England.

1.0 Introduction

Here is a list of links for the Visualization for Literary History:

Here are the tools to try for the full Voyeur interface:

Now lets try the full text again in the full Voyeur:

For a list of tools see: http://entry.tapor.ca

DH 2011 Visualization for Literary History (Visualization with Voyeur)

This is an outline for a workshop on visualization with Voyeur. It is based on a workshop given at DH 2010 in London, England.

1.0 Introduction

2.0 Visualizing a Single Text

In the first part of the Workshop we will show you how to use Voyeur to visualize  a single text as a way of learning the interface. We will work with the Introduction, Preface, Chapter 1 and Chapter 2 of Mary Shelley's Frankenstein. The plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files. Plain text files do not require a specialized program, such as a word processor, to read them. For more information, see the Wikipedia. Return to Glossary. is here:

http://taporware.ualberta.ca/sampleDocs/plainText.txt - This is just a couple of chapters

http://www.gutenberg.org/cache/epub/84/pg84.txt - This is the Gutenberg version of the full text

In order to focus on each tool independently, will open each Voyeur tool separately. 

  • First we will look at Cirrus: http://voyeurtools.org/tool/Cirrus/  
  • Cirrus is a visualization tool that displays a word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. Return to Glossary. relating to the frequency of words appearing in one or more documents. One can click on any word appearing in the cloud to obtain detailed information about its relativity. The larger the word, the more frequent the term.
    • Show how to load a text by copying one of the Frankenstein URLs into the "Add Texts" box
    • Show how hovering over the words reveals a number showing the word count of the current word in the corpus. 
    • Show how clicking on a word produces a textual set of results as a list on a new page.  These results include a count, a relative count, and a trend graph.
  • Next we will look at Links: http://voyeurtools.org/tool/Links/
  • Links finds collocates for words and displays links between them using a force directed graph. It shows term frequencies in proximity to keyword. It is a visualization and shows a web of terms. Once you arrive to Links, insert / upload your content and let the tool perform its analysis. You will be presented with a web type visualization. You may hover over words to find data pertaining to that word within your corpus. You may also double-click on any word to find a more detailed analysis. Clicking and dragging allows you to organize your corpus. If there are multiple documents within the corpus, they will be coloured differently.
  • Load a text by copying one of the Frankenstein URLs into the "Add Texts" box
  • If you hover over a term, Voyeur will tell you its linkage within the corpus documents.
  • Try dragging and dropping terms to organize them.
  • If you would like to manipulate the visualization, right-click on any of the terms and choose 'Stick/unstick' or 'Remove'. 'Stick/unstick' puts the term in place, and is not moved when other terms are moved. 'Remove' simply removes the term from the visualization.
  • Clicking on the options button (the button that looks like a gear) will launch a dialog box with various options pertaining to the Links tool. Stop words list is if you would like to exclude words from the visualization. (Usually words such as 'a', 'the', and 'and'.) 'NodeA node in a graph is the basic unit of data from which a graph can be constructed. In text analysis using a hypergraph, nodes connect to other nodes. Each node represents a word, and nodes touching where words are found in conjunction with one another in the source text. For more information on nodes, see the Wikipedia. Return to Glossary. size determined by type frequency' is the default, and will result in sorting by how often the term appears in the documents. Sorting by 'NodeA node in a graph is the basic unit of data from which a graph can be constructed. In text analysis using a hypergraph, nodes connect to other nodes. Each node represents a word, and nodes touching where words are found in conjunction with one another in the source text. For more information on nodes, see the Wikipedia. Return to Glossary. links' will result in terms appearing larger if they are heavily linked with other terms. 'Autofit graph on screen' sizes the graph depending on the size of your browser window. 'Remove orphans' will remove terms which are not linked to any other term in the visualization.
Now we will look at Word Trends http://voyeurtools.org/tool/TypeFrequenciesChart/
  • Term Frequencies Chart shows how terms are distributed across document(s) in a corpus (documents are shown in the order in which they were added).  Every charted lineA line is the string of text limited by the width of a page. Lines are often used in tokenization, and may contain parts of one or more sentences. For example "The quick brown fox jumps over the lazy dog." is a complete sentence and occurs on one line. By contrast, "Hard by a great forest dwelt a poor wood-cutter with his wife and his two children. The boy was called Hansel and the girl Gretel. He had little to bite and to break, and once when great dearth fell on the land, he could no longer procure even daily bread." spans three sentences and four lines. Return to Glossary. represents one word common throughout the entire corpus. If you hover over specific points it will give you specific information about that word in a specific document.
  • When you add analyze a corpus with Term Frequencies Grid, you will initially have common words at the top of the chart with colour codes. You will see lines within the graph which are coloured accordingly to those words. If you click on one of the terms at the top, it will omit that term from the graph.
  • When we hover over the segment points, we can see the frequency of that term in that segment. If you click on the point, Voyeur will open a new window with detailed information of that segment and term within its Document KWICs tool.
  • If you click and drag on a section of the chart it will zoom in to that section. To reset the chart to its original state, click on “reset zoom”.

  • If you would like to see less or more segments on the chart, simply click on “Segments” at the bottom left of the chart to choose the desired segments.

Other Things
  • We will look at how how to get help (Mention Quick Guide)
  • Some things to try:
    • Experiment with the Options (like the Stop Word list)
    • Create a Favorites list for a theme and and explore that list
    • Search for phrases

3.0 Analyzing a Corpus

In the second part of the Workshop we will look at working with a corpus or collection of many texts. We will use Voyeur on the archives of HUMANIST from 1987 to 2008 (21 documents.) The Voyeur index is at:

http://voyeurtools.org/?corpus=humanist

 

  • Bubblelines is a visualization tool that helps to understand patterns of word repetition in one or more documents. Each document is represented as a horizontal lineA line is the string of text limited by the width of a page. Lines are often used in tokenization, and may contain parts of one or more sentences. For example "The quick brown fox jumps over the lazy dog." is a complete sentence and occurs on one line. By contrast, "Hard by a great forest dwelt a poor wood-cutter with his wife and his two children. The boy was called Hansel and the girl Gretel. He had little to bite and to break, and once when great dearth fell on the land, he could no longer procure even daily bread." spans three sentences and four lines. Return to Glossary. and each seach term is represented as a bubble – the bubble represents the frequency of the term in the corresponding segment of text (the text is divided into segments of equal length). The larger the bubble, the more frequent the term.
  • Load a text by copying one of the Frankenstein URLs into the "Add Texts" box
  • Hovering over a bubble, or set of bubbles, will cause a box to appear that displays the frequency counts for that segment of text.

  • Similarly, hovering over the number at the end of the lineA line is the string of text limited by the width of a page. Lines are often used in tokenization, and may contain parts of one or more sentences. For example "The quick brown fox jumps over the lazy dog." is a complete sentence and occurs on one line. By contrast, "Hard by a great forest dwelt a poor wood-cutter with his wife and his two children. The boy was called Hansel and the girl Gretel. He had little to bite and to break, and once when great dearth fell on the land, he could no longer procure even daily bread." spans three sentences and four lines. Return to Glossary. will cause a box to appear that summarizes the frequency for the entire document.

  • When Bubblelines first loads a corpus, you may see terms that have been pre-selected and included in the URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. or embedded page. If no terms are specified, Bubblelines automatically fetches the five most frequent terms and displays bubbles based on those.

  • You can remove the default terms by clicking on the "Clear Terms" button.

  • You can add additional terms to be displayed using the "Find Term" box. Note that available terms will appear as you type and you can pick an item from the list to have it added.

  • In addition to adding and removing terms, you can toggle the display of the terms that have been loaded. To do so simply click on the term (active terms are underlined).

  • ScatterPlot creates a scatter plot graph of terms, spaced by their variation from one another. Once you arrive to ScatterPlot, insert / upload your content and let the tool perform its analysis. You may hover over these dots and click on them for more information.
  • When you first load ScatterPlot, you will see a variety of terms plotted on a graph. If you hover over the terms, you will see their variation explained by each component on the x and y axis. If you click on any of these terms, it will bring you to the Document KWICsA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary. tool for further analysis.

  • ScatterPlot offers options for changing the plot. The terms button allows you choose how many terms should be displayed. The dimensions button lets you switch between a two or three dimensional graph. Toggle labels simply removes or adds labels for the terms on the graph.

  • Some other things to try:
    • Set stoplists.  You may want to exclude common words.  To do this, click on the "Options" button, represented by a gear icon in the upper-right. 
    • Manage multiple documents.  
    • Show how to group results
    • Show comparing document
  • Try looking for trends yourself using the different tools

4.0 Using your own text

  • Now you can try your own text. There are different ways of providing Voyeur a text:
    • Typing a text or pasting it in
    • Typing in one or more URLs (as we have done above)
    • Uploading a text, using the "upload" button
  • For uploading, there are a number of formats of texts that will work:
    • file formats: text, HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers. HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this: < body >< p >< /p >< /body > In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body. Elements may also be modified by attributes and attribute values: < p class="hangingindent" > In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. , XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. , RSS, TEI, PDF, MS Word, RTF
  • Finally, we will discuss caching and so on.
  • Now try your own text.

5.0 Exporting Data and Quoting Analytics

We will now show how to export data and quote analytical results:

  • How to export tab-separated values, copy and pasted into Excel
  • How to export of XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. results from KWICsA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary. (for instance)
  • How to quote an analytical result in TADA.
  • Go to http://tada.mcmaster.ca/Sandbox/VoyeurWorkshop to try it yourself.

6.0 Advanced and Other

7.0 To Prepare

  • Make sure we have Voyeur running with a backup
  • Sort out how participants can get on wireless
  • Powerbars for laptops
  • Preindex texts

DH 2011 Voyeur Tools

This outline is for a workshop offered at the Digital Humanities 2011 conference at Stanford.

Please note that the main server for Voyeur Tools (voyeurtools.org) may be inaccessible so we have created a backup installation (dev.voyeurtools.org:8080). They should function very similarly, but the corpora loaded into the development server may not be accessible after the workshop.

1.0 Introduction

2.0 Introduction to text analysis using individual Voyeur tools

After introductions we will show you how to use individual tools in Voyeur to analyze a single text as a way of thinking about techniques in text analysis. We will work with Mary Shelley's Frankenstein, the Humanist discussion list corpus and a collection of Austen novels. The plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files. Plain text files do not require a specialized program, such as a word processor, to read them. For more information, see the Wikipedia. Return to Glossary. to Frankenstein is here: http://www.gutenberg.org/cache/epub/84/pg84.txt - This is the Gutenberg version of the full text

Here are the tools we will try:

  1. Cirrus (with Austen): http://voyeur.hermeneuti.ca/tool/Cirrus/?corpus=1308408654248.9846&stopList=stop.en.taporware.txt
  2. Word Trends (with Humanist): http://dev.voyeurtools.org:8080/tool/TypeFrequenciesChart/?corpus=humanist
  3. Links (Collocates): http://dev.voyeurtools.org:8080/tool/Links/?corpus=1308459917755.5623&mode=document&stopList=stop.en.taporware.txt

We will discuss the standard controls for a panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary. and how you can cite and embed panels (with their texts).

3.0 Distant Reading: Analyzing a Single Text

In the third part of the Workshop we will show you how to use Voyeur to analyze a single text as a way of learning the interface.

  • We will open Voyeur:
    • Show how to load a text (including XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. options)
    • Show the different panels that appear initially
      • Discuss the order they open and the Summary panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary.
      • Go over the Words in the Entire Corpus panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary. (Options, Columns, Search, Favorites)
    • Discuss the full set of panels
    • Show how to manage panels
    • Discuss trigger order of panels (flow within Voyeur)
    • Show how to get help (Mention Quick Guide)
    • Show how to make a list of favorite words to explore searching for words and saving in favorites
  • Now you should try Voyeur with your text or the Frankenstein text above. To open the Frankenstein click here:

http://dev.voyeurtools.org:8080/tool/Cirrus/?corpus=1308459917755.5623&stopList=stop.en.taporware.txt

  • Some things to try:
    • Experiment with the Options (like the Stop Word list)
    • Create a Favorites list for a theme and and explore that list
    • Search for phrases

4.0 Distant Reading: Analyzing a Corpus with Correspondence Analysis

In the fourth part of the Workshop we will look at working with a corpus using a different skin and the Correspondence Analysis tool. We will use Voyeur on the archives of HUMANIST from 1987 to 2008 (21 documents.)

We will use the Humanist Corpus with a different skin or arrangement of panels:

http://dev.voyeurtools.org:8080/?corpus=humanist&skin=scatter&stopList=stop.en.taporware.txt

We will discuss how to use the Correspondence Analysis tool to explore themes in a diachronic corpus. For more on CA see http://stefansinclair.name/correspondence-analysis

Some of the features to look at:

  • Controlling the visualization (labels, words, etc.)
  • Using the list of words (selecting multiple words)
  • Controlling panels

5.0 Using your own text

  • Now you can try your own text. We will show the different ways of providing Voyeur a text:
    • Typing a text or pasting it in
    • Typing in one or more URLs
    • Uploading a text
  • We will then discuss the formats of texts that will work, and what will happen to them:
    • file formats: text, HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers. HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this: < body >< p >< /p >< /body > In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body. Elements may also be modified by attributes and attribute values: < p class="hangingindent" > In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. , XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. , RSS, TEI, PDF, MS Word, RTF
    • Finally we will Discuss caching and so on

Try your own text now.

6.0 Exporting Data and Quoting Analytics

There are different ways to export data and quote analytical results:

  • You can export tab-separated values, copy and pasted into Excel
  • You can export of XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. results from KWICsA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary. (for instance)
  • You can embed live tool snippets (in a blog post, TADA, etc.)

7.0 Wrap-Up

  • other aspects: skins, tool browser
  • how to give feedback
  • future: Voyeur Notebooks, new TAPoR
  • thanks!

DH2010 Introduction to Voyeur

This is an outline for a workshop on Voyeur. It was developed for a workshop before DH 2010 in London, England.

1.0 Introduction

2.0 Analyzing a Single Text

In the first part of the Workshop we will show you how to use Voyeur to analyze a single text as a way of learning the interface. We will work with the Introduction, Preface, Chapter 1 and Chapter 2 of Mary Shelley's Frankenstein. The plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files. Plain text files do not require a specialized program, such as a word processor, to read them. For more information, see the Wikipedia. Return to Glossary. is here:

http://taporware.ualberta.ca/sampleDocs/plainText.txt - This is just a couple of chapters

http://www.gutenberg.org/cache/epub/84/pg84.txt - This is the Gutenberg version of the full text

  • We will open Voyeur:
    • Show how to load a text
    • Show the different panels that appear initially
      • Discuss the order they open and the Summary panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary.
      • Go over the Words in the Entire Corpus panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary. (Options, Columns, Search, Favorites)
    • Discuss the full set of panels
    • Show how to manage panels
    • Discuss trigger order of panels (flow within Voyeur)
    • Show how to get help (Mention Quick Guide)
    • Show how to make a list of favorite words to explore searching for words and saving in favorites
  • Now you should try Voyeur with your text or the Frankenstein text above. To open the Frankenstein click here:

http://voyeurtools.org/?corpus=1278409278561.646

  • Some things to try:
    • Experiment with the Options (like the Stop Word list)
    • Create a Favorites list for a theme and and explore that list
    • Search for phrases

3.0 Analyzing a Corpus

In the second part of the Workshop we will look at working with a corpus or collection of many texts. We will use Voyeur on the archives of HUMANIST from 1987 to 2008 (21 documents.) The Voyeur index is at:

http://voyeurtools.org/?corpus=humanist

  • We will show you how to:
    • Show how to set various options, like stoplists
    • Show how to hide and show columns
    • Manage multiple documents
    • Show how to group results
    • Show comparing documents
  • Try looking for trends yourself

4.0 Using your own text

  • Now you can try your own text. We will show the different ways of providing Voyeur a text:
    • Typing a text or pasting it in
    • Typing in one or more URLs
    • Uploading a text
  • We will then discuss the formats of texts that will work, and what will happen to them:
    • file formats: text, HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers. HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this: < body >< p >< /p >< /body > In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body. Elements may also be modified by attributes and attribute values: < p class="hangingindent" > In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. , XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. , RSS, TEI, PDF, MS Word, RTF
    • Finally we will Discuss caching and so on
  • Now try your own text.

5.0 Exporting Data and Quoting Analytics

We will now show how to export data and quote analytical results:

  • How to export tab-separated values, copy and pasted into Excel
  • How to export of XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. results from KWICsA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary. (for instance)
  • How to quote an analytical result in TADA.
  • Go to http://tada.mcmaster.ca/Sandbox/VoyeurWorkshop to try it yourself.

6.0 Advanced and Other

7.0 To Prepare

  • Make sure we have Voyeur running with a backup
  • Sort out how participants can get on wireless
  • Powerbars for laptops
  • What texts will we use?
  • Preindex texts and create a Workshop web page on Hermeneuti.c

DH2012 Workshop in Hamburg

This is a script for a workshop on using Voyant Toos for the Digital Humanities 2012 Conference in Hamburg.

 

1.0 Introduction

 

  • Workshop leaders: Stéfan Sinclair, McGill University and Geoffrey Rockwell
  • Overview
    Voyant Tools is a web-based environment for reading and analyzing digital texts, created Stéfan Sinclair and Geoffrey Rockwell. It was previously called "Voyeur" so don't be confused if that name is used. Voyant is the next generation in a series of text analysis tools that include HyperPo and TAPoRware. It provides tables and graphs related to word use across a single document or a collection. Voyant adds, among other things, the ability to handle much larger files than the previous tools could. Voyant is actually a suite of modular tools that can be combined in pre-defined or user-defined combinations called skins. This workshop's primary objectives are to better understand how and why one might use Voyant Tools to help in the study of digital texts.
  • Outline
    In this workshop we will:
    • First, look at how to use a single Voyant tool, Cirrus, with a small multilingual corpus.
    • Then learn how to use the normal "skin" (multi-tool interface) of Voyant with a single text.
    • Show how to load your own text(s) into Voyant.
    • Look at some of the more exploratory and advanced tools available in Voyant, such a Bubbles and Correspondence Analysis.
    • Discuss the use of Voyant Tools in a larger research process (embedding tools in remote content, etc.)
  • Help
    Here are some useful links:

N.B. Voyant Tools is in beta, it has warts and blemishes. Always view what you're looking at with some circumspection and if something doesn't work as expected, assume it's a bug, not something that you're misunderstanding (and please tell us about it).


2.0 Using a single Voyant Tool: Cirrus

Voyant Tools has a number of different tools that can be composed into skins or used individually. We will start with just one tool called Cirrus that can then spawn other tools. We will try it with the English version of the Universal Declaration of Human Rights.

http://work.voyant-tools.org/tool/Cirrus/?corpus=unhr&docIndex=0&stopList=stop.en.taporware.txt&toolFlow=simple

The Cirrus tool shows you a word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. Return to Glossary. of high frequency words. Some questions to ask yourself:

  • What words did you expect? What words are missing? What words are interesting?
  • How does the tool arrange words and choose colours? Is there any correspondence between size and frequency?

Here are some more Cirrus visualizations to consider:

These types of word clouds are prevalent from academia to advertising – they quickly provide an intriguing representation of a text, as demonstrated by this example of studying gendered languages in toy advertising. But they're ability to rapidly convey a picture with words comes at the cost of information reduction, and some are highly critical of word clouds as hermeneutical tools. What do you think?

These Cirrus visualizations don't show all top frequency words, so-called stopwords are missing – stopwords are function words (like determiners and prepositions) that typically carry less meaning. What to include in a stopword list is a matter of interpretation and purpose. Are numbers (like "one") important? What about words like "against"? A new feature in Voyant Tools is the ability to set and edit stopword lists. To do so, click on the options (gear) icon and then click on "Edit Stop Words".

Try It: Try clicking on a word. It will launch a second tab or window with a list of the texts in the corpus with the frequency of the word you clicked on.

Try It: Now try double-clicking on one of the texts. This should launch another tab or window with a Key Word In ContextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph. Context is particularly important when generating a concordance for a string. Return to Glossary. (KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary.) of the word in that text. Note that you may have to allow pop-ups.

Try It: Try some of the other individual tools at docs.voyant-tools.org/tools


3.0 Using a Reading Skin

Voyant Tools can also be composed into "skins" that combine tools as panels so that they can be used interactively. Here is the same corpus in a simple skin.

In this skin clicking in one panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary. will often (but not always) update other panels. Try the following:

  • Triggering: Click on words in the Cirrus word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. Return to Glossary.. Then click on a text in the Word Trends and play with the KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary..
  • Changing Settings: Try changing the settings for the Cirrus by clicking on the small gear icon. Try playing with the Word Trends
  • Showing and Hiding Panels: Try showing and hiding panels using the small up and down arrows in the upper-right of the panels.

When in doubt just restart the session by hitting refresh.


4.0 Using Voyant on You Own Text

Voyant Tools can be used on your own text or corpus. To do that you go to the simple URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. for the tool:

Voyant: work.oyant-tools.org

Just the Cirrus tool in Voyant: work.voyant-tools.org/tool/Cirrus

You will get panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary. that asks you for a text. You can provide:

  • One or more URLs to texts on the web
  • Upload a text or a zipped collection of texts
  • Upload plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files. Plain text files do not require a specialized program, such as a word processor, to read them. For more information, see the Wikipedia. Return to Glossary., HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers. HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this: < body >< p >< /p >< /body > In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body. Elements may also be modified by attributes and attribute values: < p class="hangingindent" > In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. , or XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. texts
  • Upload a PDF (and Voyant will try to extract the text)
  • Paste in a text

Voyant is forgiving, but there are nonetheless issues (and bugs).

Note that you can create a persistent URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. for your corpus – that way your link can be shared or bookmarked and you won't need to reload the texts into Voyant. Click the save icon (disk icon) in the blue bar at the top and the first URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. will be the link for your Voyant corpus.


5.0 Exploratory and Advanced Tools

Voyant Tools is conceived on the notion that text analysis in the humanities is a practice of re-presenting the text, not about producing incontrovertible evidence. Some Voyant Tools are more about aesthetic or ludic aspects of experiencing digital texts, which can directly or indirectly inspire observations that may not be otherwise possible. Here are some examples:

Specialized Skins

At the same time, some tools are more advanced. For instance, one use a  correspondence analysis skin that shows how terms map across multiple documents, such as this view of the Humanist Discussion Group listserv.

Voyant Tools also enables some quick-and-dirty social network analysis. This is possible thanks to a process called named entity extraction (NER) that attempts to automatically identify people, places and locations in a text (at the moment Voyant Tools uses the Stanford Natural Language Processing package to perform this automated process). It's worth emphasizing that automated processes like these are subject to several issues and problems – for instance, how to combine or differentiate between uses of first and/or last names? how to tell if a same name refers to one or two different people? What to do when an organization looks like a person's name (e.g. Johns Hopkins)? Still, you can't beat the simplicity of Voyant Tools RezoViz, especially when working with a mid-size corpus of shorter texts (5-50 articles, for instance). For instance, here is a specialize interface showing connections between people mentioned in emails to the Humanist listserv (RezoViz is in alpha and best experienced in Chrome).

As always, the real strength of Voyant Tools is the ability to create your own corpus – you can start at work.voyant-tools.org/tool/RezoViz.


6.0 Voyant as a Scholarly Tool

One of the essential design principles of Voyant Tools is that it tries to be useful not just at the moment of analysis, but through more phases of research. Here are some examples:

  • as we've already seen, you can export a link to a corpus that can be bookmarked, shared by email or Twitter, or otherwise preserved (as a general rule, a corpus in Voyant will remain accessible as long as it has been consulted at least once in the past month)
  • there's built-in Zotero awareness – you can click on the folder/article icon in the Firefox address bar to create a new entry (though you may wish to complete some of the metadata)
  • you can export data for other applications – for instance, produce a tab-separated values view of a table that can be copy-and-pasted into a spreadsheet application (where you can edit the data and produce even more graphs, charts, etc.)
  • you can embed a live tool in remote content (a blog post, a journal article, a term paper, etc.), much as you would embed a YouTube clip – the interactive affordances of Voyant allow you to go beyond static screenshots and images and allow your users/readers to engage with the content and data themselves, like the with the DH2012 abstracts

7.0 Other Stuff

Here are some other useful resources.

Other Tools:

  • TAPoR 2.0 - Discover and comment on tools. For example, here are the Voyant Tools listed in TAPoR 2.0. Leave a comment on your favorite Voyant tool. Link to a project where you use it.
  • TAPoRWare - Simple tools for processing plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files. Plain text files do not require a specialized program, such as a word processor, to read them. For more information, see the Wikipedia. Return to Glossary., HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers. HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this: < body >< p >< /p >< /body > In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body. Elements may also be modified by attributes and attribute values: < p class="hangingindent" > In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. , and XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary.
  • CWRC List of Visualization Tools
  • DIRTROD

Other Corpora:

Other Voyant Skins:

 


 

 

Dublin 2011: From Metadata to Linked Data

This workshop outline is for a Summer School at Trinity College Dublin. See http://dho.ie/summerschool2011 for the full description. This outline is for Day 3  on Generating Textual Data:

Day 3: Generating Textual Data, Tobias Blanke and Geoffrey Rockwell

Based on the results of Day II, participants will dig deeper into the details of generating textual data using text and data mining techniques. Participants will learn methods to algorithmically create textual data while critically evaluating existing tools, methods, and solutions as well as their future potential. They will gain insights on how generic services need to be modified to serve the needs of humanities research. Finally, we will investigate how to generate output can be reused in the emerging web of data.


This is an outline for a workshop on Voyeur. It was developed for a workshop before DH 2010 in London, England.

1.0 Introduction

2.0 TAPoRware: A Simple Recipe for Studying Themes in a Text

In the second part of the Workshop we will show you how to use TAPoRware to analyze a single text as a way of thinking about techniques in text analysis. We will work with the Introduction, Preface, Chapter 1 and Chapter 2 of Mary Shelley's Frankenstein. The plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files. Plain text files do not require a specialized program, such as a word processor, to read them. For more information, see the Wikipedia. Return to Glossary. is here:

http://taporware.ualberta.ca/sampleDocs/plainText.txt - This is just a couple of chapters

http://www.gutenberg.org/cache/epub/84/pg84.txt - This is the Gutenberg version of the full text

We will also be using some TAPoRware tools and Recipes for TAPoRware. The Tools and Recipes are here:

List Words: http://taporware.ualberta.ca/~taporware/textTools/listword.shtml - Use short Frankenstein

Concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. Tool: http://taporware.ualberta.ca/~taporware/textTools/findtext.shtml - Use short Frankenstein

Weighted Centroid: http://taporware.ualberta.ca/~taporware/otherTools/wcentroid.shtml - Use short Frankenstein

Principal Component Analysis: http://taporware.ualberta.ca/~taporware/betaTools/pca.shtml - Use short Frankenstein

2.2 Using Voyeur Simple Tools

Cirrus Word Cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. Return to Glossary. (Frankenstein): http://dev.voyeurtools.org:8080/tool/Cirrus/?corpus=1309937516546.6692&query=&stopList=stop.en.taporware.txt

Cirrus Word Cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. Return to Glossary. (Austen): http://voyeur.hermeneuti.ca/tool/Cirrus/?corpus=1308408654248.9846&stopList=stop.en.taporware.txt

Other tools from Voyeur can be found here: http://hermeneuti.ca/voyeur/tools

3.0 Distant Reading: Analyzing a Single Text

In the third part of the Workshop we will show you how to use Voyeur to analyze a single text as a way of learning the interface.

  • We will open Voyeur:
    • Show how to load a text (Frankenstein: http://www.gutenberg.org/cache/epub/84/pg84.txt). Discuss different types of texts that can be loaded.
    • Show the different panels that appear initially
      • Discuss the order they open and the Summary panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary.
      • Discuss common features to panels
      • Go over the Words in the Entire Corpus panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary. (Options, Columns, Search, Favorites)
    • Show how to manage panels
    • Discuss trigger order of panels (flow within Voyeur)
    • Show how to get help (Mention Quick Guide)
    • Show how to make a list of favorite words to explore searching for words and saving in favorites
  • Now you should try Voyeur with your text or the Frankenstein text above. To open the Frankenstein click here:

http://voyeur.hermeneuti.ca/?corpus=1309937028026.8131

  • Some things to try:
    • Experiment with the Options (like the Stop Word list)
    • Create a Favorites list for a theme and and explore that list
    • Search for phrases

4.0 Distant Reading: Analyzing a Corpus

In the fourth part of the Workshop we will look at working with a corpus or collection of many texts. We will use Voyeur on the archives of HUMANIST from 1987 to 2008 (21 documents.) The Voyeur index is at:

http://voyeurtools.org/?corpus=humanist&skin=scatter&stopList=stop.en.taporware.txt

  • We will discuss:
    • Different skins with different panels
    • Correspondence analysis and the exploration of a large corpus
  • Try looking for trends yourself

5.0 Using your own text

  • Now you can try your own text. We will show the different ways of providing Voyeur a text:
    • Typing a text or pasting it in
    • Typing in one or more URLs
    • Uploading a text
  • We will then discuss the formats of texts that will work, and what will happen to them:
    • file formats: text, HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers. HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this: < body >< p >< /p >< /body > In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body. Elements may also be modified by attributes and attribute values: < p class="hangingindent" > In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. , XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. , RSS, TEI, PDF, MS Word, RTF
    • Finally we will Discuss caching and so on
  • Now try your own text.

6.0 Exporting Data and Quoting Analytics

We will now show how to export data and quote analytical results:

  • How to export tab-separated values, copy and pasted into Excel
  • How to export of XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. results from KWICsA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary. (for instance)
  • How to quote an analytical result in TADA.
  • Show going to http://tada.mcmaster.ca/Sandbox/VoyeurWorkshop to insert a panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary..

7.0 Skinning Voyeur

We will now look at how you can develop a different skin.

  • Open a corpus like http://voyeurtools.org/?corpus=1309931394540.8106
  • Click on the Export button (the disk button in upper right) and export to layout builder
  • Drag panels into the blank area to create a custom skin (Warning: many combinations won't work)

 

MLA 2013: TAPoR and Voyant Workshop for DHCommons

This is a script for a workshop on using Voyant and TAPoR for the MLA 2013: DHCommons Get Started in Digital Humanities.

The script can be found at http://hermeneuti.ca/node/250/

0.0 Before the Workshop

Here are some of the things you might want to do before the workshop on Voyant:

  • Review this workshop outline, follow links and review some of the help materials.
  • Prepare a text of your own to try with Voyant. To start you might find a novel-length text that you are familiar with and which interests you. Save it as a plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files. Plain text files do not require a specialized program, such as a word processor, to read them. For more information, see the Wikipedia. Return to Glossary. file somewhere where you can get at it during the workshop (on your laptop) or online.
  • Bring a laptop with wireless to use in the workshop.

 

1.0 Introduction

  • Workshop leader and participants introduce themselves:
    • Geoffrey Rockwell, University of Alberta, geoffrey (dot) rockwell (at) ualberta (dot) ca
  • Overview
    This workshop will quickly introduce you to computer assisted text analysis using Voyant and TAPoR. Voyant is currently a beta release by Stéfan Sinclair and Geoffrey Rockwell and is the next generation in a series of text analysis tools that include HyperPo and TAPoRware. TAPoR is web site for the discovery and review of text analysis tools including those in Voyant.
  • Outline
    In this workshop we will:
    • First, look at how to use a single Voyant tool, Cirrus, with different texts. 
    • Then learn how to use the normal skin of Voyant with a single text and then a corpus.
    • Then learn how to load your own text into Voyant.
    • Finally, we will look at TAPoR where you can find other tools.
  • Help
    Remember that the tools entered in TAPoR like Voyant are research tools and will often fail, especially when a whole group of people use it at once. There are multiple versions up if one server is down. If you need help, connect to Hermeneuti.ca and explore the resources there. Here are some useful links:
  • Voyant Tools
    • Individual Voyant tool descriptions and links can be found at docs.voyant-tools.org/tools
    • The main URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. for Voyant Tools in http://voyant-tools.org/, though there are other URLs that can be used:
      http://temp.voyant-tools.org/the primary server for this workshop, but it's a temporary server, so please avoid linking to it
      http://voyant-tools.org/the main server, content here usually persists for a month or longer (especially when accessed regularly)
      http://beta.voyant-tools.org/a development server, less stable, so avoid linking to it
      http://voyeur.hermeneuti.ca/a much older version of Voyant, more of a tourist attraction and last ditch solution
      Links for this workshop will look like this: http://bit.ly/VoyantShakespeare [temp, main, beta] where the first link resolves automatically (use it whenever possible) and the subsequent links provide backup. Please note that corpora for this workshop are located on all three servers, but if you load a corpus it will only be availble on the server where it was uploaded.



2.0 Using a single Voyant Tool: Cirrus

Voyant Tools has a number of different tools that can be composed into skins or used individually. We will start with just one tool called Cirrus that can then spawn other tools. We will try it with Mary Shelley's Frankenstein. Click on this link to open.

Cirrus (Frankenstein): http://bit.ly/VoyantCirrusFrankenstein [tempmainbeta]

Alternatively you can go to the tool: http://voyant-tools.org/tool/Cirrus and load the text http://www.gutenberg.org/cache/epub/84/pg84.txt

To learn more about the Cirrus tool go to http://docs.voyant-tools.org/tools and scroll down to read about Cirrus. Or go to TAPoR 2.0 and read a review.

The Cirrus tool shows you a word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. Return to Glossary. of high frequency words. Some questions to ask yourself:

  • What words did you expect? What words are missing? What words are interesting.
  • How does the tool arrange words and choose colours? Is there any correspondence between size and frequency?

Try It: Try clicking on a word. It will launch a second tab or window with the full Voyant reading environment. That's what we will look at next.

Try It: Now try other tools in Voyant. Go to http://docs.voyant-tools.org/tools and experiment with the tools. Warning some of them are prototypes that won't work that well. You will need to give a URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. for text to them. You can use the Frankenstein from http://www.gutenberg.org/cache/epub/84/pg84.txt



3.0 Using a Reading Skin

Voyant Tools can also be composed into "skins" that combine tools as panels so that they can be used interactively. Here is the same Frankenstein text and an Austen corpus in a simple skin:

Frankenstein: http://bit.ly/VoyantFrankensteinStop [temp, main, beta]

For a corpus see Austen (5 novels): http://bit.ly/VoyantAustenStop [temp, main, beta]

To learn about using the full Reading skin you can go to

In this skin clicking in one window will often (but not always) update other windows. Try the following:

  • Triggering: Click on words in the Cirrus word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. Return to Glossary.. Then click on a text in the Word Trends and play with the KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary..
  • Changing Settings: Try changing the settings for the Cirrus by clicking on the small gear icon. Try playing with the Word Trends
  • Showing and Hiding Panels: Try showing and hiding panels using the small up and down arrows in the upper-right of the panels.

When in doubt just restart the session by hitting refresh.


 

4.0 Using Voyant on You Own Text

Voyant Tools can be used on your own text or corpus. To do that you go to the simple URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. for the tool:

http://temp.voyant-tools.org/ (remember this is a temporary server, for more persistent URLs use the main server) [main, beta]

Just the Cirrus tool in Voyant: http://temp.voyant-tools.org/tool/Cirrus/

You will get panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary. that asks you for a text. You can provide:

  • One or more URLs to texts on the web
  • Upload a text or a zipped collection of texts
  • Upload plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files. Plain text files do not require a specialized program, such as a word processor, to read them. For more information, see the Wikipedia. Return to Glossary., HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers. HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this: < body >< p >< /p >< /body > In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body. Elements may also be modified by attributes and attribute values: < p class="hangingindent" > In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. , or XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. texts
  • Upload a PDF (and Voyant will try to extract the text)

Voyant is forgiving, but there are none-the-less bugs.



5.0 Other Stuff

Here are some links to other tools, different corpora and skins for specialized tools:


 

6.0 For After the Workshop

To understand the power and limitations of text analysis it is useful to use Voyant on your own text.

  1. Find or assemble a text of your own. 
  2. Try studying it with Voyant.
  3. What would you like to ask of your text but can't? What sort of tool would you like?

Finding Texts:

Aggregating and Cleaning Texts:


 

7.0 Other Tools

What other tools are there out there? See TAPoR 2.0 for a growing list of tools.

NITLE: Digital Reading Practices for the Liberal Arts Classroom (April 18, 2013)

(Please note that this page will be updated prior to the NITLE workshop.)

Introduction to the Session

  • who we are
  • about this session (show, try, discuss - questions please!)

First Encounters with Voyant Tools

Can you guess what text this is?

Introducing Voyant Tools in the Classroom

  • why use tools to read and analyze?
    • digital text is everywhere
    • digital texts allow for a proliferation of representations
  • reading digital texts: addressing the gap between algorithmic thinking and interpretation
  • helpful teaching resources
  • limitations & strengths of Voyant

Where to Go Next?

  • advanced functionality in Voyant Tools: other tools, skins, skin builder
  • embedding Voyant Tools in remote content
  • development plans for Voyant: better linguistic analysis and Voyant Notebooks
  • other tools and methodologies: TAPoR.ca, Bamboo DiRT, Many Eyes, etc.

 

THATCamp Kansas (2012)

This is a script for a workshop on using Voyant for the Kansas THATCamp. It is available at http://hermeneuti.ca/workshops/kansas12

0.0 Before the THATCamp

Here are some of the things you might want to do before the workshop on Voyant:

  • Read an introduction to text analysis like The Measured Words (distributed before the camp.)
  • Review this workshop outline, follow links and review some of the help materials.
  • Prepare a text of your own to try with Voyant. To start you might find a novel-length text that you are familiar with and which interests you. Save it as a plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files. Plain text files do not require a specialized program, such as a word processor, to read them. For more information, see the Wikipedia. Return to Glossary. file somewhere where you can get at it during the workshop (on your laptop) or online.
  • Bring a laptop with wireless to use in the workshop.

 


 

1.0 Introduction

  • Workshop leader and participants introduce themselves:
    • Geoffrey Rockwell, University of Alberta, geoffrey (dot) rockwell (at) ualberta (dot) ca
  • Overview
    Voyant is currently a beta release by Stéfan Sinclair and Geoffrey Rockwell. It was previously called "Voyeur" so do not be confused if that name is used. Voyant is the next generation in a series of text analysis tools that include HyperPo and TAPoRware. It provides tables and graphs related to word use across a single document or a collection. Voyant adds, among other things, the ability to handle much larger files than the previous online tools could.
  • Outline
    In this workshop we will:
    • First, look at how to use a single Voyant tool, Cirrus, with different texts. 
    • Then learn how to use the normal skin of Voyant with a single text and then a corpus.
    • Finally, show how to load your own text into Voyant.
  • Help
    Remember that Voyant is a research tool and will often fail, especially when a whole group of people use it at once. There are multiple versions up if one server is down. If you need help, connect to Hermeneuti.ca and explore the resources there. Here are some useful links:
  • Voyant Tools



2.0 Using a single Voyant Tool: Cirrus

Voyant Tools has a number of different tools that can be composed into skins or used individually. We will start with just one tool called Cirrus that can then spawn other tools. We will try it with Mary Shelley's Frankenstein. Click on this link to open.

Cirrus (Frankenstein): http://bit.ly/VoyantCirrusFrankenstein [tempmainbeta]

Alternatively you can go to the tool: http://voyant-tools.org/tool/Cirrus and load the text http://www.gutenberg.org/cache/epub/84/pg84.txt

To learn more about the Cirrus tool go to http://docs.voyant-tools.org/tools and scroll down to read about Cirrus. Or go to TAPoR 2.0 and read a review.

The Cirrus tool shows you a word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. Return to Glossary. of high frequency words. Some questions to ask yourself:

  • What words did you expect? What words are missing? What words are interesting.
  • How does the tool arrange words and choose colours? Is there any correspondence between size and frequency?

Try It: Try clicking on a word. It will launch a second tab or window with the full Voyant reading environment. That's what we will look at next.

Try It: Now try other tools in Voyant. Go to http://docs.voyant-tools.org/tools and experiment with the tools. Warning some of them are prototypes that won't work that well. You will need to give a URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. for text to them. You can use the Frankenstein from http://www.gutenberg.org/cache/epub/84/pg84.txt



3.0 Using a Reading Skin

Voyant Tools can also be composed into "skins" that combine tools as panels so that they can be used interactively. Here is the same Frankenstein text and an Austen corpus in a simple skin:

Frankenstein: http://bit.ly/VoyantFrankensteinStop [temp, main, beta]

For a corpus see Austen (5 novels): http://bit.ly/VoyantAustenStop [temp, main, beta]

To learn about using the full Reading skin you can go to

In this skin clicking in one window will often (but not always) update other windows. Try the following:

  • Triggering: Click on words in the Cirrus word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. Return to Glossary.. Then click on a text in the Word Trends and play with the KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary..
  • Changing Settings: Try changing the settings for the Cirrus by clicking on the small gear icon. Try playing with the Word Trends
  • Showing and Hiding Panels: Try showing and hiding panels using the small up and down arrows in the upper-right of the panels.

When in doubt just restart the session by hitting refresh.


 

4.0 Using Voyant on You Own Text

Voyant Tools can be used on your own text or corpus. To do that you go to the simple URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. for the tool:

http://temp.voyant-tools.org/ (remember this is a temporary server, for more persistent URLs use the main server) [main, beta]

Just the Cirrus tool in Voyant: http://temp.voyant-tools.org/tool/Cirrus/

You will get panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary. that asks you for a text. You can provide:

  • One or more URLs to texts on the web
  • Upload a text or a zipped collection of texts
  • Upload plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files. Plain text files do not require a specialized program, such as a word processor, to read them. For more information, see the Wikipedia. Return to Glossary., HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers. HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this: < body >< p >< /p >< /body > In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body. Elements may also be modified by attributes and attribute values: < p class="hangingindent" > In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. , or XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. texts
  • Upload a PDF (and Voyant will try to extract the text)

Voyant is forgiving, but there are none-the-less bugs.



5.0 Other Stuff

Here are some links to other tools, different corpora and skins for specialized tools:


 

6.0 For After the Workshop

To understand the power and limitations of text analysis it is useful to use Voyant on your own text.

  1. Find or assemble a text of your own. 
  2. Try studying it with Voyant.
  3. What would you like to ask of your text but can't? What sort of tool would you like?

Finding Texts:

Aggregating and Cleaning Texts:


 

7.0 Other Tools

What other tools are there out there? See TAPoR 2.0 for a growing list of tools.

Trinity Long Room Hub 2012: Workshop on Voyant

This is a script for a workshop on using Voyant for the TCD community. It is available at http://hermeneuti.ca/node/222

1.0 Introduction

  • Workshop leader and participants introduce themselves:
  • Overview
    Voyant is currently a beta release by Stéfan Sinclair and Geoffrey Rockwell. It was previously called "Voyeur" so do not be confused if that name is used. Voyant is the next generation in a series of text analysis tools that include HyperPo and TAPoRware. It provides tables and graphs related to word use across a single document or a collection. Voyant adds, among other things, the ability to handle much larger files than the previous tools could.
  • Outline
    In this workshop we will:
    • First, look at how to use a single Voyant tool, Cirrus, with different texts. 
    • Then learn how to use the normal skin of Voyant with a single text and then a corpus.
    • Finally, show how to load your own text into Voyant.
  • Help
    Remember that Voyant is a research tool and will often fail. There are multiple versions up if one server is down. If you need help, connect to Hermeneuti.ca and explore the resources there. Here are some useful links:



2.0 Using a single Voyant Tool: Cirrus

Voyant Tools has a number of different tools that can be composed into skins or used individually. We will start with just one tool called Cirrus that can then spawn other tools. We will try it with Mary Shelley's Frankenstein. Click on this link to open.

Cirrus (Frankenstein): http://voyant-tools.org/tool/Cirrus/?corpus=1332317356275.4528&query=&stopList=stop.en.taporware.txt&toolFlow=simple

For a backup go here: http://voyant-tools.org/tool/Cirrus and enter text http://www.gutenberg.org/cache/epub/84/pg84.txt

To learn more about the Cirrus tool go to http://hermeneuti.ca/voyeur/tools and scroll down.

The Cirrus tool shows you a word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. Return to Glossary. of high frequency words. Some questions to ask yourself:

  • What words did you expect? What words are missing? What words are interesting.
  • How does the tool arrange words and choose colours? Is there any correspondence between size and frequency?

Try It: Try clicking on a word. It will launch a second tab or window with the full Voyant reading environment. That's what we will look at next.

Try It: Now try other tools in Voyant. Go to http://hermeneuti.ca/voyeur/tools and experiment with the tools. Warning some of them are prototypes that won't work that well. You will need to give a URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. for text to them. You can use the Frankenstein from http://www.gutenberg.org/cache/epub/84/pg84.txt



3.0 Using a Reading Skin

Voyant Tools can also be composed into "skins" that combine tools as panels so that they can be used interactively. Here is the same Frankenstein text and an Austen corpus in a simple skin:

Frankenstein: http://voyant-tools.org/?corpus=1332317356275.4528&skin=simple&event=corpusTypeSelected

Austen (5 novels): http://voyeurtools.org/?corpus=JaneAusten&stopList=stop.en.taporware.txt (backup)

To learn about using the full Reading skin you can go to

In this skin clicking in one window will often (but not always) update other windows. Try the following:

  • Triggering: Click on words in the Cirrus word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. Return to Glossary.. Then click on a text in the Word Trends and play with the KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary..
  • Changing Settings: Try changing the settings for the Cirrus by clicking on the small gear icon. Try playing with the Word Trends
  • Showing and Hiding Panels: Try showing and hiding panels using the small up and down arrows in the upper-right of the panels.

When in doubt just restart the session by hitting refresh.

 


 

4.0 Using Voyant on You Own Text

Voyant Tools can be used on your own text or corpus. To do that you go to the simple URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. for the tool:

Voyant: http://voyant-tools.org

Just the Cirrus tool in Voyant: http://voyant-tools.org/tool/Cirrus/

Backup older version: http://voyeur.hermeneuti.ca

You will get panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary. that asks you for a text. You can provide:

  • One or more URLs to texts on the web
  • Upload a text or a zipped collection of texts
  • Upload plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files. Plain text files do not require a specialized program, such as a word processor, to read them. For more information, see the Wikipedia. Return to Glossary., HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers. HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this: < body >< p >< /p >< /body > In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body. Elements may also be modified by attributes and attribute values: < p class="hangingindent" > In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. , or XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. texts
  • Upload a PDF (and Voyant will try to extract the text)

Voyant is forgiving, but there are none-the-less bugs.



5.0 Other Stuff

Here are some different corpora and skins for specialized tools:

 


 

6.0 For Next Week

To understand the power and limitations of text analysis it is useful to use Voyant on your own text. Please try it over the week:

  1. Find or assemble a text of your own. 
  2. Try studying it with Voyant.
  3. What works and what doesn't? Document your problems and questions. 
  4. What would you like to ask of your text but can't? What sort of tool would you like?

Next week we will discuss what works and doesn't; we will look at advanced features; and discuss other tools.

Finding Texts:

Aggregating and Cleaning Texts:

 


 

7.0 Second Workshop

This second workshop will be less structured. We will do the following:

  • Participants who had a chance to experiment with Voyant can report back.
  • Discussion of problems and desires for Voyant.
  • Exporting Voyant results. Demonstration of how you might put Voyant results into Excel.
  • Getting an image result. Demonstration of getting a PNG for a trend graph.
  • Placing a Voyant PanelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary.. Demonstration of placing in TADA here http://tada.mcmaster.ca/Main/VoyantTest
  • Content Analysis - what can you do? Demonstration of Globe Work.

Other Tools

What other tools are there out there? See TAPoR 2.0 beta for a growing list of tools.

Other Stuff

ePorte Roots and Routes Voyant Workshop, May 2012

This is a script for a workshop on using Voyant Toos for the ePorte Roots and Routes Summer Institue. It is available at http://hermeneuti.ca/workshops/roots12

1.0 Introduction

  • Workshop leader: Stéfan Sinclair, McGill University
  • Overview
    Voyant Tools is a web-based environment for reading and analyzing digital texts, created Stéfan Sinclair and Geoffrey Rockwell. It was previously called "Voyeur" so don't be confused if that name is used. Voyant is the next generation in a series of text analysis tools that include HyperPo and TAPoRware. It provides tables and graphs related to word use across a single document or a collection. Voyant adds, among other things, the ability to handle much larger files than the previous tools could. Voyant is actually a suite of modular tools that can be combined in pre-defined or user-defined combinations called skins. This workshop's primary objectives are to better understand how and why one might use Voyant Tools to help in the study of digital texts.
  • Outline
    In this workshop we will:
    • First, look at how to use a single Voyant tool, Cirrus, with a small corpus of Austen texts.
    • Then learn how to use the normal "skin" (multi-tool interface) of Voyant with a single text.
    • Show how to load your own text(s) into Voyant.
    • Explore some of the more exploratory and advanced tools available in Voyant, such a Bubbles and Correspondence Analysis.
    • Discuss the use of Voyant Tools in a larger research process (managing links in Zotero, embedding tools in remote content, etc.)
  • Help
    If you need help, connect to Hermeneuti.ca and explore the resources there. Here are some useful links:

N.B. Voyant Tools is in beta, it has warts and blemishes. Always view what you're looking at with some circumspection and if something doesn't work as expected, assume it's a bug, not something that you're misunderstanding.

2.0 Using a single Voyant Tool: Cirrus

Voyant Tools has a number of different tools that can be composed into skins or used individually. We will start with just one tool called Cirrus that can then spawn other tools. We will try it with Jane Austen's Persuasion.

Cirrus (Austen's Persuasion): http://voyant-tools.org/tool/Cirrus/?corpus=JaneAusten&docIndex=5&stopList=stop.en.taporware.txt&toolFlow=simple (backup)

The Cirrus tool shows you a word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. Return to Glossary. of high frequency words. Some questions to ask yourself:

  • What words did you expect? What words are missing? What words are interesting?
  • How does the tool arrange words and choose colours? Is there any correspondence between size and frequency?

Here are some more Cirrus visualizations to consider:

These types of word clouds are prevalent from academia to advertising – they quickly provide an intriguing representation of a text, as demonstrated by this example of studying gendered languages in toy advertising. But they're ability to rapidly convey a picture with words comes at the cost of information reduction, and some are highly critical of word clouds as hermeneutical tools. What do you think?

Try It: Try clicking on a word. It will launch a second tab or window with a list of the texts in the corpus with the frequency of the word you clicked on.

Try It: Now try double-clicking on one of the texts. This should launch another tab or window with a Key Word In ContextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph. Context is particularly important when generating a concordance for a string. Return to Glossary. (KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary.) of the word in that text.

3.0 Using a Reading Skin

Voyant Tools can also be composed into "skins" that combine tools as panels so that they can be used interactively. Here is the same Austen corpus in a simple skin:

http://voyant-tools.org/?corpus=JaneAusten&stopList=stop.en.taporware.txt (backup)

In this skin clicking in one window will often (but not always) update other windows. Try the following:

  • Triggering: Click on words in the Cirrus word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. Return to Glossary.. Then click on a text in the Word Trends and play with the KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary..
  • Changing Settings: Try changing the settings for the Cirrus by clicking on the small gear icon. Try playing with the Word Trends
  • Showing and Hiding Panels: Try showing and hiding panels using the small up and down arrows in the upper-right of the panels.

When in doubt just restart the session by hitting refresh.

4.0 Using Voyant on You Own Text

Voyant Tools can be used on your own text or corpus. To do that you go to the simple URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. for the tool:

Voyant: http://voyant-tools.org

Just the Cirrus tool in Voyant: http://voyant-tools.org/tool/Cirrus/

Backup version: http://beta.voyant-tools.org/

You will get panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary. that asks you for a text. You can provide:

  • One or more URLs to texts on the web
  • Upload a text or a zipped collection of texts
  • Upload plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files. Plain text files do not require a specialized program, such as a word processor, to read them. For more information, see the Wikipedia. Return to Glossary., HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers. HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this: < body >< p >< /p >< /body > In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body. Elements may also be modified by attributes and attribute values: < p class="hangingindent" > In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. , or XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. texts
  • Upload a PDF (and Voyant will try to extract the text)

Voyant is forgiving, but there are nonetheless bugs.

Note that you can create a persistent URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. for your corpus – that way your link can be shared or bookmarked and you won't need to reload the texts into Voyant. Click the save icon in the blue bar at the top and the first URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. will be the link for your Voyant corpus.

5.0 Exploratory and Advanced Tools

Voyant Tools is conceived on the notion that text analysis in the humanities is as much about proliferating representations of texts than about producing incontrovertible evidence. Some Voyant Tools are more about aesthetic or ludic aspects of experiencing digital texts, which can directly or indirectly inspire observations that may not be otherwise possible. Here are some examples:

At the same time, some tools are more advanced. For instance, once can produce correspondence analysis views that try to show how terms map across multiple documents, such as this view of the Humanist Discussion Group listserv.

Voyant Tools also enables some quick-and-dirty social network analysis. This is possible thanks to a process called named entity extraction that attempts to automatically identify people, places and locations in a text (at the moment Voyant Tools uses the Stanford Natural Language Processing package to perform this automated process). It's worth emphasizing that automated processes like these are subject to several issues and problems – for instance, how to combine or differentiate between uses of first and/or last names? how to tell if a same name refers to one or two different people? What to do when an organization looks like a person's name (e.g. Johns Hopkins)? Still, you can't beat the simplicity of Voyant Tools RezoViz, especially when working with a mid-size corpus of shorter texts (5-50 articles, for instance). For instance, here is a specialize interface showing connections between people mentioned in emails to the Humanist listserv (RezoViz is in alpha and best experienced in Chrome).

As always, the real strength of Voyant Tools is the ability to create your own corpus – you can start at http://voyant-tools.org/?skin=rezoviz.

6.0 Voyant as a Scholarly Tool

One of the essential design principles of Voyant Tools is that it tries to be useful not just at the moment of analysis, but through more phases of research. Here are some examples:

  • as we've already seen, you can export a link to a corpus that can be bookmarked, shared by email or Twitter, or otherwise preserved (as a general rule, a corpus in Voyant will remain accessible as long as it has been consulted at least once in the past month)
  • there's built-in Zotero awareness – you can click on the folder/article icon in the Firefox address bar to create a new entry (though you may wish to complete some of the metadata)
  • you can export data toward other applications – for instance, produce a tab-separated values view of a table that can be copy-and-pasted into a spreadsheet application (where you can edit the data and produce even more graphs, charts, etc.)
  • you can embed a live tool in remote content (a blog post, a journal article, a term paper, etc.), much as you would embed a YouTube clip – the interactive affordances of Voyant allow you to go beyond static screenshots and images and allow your users/readers to engage with the content and data themselves

 

7.0 Other Stuff

Appendices

This section is mostly a parking lot for miscellaneous subsections – content will eventually be moved, integrated elsewhere, or deleted.

Functionality

Some of the discussed functionality:

Input

  • from Zotero
  • from Firefox plug in
  • from portal
  • from interactive essays
  • from web site
  • from results buttons that allow recapitulatio (like in portal or taporware)
  • from links
  • from eclipse
  • panels
  • command lineA line is the string of text limited by the width of a page. Lines are often used in tokenization, and may contain parts of one or more sentences. For example "The quick brown fox jumps over the lazy dog." is a complete sentence and occurs on one line. By contrast, "Hard by a great forest dwelt a poor wood-cutter with his wife and his two children. The boy was called Hansel and the girl Gretel. He had little to bite and to break, and once when great dearth fell on the land, he could no longer procure even daily bread." spans three sentences and four lines. Return to Glossary.
  • application (eclipse)
  • swing based interface

Output:

  • out to blogs
  • output to portal research log
  • output to gathering (a tiddlywiki web page that you can save to computer) –

Tools:

  • tzeeker is a panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary. builder

Priority Tools:

  • cleaner
  • list words with distribution
  • comparative list words
  • search concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary.
  • KWicA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary.
  • repeated phrases
  • distribution
  • visual collocator

Logging:

  • ability to log what happens

Interface to frameworks

  • api that provides info like progress
  • good error response
  • Help and stuff through the framework

How to Contribute

Here is where we will put information about how users can contribute to code, essays, and so on.

Temporary Workaround for Additional Tools

At the moment the default interface of Voyeur doesn't expose the range of tools that are available. As an awkward workaround, you can try this:

Click on the export icon to generate a URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. of your corpus (notice that here I'm working on voyeurtools.org instead of voyeur.hermeneuti.ca):

Export Image

This will generate a URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. that looks something like this: http://voyeurtools.org/?corpus=1278412802513.7776

What you can do is insert tool/<em>toolName</em> in this address to see other tools (your mileage may vary):

Tools

Tools AlphabeticallyTools by DisplayTools by Scope
  • Bubblelines (word distribution visualization)
  • Bubbles (text reading visualization)
  • Cirrus (word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. Return to Glossary. visualization)
  • Corpus Grid (table of texts in a corpus)
  • Corpus Summary (corpus overview)
  • Corpus Term Frequencies (table of term frequencies by corpus)
  • Collocate Term Frequencies (table of term frequencies in proximity to keyword)
  • Document Term Frequencies (table of term frequencies by document)
  • Document KWICs (concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. or table of keywords in contextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph. Context is particularly important when generating a concordance for a string. Return to Glossary.)
  • Entities Browser (named entities visualization)
  • Knots (term occurrence visualization)
  • Lava (keyword in contextA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary. visualization)
  • Links (term frequencies in proximity to keyword visualization)
  • Mandala (term browsing visualization)
  • Reader (large-scale document reader)
  • ScatterPlot (term distribution visualization)
  • Term Frequencies Chart (term distribution visualization)
  • Term Fountain (term frequencies visualization)
Bubblelines

Bubblelines is a visualization tool that helps to understand patterns of word repetition in one or more documents. Each document is represented as a horizontal lineA line is the string of text limited by the width of a page. Lines are often used in tokenization, and may contain parts of one or more sentences. For example "The quick brown fox jumps over the lazy dog." is a complete sentence and occurs on one line. By contrast, "Hard by a great forest dwelt a poor wood-cutter with his wife and his two children. The boy was called Hansel and the girl Gretel. He had little to bite and to break, and once when great dearth fell on the land, he could no longer procure even daily bread." spans three sentences and four lines. Return to Glossary. and each seach term is represented as a bubble – the bubble represents the frequency of the term in the corresponding segment of text (the text is divided into segments of equal length). The larger the bubble, the more frequent the term.

Bubbles

Bubbles reads the words in a document (or corpus) and displays the highest frequency words within proportionately large bubbles.

Cirrus

Cirrus is a visualization tool that displays a word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. Return to Glossary. relating to the frequency of words appearing in one or more documents. One can click on any word appearing in the cloud to obtain detailed information about its relativity. The larger the word, the more frequent the term.

Corpus Grid

Corpus Grid shows an overview of the corpus, including each document's title, number of word tokens (total words), number or word types (unique words), and lexical density (the ratio of tokens to types).

Corpus Summary

Corpus Summary is a tool that provides a simple, textual overview of the current corpus. Features of this tool include number of words, number of unique words, longest documents, highest vocabulary density, most frequent words, notable peaks in frequency, and distinctive words. Users can click within these features for more detailed information of the analysis.

Corpus Term Frequencies

Corpus Term Frequencies shows overall word frequencies for the entire corpus as well as information about how word frequencies are spread out over documents within the corpus. Hover over column headers and buttons for more information.


 
Document Term Frequencies

Document Term Frequencies shows word frequencies for each document in the corpus. You can see the selected word at the top of the window highlighted in yellow. Its relevance to the documents is shown in the table below. Hover over the column headers or toolbar buttons for more information.

Document KWICs

Document KWICs shows a table of keywords in their contextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph. Context is particularly important when generating a concordance for a string. Return to Glossary.. In other words, it provides a list of certain keywords and their occurrence within a corpus or document.

Entities Browser (named entities visualization)
Knots

Knots is a visualization tool that helps to understand patterns of word relevance in one or more documents. Each term is represented as a twisted lineA line is the string of text limited by the width of a page. Lines are often used in tokenization, and may contain parts of one or more sentences. For example "The quick brown fox jumps over the lazy dog." is a complete sentence and occurs on one line. By contrast, "Hard by a great forest dwelt a poor wood-cutter with his wife and his two children. The boy was called Hansel and the girl Gretel. He had little to bite and to break, and once when great dearth fell on the land, he could no longer procure even daily bread." spans three sentences and four lines. Return to Glossary. – when the lines overlap it means a relevance or linkage within the terms.

Lava

Lava allows you to view multiple levels of a corpus in a three-dimensional environment. Clicking on certain documents within the corpus expands the Lava visualization in a ring to explore further. By clicking on certain parts of the visualization, you are able to explore terms within their contextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph. Context is particularly important when generating a concordance for a string. Return to Glossary..

Links

Links finds collocates for words and displays links between them using a force directed graph. It shows term frequencies in proximity to keyword. It is a visualization and shows a web of terms.

Collocate Term Frequencies (table of term frequencies in proximity to keyword)
Mandala

Mandala is a visualization tool that imports “textual” files to perform analysis on the frequency and linkage of words. For example, you may import a play and find the linkage and frequency between a word and its speaker.

Reader

Reader acts as a method of reading all documents within a specified corpus. It does not provide text analysis but rather a method of viewing the contents of a corpus.

ScatterPlot

ScatterPlot creates a scatter plot graph of terms, spaced by their variation from one another.

Term Frequencies Chart

Term Frequencies Chart shows how terms are distributed across document(s) in a corpus (documents are shown in the order in which they were added).

  
Term Fountain

Term Fountain visualizes word frequencies as a fountain.

Cirrus

 Cirrus is a visualization tool that displays a word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. Return to Glossary. relating to the frequency of words appearing in one or more documents. One can click on any word appearing in the cloud to obtain detailed information about its relativity. The larger the word, the more frequent the term.

Getting Started

When you first arrive to the Cirrus tool you will see one of two possible screens:

Cirrus without a pre-loaded corpus. See loading texts into Voyeur for help on how to proceed.

Cirrus with a pre-loaded corpus. You were probably given a URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. that included the corpus, or you're viewing a page that has an embedded Voyeur tool in it. If you prefer, you can also start without a Corpus.

Interface Elements

Cirrus includes the standard set of interface elements (see image to the right). For more help with these see the Voyeur Tools Standard Interface Elements page.

Standard UI Elements

Hovering over a word will cause a box to appear that displays the frequency count for that term.

Hovering over a word.

If a word is clicked, you are taken to an analysis of the word within the document(s). On the right side, Voyeur displays the relativity of the word in each of the document(s). (This is the Document Term Frequencies tool.)

Word relativity

Exporting

Like all Voyeur tools, Cirrus can be reused in a variety of ways:

  • create a link that is specific to the corpus and options that are currently being used
  • embed the current corpus and options as a tool in an external page

For more information see exporting and reusing Voyeur Tools.

Bubblelines

 Bubblelines is a visualization tool that helps to understand patterns of word repetition in one or more documents. Each document is represented as a horizontal lineA line is the string of text limited by the width of a page. Lines are often used in tokenization, and may contain parts of one or more sentences. For example "The quick brown fox jumps over the lazy dog." is a complete sentence and occurs on one line. By contrast, "Hard by a great forest dwelt a poor wood-cutter with his wife and his two children. The boy was called Hansel and the girl Gretel. He had little to bite and to break, and once when great dearth fell on the land, he could no longer procure even daily bread." spans three sentences and four lines. Return to Glossary. and each seach term is represented as a bubble – the bubble represents the frequency of the term in the corresponding segment of text (the text is divided into segments of equal length). The larger the bubble, the more frequent the term.

Getting Started

When you first arrive to the Bubblelines tool you will see one of two possible screens:

Bubblelines without a pre-loaded corpus. See loading texts into Voyeur for help on how to proceed.

Bubblelines with a pre-loaded corpus. You were probably given a URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. that included the corpus, or you're viewing a page that has an embedded Voyeur tool in it. If you prefer, you can also start without a corpus.

Interface Elements

Bubblelines includes the standard set of interface elements (see image to the right). For more help with these see the Voyeur Tools Standard Interface Elements page.

Standard UI Elements

Hovering over a bubble, or set of bubbles, will cause a box to appear that displays the frequency counts for that segment of text.

Hovering over a Bubble

Similarly, hovering over the number at the end of the lineA line is the string of text limited by the width of a page. Lines are often used in tokenization, and may contain parts of one or more sentences. For example "The quick brown fox jumps over the lazy dog." is a complete sentence and occurs on one line. By contrast, "Hard by a great forest dwelt a poor wood-cutter with his wife and his two children. The boy was called Hansel and the girl Gretel. He had little to bite and to break, and once when great dearth fell on the land, he could no longer procure even daily bread." spans three sentences and four lines. Return to Glossary. will cause a box to appear that summarizes the frequency for the entire document.

Hovering over a Line

When Bubblelines first loads a corpus, you may see terms that have been pre-selected and included in the URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. or embedded page. If no terms are specified, Bubblelines automatically fetches the five most frequent terms and displays bubbles based on those.

You can remove the default terms by clicking on the "Clear Terms" button.

Clear Terms

You can add additional terms to be displayed using the "Find Term" box. Note that available terms will appear as you type and you can pick an item from the list to have it added.

Clear Terms

In addition to adding and removing terms, you can toggle the display of the terms that have been loaded. To do so simply click on the term (active terms are underlined).

Clear Terms

Exporting

Like all Voyeur tools, Bubblelines can be reused in a variety of ways:

  • create a link that is specific to the corpus and options that are currently being used
  • embed the current corpus and options as a tool in an external page

For more information see exporting and reusing Voyeur Tools.

Corpus Summary

 Corpus Summary is a tool that provides a simple, textual overview of the current corpus. Features of this tool include number of words, number of unique words, longest documents, highest vocabulary density, most frequent words, notable peaks in frequency, and distinctive words. Users can click within these features for more detailed information of the analysis.

Getting Started

When you first arrive to the Corpus Summary tool you will see one of two possible screens:

Corpus Summary without a pre-loaded corpus. See loading texts into Voyeur for help on how to proceed.

Corpus Summary with a pre-loaded corpus. You were probably given a URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. that included the corpus, or you're viewing a page that has an embedded Voyeur tool in it. If you prefer, you can also start without a corpus.

Interface Elements

Corpus Summary includes the standard set of interface elements (see image to the right). For more help with these see the Voyeur Tools Standard Interface Elements page.

Standard UI Elements

Once the analysis is complete, links to words and individual documents appear. Words appear highlighted in yellow. You may click on any of these links, and it will provide detailed information pertaining to the term or document.

If we click on a specific word, we see a chart explaining its appearance and relativity in different documents within the corpus. (This is the Document Term Frequencies tool.)

Exporting

Like all Voyeur tools, Corpus Summary can be reused in a variety of ways:

  • create a link that is specific to the corpus and options that are currently being used
  • embed the current corpus and options as a tool in an external page

For more information see exporting and reusing Voyeur Tools.

Corpus Term Frequencies

 Corpus Term Frequencies shows overall word frequencies for the entire corpus as well as information about how word frequencies are spread out over documents within the corpus. Hover over column headers and buttons for more information.

Getting Started

When you first arrive to the Corpus Term Frequencies tool you will see one of two possible screens:

Corpus Term Frequencies without a pre-loaded corpus. See loading texts into Voyeur for help on how to proceed.

Corpus Term Frequencies with a pre-loaded corpus. You were probably given a URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. that included the corpus, or you're viewing a page that has an embedded Voyeur tool in it. If you prefer, you can also start without a corpus.

Interface Elements

Corpus Term Frequencies includes the standard set of interface elements (see image to the right). For more help with these see the Voyeur Tools Standard Interface Elements page.

Standard UI Elements

On the left you will see the most frequent terms highlighted in yellow, and in the other columns. In the columns on the right you will see how many times the word appears in the document(s), and a miniature graph showing its trend within the document(s).

Overview of Corpus Term Frequencies

At the bottom, the traditional Voyeur favourite buttons and search can be found. Click on the boxes next to any terms and click on the Favourite button to add them to your favourite terms.

Voyeur favourite buttons

Double-click on any word within the corpus to access the Document Term Frequencies tool.

Exporting

Like all Voyeur tools, Corpus Term Frequencies can be reused in a variety of ways:

  • create a link that is specific to the corpus and options that are currently being used
  • embed the current corpus and options as a tool in an external page

For more information see exporting and reusing Voyeur Tools.

Bubbles

 Bubbles reads the words in a document (or corpus) and displays the highest frequency words within proportionately large bubbles.

Getting Started

When you first arrive to the Bubbles tool you will see one of two possible screens:

Bubbles without a pre-loaded corpus. See loading texts into Voyeur for help on how to proceed.

Bubbles with a pre-loaded corpus. You were probably given a URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. that included the corpus, or you're viewing a page that has an embedded Voyeur tool in it. If you prefer, you can also start without a corpus.

Interface Elements

Bubbles includes the standard set of interface elements (see image to the right). For more help with these see the Voyeur Tools Standard Interface Elements page.

Standard UI Elements

As more Bubbles load, you will see a list of terms. The higher the term is on the list, the higher frequency it has within the corpus. The same applies for the bubbles – the larger the bubble, the higher frequency it has within the corpus.

Different sizes of bubbles

At the bottom left of the screen, the tool displays the number of bubbles displayed out of the possible terms in the corpus. If the number is increasing, this means bubbles is still loading in more terms.

Number of terms in the corpus

Exporting

Like all Voyeur tools, Bubbles can be reused in a variety of ways:

  • create a link that is specific to the corpus and options that are currently being used
  • embed the current corpus and options as a tool in an external page

For more information see exporting and reusing Voyeur Tools.

Mandala

Mandala

Mandala is a visualization tool that imports “textual” files to perform analysis on the frequency and linkage of words. For example, you may import a play and find the linkage and frequency between a word and its speaker.

Getting Started 

Once you’ve launched Mandala, the interface should be blank. On the top left corner of Mandala, you have a few options for how to proceed with loading a file. You may click Open File to proceed to open a file of your choosing (preferably .xmlXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. , .txt, or .zip), or you may choose any file listed in the dropdown menu. Dots Represent allows you to choose which tags in your file you would like to visualize. More often than not, Mandala will choose the option that will produce the most useful visualization. Finally, once you’ve made your selection, click Load to load the file into view. Additionally, you may load a new file into the current visualization with Merge.

File palette

If you’ve loaded RomeoJuliet.xmlXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. , Mandala should look similar to this:

Example visualization

For this example, Mandala automatically visualizes the most frequent elementAn element, also called a tag, is characteristically used within HTML and XML to apply characteristics (such as headings, paragraphs or user-defined categories) or metadata to a document, usually a text. Elements generally appear in matching pairs of an opening element and a closing element, with text in between. All text within an element pair is modified by that element, and one element pair may be nested inside another. In the case of HTML, elements are used to format a text directly, or as a delimiter for CSS formatting to the text within that element. An HTML paragraph element: < p >< /p > In the case of XML, elements may be also be used as a delimiter for CSS formatting to the text within that element, but its primary purpose is to apply metadata to that text. Ex: < book format="hardcover" >< /book > Both HTML and XML elements may be modified with attribute/value pairs. In the above example, format="hardcover" is the attribute/value pair modifying the element < book >. Return to Glossary. in the represented tagA tag, also called an element, is characteristically used within HTML and XML to apply characteristics (such as headings, paragraphs or user-defined categories) or metadata to a document, usually a text. HTML and XML tags generally appear in matching pairs of an opening and a closing tag, with text in between. All text within a tag pair is modified by that tag, and one tag pair may be nested inside another. In the case of HTML, tags are used to format a text directly, or as a delimiter for CSS formatting to the text within that tag. An HTML paragraph tag: < p >< /p > In the case of XML, tags may be also be used as a delimiter for CSS formatting to the text within that tag, but its primary purpose is to apply metadata to that text. Ex: < book format="hardcover" >< /book > Both HTML and XML tags may be modified with attribute/value pairs. In the above example, format="hardcover" is the attribute/value pair modifying the tag < book >. Return to Glossary. (in this case, the most frequent speaker of a speech is Romeo.) It will then visualize one of the most frequent words within all of the speeches, and also visualize the matches between the speaker and the word. In this case, love is visualized and we see the matches between love and Romeo visualized with a blue/green magnet. Magnets are Mandala’s method of grouping differing elements on screen. If we click on any of the dots surrounding a magnet, we will see a speech displayed in the right-hand display called the reader panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary.. We can also double-click on a magnet for all of its speeches, or even click-and-hold to lasso dots and display their speeches.

If you’d like to add a new search term, navigate to the Search palette and click Add New Magnet. Then, type in your new search criteria in the text box. Then click Go to add this new search term to our visualization:

Search palette

If we click on the dance magnet, we can see that we are unable to see who is actually speaking. If we’d like to see whose speaking, we need to click on the Display palette and on the Fields always displayed dropdown, click speech—speaker. If you return to the dance magnet, we will now be able to see who is speaking. 

Display palette

We can also add new magnets representing speakers. Click Add New Magnet and under Field, select speech—speaker. You may type in Juliet’s name, or whichever character you would like to visualize. Click Go. We now can see the subsets of where Juliet says the words love and dance

New speaker magnet

If you would like to see the finer details of your Mandala visualization, you may zoom in / out and adjust the screen with the Display palette. Click on the Display palette and you may zoom in / out with the ‘+’ and ‘-‘ signs. Another way to zoom is with a mousewheel / two-finger scroll. You may also navigate the screen with the ←, →, ↓, ↑ arrows. If you’d like more control, you can click on the hand tool to drag the screen around. We may also export our Mandala visualization by clicking Export on the bottom-right side of the window. If you choose Text, you can save a copy of the text displayed in the reader panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary.. If you choose Screenshot, you can save a visual copy of the Mandala visualization.

Export panel

 

Advanced Features

Mandala allows you to perform specific queries pertaining to certain dots. You may, as mentioned earlier click-and-drag to select many dots, or you can use custom lasso tools that may assist you in highlighting certain dots:

Tools palette

To reset any selected dots, just choose Reset selection state from the Tools palette. Mandala also includes something called the Microtext PanelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary. which allows you to walk sequentially through the magnets. If you click on any of the gray bars, you will see the specified dots displayed in the right-hand panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary.. The Microtext PanelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary. divides the number of dots evenly. You may also navigate this panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary. by clicking Prev Page or Next Page on the bottom-right of the screen.

Microtext panel

If you would like to establish your Mandala window with every speaker in RomeoJuliet.xmlXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. , under Search set Field to speech-speaker, then click on the text field and choose [All terms] from the top of the dropdown. This will then warn you about creating a multitude of magnets, click Yes to continue. You will then see a bunch of new magnets representing speakers in the play. You may double-click on them to preview their speeches.

Search palette (again)

It should also be noted that clicking Randomize in the Display palette will generate random magnets. This could potentially reveal some interesting trends within the text.

In addition to the sample files, you may load your own XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. / text files as well. Again, Mandala will choose the most suitable tags to use as fields, but you may specify your own when you are loading the file. (This can be done by either clicking on the Dots represent field or you may choose Custom and define your own XPath.) If you choose to load a text file, sections will be divided by paragraph. Also, if you have multiple files with similar structures that you’d like to analyse, you may import them all as a .zip file.

When you import your own files to be analyzed, Mandala may not automatically display magnets. In the Search palette click Add new magnet and you may customize what kind of Field and Match type that the magnet should represent.

 

Document KWICs

 Document KWICs shows a table of keywords in their contextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph. Context is particularly important when generating a concordance for a string. Return to Glossary.. In other words, it provides a list of certain keywords and their occurrence within a corpus or document.

Getting Started

When you first arrive to the Document KWICs tool you will see one of two possible screens:

Document KWICs without a pre-loaded corpus. See loading texts into Voyeur for help on how to proceed.

Document KWICs with a pre-loaded corpus. You were probably given a URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. that included the corpus, or you're viewing a page that has an embedded Voyeur tool in it. If you prefer, you can also start without a corpus.

Interface Elements

Document KWICsA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary. includes the standard set of interface elements (see image to the right). For more help with these see the Voyeur Tools Standard Interface Elements page.

Standard UI Elements

Document KWICsA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary. requires not only input, but a specified term to perform the analysis. For example, in the image below, we see the analysis looking at all of the documents in relation to the word 'lord'. The document earliest in the corpus is shown at the top, and so forth. We can see on the left and right of the term (highlighted in yellow) the contextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph. Context is particularly important when generating a concordance for a string. Return to Glossary. it resides in.

Document KWICs standard screen

To expand the contextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph. Context is particularly important when generating a concordance for a string. Return to Glossary. for a preview of where it resides in the document, press the '+' button beside the listing.

Expanding the KWIC to see a preview

At the bottom, the traditional Voyeur favourite buttons and search can be found, along with Document KWICsA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary. specific options. To cycle through the pages of KWICsA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary., press the forward and back buttons by the 'page' display. To change the number of words on either side of the keyword, choose an option from the 'ContextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph. Context is particularly important when generating a concordance for a string. Return to Glossary.' menu. To change the number of words of the expanded preview, choose an option from the 'Preview' menu. Creating a new search will change the keyword within Document KWICsA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary.. Click on any term listing and click on the Favourite button to add them to your favourite terms.

Document KWICs bottom toolbar

Exporting

Like all Voyeur tools, Document KWICsA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary. can be reused in a variety of ways:

  • create a link that is specific to the corpus and options that are currently being used
  • embed the current corpus and options as a tool in an external page

For more information see exporting and reusing Voyeur Tools.

Document Term Frequencies

 Document Term Frequencies shows word frequencies for each document in the corpus. You can see the selected word at the top of the window highlighted in yellow. Its relevance to the documents is shown in the table below. Hover over the column headers or toolbar buttons for more information.

Getting Started

When you first arrive to the Document Term Frequencies tool you will see one of two possible screens:

Document Term Frequencies without a pre-loaded corpus. See loading texts into Voyeur for help on how to proceed.

Document Term Frequencies with a pre-loaded corpus. You were probably given a URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. that included the corpus, or you're viewing a page that has an embedded Voyeur tool in it. If you prefer, you can also start without a corpus.

Interface Elements

Document Term Frequencies includes the standard set of interface elements (see image to the right). For more help with these see the Voyeur Tools Standard Interface Elements page.

Standard UI Elements

Document Term Frequencies requires not only input, but a specified term to perform the analysis. For example, in the image below, we see the analysis looking at all of the documents in relation to the word 'thou'. The document with the highest count is shown at the top, and so forth. We also have the documents' relativity in the columns in the right.

Document Term Frequencies overview

Document Term Frequencies is commonly accessed from other tools by clicking on individual terms. For example, if using the tool Cirrus, clicking on one of the words will bring you to a Document Term Frequencies analysis.

At the bottom, the traditional Voyeur favourite buttons and search can be found. Click on the boxes next to any terms and click on the Favourite button to add them to your favourite terms.

Favourite buttons

Exporting

Like all Voyeur tools, Document Term Frequencies can be reused in a variety of ways:

  • create a link that is specific to the corpus and options that are currently being used
  • embed the current corpus and options as a tool in an external page

For more information see exporting and reusing Voyeur Tools.

Corpus Grid

 Corpus Grid shows an overview of the corpus, including each document's title, number of word tokens (total words), number or word types (unique words), and lexical density (the ratio of tokens to types).

Getting Started

When you first arrive to the Corpus Grid tool you will see one of two possible screens:

Corpus Grid without a pre-loaded corpus. See loading texts into Voyeur for help on how to proceed.

Corpus Grid with a pre-loaded corpus. You were probably given a URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. that included the corpus, or you're viewing a page that has an embedded Voyeur tool in it. If you prefer, you can also start without a corpus.

Interface Elements

Corpus Grid includes the standard set of interface elements (see image to the right). For more help with these see the Voyeur Tools Standard Interface Elements page.

Standard UI Elements

Once the analysis is complete, a grid will appear with information pertaining to all of the documents within the corpus. If you click on the column headers, you will find sorting information.

If you click on 'Group By This Field' when clicking on the column header, it will change the layout of the grid and thus sort by that field.

Exporting

Like all Voyeur tools, Corpus Grid can be reused in a variety of ways:

  • create a link that is specific to the corpus and options that are currently being used
  • embed the current corpus and options as a tool in an external page

For more information see exporting and reusing Voyeur Tools.

Knots

 Knots is a visualization tool that helps to understand patterns of word relevance in one or more documents. Each term is represented as a twisted lineA line is the string of text limited by the width of a page. Lines are often used in tokenization, and may contain parts of one or more sentences. For example "The quick brown fox jumps over the lazy dog." is a complete sentence and occurs on one line. By contrast, "Hard by a great forest dwelt a poor wood-cutter with his wife and his two children. The boy was called Hansel and the girl Gretel. He had little to bite and to break, and once when great dearth fell on the land, he could no longer procure even daily bread." spans three sentences and four lines. Return to Glossary. – when the lines overlap it means a relevance or linkage within the terms.

Getting Started

When you first arrive to the Knots tool you will see one of two possible screens:

Knots without a pre-loaded corpus. See loading texts into Voyeur for help on how to proceed.

Knots with a pre-loaded corpus. You were probably given a URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. that included the corpus, or you're viewing a page that has an embedded Voyeur tool in it. If you prefer, you can also start without a corpus.

Interface Elements

Knots includes the standard set of interface elements (see image to the right). For more help with these see the Voyeur Tools Standard Interface Elements page.

Standard UI Elements

As you hover over segments of lines, you will see them become highlighted. If you click on them, you will be taken to the Document KWICs tool for an analysis of the word in contextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph. Context is particularly important when generating a concordance for a string. Return to Glossary.. You also have the option to drag the Knots visualization by clicking and dragging the centre point.

When Knots first loads a corpus, you may see terms that have been pre-selected and included in the URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. or embedded page. If no terms are specified, Knots automatically fetches the five most frequent terms and displays lines based on those.

Initial loading of Knots

When Knots first loads a corpus, you may see terms that have been pre-selected and included in the URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. or embedded page. If no terms are specified, Knots automatically fetches the five most frequent terms and displays bubbles based on those.

You can remove the default terms by clicking on the "Clear Terms" button.

Clear Terms

You can add additional terms to be displayed using the "Find Term" box. Note that available terms will appear as you type and you can pick an item from the list to have it added.

Find term box

In addition to adding and removing terms, you can toggle the display of the terms that have been loaded. To do so simply click on the term (active terms are underlined).

Toggle terms

Some options included in Knots are 'Build Speed', 'Starting Angle', and 'Tangles'. Build speed affects how quickly the visualization is performed. Starting angle adjusts the angle at which the lines expand and develop. Tangles affects how many twists there are within the visualization.

Knots options

Exporting

Like all Voyeur tools, Knots can be reused in a variety of ways:

  • create a link that is specific to the corpus and options that are currently being used
  • embed the current corpus and options as a tool in an external page

For more information see exporting and reusing Voyeur Tools.

Links

 Links finds collocates for words and displays links between them using a force directed graph. It shows term frequencies in proximity to keyword. It is a visualization and shows a web of terms.

Getting Started

When you first arrive to the Links tool you will see one of two possible screens:

Links without a pre-loaded corpus. See loading texts into Voyeur for help on how to proceed.

Links with a pre-loaded corpus. You were probably given a URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. that included the corpus, or you're viewing a page that has an embedded Voyeur tool in it. If you prefer, you can also start without a corpus.

Interface Elements

Links includes the standard set of interface elements (see image to the right). For more help with these see the Voyeur Tools Standard Interface Elements page.

Standard UI Elements

When the visualization appears, you will see a list of the included documents within the corpus. They are each associated with a certain colour, and this colour is shown as term links within the visualization. A term's colour is determined by which document it appears in most frequently. You may add a word to the visualization by typing it into the 'Add word' box and hitting return.

Documents list

If you hover over a term, Voyeur will tell you it's linkage within the corpus documents. You may also drag and drop terms to organize them as you please.

Hovering over a term

Clicking on the options button (the button that looks like a gear) will launch a dialog box with various options pertaining to the Links tool. Stop words list is if you would like to exclude words from the visualization. (Usually words such as 'a', 'the', and 'and'.) 'NodeA node in a graph is the basic unit of data from which a graph can be constructed. In text analysis using a hypergraph, nodes connect to other nodes. Each node represents a word, and nodes touching where words are found in conjunction with one another in the source text. For more information on nodes, see the Wikipedia. Return to Glossary. size determined by type frequency' is the default, and will result in sorting by how often the term appears in the documents. Sorting by 'NodeA node in a graph is the basic unit of data from which a graph can be constructed. In text analysis using a hypergraph, nodes connect to other nodes. Each node represents a word, and nodes touching where words are found in conjunction with one another in the source text. For more information on nodes, see the Wikipedia. Return to Glossary. links' will result in terms appearing larger if they are heavily linked with other terms. 'Autofit graph on screen' sizes the graph depending on the size of your browser window. 'Remove orphans' will remove terms which are not linked to any other term in the visualization.

Links options

If you would like to manipulate the visualization, right-click on any of the terms and choose 'Stick/unstick' or 'Remove'. 'Stick/unstick' puts the term in place, and is not moved when other terms are moved. 'Remove' simply removes the term from the visualization.

Sticking a term

Exporting

Like all Voyeur tools, Links can be reused in a variety of ways:

  • create a link that is specific to the corpus and options that are currently being used
  • embed the current corpus and options as a tool in an external page

For more information see exporting and reusing Voyeur Tools.

Lava

 Lava allows you to view multiple levels of a corpus in a three-dimensional environment. Clicking on certain documents within the corpus expands the Lava visualization in a ring to explore further. By clicking on certain parts of the visualization, you are able to explore terms within their contextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph. Context is particularly important when generating a concordance for a string. Return to Glossary..

Getting Started

When you first arrive to the Lava tool you will see one of two possible screens:

Lava without a pre-loaded corpus. See loading texts into Voyeur for help on how to proceed.

Lava with a pre-loaded corpus. You were probably given a URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. that included the corpus, or you're viewing a page that has an embedded Voyeur tool in it. If you prefer, you can also start without a corpus.

Interface Elements

Lava includes the standard set of interface elements (see image to the right). For more help with these see the Voyeur Tools Standard Interface Elements page.

Standard UI Elements

When you first launch Lava (with a predefined corpus and multiple documents), you will see what looks like a black pole. (If you launch with one document, Lava will look like a black disc.) If you hover on sections of this pole you will see it illuminate with the names of different documents within the corpus.

Hovering over a document

If you click on an illuminated part of the pole, Lava will expand as a colourful ring – each colour representing a high frequency term. If you hover over any of the coloured sections, it will illuminate and show the term that it represents.

Hovering over the ring

If you click on the term, it will expand as a coloured arm out from the centre. There are different sections of this arm as well. When you hover over them, it will show the term in the contextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph. Context is particularly important when generating a concordance for a string. Return to Glossary. of the document. Clicking this term will take you to a more in-depth analysis.

Hovering over the arm

Clicking other sections of the pole will expand other documents within the corpus so you may visualize and interact with them as well.

Expanding other documents

Exporting

Like all Voyeur tools, Lava can be reused in a variety of ways:

  • create a link that is specific to the corpus and options that are currently being used
  • embed the current corpus and options as a tool in an external page

For more information see exporting and reusing Voyeur Tools.

Reader

 Reader acts as a method of reading all documents within a specified corpus. It does not provide text analysis but rather a method of viewing the contents of a corpus.

Getting Started

When you first arrive to the Reader tool you will see one of two possible screens:

Reader without a pre-loaded corpus. See loading texts into Voyeur for help on how to proceed.

Reader with a pre-loaded corpus. You were probably given a URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. that included the corpus, or you're viewing a page that has an embedded Voyeur tool in it. If you prefer, you can also start without a corpus.

Interface Elements

Reader includes the standard set of interface elements (see image to the right). For more help with these see the Voyeur Tools Standard Interface Elements page.

Standard UI Elements

When you load a collection of documents, you will see a panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary. of coloured bars on the left of the screen. These bars represent the different documents within the corpus, and if you hover over them they will display the document title. Click on any of these documents to view it in Reader.

Documents list within Reader

At the bottom left of the screen you will see a search panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary.. Type in any term you would like to find within the corpus and Reader will highlight it within the documents.

Reader search box

Once you begin typing, Reader will give you suggestions for terms as you type. As soon as you search for a term, certain coloured bars will change depending on if they properly fit the search.

Finding specific words in Reader

Exporting

Like all Voyeur tools, Reader can be reused in a variety of ways:

  • create a link that is specific to the corpus and options that are currently being used
  • embed the current corpus and options as a tool in an external page

For more information see exporting and reusing Voyeur Tools.

Term Fountain

 Term Fountain visualizes word frequencies as a fountain.

Getting Started

When you first arrive to the Term Fountain tool you will see one of two possible screens:

Term Fountain without a pre-loaded corpus. See loading texts into Voyeur for help on how to proceed.

Term Fountain with a pre-loaded corpus. You were probably given a URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. that included the corpus, or you're viewing a page that has an embedded Voyeur tool in it. If you prefer, you can also start without a corpus.

Interface Elements

Term Fountain includes the standard set of interface elements (see image to the right). For more help with these see the Voyeur Tools Standard Interface Elements page.

Standard UI Elements

Once you load a corpus into Term Fountain, it will begin to generate a fountain. Hover over certain streams to see which word the stream represents. The higher the stream, the higher the term frequency.

Hovering over a Term

Exporting

Like all Voyeur tools, Term Fountain can be reused in a variety of ways:

  • create a link that is specific to the corpus and options that are currently being used
  • embed the current corpus and options as a tool in an external page

For more information see exporting and reusing Voyeur Tools.

Term Frequencies Chart

 Term Frequencies Chart shows how terms are distributed across document(s) in a corpus (documents are shown in the order in which they were added). The example below uses the play Macbeth and not the standard “entire plays of Shakespeare” corpus.

Getting Started

When you first arrive to the Term Frequencies Chart tool you will see one of two possible screens:

Term Frequencies Chart without a pre-loaded corpus. See loading texts into Voyeur for help on how to proceed.

Term Frequencies Chart with a pre-loaded corpus. You were probably given a URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. that included the corpus, or you're viewing a page that has an embedded Voyeur tool in it. If you prefer, you can also start without a corpus.

Interface Elements

Term Frequencies Chart includes the standard set of interface elements (see image to the right). For more help with these see the Voyeur Tools Standard Interface Elements page.

Standard UI Elements

When you add analyze a corpus with Term Frequencies Grid, you will initially have common words at the top of the chart with colour codes. You will see lines within the graph which are coloured accordingly to those words. If you click on one of the terms at the top, it will omit that term from the graph.

Terms at the top of the chart

When we hover over the segment points, we can see the frequency of that term in that segment. So, in the example below the word “Macbeth” appears in the last tenth of the play Macbeth. If you click on the point, Voyeur will open a new window with detailed information of that segment and term within its Document KWICs tool.

Clicking on a segment point

If you click and drag on a section of the chart it will zoom in to that section. To reset the chart to its original state, click on “reset zoom”.

Zooming in on the chart

If you would like to see less or more segments on the chart, simply click on “Segments” at the bottom left of the chart to choose the desired segments.

Choosing segment amount

Exporting

Like all Voyeur tools, Term Frequencies Chart can be reused in a variety of ways:

  • create a link that is specific to the corpus and options that are currently being used
  • embed the current corpus and options as a tool in an external page

For more information see exporting and reusing Voyeur Tools.

Scatter Plot

 ScatterPlot creates a scatter plot graph of terms, spaced by their variation from one another.

Getting Started

When you first arrive to the ScatterPlot tool you will see one of two possible screens:

ScatterPlot without a pre-loaded corpus. See loading texts into Voyeur for help on how to proceed.

ScatterPlot with a pre-loaded corpus. You were probably given a URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. that included the corpus, or you're viewing a page that has an embedded Voyeur tool in it. If you prefer, you can also start without a corpus.

Interface Elements

ScatterPlot includes the standard set of interface elements (see image to the right). For more help with these see the Voyeur Tools Standard Interface Elements page.

Standard UI Elements

When you first load ScatterPlot, you will see a variety of terms plotted on a graph. If you hover over the terms, you will see their variation explained by each component on the x and y axis. If you click on any of these terms, it will bring you to the Document KWICsA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary. tool for further analysis.

Clicking on a term

ScatterPlot offers options for changing the plot. The terms button allows you choose how many terms should be displayed. The dimensions button lets you switch between a two or three dimensional graph. Toggle labels simply removes or adds labels for the terms on the graph.

Scatterplot's options

Exporting

Like all Voyeur tools, ScatterPlot can be reused in a variety of ways:

  • create a link that is specific to the corpus and options that are currently being used
  • embed the current corpus and options as a tool in an external page

For more information see exporting and reusing Voyeur Tools.

Viral Analytics: Embedding Voyeur in Other Sites

This page is for a manual on how to embed analytics. Tentative outline:

Voyeur Plugins

Voyeur is designed to function as a standalone environment (voyeurtools.org) or as a set of more independent modules that can be embedded into remote sites (much like a YouTube clip). Technically, this is done using an iFrame tagA tag, also called an element, is characteristically used within HTML and XML to apply characteristics (such as headings, paragraphs or user-defined categories) or metadata to a document, usually a text. HTML and XML tags generally appear in matching pairs of an opening and a closing tag, with text in between. All text within a tag pair is modified by that tag, and one tag pair may be nested inside another. In the case of HTML, tags are used to format a text directly, or as a delimiter for CSS formatting to the text within that tag. An HTML paragraph tag: < p >< /p > In the case of XML, tags may be also be used as a delimiter for CSS formatting to the text within that tag, but its primary purpose is to apply metadata to that text. Ex: < book format="hardcover" >< /book > Both HTML and XML tags may be modified with attribute/value pairs. In the above example, format="hardcover" is the attribute/value pair modifying the tag < book >. Return to Glossary. that creates a sandbox within your page where Voyeur can do its thing (JavascriptJavaScript is a scripting language often used to create interactive features in a web site. In the TAPoRware and Voyant toolsets, JavaScript is used to generate interactive graphs and other dynamic output elements. For more information, see the Wikipedia. Return to Glossary. security limits the interaction that's possible between your page and Voyeur).

The Voyeur plugins essentially allow you, as a site administrator, to define options to allow the user to view tool results, based on the the current content (the current page) or more specific content based on your content management system (all the WordPress posts with a given tagA tag, also called an element, is characteristically used within HTML and XML to apply characteristics (such as headings, paragraphs or user-defined categories) or metadata to a document, usually a text. HTML and XML tags generally appear in matching pairs of an opening and a closing tag, with text in between. All text within a tag pair is modified by that tag, and one tag pair may be nested inside another. In the case of HTML, tags are used to format a text directly, or as a delimiter for CSS formatting to the text within that tag. An HTML paragraph tag: < p >< /p > In the case of XML, tags may be also be used as a delimiter for CSS formatting to the text within that tag, but its primary purpose is to apply metadata to that text. Ex: < book format="hardcover" >< /book > Both HTML and XML tags may be modified with attribute/value pairs. In the above example, format="hardcover" is the attribute/value pair modifying the tag < book >. Return to Glossary., for instance).

Currently the content is fetched once, based on the URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary., indexed, and stored in Voyeur so that subsequent requests are faster. We don't currently have an easy way to update or purge a corpus, but that will come.

The Voyeur plugins below should be considered alpha code – use at your own risk and expect changes (but please do use! :).

Drupal Plugin

Please note:

  • we intend to release the Voyeur Drupal Plugin through the Drupal modules site, but while we do some initial testing, it will only be available via GitHub
  • this plugin was designed specifically for Drupal 6.x. Other versions have not been used or tested

To install the Voyeur Drupal Plugin from GitHub:

  • Navigate to http://github.com/mcds/Voyeur-Drupal-Plugin and at the top-right of the page, click on 'Downloads'. Download the plugin source files as whatever format you like.
  • Extract the source files and copy the contents of the folder 'mcds-Voyeur-Drupal-Plugin-xxxxxxx' to a new folder entitled 'voyeur' in Drupal > sites > all > modules. (If this folder structure does not yet exist, create it.)
  • Log into your Drupal installation and navigate to Administer > Site building > Modules. Check the 'Enabled' box next to the module and then click the 'Save Configuration' button at the bottom.
  • To change Voyeur specific settings, navigate to Administer > Site configuration > Voyeur module settings.
  • For more information, consult the readme included with the Voyeur package. There is also a good article on installing Drupal modules here: http://drupal.org/getting-started/install-contrib/modules.
  • For bug reports and feature requests, please leave a comment at http://hermeneuti.ca/voyeur/plugins/drupal

OJS Plugin

To install the Voyeur OJS Plugin from Git:

  • Navigate to http://github.com/mcds/Voyeur-OJS-Plugin and at the top-right of the page, click on 'Downloads'. Download the plugin source files as whatever format you like.
  • Extract the source files and copy the contents of the folder 'mcds-Voyeur-OJS-Plugin-xxxxxxx' to a new folder entitled 'voyeur' in OJS > plugins > generic.
  • Log into the administrative panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary. within your OJS installation and click on 'Journal Manager' for the journal you would like to install Voyeur within.
  • Under 'Management Pages' find 'System Plugins' and then find the Voyeur plugin. Click 'enable'. Then click 'settings' to adjust Voyeur specific settings.
  • For more information, consult the readme included with the Voyeur package.
  • For bug reports and feature requests, please leave a comment at http://hermeneuti.ca/voyeur/plugins/ojs

 

WordPress Plugin

Please note: we intend to release the Voyeur WordPress Plugin through the WordPress widgets site (which will facilitate future upgrades of the plugin), but at the moment, as we do some preliminary testing, we will release the plugin through GitHub.

To install the Voyeur Wordpress Plugin from GitHub:

  • Navigate to http://github.com/mcds/Voyeur-WordPress-Plugin and at the top-right of the page, click on 'Downloads'. Download the plugin source files as whatever format you like.
  • Extract the source files and copy the contents of the folder 'mcds-Voyeur-WordPress-Plugin-xxxxxxx' to a new folder entitled 'voyeurWP' in Wordpress > wp-content > plugins.
  • Log into the administrative panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary. within your Wordpress installation and click on 'Plugins'. Find the Voyeur plugin and enable it.
  • Add the Voyeur widget within the widgets menu in the administrative panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary.. (Appearance > Widgets) (This is achieved by clicking and dragging the Voyeur widget across to the panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary.. For more information, please see http://support.wordpress.com/widgets/)
  • Adjust Voyeur specific settings within the widget panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary..
  • For more information, consult the readme included with the Voyeur package.
  • For bug reports and feature requests, please leave a comment at http://hermeneuti.ca/voyeur/plugins/wordpress