DH2010 Introduction to Voyeur

This is an outline for a workshop on Voyeur. It was developed for a workshop before DH 2010 in London, England.

1.0 Introduction

2.0 Analyzing a Single Text

In the first part of the Workshop we will show you how to use Voyeur to analyze a single text as a way of learning the interface. We will work with the Introduction, Preface, Chapter 1 and Chapter 2 of Mary Shelley's Frankenstein. The plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files. Plain text files do not require a specialized program, such as a word processor, to read them. For more information, see the Wikipedia. Return to Glossary. is here:

http://taporware.ualberta.ca/sampleDocs/plainText.txt - This is just a couple of chapters

http://www.gutenberg.org/cache/epub/84/pg84.txt - This is the Gutenberg version of the full text

  • We will open Voyeur:
    • Show how to load a text
    • Show the different panels that appear initially
      • Discuss the order they open and the Summary panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary.
      • Go over the Words in the Entire Corpus panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary. (Options, Columns, Search, Favorites)
    • Discuss the full set of panels
    • Show how to manage panels
    • Discuss trigger order of panels (flow within Voyeur)
    • Show how to get help (Mention Quick Guide)
    • Show how to make a list of favorite words to explore searching for words and saving in favorites
  • Now you should try Voyeur with your text or the Frankenstein text above. To open the Frankenstein click here:

http://voyeurtools.org/?corpus=1278409278561.646

  • Some things to try:
    • Experiment with the Options (like the Stop Word list)
    • Create a Favorites list for a theme and and explore that list
    • Search for phrases

3.0 Analyzing a Corpus

In the second part of the Workshop we will look at working with a corpus or collection of many texts. We will use Voyeur on the archives of HUMANIST from 1987 to 2008 (21 documents.) The Voyeur index is at:

http://voyeurtools.org/?corpus=humanist

  • We will show you how to:
    • Show how to set various options, like stoplists
    • Show how to hide and show columns
    • Manage multiple documents
    • Show how to group results
    • Show comparing documents
  • Try looking for trends yourself

4.0 Using your own text

  • Now you can try your own text. We will show the different ways of providing Voyeur a text:
    • Typing a text or pasting it in
    • Typing in one or more URLs
    • Uploading a text
  • We will then discuss the formats of texts that will work, and what will happen to them:
    • file formats: text, HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers. HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this: < body >< p >< /p >< /body > In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body. Elements may also be modified by attributes and attribute values: < p class="hangingindent" > In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. , XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. , RSS, TEI, PDF, MS Word, RTF
    • Finally we will Discuss caching and so on
  • Now try your own text.

5.0 Exporting Data and Quoting Analytics

We will now show how to export data and quote analytical results:

  • How to export tab-separated values, copy and pasted into Excel
  • How to export of XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. results from KWICsA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary. (for instance)
  • How to quote an analytical result in TADA.
  • Go to http://tada.mcmaster.ca/Sandbox/VoyeurWorkshop to try it yourself.

6.0 Advanced and Other

7.0 To Prepare

  • Make sure we have Voyeur running with a backup
  • Sort out how participants can get on wireless
  • Powerbars for laptops
  • What texts will we use?
  • Preindex texts and create a Workshop web page on Hermeneuti.c