Voyeur Tools: See Through Your Texts

Voyeur

Click to run Voyeur


Voyeur Loading

Loading Texts

 


 

Voyeur Overview
Tools Overview

Introducing Voyeur

Voyeur is a web-based text analysis environment. It is designed to be user-friendly, flexible and powerful. Voyeur is part of the Hermeneuti.ca, a collaborative project to develop and theorize text analysis tools and text analysis rhetoric. This section of the Hermeneuti.ca web site provides information and documentation for users and developers of Voyeur.

What you can do with Voyeur:

Voyeur is a work in progress – it is currently in beta. Some things don't work properly, some planned features aren't available yet. In particular, here are some weaknesses that we recognize:

To get started, try viewing one of the screencasts to the right or continue to Workshops -> Voyeur Tools for Users

How to Use this Manual

This manual if for novice users, experienced users, writers and developers. This manual works closely with The Rhetoric of Text Analysis where you will find example essays and discussion of text analysis in general.

Novice Users who want to get started with text analysis should:

Writers, web authors and researchers who want to embed Voyeur panels (we call them hermeneuticons) into their essays, online journals, blogs and so on should:

CC-GNU GPL

Developers who want to adapt Voyeur tools or develop their own should:

 

How Voyeur connects to Hermeneuti.ca, the book

Diagram

Voyeur is the toolset that made possible the analysis reported in Hermeneuti.ca, the book and web site you are now looking at. The book reflects on text analysis, gives examples, and discusses the decisions behind Voyeur. The web site hermeneuti.ca (note how we use the lower case when referring to the web site) includes the sections of the book and the manual for Voyeur (which you are reading now.) The two connect like this:

Principles of Voyeur

Introducing Voyeur

Voyeur is the suite of tools used by Hermeneuti.ca to interpret texts and to think about tools. You too can use Voyeur to analyze your own texts, to write essays with emebedded hermeneutical panels generated by Voyeur, and you can adapt the code to create your own versions of tool. This section of the Hermeneuti.ca web site is both a tutorial and a reference.

What you can do with Voyeur

Voyeur is a new type of text analysis tool that you can use across the research cycle. You can:

Design Principles

Although text analysis tool developers might choose to highlight different aspects for their purposes (such as stand-alone software as opposed to web-based software), here are some of the primary design principles for Voyeur, as gleaned from other tools:

Though they have existed before to varying degrees in different tools, Voyeur is an attempt to pull together these design principles into a single a package. In some cases the the principles may in fact be contradictory in practice (for instance, supporting large-scale immediate analysis) and compromises must be found. Working through those challenges is one of the aspects that make Voyeur a worthy intellectual challenge.

HyperPo and TAPoRware are the tools with the strongest affinities to Voyeur1, but we have devoted considerable thought and attention to improving existing web-based tools in ways further described below.

Scalability. Whereas HyperPo and Taporware can readily handle book-length texts for micro-analysis, both reach their practical limits when corpora grow to beyond a couple of megabytes. In contrast, Voyeur is designed to handle much larger corpora (dozens of megabytes and beyond). There is still a practical (though undefined) limit to the size of corpora for Voyeur given that it seeks to enable immediate micro-analysis, but the Voyeur architecture is desiged with scale in mind. There will always be a tension between indexing speed and retrieval speed: the more time is available for indexing, the faster retrieval tends to be. As such, text analysis tools that require pre-indexing (Philologic, Monk, etc.) will almost always operate faster because pre-processing can be done over the course of hours or even days (building very large relational databases, for instance). In contrast, Voyeur seeks to strike a balance between indexing and retrieval speed: ideally both should happen in a timeframe that seems reasonable in a web-based context. The ever-evolving pace of computing power and the promise of high performance computers obviously make the actual capabilities a moving target.

Ubiquity. As useful as text analysis tools like HyperPo and Taporware may be, we recognize a need to allow content providers and producers (like bloggers) to quickly and easily integrate functionality into their own space. The previous model was limited to users bringing their own texts to our tools, we now wish to also allow users to also bring our tools to their texts. In some cases users will wish to have static results, in which case we can provide a mechanism for easily copying and pasting results that can be directly embedded in other content. However, much of the most compelling functionality of Voyeur is interactive and requires considerable client-side scripting: our current approach is to provide a tiny snippet of HTML that is essentially an IFRAME that contains the necessary HTML elements. This approach allows Voyeur code to remain separate from its host while satisfying security limitations of cross-browser scripting. There are of course other challenges inherent to code embedded elsewhere, including version management (supporting legacy syntax) and cacheing of data (both the corpus and results2).

Referenceability. The status of text analysis tools as academic resources has been a point of debate over the years. Scholars feel compelled to cite ideas and texts that come from other authors, but they are much less likely to recognized tools that have contributed to their work (and we would probably not want every scholar to cite search engines such as Google that have been used during research). We feel strongly that text analysis tools can represent a significant contributor to digital research, whether they were used to help confirm hunches or to lead the researcher into completely unanticipated realms. In any case, we have designed Voyeur to be conducive to citation in various ways, including a general citation to Voyeur and citations for static or dynamic results. An important component of academic knowledge is reproducibility, and providing scholars with more information on the processes followed during research – including the use of text analysis tools – is sure to be useful.

Ultimately, Voyeur is an attempt to learn from the strengths and weaknesses of past tools, to recognize current user needs (ex: working with much larger corpora), and to anticipate future practices (ex: referencing text analysis tools and results). We believe that the potential for tools in the interpretive process merits continual rethinking of tool design and functionality, and as such, Voyeur is of course a work in progress.

  1. 1. . The affininity to Voyeur is not surprisingly given that Sinclair developed HyperPo and Rockwell developed Taporware.
  2. 2. For instance, we wouldn't want to re-run a computationally expensive process each time someone visits a popular blog, but we don't wish to cache everyone's analytic results either.

Some Background of Voyeur

Text analysis tools go back to the first ad-hoc tools that Roberto Busa created for his concordanceA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). See also the definitions from TADA and Wikipedia. of the works of Thomas Acquinas and Andrew Booth’s Mechanical Resolution of Linguistic Problems in the 1950s.

Voyeur is a suite of analysis and exploration tools for digital texts. Very few contributions to knowledge and technology are unrecognizable from what preceded1, and Voyeur is no exception: it is largely built on the foundations of text analysis tool design and methodology from over 50 years of humanities computing research2. The following are some of the tools that have most influenced text analysis tool development and Voyeur in particular:

  1. 1. Limiting himself to the history of science, Thomas Kuhn provides examples of revolutionary advances in thinking, such as Copernican cosmology or Einstein's Theory of Relativity; Voyeur has much more modest ambitions.
  2. 2. For a brief overview of the history of humanities computing, see Hockey "History", 2002.
  3. 3. Unix is used here as shorthand for both Unix and unix-like operating systems like Linux.

Quick Guide of Voyeur for Users

Voyeur is a web-based tool. To use it go to http://voyeurtools.org. This is what you will see,

To use Voyeur you need to specify a text. You can do this different ways:

Voyeur, once it retreives your text, will index it for analysis and display this simple arrangement of two panels,

Once your text or collection is indexed Voyeur will present you will a display with two panels. The right hand panel summarizes the text. The left panel shows you the high frequency words. You can show an hide different panels using the double arrow button. You can also see more panels by selected a word or words to follow. Try the Words in the Entire Corpus panel.

There are a number of features to the Words in the Entire Corpus panel:

salt, pepper, sardines, oil
"digital humanities"

Once you select one or more words you will get an arrangement of panels like this,

The panels are connected so that clicking in one will trigger updates in the others. Typically the order is the following:

 

 

 

Workshops

DH2010 Introduction to Voyeur

This is an outline for a workshop on Voyeur. It was developed for a workshop before DH 2010 in London, England.

1.0 Introduction

2.0 Analyzing a Single Text

In the first part of the Workshop we will show you how to use Voyeur to analyze a single text as a way of learning the interface. We will work with the Introduction, Preface, Chapter 1 and Chapter 2 of Mary Shelley's Frankenstein. The plain text is here:

http://taporware.ualberta.ca/sampleDocs/plainText.txt - This is just a couple of chapters

http://www.gutenberg.org/cache/epub/84/pg84.txt - This is the Gutenberg version of the full text

  • We will open Voyeur:
    • Show how to load a text
    • Show the different panels that appear initially
      • Discuss the order they open and the Summary panel
      • Go over the Words in the Entire Corpus panel (Options, Columns, Search, Favorites)
    • Discuss the full set of panels
    • Show how to manage panels
    • Discuss trigger order of panels (flow within Voyeur)
    • Show how to get help (Mention Quick Guide)
    • Show how to make a list of favorite words to explore searching for words and saving in favorites
  • Now you should try Voyeur with your text or the Frankenstein text above. To open the Frankenstein click here:

http://voyeurtools.org/?corpus=1278409278561.646

  • Some things to try:
    • Experiment with the Options (like the Stop Word list)
    • Create a Favorites list for a theme and and explore that list
    • Search for phrases

3.0 Analyzing a Corpus

In the second part of the Workshop we will look at working with a corpus or collection of many texts. We will use Voyeur on the archives of HUMANIST from 1987 to 2008 (21 documents.) The Voyeur index is at:

http://voyeurtools.org/?corpus=humanist

  • We will show you how to:
    • Show how to set various options, like stoplists
    • Show how to hide and show columns
    • Manage multiple documents
    • Show how to group results
    • Show comparing documents
  • Try looking for trends yourself

4.0 Using your own text

  • Now you can try your own text. We will show the different ways of providing Voyeur a text:
    • Typing a text or pasting it in
    • Typing in one or more URLs
    • Uploading a text
  • We will then discuss the formats of texts that will work, and what will happen to them:
    • file formats: text, HTML, XML, RSS, TEI, PDF, MS Word, RTF
    • Finally we will Discuss caching and so on
  • Now try your own text.

5.0 Exporting Data and Quoting Analytics

We will now show how to export data and quote analytical results:

  • How to export tab-separated values, copy and pasted into Excel
  • How to export of XML results from KWICsA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). See also the definitions from TADA and Wikipedia. (for instance)
  • How to quote an analytical result in TADA.
  • Go to http://tada.mcmaster.ca/Sandbox/VoyeurWorkshop to try it yourself.

6.0 Advanced and Other

7.0 To Prepare

  • Make sure we have Voyeur running with a backup
  • Sort out how participants can get on wireless
  • Powerbars for laptops
  • What texts will we use?
  • Preindex texts and create a Workshop web page on Hermeneuti.c

Appendices

This section is mostly a parking lot for miscellaneous subsections – content will eventually be moved, integrated elsewhere, or deleted.

Functionality

Some of the discussed functionality:

Input

  • from Zotero
  • from Firefox plug in
  • from portal
  • from interactive essays
  • from web site
  • from results buttons that allow recapitulatio (like in portal or taporware)
  • from links
  • from eclipse
  • panels
  • command line
  • application (eclipse)
  • swing based interface

Output:

  • out to blogs
  • output to portal research log
  • output to gathering (a tiddlywiki web page that you can save to computer) –

Tools:

  • tzeeker is a panel builder

Priority Tools:

  • cleaner
  • list words with distribution
  • comparative list words
  • search concordanceA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). See also the definitions from TADA and Wikipedia.
  • KWicA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). See also the definitions from TADA and Wikipedia.
  • repeated phrases
  • distribution
  • visual collocator

Logging:

  • ability to log what happens

Interface to frameworks

  • api that provides info like progress
  • good error response
  • Help and stuff through the framework

Temporary Workaround for Additional Tools

At the moment the default interface of Voyeur doesn't expose the range of tools that are available. As an awkward workaround, you can try this:

Click on the export icon to generate a URL of your corpus (notice that here I'm working on voyeurtools.org instead of voyeur.hermeneuti.ca):

Export Image

This will generate a URL that looks something like this: http://voyeurtools.org/?corpus=1278412802513.7776

What you can do is insert tool/<em>toolName</em> in this address to see other tools (your mileage may vary):