Voyeur Tools: See Through Your Texts

Tutorials

Voyeur Loading
Loading Texts
Voyeur Overview
Tools Overview

Introduction

Introducing Voyeur

Voyeur is the suite of tools used by Hermeneuti.ca to interpret texts and to think about tools. You too can use Voyeur to analyze your own texts, to write essays with emebedded hermeneutical panels generated by Voyeur, and you can adapt the code to create your own versions of tool. This section of the Hermeneuti.ca web site is both a tutorial and a reference.

What you can do with Voyeur

Voyeur is a new type of text analysis tool that you can use across the research cycle. You can:

How to use this manual

This manual if for novice users, experienced users, writers and developers. This manual works closely with The Rhetoric of Text Analysis where you will find example essays and discussion of text analysis in general.

Novice Users who want to get started with text analysis should:

Writers, web authors and researchers who want to embed Voyeur panels (we call them hermeneuticons) into their essays, online journals, blogs and so on should:

Developers who want to adapt Voyeur tools or develop their own should:

How Voyeur connects to Hermeneuti.ca, the book

Diagram

Voyeur is the toolset that made possible the analysis reported in Hermeneuti.ca, the book and web site you are now looking at. The book reflects on text analysis, gives examples, and discusses the decisions behind Voyeur. The web site hermeneuti.ca (note how we use the lower case when referring to the web site) includes the sections of the book and the manual for Voyeur (which you are reading now.) The two connect like this:

Graphic here

Some Background of Voyeur

Text analysis tools go back to the first ad-hoc tools that Roberto Busa created for his concordanceA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). See also the definitions from TADA and Wikipedia. of the works of Thomas Acquinas and Andrew Booth’s Mechanical Resolution of Linguistic Problems in the 1950s.

Voyeur is a suite of analysis and exploration tools for digital texts. Very few contributions to knowledge and technology are unrecognizable from what preceded1, and Voyeur is no exception: it is largely built on the foundations of text analysis tool design and methodology from over 50 years of humanities computing research2. The following are some of the tools that have most influenced text analysis tool development and Voyeur in particular:

Design Principles

Although text analysis tool developers might choose to highlight different aspects for their purposes (such as stand-alone software as opposed to web-based software), here are some of the primary design principles for Voyeur, as gleaned from other tools:

Though they have existed before to varying degrees in different tools, Voyeur is an attempt to pull together these design principles into a single a package. In some cases the the principles may in fact be contradictory in practice (for instance, supporting large-scale immediate analysis) and compromises must be found. Working through those challenges is one of the aspects that make Voyeur a worthy intellectual challenge.

HyperPo and TAPoRware are the tools with the strongest affinities to Voyeur4, but we have devoted considerable thought and attention to improving existing web-based tools in ways further described below.

Scalability. Whereas HyperPo and Taporware can readily handle book-length texts for micro-analysis, both reach their practical limits when corpora grow to beyond a couple of megabytes. In contrast, Voyeur is designed to handle much larger corpora (dozens of megabytes and beyond). There is still a practical (though undefined) limit to the size of corpora for Voyeur given that it seeks to enable immediate micro-analysis, but the Voyeur architecture is desiged with scale in mind. There will always be a tension between indexing speed and retrieval speed: the more time is available for indexing, the faster retrieval tends to be. As such, text analysis tools that require pre-indexing (Philologic, Monk, etc.) will almost always operate faster because pre-processing can be done over the course of hours or even days (building very large relational databases, for instance). In contrast, Voyeur seeks to strike a balance between indexing and retrieval speed: ideally both should happen in a timeframe that seems reasonable in a web-based context. The ever-evolving pace of computing power and the promise of high performance computers obviously make the actual capabilities a moving target.

Ubiquity. As useful as text analysis tools like HyperPo and Taporware may be, we recognize a need to allow content providers and producers (like bloggers) to quickly and easily integrate functionality into their own space. The previous model was limited to users bringing their own texts to our tools, we now wish to also allow users to also bring our tools to their texts. In some cases users will wish to have static results, in which case we can provide a mechanism for easily copying and pasting results that can be directly embedded in other content. However, much of the most compelling functionality of Voyeur is interactive and requires considerable client-side scripting: our current approach is to provide a tiny snippet of HTML that is essentially an IFRAME that contains the necessary HTML elements. This approach allows Voyeur code to remain separate from its host while satisfying security limitations of cross-browser scripting. There are of course other challenges inherent to code embedded elsewhere, including version management (supporting legacy syntax) and cacheing of data (both the corpus and results5).

Referenceability. The status of text analysis tools as academic resources has been a point of debate over the years. Scholars feel compelled to cite ideas and texts that come from other authors, but they are much less likely to recognized tools that have contributed to their work (and we would probably not want every scholar to cite search engines such as Google that have been used during research). We feel strongly that text analysis tools can represent a significant contributor to digital research, whether they were used to help confirm hunches or to lead the researcher into completely unanticipated realms. In any case, we have designed Voyeur to be conducive to citation in various ways, including a general citation to Voyeur and citations for static or dynamic results. An important component of academic knowledge is reproducibility, and providing scholars with more information on the processes followed during research – including the use of text analysis tools – is sure to be useful.

Ultimately, Voyeur is an attempt to learn from the strengths and weaknesses of past tools, to recognize current user needs (ex: working with much larger corpora), and to anticipate future practices (ex: referencing text analysis tools and results). We believe that the potential for tools in the interpretive process merits continual rethinking of tool design and functionality, and as such, Voyeur is of course a work in progress.

  1. 1. Limiting himself to the history of science, Thomas Kuhn provides examples of revolutionary advances in thinking, such as Copernican cosmology or Einstein's Theory of Relativity; Voyeur has much more modest ambitions.
  2. 2. For a brief overview of the history of humanities computing, see Hockey "History", 2002.
  3. 3. Unix is used here as shorthand for both Unix and unix-like operating systems like Linux.
  4. 4. . The affininity to Voyeur is not surprisingly given that Sinclair developed HyperPo and Rockwell developed Taporware.
  5. 5. For instance, we wouldn't want to re-run a computationally expensive process each time someone visits a popular blog, but we don't wish to cache everyone's analytic results either.

Text Analysis Recipes

Much like the O'Reilly series of Cookbooks, the text analysis recipes below are meant to provide succinct instructions for completing specific tasks, while also discussing the important concepts that would allow the instructions to be generalized to similar tasks. Each recipe follows the same basic structure:

Voyeur Tools for Tool Developers

Voyeur Tools for Users

Voyeur Tools for Web Authors

Appendices

This section is mostly a parking lot for miscellaneous subsections – content will eventually be moved, integrated elsewhere, or deleted.

Functionality

Some of the discussed functionality:

Input

  • from Zotero
  • from Firefox plug in
  • from portal
  • from interactive essays
  • from web site
  • from results buttons that allow recapitulatio (like in portal or taporware)
  • from links
  • from eclipse
  • panels
  • command line
  • application (eclipse)
  • swing based interface

Output:

  • out to blogs
  • output to portal research log
  • output to gathering (a tiddlywiki web page that you can save to computer) –

Tools:

  • tzeeker is a panel builder

Priority Tools:

  • cleaner
  • list words with distribution
  • comparative list words
  • search concordanceA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). See also the definitions from TADA and Wikipedia.
  • KWicA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). See also the definitions from TADA and Wikipedia.
  • repeated phrases
  • distribution
  • visual collocator

Logging:

  • ability to log what happens

Interface to frameworks

  • api that provides info like progress
  • good error response
  • Help and stuff through the framework

How to Contribute

Here is where we will put information about how users can contribute to code, essays, and so on.