Corpus Summary

 Corpus Summary is a tool that provides a simple, textual overview of the current corpus. Features of this tool include number of words, number of unique words, longest documents, highest vocabulary density, most frequent words, notable peaks in frequency, and distinctive words. Users can click within these features for more detailed information of the analysis.

Getting Started

When you first arrive to the Corpus Summary tool you will see one of two possible screens:

Corpus Summary without a pre-loaded corpus. See loading texts into Voyeur for help on how to proceed.

Corpus Summary with a pre-loaded corpus. You were probably given a URLA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary. that included the corpus, or you're viewing a page that has an embedded Voyeur tool in it. If you prefer, you can also start without a corpus.

Interface Elements

Corpus Summary includes the standard set of interface elements (see image to the right). For more help with these see the Voyeur Tools Standard Interface Elements page.

Standard UI Elements

Once the analysis is complete, links to words and individual documents appear. Words appear highlighted in yellow. You may click on any of these links, and it will provide detailed information pertaining to the term or document.

If we click on a specific word, we see a chart explaining its appearance and relativity in different documents within the corpus. (This is the Document Term Frequencies tool.)

Exporting

Like all Voyeur tools, Corpus Summary can be reused in a variety of ways:

  • create a link that is specific to the corpus and options that are currently being used
  • embed the current corpus and options as a tool in an external page

For more information see exporting and reusing Voyeur Tools.