Ideas from TCD Workshop

Today I gave the second part of the Trinity College, Dublin workshop on Voyant. Last week I showed them how to use Voyant. This week we went over what they had tried over the week, what questions they had, and what other tools there were. I thought this worked very well to reinforce the first session and to get at real issues in analytics. It also allowed us to talk about issues regarding individual projects.

Some things they would like:

  • The ability to define word groups so that trend graphs would have one lineA line is the string of text limited by the width of a page. Lines are often used in tokenization, and may contain parts of one or more sentences. For example "The quick brown fox jumps over the lazy dog." is a complete sentence and occurs on one line. By contrast, "Hard by a great forest dwelt a poor wood-cutter with his wife and his two children. The boy was called Hansel and the girl Gretel. He had little to bite and to break, and once when great dearth fell on the land, he could no longer procure even daily bread." spans three sentences and four lines. Return to Glossary. for a group. A group called "Flowers" might include "Rose, Flower, Carnation ..." If I remember, TACT has such functionality.
  • The ability to have interpretative tagging.
  • Good crawling and scraping tools. This isn't really for Voyant, but it is still useful for building a corpus for Voyant.
  • The ability to add and remove texts from the corpus as they experiment.
  • Named Entity Recognition and then a geo-location tool so you could map places mentioned in a text. They liked the HyperCities interface.
  • The ability with a corpus to be able to get trends and stats for subsets so you could compare Chapter 1 to Chapters 1, 2, and 3.

When I was showing them other tools I showed ManyEyes and I rather like the way people can save sets of tool/data there. That way people can look at a tool with data first and then build one. This might be a way to deal with caching corpora.