Project Blog

Trends with Character

Word Trend with Character

We were exploring the Game Studies corpus using Word Trends and came across a trend with what looks like a game character. I wonder if there is an art form here - creating texts that can be trended into different graphics.

New Voyant Tools Screencasts

Mark Turcato, working with Stéfan, has produced some new screencasts.

Converting Files

In order to use Voyant you need to convert your files to text files, HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers. HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this: < body >< p >< /p >< /body > In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body. Elements may also be modified by attributes and attribute values: < p class="hangingindent" > In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. files or XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. files. Voyant will try to extract textTo extract text is to remove HTML or XML elements from it. This process returns a plain text document. All text can be extracted from an HTML or XML document, or only the text within particular elements. Return to Glossary. from a PDF or other file types, but may not do such a good job of it. I just came across a file conversion web site called Zamzar which seems to do all sorts of conversions.

Named Skins

We talked about having named skins that users can choose from when using Voyant. Here is a list of possible ones:

  • Simple Text Analysis: A skin for studying a single text with the Words in Documents, Trends, and KWiCA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary..
  • Simple Corpus Analysis: A skin for studying a corpus with the Words in Entire Corpus, Words in Documents, Trends, and KWiCA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary..
  • Corpus Reading: A skin for reading a large corpus with Cirrus, Summary, Words in Entire Corpus, Trends, Words in Documents, KWiCA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary. and Collocates. This is essentially the normal skin without the Reader and without the Collocates.
  • Exploring Correspondence: The Scatter skin that we have now.
  • Exploring Social Network: RezoViz with the Trends and KWiCA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary. as the Scatter has.

What else?

Working Together in Montreal

We are spending 3 days together in Montreal working on another experiment. This experiment is looking at 10 years of Game Studies using named entity recognition and a force-directed network graph to study the social/research network of people. You can see our experiment notes here. The essay we are working on is called Name Games: Exploring influence in a research community. As part of the experiment we are hacking the ResoViz tool.

Ideas from TCD Workshop

Today I gave the second part of the Trinity College, Dublin workshop on Voyant. Last week I showed them how to use Voyant. This week we went over what they had tried over the week, what questions they had, and what other tools there were. I thought this worked very well to reinforce the first session and to get at real issues in analytics. It also allowed us to talk about issues regarding individual projects.

Some things they would like:

  • The ability to define word groups so that trend graphs would have one lineA line is the string of text limited by the width of a page. Lines are often used in tokenization, and may contain parts of one or more sentences. For example "The quick brown fox jumps over the lazy dog." is a complete sentence and occurs on one line. By contrast, "Hard by a great forest dwelt a poor wood-cutter with his wife and his two children. The boy was called Hansel and the girl Gretel. He had little to bite and to break, and once when great dearth fell on the land, he could no longer procure even daily bread." spans three sentences and four lines. Return to Glossary. for a group. A group called "Flowers" might include "Rose, Flower, Carnation ..." If I remember, TACT has such functionality.
  • The ability to have interpretative tagging.
  • Good crawling and scraping tools. This isn't really for Voyant, but it is still useful for building a corpus for Voyant.
  • The ability to add and remove texts from the corpus as they experiment.
  • Named Entity Recognition and then a geo-location tool so you could map places mentioned in a text. They liked the HyperCities interface.
  • The ability with a corpus to be able to get trends and stats for subsets so you could compare Chapter 1 to Chapters 1, 2, and 3.

When I was showing them other tools I showed ManyEyes and I rather like the way people can save sets of tool/data there. That way people can look at a tool with data first and then build one. This might be a way to deal with caching corpora.

Voyant at Trinity College Dublin

Yesterday I ran the first of two sessions to a Voyant workshop at Trinity College Dublin. You can see the script for the first session here. I'm experimenting with breaking the workshop into two sessions a week apart so that:

  • Participants can try Voyant on their own stuff for a week and then come back and ask questions.
  • I can prepare follow up materials based on what comes up in the first session.
  • If things are broken (which almost always happens) then I have a week to fix them.
  • We can have a good discussion in the second session about what can be done with text analysis or not.

I'll post after the second session about how this works, but the first session went well. I do, however, have some thoughts based on the workshop:

  • For folks in the cultural sector it would be nice to have entity recognition tools. These types of tools don't really fit the reading paradigm, but a lot of people here seem to want to do that. They want to see how people, places and organizations show up.
  • For the folks doing close literary interpretation of a single text (or small number of texts) we should think about whether we could offer a COCOA type markup environment where you could tagA tag, also called an element, is characteristically used within HTML and XML to apply characteristics (such as headings, paragraphs or user-defined categories) or metadata to a document, usually a text. HTML and XML tags generally appear in matching pairs of an opening and a closing tag, with text in between. All text within a tag pair is modified by that tag, and one tag pair may be nested inside another. In the case of HTML, tags are used to format a text directly, or as a delimiter for CSS formatting to the text within that tag. An HTML paragraph tag: < p >< /p > In the case of XML, tags may be also be used as a delimiter for CSS formatting to the text within that tag, but its primary purpose is to apply metadata to that text. Ex: < book format="hardcover" >< /book > Both HTML and XML tags may be modified with attribute/value pairs. In the above example, format="hardcover" is the attribute/value pair modifying the tag < book >. Return to Glossary. interpretatively and use Voyant to then analyze the tagged text. Something like this is being done in the collaboration with CATMA.

Talking about Voyant to a Digital Humanities class

I (Geoffrey Rockwell) have been asked to talk to a Digital Humanities class led by Dr. Lauren Klein that is using Voyant. They have posted some interesting questions here. Here is some of the stuff I hope to cover:

Background to Voyant

  • Slowly replacing previous generation of tools - HyperPo and TAPoRware
  • Trying build an online tool that would scale to handle large corpora - this meant indexing
  • The TAPoR project brought us together

Agile Interpretation

  • Has come out of a practice we started of doing text analysis together - pair or extreme analysis
  • Doing analysis and building tools - trying to do it in a day - see http://tada.mcmaster.ca/Main/EtaMay0107 (sort of log about what we did)
  • This led to wanting to embed results in our papers - hermeneuti.ca

Rereading Tools

  • Another idea we were playing with (that comes from HyperPo) is that of a reading tool rather than a black-box analytic.
  • Voyant lets you compose skins to create your custom reading environment - now we are trying to figure out how to chain things properly
  • We have a small grant to create Voyant Notebooks - a notebook environment similar to Mathematica notebooks

Visualizations

  • I got started on text visualization in the early 1990s when I worked with John Bradley. We were looking at scientific visualization environments and wondering how they could be applied to analytics - we developed a prototype visual programming environment called Eye-ConTact and we did some interesting work using correspondence analysis on David Hume's Dialogues Concerning Natural Religion.
  • To the best of my knowledge Daryl Raymond at the New OED project at Waterloo was the first to experiment with visualization.
  • Some of our visualizations are standard ways to display statistical views. Some come from applying visual ideas from different spheres and some are inspired by other projects. Some even come from other projects - some from Laura Mandell that we hooked into Trombone. The code is often open source that is adapted. For example we are now playing with network visualization - see http://rezoviz.voyant-tools.org/humanist/#/all

Voyant Workshop at DH 2012

We will be running a workshop on Voyant at DH 2012. See Stéfan Sinclair's blog entry.

Game Studies

We are now playing with the corpus of all the articles in the journal Game Studies. It seemed like a nice way to see how a tool can give one a sense of a field or at least a journal in a field. We are trying to get ResoViz to work properly so we can do a network analysis of who is connected to who in the game studies field. Alas we just had "Homo Ludens" connected to everyone like a big firework explosion.

Syndicate content