Stéfan and I were discussing the Canadian history of humanities computing and text analysis. Is it true that there is a tradition of text analysis tool development in Canada? Has this been an area of strength? How would we answer this question.
I have put together a wiki of information on the Canadian capacity in this area:
A good history of humanities computing in Canada has yet to be written.
Geoffrey R.
We are trying to figure out how to integrate the following three useful components:
Stéfan has created a cool new tool that you can see with the Humanist archive here:
http://voyant-tools.org/tool/TermsRadio/?corpus=humanist&stopList=stop.en.taporware.txt
This tool plays with ideas we have had about real-time analytics and animation of analytics. It lets you explore the evolution of themes in Humanist.
With this tool you can see clearly the explosion of interest in the web, but what is more interesting is what other words rose or fell with the web. How did the shift to the web affect humanists.
Some of the words that intrigued me were "department" and "London."
One thing we need to do is to figure out the relationship of literary text analysis and linguistic text analysis. Could the two merge? Should literary build on linguistic? Some initial differences:
An interesting exception is corpus linguistics which tends to work on larger corpora, though they use them to develop theories about language not literature.
We have added a feature now that will collapse a set of terms in the Word Trends. That allows one to develop a group of related terms and then graph the group as a whole. This is important given the forms a word or theme can take.
Stéfan also ran Hume's Dialogues Concerning Natural Religion through Mallet and here are the topics proposed:
It hardly seems fair for only me to have homework. Here's yours: look through the list of 20 topic clusters below to see if there's anything of interest. I took the entire dialogue, distilled the nouns, broke it up into parts, and fed it through Mallet to do topic modelling.
We have a SSHRC funded project that is looking at how just what people do with text analysis. Some of the interesting points that came up:
I (Geoffrey Rockwell) am giving a workshop on Voyant at the Kansas 2012 THATcamp. This time we had a number of backup servers set up and they all worked well. Some participants were working with Arabic that worked, to a degree. Stéfan set up a system that resolves to different servers:
http://bit.ly/VoyantCirrusFrankenstein
resolves to
http://resolve.voyant-tools.org/tool/Cirrus/?corpus=frankenstein&stopLis...
which then redirects to
http://temp.voyant-tools.org/tool/Cirrus/?corpus=frankenstein&stopList=s...
That's in "workshop" mode (where the temp instance is favoured). If you remove the incontext part of the urlA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary.
http://resolve.voyant-tools.org/tool/Cirrus/?corpus=frankenstein&stopLis...
it resolves to the main server.
http://voyant-tools.org/tool/Cirrus/?corpus=frankenstein&stopList=stop.e...
Some of the issues/questions that came up:
We have finished another chapter, the one that provides a history of text analysis from concordancesA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary. to ubiquitous analytics. Hurrah!
The first draft of The Measured Word chapter is done. This is a chapter that goes through what we can do with a computer to texts. It introduces stringA string is a series of characters (symbols, letters or numbers) of finite length. Strings are used to generate a collocation, concordance, co-occurrence, or any other type of textual analysis in which locating a word fragment, word, phrase, sentence and so on is important. For more information, see the Wikipedia. Return to Glossary. processing for humanists. The chapter covers stuff that many digital humanists will know, but it is meant as an introduction to thinking like a computer about texts.
The frame of the chapter is a tradition of thinking about artificial life, intelligence and interpretation which includes Pygmalion, Frankenstein, Searle's Chinese room, Dreyfus on AI and Powers. Richard Powers has a brilliant novel Galatea 2.2 where the narrator (a semi-biographical Richard Powers) helps train an AI designed to pass a Masters English exam as a version of a Turing test. The story uses this challenge to revist the story of Pygmalion and Galatea - the story of an artist (trainer) getting close with their creation. The story deals with computer assisted interpretation - an AI trained to respond to exam questions about a literary text - something we are trying to do with, though differently. We are not trying to build artificial interpreters but interpretative aides or tools to augment our interpretation.
In our conference call today we also discussed the next steps. We looked at some issues with word trends and single texts. We talked about how topic modeling should be promising for the Game Studies experiment.
We reviewed what has to be written and changed our outline a bit. We are introducing a 4th example on Hume's Dialogues Concerning Natural Religion. This will give us:
We are also going to experiment with Topic Modelling (on the Game Studies corpus) and Mandala.