Skin

Looking for trends over time

Having sent out the prospectus (and keeping our fingers crossed) we are now turning back to the second experiment tentatively called "The Swallow Flies Swiftly Through." The idea is to try to understand humanities computing / digital analysis using text analysis. This uses Humanist as a corpus so it lets us work with a large corpus. It also forced us to think about how to do diachronic analysis. By creating a corpus from each year and naming them right we can get distribution graphs over time. We also, with support from the Digging Into Data challenge, developed a correspondence analysis tool that works well.

One way we are studying the corpus is looking at the words (across the corpus) that have a skew one way or another. The idea is to look for words trending up or down. Here is a simple skin I developed for going through the words with the distribution tool and KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary. for checking.

http://www.voyeurtools.org/?skin=custom&corpus=humanist&layout=%5B%7B%22r1%22%3A%22c3%22%2C%22i1%22%3A%5B%7B%22x1%22%3A%22CorpusTypeFrequenciesGrid%22%7D%5D%7D%2C%7B%22r1%22%3A%22e1%22%2C%22i1%22%3A%5B%7B%22x1%22%3A%22TypeFrequenciesChart%22%7D%5D%2C%22s1%22%3Atrue%2C%22c1%22%3Afalse%2C%22c2%22%3Afalse%2C%22w1%22%3A%2289p1%22%7D%2C%7B%22r1%22%3A%22s2%22%2C%22i1%22%3A%5B%7B%22x1%22%3A%22DocumentTypeKwicsGrid%22%7D%5D%2C%22s1%22%3Atrue%2C%22c1%22%3Afalse%2C%22c2%22%3Afalse%2C%22h1%22%3A%2229p1%22%7D%5D

By playing with the settings (set TAPoRware stopwords and a Z-scoreA z-score is an expression of how many standard deviations higher or lower a data point is from the mean. For more information, see the Wikipedia. Return to Glossary. of 2 or more) I can narrow the list of words down and then go through them manually.

Some things I need:

  • We need a way to identify high frequency phrases like "digital humanities". I can trace the phrases, but I would like to know which frequent n-grams are out there.
  • I need to understand Skew and Peakedness better. Do they really show words that trend? What do I meanIn statistics, the mean is the arithmetic average of a set of values. When used in text analysis, the set of values is the distribution of words in the source text, and the mean value the word with the occurrence rate closest to the average. For more information, see the Wikipedia. Return to Glossary. by trend?
  • I want to be able to export full lists of words/KWICsA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary. and so on. I know it is expensive, but it would let Voyeur work with other tools. Perhaps that is what Voyeur Notebooks will do.

 

 

Syndicate content