Methods Commons

Methods Commons Logo

Welcome to the Methods Commons.

Computation has produced new and exciting ways of studying texts.  Many of these methods do not require the use of expensive programs or detailed programming knowledge, but only the know-how to combine freely accessible resources to perform various tasks.

This book describes common or interesting sequences of actions, or recipes. They are organized according to the objective of the recipe. Recipes fall into the three major categories of location and identification of ideas, themes or specific terms; analysis of textual devices or themes; or the construction of new entities or corpora. There are also a set of three tutorial recipes included to introduce three common and specific tasks using TAPoR Tools, and a series of experimental draft recipes that are still under construction.

The Methods Commons community benefits from shared experience and learning how others make use of recipes. You can share your experience by adding your own recipes to the collection. More information about recipe and exercise structure and authoring is available on the RecipeStructure page. We also have a Glossary that we hope you will add to.


Table of Contents

Analyse

Methods in this section constitute the heart of text analysis, and allow texts to be measured, visualized, or otherwise dissected in innovative and interesting ways.

Analyze Blog Discourse

Introduction

This recipe uses List Words, Concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. and CollocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary. tools to explore themes in blog discourse. It is important to keep individual blog entries around the same length to ensure consistent results when analyzing your compiled text.

 

Ingredients

  • A collection of blog entries on your area of research, compiled into a text.
  • A list words tool such as the TAPoR List Words Tool
  • A CollocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary. tool such as the TAPoR Find Collocates Tool
  • A Concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. tool such as the TAPoR Find Concordance Tool

 

Steps

  1. Collect blog entries on your research topic by searching an online blog search such as Google Blogs.
  2. Compile individual blog entries into a single text using a text editor.
  3. Use the TAPoR List Words Tool to identify words that stand out, or lead you to consider trends and themes in your collected blog discourse.
  4. Use the TAPoR Find Collocates Tool with various words identified in your list words search to identify themes in your text.
  5. Use the TAPoR Find Concordance Tool with various words identified in your find collocates search that yielded interesting results.
  6. Read through interesting concordancesA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary. to consider how authors in your text of blog discourse can help you to think differently about your research question.

 

Discussion

  • Finding a Text

The easiest way to collect blog entries is to use Google Blogs. When choosing which blog entries to include, remain consistent with your search criteria. In other words, keep the size, topic and date range of your search consistent throughout to ensure consistent results.

One way to organize your blog entries is to copy them into a bibliographic database like EndNote. You can then add keywords and metadata with which to sort or select subsets. You can also create styles to export the entries with XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. tags for analysis.

 

Glossary

A Complete Glossary

 

Next Steps/Further Information

 

Comments

Compare Texts to Verify Authorship

Introduction

This recipe takes two works purported to come from the same author and uses tools such as distribution, Word Lists, etc. to suggest whether they may have been created by the same author.

This recipe and exercise will soon be available as a PDF download.

 

Ingredients

  • Two electronic texts from the same author to explore
  • A CollocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary. tool such as the TAPoR Find Collocates Tool
  • A Distribution tool such as the TAPoR Pattern Distribution Tool
  • A List Words tool such as the TAPoR List Words Tool
wrench.gif This recipe is applied to a sample text in Exercise to Compare Texts to Verify Authorship

 

Steps

  1. Obtain comparison texts from a source such as Project Gutenberg or use ones which you already have. ;
  2. Login to the TAPoR portal;
  3. Generate a word list (sorted by frequency) using the TAPoR List Words Tool and save the results to the Databench The Databench is a temporary workspace where you can store your text analysis results in the TAPoR for further use. For more information see this TAPoR tutorial. Return to Glossary. with a unique name;
  4. Run the TAPoR Pattern Distribution Tool on the text and save the results to the Databench The Databench is a temporary workspace where you can store your text analysis results in the TAPoR for further use. For more information see this TAPoR tutorial. Return to Glossary. with a unique name;
  5. Repeat these steps on all the comparison texts;
  6. Compare the two results visually for similaritie and differences in word usage and distribution;

 

Discussion

  • Finding a Text

Possible sources for electronic texts are listed on the Electronic Texts Panel of TAPoR. When preparing text for analysis, you should be aware that academic infrastructure included in the text may obstruct reading the text for its original construction. It may be useful to remove notes and other materials added by subsequent authors from the original work. You can use tools such TAPoR Extract Text to remove added material.

 

Glossary

A Complete Glossary

 

Next Steps/Further Information

 

Comments

Compile Textual Data and Visualize the Results Using Excel

Introduction

This recipe takes data provided by textual analysis tools and uses Microsoft Excel to create graphs to aid in its interpretation.

This recipe is available as a PDF download.

 

Ingredients

wrench.gif This recipe needs a good exercise to show how to Compile Textual Data and Visualize the Results Using Excel?

 

Steps

  1. Take an electronic text from a source such as Project Gutenberg;
  2. Tidy up the text using a tool such as TAPoR Neko Transformer Tool;
  3. Generate a list of words used in the text using the TAPoR List Words Tool;
  4. Generate a frequency distribution chart using TAPoR Pattern Distribution Tool for a particular word, phrase or partial word;
  5. Save the results to your computer as an XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. file;
  6. Open the results in Microsoft Excel;
  7. Graph the results as a chart

 

Next Steps/Further Information

 

Comments

Explore Dynamically Aggregated Text

Introduction

This recipe uses the Googlizer The Googlizer queries the Google search engine using a word or phrase you provide and returns the results of the Google search. For more information see this TAPoR tutorial. Return to Glossary. Tool to generate Dynamically Aggregated Text and uses tools such as a frequency list and concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. to explore the results.

This recipe and exercise are available as a PDF download.

 

Ingredients

  • An Aggregation Tool such as TAPoR Googlizer
  • A Concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. tool such as the TAPoR Find Words - Concordance Tool
  • A List Words tool such as the TAPoR List Words Tool
wrench.gif This recipe is applied to a sample text in Exercise to Explore Dynamically Aggregated Text

 

Steps

  1. Login to the TAPoR portal;
  2. Generate an aggregated text by supplying search terms to an aggregation tool such as TAPoR Googilizer ;
  3. Save the results to the Databench The Databench is a temporary workspace where you can store your text analysis results in the TAPoR for further use. For more information see this TAPoR tutorial. Return to Glossary. ;
  4. List the words by frequency in the aggregated text using a tool such as TAPoR List Words Tool.
  5. Explore the aggregated results using a tool such as TAPoR Find Words - Concordance Tool with your search term as the target.

 

Glossary

 

The Databench The Databench is a temporary workspace where you can store your text analysis results in the TAPoR for further use. For more information see this TAPoR tutorial. Return to Glossary. is a temporary workspace where you can store your text analysis results in the TAPoR for further use.
The Googlizer The Googlizer queries the Google search engine using a word or phrase you provide and returns the results of the Google search. For more information see this TAPoR tutorial. Return to Glossary. queries the Google search engine using a word or phrase you provide and returns the results of the Google search.

A Complete Glossary

 

Next Steps/Further Information

 

Comments

Explore Themes Within a Text

Introduction

This is a recipe to explore simple themes within a text.

This recipe is available as a PDF download.

 

Ingredients

  • An electronic text to explore
  • A Theme of exploratory interest
  • A List Words tool such as the TAPoR List Words Tool
  • A Synonym Finding Tool such as Eva at POETS
  • A Concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. tool such as the TAPoR Find Words - Concordance Tool
  • A CollocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary. tool such as the TAPoR Find Collocates Tool
wrench.gif This recipe is applied to a sample text in Exercise to Explore Themes within a Text

 

Steps

  1. Take an electronic text which is rich in known themes or obtain an appopriate text from a source such as Project Gutenberg ;
  2. Generate a list of words related to the theme you are interest in using a tool such as WordNet ;
  3. Determine the Sense(s) of the words that are related to that theme ;
  4. Use those words to Build a word list with synonyms, antonyms, and cognates using a tool such as WordNet ;
  5. Create a concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. for your word list using TAPoR Find Words - Concordance Tool;
  6. Explore different usages of the same word using Find Collocates Tool to determine usage patterns ;
  7. Repeat the above step for all words in the list.

 

Discussion

  • The Sense of a Word

A word’s sense is the way in which it is used within the contextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph. Context is particularly important when generating a concordance for a string. Return to Glossary. of the text. Word Sense Disambiguation is the process through which the various senses of a word are considered within the contextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph. Context is particularly important when generating a concordance for a string. Return to Glossary. of its specific usage. A service such as Eva at POETS provides a list of senses in which a word can be used.

  • WordNet Service

WordNet is one of many web services available which will provide word senses, synonyms, antonyms and other related words for terms that you enter. For more information see WordNet or Eva at POETS.

 

Glossary

  • CollocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary.
CollocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary. refers to the occurrence of words adjacently more often than would be expected by chance.

A Complete Glossary

 

Further approaches

  • Find words that avoid the theme
  • Cluster analysis on themes
  • Distribution of theme
  • Comparison

 

Next Steps/Further Information

 

Comments

Exploring Colloquial Word Usage in a Text

Introduction

This recipe takes a text and explores the use of colloquial words within it using tools such as Googlizer The Googlizer queries the Google search engine using a word or phrase you provide and returns the results of the Google search. For more information see this TAPoR tutorial. Return to Glossary., concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary., co-occurrence and collocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary..

 

Ingredients

  • An Aggregator Tool such as TAPoR Googlizer

    or
  • An electronic text to explore
  • A specific colloquial term of interest
  • A List Words Tool such as TAPoR List Words Tool
  • A Concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. tool such as the TAPoR Find Words - Concordance Tool
  • A CollocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary. tool such as the TAPoR Find Collocates Tool
  • A Co-occurrence tool such as the TAPoR Find Co-occurrence Tool
wrench.gif This recipe is applied in Exercise Exploring Colloquial Word Usage in a Text.

 

Steps

  1. Login to the TAPoR portal;
  2. Generate an aggregated text by supplying search terms to an aggregation tool such as TAPoR Googlizer ;
  3. Save the results to the Databench The Databench is a temporary workspace where you can store your text analysis results in the TAPoR for further use. For more information see this TAPoR tutorial. Return to Glossary. ;
  4. List the words by frequency in the aggregated text using a tool such as TAPoR List Words Tool.
  5. Explore more frequently returned words using a tool such as TAPoR Find Words - Concordance Tool.

 

Glossary

The Databench The Databench is a temporary workspace where you can store your text analysis results in the TAPoR for further use. For more information see this TAPoR tutorial. Return to Glossary. is a temporary workspace where you can store your text analysis results in the TAPoR for further use.
The Googlizer The Googlizer queries the Google search engine using a word or phrase you provide and returns the results of the Google search. For more information see this TAPoR tutorial. Return to Glossary. queries the Google search engine using a word or phrase you provide and returns the results of the Google search.

A Complete Glossary

 

Next Steps/Further Information

 

Comments

Exploring Concepts

Introduction

This recipe takes a text which is rich in concepts and uses tools such as word frequency, concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary., co-occurrence and collocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary. to explore a specific concept.

 

Ingredients

  • An electronic text which is rich in concepts
  • A CollocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary. tool such as the TAPoR Find Collocates Tool
  • A Concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. tool such as the TAPoR Find Words - Concordance Tool
  • A Co-OccurenceCo-occurrence is the number of times two patterns occur in a set order within a set distance of one another in a source text. For more information, see the Wikipedia. Return to Glossary. tool such as the TAPoR Find Co-occurrence Tool
  • A List Words tool such as the TAPoR List Words Tool
wrench.gif This recipe needs a good exercise to demonstrate how to Explore Concepts?

 

Steps

  1. Use a List Words tool such as TAPoR List Words Tool to list the words in a text sorted by frequency.
  2. Choose a concept term that appears in the text ;
  3. Use a Concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. tool such as the TAPoR Find Words - Concordance Tool to see how the term is used;
  4. Use a CollocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary. tool such as the TAPoR Find Collocates Tool to see what words collocate with the original concept term;
  5. Repeat 1 - 3 using collocating terms.
  6. Use a Co-occurenceCo-occurrence is the number of times two patterns occur in a set order within a set distance of one another in a source text. For more information, see the Wikipedia. Return to Glossary. tool such as TAPoR Find Co-occurrence Tool find passages that have pairs of concept terms.

 

Discussion

  • Concepts

Concepts are usually discussed in a text using unambiguous vocabulary so search for words associated with a concept will find relevant passages. This recipe shows how you can follow a web of concepts looking at words that co-locate.

 

Glossary

A Complete Glossary

 

Next Steps/Further Information

 

Comments

Exploring Word Sense and Tense in a Simple Text

Introduction

This recipe takes a text and explores the tenses and senses of word usage by combining the use of a sense finding service, the Concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. and CollocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary. Tools.

 

Ingredients

  • An electronic text to explore
  • A Sense Finding Tool such as Eva at POETS
  • A CollocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary. tool such as the TAPoR Find Collocates Tool
  • A Concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. tool such as the TAPoR Find Words - Concordance Tool
  • A List Words Tool such as TAPoR List Words Tool
wrench.gif This recipe needs a good exercise to demonstrate how to Explore Word Sense and Tense in a Simple Text?

 

Steps

  1. Take an electronic text from a source such as Project Gutenberg or one which you may already have. ;
  2. Generate a word list (sorted by frequency) using the TAPoR List Words Tool;

    or
  3. Compile a list of words that you are particularly interested in;
  4. Explore these keywords using TAPoR Find Words - Concordance Tool to find their contextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph. Context is particularly important when generating a concordance for a string. Return to Glossary.;
  5. Use a sense finding service such as Eva at POETS to obtain a list of known senses of particular words;
  6. Explore how the sense of particular words is related to key words by using TAPoR Find Collocates Tool;

 

Discussion

  • Finding a Text

Possible sources for electronic texts are listed on the Electronic Texts Panel of TAPoR. When preparing text for analysis, you should be aware that academic infrastructure included in the text may obstruct reading the text for its original construction. It may be useful to remove notes and other materials added by subsequent authors from the original work. You can use tools such TAPoR Extract Text to remove added material.

  • Using a Word List

The word list can provide a first clue about the nature of the text. Questions which can be asked of the word list may include:

  1. What are the basic preoccupations of this text?
  2. What is unusual in the text?

 

Glossary

A Complete Glossary

 

Next Steps/Further Information

 

Comments

Exploring a Text for Theoretical Foundation

Introduction

This recipe takes a text and explores its use of theory by using tools such as word list, concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary., and collocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary..

This recipe and exercise are available as a PDF download.

 

Ingredients

  • An electronic text to which is rich in theoretical discussion
  • A List Words Tool such as TAPoR List Words Tool
  • A Concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. tool such as the TAPoR Find Words - Concordance Tool
  • A Synonym Finding Tool such as WordNet
wrench.gif This recipe is applied to a sample text in Exercise Exploring a Text for Theoretical Foundation

 

Steps

  1. Prepare Text;
  2. Log in to TAPoR;
  3. Generate a word list (sorted by frequency) using the TAPoR List Words Tool;
  4. Save the results to the Databench The Databench is a temporary workspace where you can store your text analysis results in the TAPoR for further use. For more information see this TAPoR tutorial. Return to Glossary. ;
  5. Perform a wildcard search for *ism, and other suffixes suggestive of theory in the frequency list;
  6. Explore the words found individually using TAPoR Find Words - Concordance Tool to understand their contextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph. Context is particularly important when generating a concordance for a string. Return to Glossary.;
  7. Repeat this for all found words;
  8. Use a antonym finding tool such as WordNet to compile a list of words referring to opposing concepts to those previously found;
  9. Look for contrary theoretical references using TAPoR Find Words - Concordance Tool for the antonymical references;

 

Discussion

  • This technique involves querying for suffixes that suggest that the subject matter is theoretical in nature. Other suffixes that may prove useful could include: istic, sophy, ism, ist, ize, ise, arian, etc. and collocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary. with the word theory itself.

 

Glossary

The Databench The Databench is a temporary workspace where you can store your text analysis results in the TAPoR for further use. For more information see this TAPoR tutorial. Return to Glossary. is a temporary workspace where you can store your text analysis results in the TAPoR for further use.

A Complete Glossary

 

Further approaches

  • AggregateAn aggregate is a text formed by sequentially joining together a number of texts from a variety of sources into one large text. Return to Glossary. collected text for a particular theory and explore it using this recipe.

 

Next Steps/Further Information

 

Comments

Extract Dialogue from a Screenplay to Explore Linear Discourse

Introduction

This recipe extracts and examines a character’s dialogue from a play to explore a particular discourse in a linear fashion.

This recipe and exercise will soon be available as a PDF download.

 

Ingredients

wrench.gif This recipe is applied to a sample text in Exercise to Extract Dialogue from a Screenplay to Explore Linear Discourse

 

Steps

  1. Get an the electronic text of a play or screen play;
  2. Extract dialogue from text using the TAPoR Extract Text;
  3. Consider the evolution or form of the discourse using TAPoR List Words Tool or the distribution of particular words using the TAPoR Pattern Distribution Tool;

 

Discussion

 

Glossary

A Complete Glossary

 

Next Steps/Further Information

 

Comments

Follow Changes in Language Use by a Particular Writer

Introduction

This recipe uses the Googlizer The Googlizer queries the Google search engine using a word or phrase you provide and returns the results of the Google search. For more information see this TAPoR tutorial. Return to Glossary., frequency lists, concordancesA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary. and collocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary. to explore how a writer’s use of language changes over a lifetime.

 

Ingredients

  • A collection of electronic texts by a writer whose changing use of language you wish to explore
  • A Thesaurus Tool such as Eva at POETS
  • An Aggregation Tool such as TAPoR Googlizer
  • A CollocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary. tool such as the TAPoR Find Collocates Tool
  • A Concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. tool such as the TAPoR Find Words - Concordance Tool
  • A List Words tool such as the TAPoR List Words Tool
wrench.gif This recipe needs a good exercise to demonstrate how to Follow Changes in Language Use by a Particular Writer

 

Steps

  1. Log in to the TAPoR portal;
  2. Build a collection of works representing the partiular writer's oeuvre, either by obtaining electronic versions from a sources such as Electronic Text Center at UVA or by using the TAPoR Googlizer;
  3. Generate a frequency list from the aggregated text using TAPoR List Words Tool to appreciate the nature of the writer's vocabulary;
  4. Save the resulting list to your workbench This is the analysis area of the TAPoR in which you apply text analysis tools to texts. For more information on Workbench see TAPoR Tutorial on the Workbench. Return to Glossary. with a distinct label;
  5. Choose select vocabulary to explore from the word list;
  6. Apply a CollocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary. tool such as TAPoR Find Collocates Tool to each of the keywords to select related vocabulary;
  7. Explore the results of your collocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary. for other fixed phrases of interest that don’t involve your keywords;
  8. Use a concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. tool such as TAPoR Find Words - Concordance Tool to locate other instances of those or related phrases;
  9. Repeat this process for each text you are interested in;
  10. Compare the individual frequncy lists that you saved to your workbench This is the analysis area of the TAPoR in which you apply text analysis tools to texts. For more information on Workbench see TAPoR Tutorial on the Workbench. Return to Glossary. considering apparent similarities and differences.

 

Discussion

  • Why Login

When using an aggregation tool such as the Googlizer The Googlizer queries the Google search engine using a word or phrase you provide and returns the results of the Google search. For more information see this TAPoR tutorial. Return to Glossary., you must be able to save text to the Databench The Databench is a temporary workspace where you can store your text analysis results in the TAPoR for further use. For more information see this TAPoR tutorial. Return to Glossary. as part of the process. To make this possible you must be logged into the system to maintain your own personal workspace. If you require access to TAPoR please visit the TAPoR signup page

  • Why Use the Thesaurus

After finding the list of words from the writers work, it is useful to use the thesaurus tool to find related words that you can use to explore further nuances of the writers’ changing use of language. The key words that you identify as points of exploration themselves mat have evolved themselves and the subtle changes in word choice can be identified through contrasting synonyms.

 

Glossary

The Googlizer The Googlizer queries the Google search engine using a word or phrase you provide and returns the results of the Google search. For more information see this TAPoR tutorial. Return to Glossary. queries the Google search engine using a word or phrase you provide and returns the results of the Google search.

A Complete Glossary

 

Next Steps/Further Information

 

Comments

Test Assumptions about Syntactic Dependencies within a Text

Introduction

This recipe takes a text with known syntactic dependencies and explores those using tools such as Word List, Concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary., Co-Occurrence and CollocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary..

 

Ingredients

  • An electronic text containing known dependencies to explore
  • A CollocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary. tool such as the TAPoR Find Collocates Tool
  • A Concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. tool such as the TAPoR Find Words - Concordance Tool
  • A Co-occurenceCo-occurrence is the number of times two patterns occur in a set order within a set distance of one another in a source text. For more information, see the Wikipedia. Return to Glossary. Tool such as TAPoR Find Co-occurrence Tool
  • A List Words tool such as the TAPoR List Words Tool
wrench.gif This recipe needs a good exercise showing how to Test Assumptions about Syntactic Dependencies within a Text?

 

Steps

  1. Take an electronic text from a source such as Project Gutenberg ;
  2. Generate a word list (sorted by frequency) using the TAPoR List Words Tool ;
  3. Identify words that may be syntactically dependant;
  4. Use a Co-Occurrence Tool such as TAPoR Find Co-occurrence Tool to find examples of these word combinations;
  5. Use a CollocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary. tool such as TAPoR Find Collocates Tool to generate a list of other relationships.

 

Discussion

  • Finding a Text

Possible sources for electronic texts are listed on the Electronic Texts Panel of TAPoR. When preparing text for analysis, you should be aware that academic infrastructure included in the text may obstruct reading the text for its original construction. It may be useful to remove notes and other materials added by subsequent authors from the original work. You can use tools such TAPoR Extract Text to remove added material.

  • Using a Word List

The word list can provide a first clue about the nature of the text. Questions which can be asked of the word list may include:

  1. What are the basic preoccupations of this text?
  2. What is unusual in the text?

 

Glossary

A Complete Glossary

 

Next Steps/Further Information

 

Comments

Use Text Analysis to Clarify the Intentions of Your Own Writing

Introduction

This recipe uses frequency lists and concordancesA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary. to determine the impact and clarity of your own writing in meeting your objectives.

This recipe is available as a PDF download.

 

Ingredients

  • A sample or complete text or you own;
  • A List Words tool such as the TAPoR List Words Tool
  • A Concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. tool such as the TAPoR Find Words - Concordance Tool
wrench.gif This recipe needs a good exercise to demonstrate how to Use Text Analysis Tools to Clarify the Intentions of Your Own Writing?.

 

Steps

  1. Appreciate the message you are hoping to communicate in your prose;
  2. Generate a list of most frequently used words in your document with a List Words tool such as the TAPoR List Words Tool;
  3. Evaluate the closeness of the words returned to your desired message;
  4. Explore possibly contradictory words through the TAPoR Find Words - Concordance Tool;
  5. Modify your text to better reflect your intentions.

 

Glossary

A Complete Glossary

 

Next Steps/Further Information

 

Comments

Visualize Scholarly Trends

Introduction

This is a recipe to use textual visualization tools to identify streams of thought, trends and potential avenues for scholarly investigation

 

Ingredients

  • An area of scholarly interest
  • An Aggregated Collection of Abstracts
  • A WordCloud tool such as the TAPoR Word Cloud Tool
  • A Visual CollocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary. tool such as the TAPoR Visual Collocator Tool
wrench.gif This recipe needs a good exercise to demonstrate how to Visualize Scholarly Trends?

 

Steps

  1. Consider an area of potential scholarly interest;
  2. Collect a representative series of abstracts from existing scholarship ;
  3. AggregateAn aggregate is a text formed by sequentially joining together a number of texts from a variety of sources into one large text. Return to Glossary. the text of these abstracts using a dedicated text aggregator or simply by using a text editor;
  4. Generate a word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. Return to Glossary. from the aggregated abstracts using a tool such as the TAPoR Word Cloud Tool ;
  5. Consider the word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. Return to Glossary. identifying most common words and mentally constructing word groups;
  6. Generate a visual collocaton diagram using a tool such as the TAPoR Visual Collocator Tool;
  7. Explore particularly interesting words through the interactive diagram returned;
  8. Construct an avenue for further exploration from the identified trends, gaps and questions raised through visual examination.

 

Discussion

  • Finding Abstracts

Possible sources for electronic abstracts are listed on the Electronic Texts Panel of TAPoR.

 

Glossary

  • Word Cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. Return to Glossary.
A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text.

A Complete Glossary

 

Next Steps/Further Information

 

Comments

Construct

Sometimes text analysis requires building digital objects.  Methods in this section allow for the creation of tools, diagrams, and other objects.

Build a Social Network Map from a Text

Introduction

This recipe extracts information about perceived social networks from a text populated with references to individuals.

 

Ingredients

  • An electronic text containing known relationships to explore
  • A CollocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary. tool such as the TAPoR Find Collocates Tool
  • A Concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. tool such as the TAPoR Find Words - Concordance Tool
  • A Co-occurrence Tool such as TAPoR Find Co-occurrence Tool
  • A List Words tool such as the TAPoR List Words Tool
  • A Synonym Finding Tool such as Eva at POETS
wrench.gif This recipe needs a good exercise to demonstrate how to Exercise to Build a Social Network Map from a Text?

 

Steps

  1. Take an electronic text from a source such as Project Gutenberg known to contain relationships between entities ;
  2. Generate a list of words that indicate relationships using a tool such as WordNet ;(Can the exercise do this with something like Romeo and Juliet…determine who’s related to who, which side one is on?)
  3. Prepare text by removing any added infrastructure ;
  4. Use a List Words tool such as the TAPoR List Words Tool to get an idea of whether relation-indicating words occur in the text you are using;
  5. Use Concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. tool such as the TAPoR Find Words - Concordance Tool on the text to with the relationship words;
  6. Take the results and use the collocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary. tool to assemble a list of relationships…..
  7. Could word tagging be useful here to indicating actions and objects of action?

 

Discussion

  • Finding a Text

Possible sources for electronic texts are listed on the Electronic Texts Panel of TAPoR. When preparing text for analysis, you should be aware that academic infrastructure included in the text may obstruct reading the text for its original construction. It may be useful to remove notes and other materials added by subsequent authors from the original work. You can use tools such TAPoR Extract Text to remove added material.

  • WordNet Service

WordNet is one of many web services available which will provide word senses, synonyms, antonyms and other related words for terms that you enter. For more information see WordNet or Eva at POETS. In this case, to determine relationships involves distinguish between objects and people as well as between the parties relating to one another. To make this distinction…

 

Glossary

A Complete Glossary

 

Next Steps/Further Information

 

Comments

Create Textual Infrastructure using Text Analysis Tools

Introduction

This recipe uses text analysis tools to extract key words to create an index and table of contents from a body of text.

This recipe is available as a PDF download.

 

Ingredients

  • An electronic text to create infrastructure for
  • A CollocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary. tool such as the TAPoR Find Collocates Tool
  • A Concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. tool such as the TAPoR Find Words - Concordance Tool
  • A List Words tool such as the TAPoR List Words Tool
wrench.gif This recipe is applied to a sample text in Exercise to Create Textual Infrastructure using Textual Analysis Tools

 

Steps

  1. Prepare your electronic text for processing ;
  2. Generate a word list (sorted by frequency) using the TAPoR List Words Tool;
  3. Identify keywords for indexing;
  4. Explore keywords using TAPoR Find Words - Concordance Tool to clusters of related words;
  5. Group terms of associated relevance as they should appear logically in the index ;
  6. Identify collocated words using TAPoR Find Collocates Tool to determine usage patterns;
  7. Return to your word processing tool such as Microsoft Word and use the generated lists to search for and tagA tag, also called an element, is characteristically used within HTML and XML to apply characteristics (such as headings, paragraphs or user-defined categories) or metadata to a document, usually a text. HTML and XML tags generally appear in matching pairs of an opening and a closing tag, with text in between. All text within a tag pair is modified by that tag, and one tag pair may be nested inside another. In the case of HTML, tags are used to format a text directly, or as a delimiter for CSS formatting to the text within that tag. An HTML paragraph tag: < p >< /p > In the case of XML, tags may be also be used as a delimiter for CSS formatting to the text within that tag, but its primary purpose is to apply metadata to that text. Ex: < book format="hardcover" >< /book > Both HTML and XML tags may be modified with attribute/value pairs. In the above example, format="hardcover" is the attribute/value pair modifying the tag < book >. Return to Glossary. the words for automated creation of index and contents.

 

Discussion

  • Text Preparation

You can use tools such TAPoR Extract Text to remove added material.

  • Grouping Words for Index Inclusion

  1. How are terms related?
  2. Does a concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. help to associate words that should be logically associated - clustering?

 

Glossary

A Complete Glossary

 

Next Steps/Further Information

 

Comments

Create a Chronological Timeline from a Biographical Text

Introduction

This recipe takes a biographical text and uses tools such as Googlizer The Googlizer queries the Google search engine using a word or phrase you provide and returns the results of the Google search. For more information see this TAPoR tutorial. Return to Glossary. and Find Dates to provide framework for a chronological timeline.

This recipe and exercise are available as a PDF download.

 

Ingredients

wrench.gif This recipe is applied to a sample text in Exercise to Create a Chronological Timeline from a Biographical Text

 

Steps

  1. Take an existing biographical text of interest or get one from a source such as Project Gutenberg ;
  2. Augment the biographical information using an aggregation tool such as TAPoR Googlizer ;
  3. Generate a rough chronology using the TAPoR Date Finder Tool;
  4. Sort the results returned by date;
  5. Examine the results for duplicate information;
  6. Clean-up text in a text editor .

 

Discussion

  • Finding a Text

Possible sources for electronic texts are listed on the Electronic Texts Panel of TAPoR. When preparing text for analysis, you should be aware that academic infrastructure included in the text may obstruct reading the text for its original construction. It may be useful to remove notes and other materials added by subsequent authors from the original work. You can use tools such as the TAPoR Extract Text to remove added material.

  • Augmenting Text

The beauty of this process is in the ability to augment a standalone text with an aggregated collection of supporting textual matter. This aggregation can then be scanned to produce a rich timeline for further editing. A text aggregation tool allows for a shotgun approach to text acquisition and much of what may be trawled is irrelevant. However, there may be small nuggets of useful information acquired that a date finding tool will quickly pinpoint.

  • Building a Chronology

The first pass using a Date Finding tool will probably return redundant and possible erroneous information. However, it allows for easy classification and sorting of the results and thus very quickly pares down biographical or other event based narratives to quickly construct a chronological timeline.

 

Next Steps/Further Information

 

Comments

Create an Online Interactive Bibliography

Introduction

This is a recipe for building and maintaining an online bibliography with TAPoR. With a TAPoR account you can manage a bibliography that links to online articles, essays, web sites and so on. The bibliography will have only your "public" items visible and will organize them by your subject tags. It will allow others to link to the items or to analyze them using TAPoR accessible tools.

 

Ingredients

  • An collection of online papers and pages you want to include in your bibliography
  • An account on the Text Analysis Portal For Research (TAPoR Portal)

 

Steps

  1. Get a TAPoR account. Go to the TAPoR Portal and click the "Sign Up for Account Here" button. Fill out the form and you will be sent confirmation by e-mail.
  2. Log In to your account on the TAPoR Portal by entering your Username and Password in the upper-left hand corner;
  3. Add TAPoRize Bookmark to your Favorites bar of your browser. To do this go to the myTexts This is an area of the TAPoR in which you collect your private texts for analysis. It is also a portal to access publicly available texts which have been added by other users. In this area you can view the catalogue of texts available to you or add, edit, tag, and view the contents of specific texts. For more information see the TAPoR Tutorial. Return to Glossary. page of your account. Click on the Help button in the upper-left and then drag the TAPoRize One of the features of the TAPoR portal is that it gives you a bookmark with which to acquire texts into your myTexts Library quickly. You will find the TAPoRize bookmark on the myTexts page of your account under the Help link. Just drag it to your Bookmarks Bar or Favorities Bar. From then on you can acquire a text (add it to your myTexts) by clicking on the Bookmark when you are looking at the text. To try this out get an account on the TAPoR portal. Return to Glossary. link to your Favorites bar. 
  4. Add texts to your account. Using the TAPoRize Bookmark, or directly in the myTexts This is an area of the TAPoR in which you collect your private texts for analysis. It is also a portal to access publicly available texts which have been added by other users. In this area you can view the catalogue of texts available to you or add, edit, tag, and view the contents of specific texts. For more information see the TAPoR Tutorial. Return to Glossary. area of you account, you can add texts to your myText Library. This myText Library is what you will publish to share your bibliography.
  5. Edit your library in the myText page of your TAPoR account. Any item that you added can be edited or deleted. You can make texts public or keep them private (only public texts will be visible to others in your myLinks bibliogrphy.) You can add Tags or subject keywords to organize the items. You can view the texts and enter advanced bibliographic information.
  6. View your TAPoR myLinks page where all your public texts are visible for others organized by tags. In the myTexts This is an area of the TAPoR in which you collect your private texts for analysis. It is also a portal to access publicly available texts which have been added by other users. In this area you can view the catalogue of texts available to you or add, edit, tag, and view the contents of specific texts. For more information see the TAPoR Tutorial. Return to Glossary. page you can click the "My Links" button to edit and view your account bibliography. Only "public" texts will be visible organized by tags.
  7. Analyze a Text from your myLinks. Every text to which your myLinks bibliography links can be analyzed directly from the myLinks page. Click on the Analyze Text link next to the item you want to analyze and TAPoR will open a three-part window where you can call analytical tools on the page in question. This make the bibliography interactive in a way normal lists are not.

 

Discussion

The TAPoRize Bookmark allows you to browse the web and quickly add pages to your myTexts This is an area of the TAPoR in which you collect your private texts for analysis. It is also a portal to access publicly available texts which have been added by other users. In this area you can view the catalogue of texts available to you or add, edit, tag, and view the contents of specific texts. For more information see the TAPoR Tutorial. Return to Glossary. Library. When you are looking at a web page that you want to add to your library, choose TAPoRize and you will get a small window that lets you add the text. You may be asked to log in to your account, if you haven't already.

 

Glossary

  • TAPoRize One of the features of the TAPoR portal is that it gives you a bookmark with which to acquire texts into your myTexts Library quickly. You will find the TAPoRize bookmark on the myTexts page of your account under the Help link. Just drag it to your Bookmarks Bar or Favorities Bar. From then on you can acquire a text (add it to your myTexts) by clicking on the Bookmark when you are looking at the text. To try this out get an account on the TAPoR portal. Return to Glossary.
One of the features of the TAPoR portal is that it gives you a bookmark with which to acquire texts into your myTexts This is an area of the TAPoR in which you collect your private texts for analysis. It is also a portal to access publicly available texts which have been added by other users. In this area you can view the catalogue of texts available to you or add, edit, tag, and view the contents of specific texts. For more information see the TAPoR Tutorial. Return to Glossary. Library quickly. You will find the TAPoRize bookmark on the myTexts This is an area of the TAPoR in which you collect your private texts for analysis. It is also a portal to access publicly available texts which have been added by other users. In this area you can view the catalogue of texts available to you or add, edit, tag, and view the contents of specific texts. For more information see the TAPoR Tutorial. Return to Glossary. page of your account under the Help link. Just drag it to your Bookmarks Bar or Favorities Bar. From then on you can acquire a text (add it to your myTexts This is an area of the TAPoR in which you collect your private texts for analysis. It is also a portal to access publicly available texts which have been added by other users. In this area you can view the catalogue of texts available to you or add, edit, tag, and view the contents of specific texts. For more information see the TAPoR Tutorial. Return to Glossary.) by clicking on the Bookmark when you are looking at the text.

A Complete Glossary

 

Next Steps/Further Information

 

Comments

Generate Meta Information for a Website using Text Analysis Tools

Introduction

This recipe uses frequency lists and Googlizer The Googlizer queries the Google search engine using a word or phrase you provide and returns the results of the Google search. For more information see this TAPoR tutorial. Return to Glossary. to build meta tags for a web page or website.

This recipe and exercise are available as a PDF download.

 

Ingredients

wrench.gif This recipe is applied to a sample text in Create Generate Meta Information for a website using TA tools

 

Steps

  1. Log-in into TAPoR.
  2. Generate a list of most frequently used words on a web page or website with a List Words tool such as the TAPoR List Words Tool ;
  3. Collect a compilation of similar or competing websites using an aggregation tool such as TAPoR Googlizer ;
  4. Generate a list of most frequently used words on the aggregated web pages with a List Words tool such as the TAPoR List Words Tool;
  5. Save the frequency lists to the Databench The Databench is a temporary workspace where you can store your text analysis results in the TAPoR for further use. For more information see this TAPoR tutorial. Return to Glossary.;
  6. Search within <body> or <p> to restrict to area of interest;
  7. Create Meta Tags using a Text Editor;
  8. Add Meta tags to your existing web pages.

 

Discussion

  • Use of Key Words

If you find that your frequency lists do not contain as many keywords as you anticipated you may also want to consider re-writing some of your text to be more keyword rich because many search engines carry out a similar analysis to rank the relevance of your page. You may want to consider the lists generated by competing sites as well to ensure that you are highlighting your site appropriately. The ratio of key word density in web text is judged to be between 3-9% of full text for most search engines.

 

Next Steps/Further Information

 

Comments

Experimental Draft Recipes

Recipes in this section are incomplete or untested.  As to function, they may fit into any of the other categories.  Please feel free to critique them or to add new ones. 

Analyzing discourse around cultural phenomena

1.    Introduction
 
1.1.        What is it about?
 
This recipe will show you how to gather a quick and dirty corpus of discourse about a cultural phenomenon on the web and then analyze the corpus with a text analytical tool.
 
 
1.2.        What is it good for?
 
The analysis will enable you to isolate interesting textual phenomena in a discourse on a cultural phenomenon and use these to identify its major topics and concepts. Thereafter, you can formulate hypotheses about the discourse’s orientation which are supported by textual evidence. 
 
 
1.3.        What will it produce?
 
  • a wordlist from which we may gather a set of potentially interesting keywords
  • an understanding of the contextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph. Context is particularly important when generating a concordance for a string. Return to Glossary. of these keywords and the concepts with which they are most closely associated
  • hypotheses concerning the major topics in the discourse and how it conceptualizes the cultural phenomenon that is its object
 
 
2.    Ingredients
 
  • a cultural phenomenon attracting a significant level of interest in current debate
  • a number of topic related documents gathered from the web
  • your curiosity
 
 
3.    Appliances
  • an online web browser with a search engine (Google etc.) to collect discourse documents from the web
  • a text editor to past these documents into a single text file
  • a text analytical tool (such as CATMA, downloadable at www.catma.de)

 

 
4.    Steps
 
  • load text file and auto-generate concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary.
  • segment text file by inserting a tagA tag, also called an element, is characteristically used within HTML and XML to apply characteristics (such as headings, paragraphs or user-defined categories) or metadata to a document, usually a text. HTML and XML tags generally appear in matching pairs of an opening and a closing tag, with text in between. All text within a tag pair is modified by that tag, and one tag pair may be nested inside another. In the case of HTML, tags are used to format a text directly, or as a delimiter for CSS formatting to the text within that tag. An HTML paragraph tag: < p >< /p > In the case of XML, tags may be also be used as a delimiter for CSS formatting to the text within that tag, but its primary purpose is to apply metadata to that text. Ex: < book format="hardcover" >< /book > Both HTML and XML tags may be modified with attribute/value pairs. In the above example, format="hardcover" is the attribute/value pair modifying the tag < book >. Return to Glossary. (“source tag”)
  • analyse wordlist for potential markers (keywords) of topic focus ; if useful create keyword groups
  • define respective tags and tagA tag, also called an element, is characteristically used within HTML and XML to apply characteristics (such as headings, paragraphs or user-defined categories) or metadata to a document, usually a text. HTML and XML tags generally appear in matching pairs of an opening and a closing tag, with text in between. All text within a tag pair is modified by that tag, and one tag pair may be nested inside another. In the case of HTML, tags are used to format a text directly, or as a delimiter for CSS formatting to the text within that tag. An HTML paragraph tag: < p >< /p > In the case of XML, tags may be also be used as a delimiter for CSS formatting to the text within that tag, but its primary purpose is to apply metadata to that text. Ex: < book format="hardcover" >< /book > Both HTML and XML tags may be modified with attribute/value pairs. In the above example, format="hardcover" is the attribute/value pair modifying the tag < book >. Return to Glossary. text
  • do a distribution & collocate analysis 
  • interpret results
 
 
5.    Discussion
 
This recipe is generic and can be put to practice with a variety of text analysis tools, such as TACT, Voyeur, CATMA etc. (or a combination of these)
 
Our example analysis for collocates of the group of words tagged as referencing notions of nationality generated an unexpected result, namely a statistically significant relationsship with 'Love'. However, on validating this result we found that it was actually misleading and due to the fact that we had incliuded in the corpus a number of bibliographies and filmographies which cited the titel "From Russia with Love".

The example analysis thus serves to illsutarte how a badly constituted corpus can invalidate the results of textual analysis. In the group discussion of the examp,e it was also noted that the problem coul dbe easily eliminated in CATMA by tagging title citations in the corpus seperately and then excluding them form the colocation analysis. 
 
 
6.    Known limitations
 
-       Quality of corpus: is it representative enough to support generalizing interpretive conclusions on the discourse?
 
 
7.    Glossary
 
to follow
 
8.    Next steps
 
-       expand corpus
-       relate your findings to critical literature
 
 
 
Example: Analysis of contemporary cultural discourse on James Bond as exemplified in a random selection of 7 web pages
 
Question 1: “How is the character James Bond being characterized in the discourse sample?”
 
Question 2: “How is the relationship between Bond and nationality reflected in the discourse sample?”




-------------------

If you want to re-run the examples with CATMA, use the attached file JamesBond.txt and load it into CATMA.

Annotation Recipe

1.Load your file into an annotation tool (AT)
2.Define your interest (positive + ex negativo: „What am I interested in and what doesn‘t interest me?“)
3.Annotate your text using the AT
4.Sharpen and/or expand interest definition
5.Review annotation (assess / expand / terminate)
6.Next steps: Export annotation to other procedures
a.Subjective manual analysis, summary and interpretation of annotation
b.Dtto. in a collective mode
 

c.Post-processing using other DH tools 

Compare Two Texts

Introduction

This is a recipe to compare different text.

 

Ingredients

wrench.gif This recipe is applied to speeches by Obama and Wright in Now, Analyze That

 

Steps

  1. Make sure your texts are in the same format. This recipe assumes they are Plain TextPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files. Plain text files do not require a specialized program, such as a word processor, to read them. For more information, see the Wikipedia. Return to Glossary. Files, though there are comparable tools to handle HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers. HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this: < body >< p >< /p >< /body > In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body. Elements may also be modified by attributes and attribute values: < p class="hangingindent" > In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. and XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. .
  2. Use the Comparator to compare your two texts. The comparator will show you the vocabulary which is common sorted by the ratio of relative frequencyIn statistics, relative frequency describes the number of times an event occurs over the course of an experiment or study. Relative frequencies are commonly plotted onto histograms to provide a visual representation of the frequency distribution. Please see the Wikipedia entry for relative frequency for more information.. This means words common to both texts, but far more frequent in text 1 appear first down to those more frequent in text two.
  3. Try to see if there are themes to the words that appear more frequently in one text over the other. You can explore these themes by using the following tools:
    • The Concordance tool can be used to search for a word to see its contextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph. Context is particularly important when generating a concordance for a string. Return to Glossary.. You can compare KWIC concordancesA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary. from the different texts of the same word or phrase.
    • The Collocation tool will let you see what words collocate with the words you are exploring. If you compare the collocates for the same word from different texts you can get a sense of the differences in how the word is used.
    • The List Word Pairs tool will let you see frequently used phrases of two words in each of the texts. These again can be compared.

 

Discussion

  • One way to think of comparing is that you are trying to find the common themes or clusters of words with the TAPoRware Comparator and then you are follow the themes seperately through each text and compare how they play out. See Explore Themes within a Text recipe.
  • You can use the TAPOR Portal instead of the TAPoRware tools listed here if you want to track your results. The portal lets you also save aggregations of more than one text so you can have a text that combines the two you are comparing. Note, however, that some of the experimental tools in TAPoRware are not in the portal.

 

Glossary

  • KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary.
A Key Word In ContextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph. Context is particularly important when generating a concordance for a string. Return to Glossary. (or KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary.) is a display of results in which the word searched for, the keyword, is in the centre surrounded by one lineA line is the string of text limited by the width of a page. Lines are often used in tokenization, and may contain parts of one or more sentences. For example "The quick brown fox jumps over the lazy dog." is a complete sentence and occurs on one line. By contrast, "Hard by a great forest dwelt a poor wood-cutter with his wife and his two children. The boy was called Hansel and the girl Gretel. He had little to bite and to break, and once when great dearth fell on the land, he could no longer procure even daily bread." spans three sentences and four lines. Return to Glossary. of contextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph. Context is particularly important when generating a concordance for a string. Return to Glossary.. This is how concordancesA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary. are usually displayed. Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb:
I.1/577.1        | Four nights will quickly dream away the time; | And
I.1/578.2 Swift as a shadow, short as any dream; | Brief as the
II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander,
III.2/591.1 this derision | Shall seem a dream and fruitless vision, |
IV.1/593.1 as the fierce vexation of a dream. | But first I will
IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The
IV.1/594.2 rare | vision. I have had a dream, past the wit of man to
IV.1/594.2 the wit of man to | say what dream it was: man is but an
IV.1/594.2 he go | about to expound this dream. Methought I was--there
IV.1/594.2 his heart to report, what my dream | was. I will get Peter
IV.1/594.2 to write a ballad of | this dream: it shall be called
IV.1/594.2 it shall be called Bottom's dream, | because it hath no
V.1/599.1 | Following darkness like a dream, | Now are frolic: not a
V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not

  • CollocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary.
CollocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary. refers to the occurrence of words adjacently more often than would be expected by chance.

A Complete Glossary

 

Next Steps/Further Information

 

Comments

 

Find Patterns in a Text File Using Regular Expressions

 


Ingredients

  1. text file: e.g. Alice’s Adventures in Wonderland - http://www.gutenberg.org/files/11/11.txt< ...
  2. Mac OSX or Linux/Unix operating system or Cygwin on Windows (cygwin.com)
    1. Specifically the grepTo grep is to search a text for a string or regular expression pattern of characters. Return to Glossary. command

 

Steps

  1. Download a copy of the text file(s)
  2. Open a shell or terminal
  3. In general you will use commands that look like grepTo grep is to search a text for a string or regular expression pattern of characters. Return to Glossary. “PATTERN” filename.txt
    1. the PATTERNIn text analysis, a pattern is a string of characters (such as a word or phrase) or regular expression to be searched for within the source text. Return to Glossary. is the regular expressionA regular expression, sometimes called regex, is an advanced method of searching text using formal language, commonly employed by programming languages. The TAPoR toolset frequently refers to regular expressions as 'patterns'. Using regular expressions allows one to expand a search beyond a simple string of characters ('cat'). Instead, one may search for such instances as all words including 'cat' ('catalogue', 'concatenate'), or all words beginning with 'c' and ending in 't'. This method therefore allows one to search for a pattern within a text with a high degree of precision and flexibility. Please note that TAPoR also supports Unix style searching, a specific form of regular expression used by the Unix operating system. For more information, please see the Wikipedia entry for regular expressions. To learn regular expressions, please see the Open Directory's resource list. Return to Glossary.
  •  
    •  
    • Examples:
    • Using our example of Alice’s Adventures in Wonderland, here are some examples of searches you could execute using regular expressions:
    •  
    • To find every instance of the word “waistcoat” in the file the command would be:
    • grepTo grep is to search a text for a string or regular expression pattern of characters. Return to Glossary. “waistcoat” 11.txt
    •  
    • To find every two-letter word that ends in ‘t’ (such as it, at) the command would be:
    •     grepTo grep is to search a text for a string or regular expression pattern of characters. Return to Glossary. “ .t “ 11.txt
    •     The dot matches exactly one character
    •  
    • To find all of the words that end in ‘ed’ the command would be:
    •     grepTo grep is to search a text for a string or regular expression pattern of characters. Return to Glossary. “.*ed “ 11.txt
    • The dot matches exactly one character and the star says 0 or more copies of the preceding character
    •  
    • To find either the word ‘she’ or ‘the’ in the text, the command would be:
    •     grepTo grep is to search a text for a string or regular expression pattern of characters. Return to Glossary. “ [st]he “ 11.txt
    • The square brackets will match ‘he’ with either ‘s’ or ‘t’ as the first character. It will not match the word ‘he’
    •  
    • If you wanted to search more than one text file, or a whole corpus the command would be:
    •     grepTo grep is to search a text for a string or regular expression pattern of characters. Return to Glossary. -R



Resources:

The grepTo grep is to search a text for a string or regular expression pattern of characters. Return to Glossary. man page: on the terminal type “man grep” or online go to http://www.ss64.com/bash/grep.html

 

A good online tutorial is: A Tao of Regular Expressions http://jmason.org/software/sitescooper/tao_re ...

 

Mastering Regular Expressions (by Jeffrey Friedl) - the O’Reilly manual for regular expressions (2nd edition published 2002)



Discussion

Regular expressions can also be used in many different programming languages.

Things to look out for:

  • Every program that implements regular expressions has slightly different syntax.
  • Some patterns cannot be described with regular expressions. These often involve nested or paired elements.



Glossary

Regular ExpressionA regular expression, sometimes called regex, is an advanced method of searching text using formal language, commonly employed by programming languages. The TAPoR toolset frequently refers to regular expressions as 'patterns'. Using regular expressions allows one to expand a search beyond a simple string of characters ('cat'). Instead, one may search for such instances as all words including 'cat' ('catalogue', 'concatenate'), or all words beginning with 'c' and ending in 't'. This method therefore allows one to search for a pattern within a text with a high degree of precision and flexibility. Please note that TAPoR also supports Unix style searching, a specific form of regular expression used by the Unix operating system. For more information, please see the Wikipedia entry for regular expressions. To learn regular expressions, please see the Open Directory's resource list. Return to Glossary.:  a means for matchingstrings of text, such as particular characters, words, or patterns of characters. A regular expressionA regular expression, sometimes called regex, is an advanced method of searching text using formal language, commonly employed by programming languages. The TAPoR toolset frequently refers to regular expressions as 'patterns'. Using regular expressions allows one to expand a search beyond a simple string of characters ('cat'). Instead, one may search for such instances as all words including 'cat' ('catalogue', 'concatenate'), or all words beginning with 'c' and ending in 't'. This method therefore allows one to search for a pattern within a text with a high degree of precision and flexibility. Please note that TAPoR also supports Unix style searching, a specific form of regular expression used by the Unix operating system. For more information, please see the Wikipedia entry for regular expressions. To learn regular expressions, please see the Open Directory's resource list. Return to Glossary. is written in aformal language that can be interpreted by a regular expressionA regular expression, sometimes called regex, is an advanced method of searching text using formal language, commonly employed by programming languages. The TAPoR toolset frequently refers to regular expressions as 'patterns'. Using regular expressions allows one to expand a search beyond a simple string of characters ('cat'). Instead, one may search for such instances as all words including 'cat' ('catalogue', 'concatenate'), or all words beginning with 'c' and ending in 't'. This method therefore allows one to search for a pattern within a text with a high degree of precision and flexibility. Please note that TAPoR also supports Unix style searching, a specific form of regular expression used by the Unix operating system. For more information, please see the Wikipedia entry for regular expressions. To learn regular expressions, please see the Open Directory's resource list. Return to Glossary. processor, a program that either serves as aparser generator or examines text and identifies parts that match the providedspecification.

 

Finite State Machine: a mathematical abstraction sometimes used to designdigital logic orcomputer programs. It is a behavior model composed of a finite number ofstates, transitions between those states, and actions.  The operation of an FSM begins from one of the states (called a start state), goes through transitions depending on input to different states and can end in any of those available, however only a certain set of states mark a successful flow of operation (called accept states).


Next Steps / Further Information

  • There are interesting connections between regular expressions, formal languages, and finite state machines or automata.
  • Once you understand how to use regular expressions they can be used for searching and replacing in text editors.
  • Using the UNIX or LINUX pipeline ( | ) character to stringA string is a series of characters (symbols, letters or numbers) of finite length. Strings are used to generate a collocation, concordance, co-occurrence, or any other type of textual analysis in which locating a word fragment, word, phrase, sentence and so on is important. For more information, see the Wikipedia. Return to Glossary. together (in sequence) multiple commands allows manipulations that would be difficult without specialized tools.

How to annotate and present areas of interest in an image

Function:

 

Ingredients:

image file (see X for converting TIFF to JPG)

associated TEI/TILE-compliant text about the image

 

Appliances:

TILE

 

Steps:

1.            Open up the TEI/TILE-compliant file with a text editor.

2.            Add lineA line is the string of text limited by the width of a page. Lines are often used in tokenization, and may contain parts of one or more sentences. For example "The quick brown fox jumps over the lazy dog." is a complete sentence and occurs on one line. By contrast, "Hard by a great forest dwelt a poor wood-cutter with his wife and his two children. The boy was called Hansel and the girl Gretel. He had little to bite and to break, and once when great dearth fell on the land, he could no longer procure even daily bread." spans three sentences and four lines. Return to Glossary. elements for the text of each annotation. Follow this model:

            <pb facs=”#idforfirstimage” />

            <lb/>

<l>Something of Interest 1</l>

<lb/>

<l>Something of Interest 2</l>

 

3. Use importer script for bringing text commentary into TILE workspace. Use this model:

            http://<server>/TILE/importWidgets/impo ... for TEI XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. >&rname=text&rnum=0&ipath=<Full path to image folder>

 

 

 

4. Go to a page [Image/Map/etc.].

 

5. Select an annotation and mark its area(s) of interest using the shape tools provided.

 

6. Repeat 4-5 as necessary until all annotations on every page are complete.

 

 

 

7. Save your session (JSON data).

7b. For archival purposes, Export to XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary.

 

Examples:

Here’s a TEI file

<?xmlXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. version="1.0" encoding="UTF-8"?>

<?oxygen RNGSchema="..//tei_all.rng" type="xmlXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. "?>

<?xmlXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. -stylesheet type="text/css" href="tei-11-08-08.css"?>

<TEI xmlns="http://www.tei-c.org/ns/1.0">

    <teiHeader></teiHeader>

            <facsimile>

                        <surface xmlXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. :id="ham-1611-22277x-bod-c01-image001"><desc>Image 001</desc><graphic urlA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary.="promontorymoc.png"/></surface>

        <surface xmlXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. :id="ham-1611-22277x-bod-c01-image002"><desc>Image 002</desc><graphic urlA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary.="rockart.png"/></surface>

        <surface xmlXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. :id="ham-1611-22277x-bod-c01-image003"><desc>Image 003</desc><graphic urlA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary.="DeneYeniseianMap.png"/></surface>

            </facsimile>

            <text>

                        <pb facs="#ham-1611-22277x-bod-c01-image001" />

                        <lb/>

                        <l>Semi-circular puckered vamp</l>

                        <lb/>

                        <l>quill work</l>

                        <lb/>

                        <l>grass-lining</l>

                        <pb facs="#ham-1611-22277x-bod-c01-image002" />

                        <lb/>

                        <l>Funky Shoulders</l>

                        <lb/>

                        <l>Spear</l>

                        <pb facs="#ham-1611-22277x-bod-c01-image003" />

                        <lb/>

                        <l>Location of the Dene-Yeniseian Languages</l>

                        <lb/>

                        <l>Location of the Promontory Point Moccasins</l>

            </text>

</TEI>

 

 

 

After using TILE, your data may look like this (Notice the new zone tags where areas of interest have been drawn):

 

<?xmlXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. version="1.0" encoding="UTF-8"?>

<?oxygen RNGSchema="..//tei_all.rng" type="xmlXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. "?>

<?xmlXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. -stylesheet type="text/css" href="tei-11-08-08.css"?>

<TEI xmlns="http://www.tei-c.org/ns/1.0">

    <teiHeader/>

            <facsimile>

                        <surface xmlXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. :id="ham-1611-22277x-bod-c01-image001"><desc>Image 001</desc><graphic urlA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary.="promontorymoc.png"/><zone xmlns="http://www.w3.org/1999/xhtml" lry="8093.23741778644035" lrx="46468.6227394908201" uly="80" ulx="464" xmlXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. :id="1282164543676_0"></zone></surface>

        <surface xmlXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. :id="ham-1611-22277x-bod-c01-image002"><desc>Image 002</desc><graphic urlA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary.="rockart.png"/><zone xmlns="http://www.w3.org/1999/xhtml" lry="2737" lrx="110.2000122070312552" uly="27" ulx="110.20001220703125" xmlXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. :id="1282164644663_0"></zone><zone xmlns="http://www.w3.org/1999/xhtml" lry="2581" lrx="70.2000122070312535" uly="25" ulx="70.20001220703125" xmlXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. :id="1282164654271_0"></zone></surface>

        <surface xmlXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. :id="ham-1611-22277x-bod-c01-image003"><desc>Image 003</desc><graphic urlA URL (Uniform Resource Locator), sometimes called a web address, is used to locate and identify web content. For more information, see the Wikipedia. Return to Glossary.="DeneYeniseianMap.png"/><zone xmlns="http://www.w3.org/1999/xhtml" lry="8196" lrx="47.20001220703125134" uly="81" ulx="47.20001220703125" xmlXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. :id="1282164715057_3"></zone><zone xmlns="http://www.w3.org/1999/xhtml" lry="18524" lrx="109.2000122070312516" uly="185" ulx="109.20001220703125" xmlXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. :id="1282164719345_4"></zone><zone xmlns="http://www.w3.org/1999/xhtml" lry="20728" lrx="141.2000122070312543" uly="207" ulx="141.20001220703125" xmlXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. :id="1282164724041_5"></zone><zone xmlns="http://www.w3.org/1999/xhtml" lry="10379" lrx="546.200012207031257" uly="103" ulx="546.2000122070312" xmlXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. :id="1282164726809_6"></zone></surface>

            </facsimile>

            <text>

                        <pb facs="#ham-1611-22277x-bod-c01-image001"/>

                        <lb/>

                        <l>Semi-circular puckered vamp</l>

                        <lb/>

                        <l>quill work</l>

                        <lb facs="1282164543676_0"/>

                        <l>grass-lining</l>

                        <pb facs="#ham-1611-22277x-bod-c01-image002"/>

                        <lb facs="1282164644663_0"/>

                        <l>Funky Shoulders</l>

                        <lb facs="1282164654271_0"/>

                        <l>Spear</l>

                        <pb facs="#ham-1611-22277x-bod-c01-image003"/>

                        <lb facs="1282164726809_6"/>

                        <l>Location of the Dene-Yeniseian Languages</l>

                        <lb/>

                        <l>Location of the Promontory Point Moccasins</l>

            </text>

</TEI>

 

Discussion

 

The result is a JSON file that can be used to present your image(s) with annotations and associated areas using the TILE interface. You also have a saved XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. file for archival purposes.

 

Known Limitations

Using TIF or raw image data can slow down the interface considerably.

 

TEI/TILE compliant XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. files should be used within TILE. It uses certain elements that TILE needs, and which are specified in the TILE documentation under “Making your XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. TILE-Ready”

 

Resources

 

* TILE Documentation Page: http://mith.umd.edu/tile/documentation/

 

Next Steps

 

·       Using manuscript data in TILE

·       Making your own customized JSON file, using the structure written out in the TILE Documentation

·       Add text of annotations in TILE

Show graphically key facts about the distribution of a word in a text

Introduction:
This recipe performs three different tasks:

(1) plot the cumulative type/tokenTokens are strings of characters, such as word fragments, words, phrases or sentences, generated from a source text. In text analysis, tokens are useful for generating everything from word counts to statistical analysis to creating a concordance. For more information, see the Wikipedia. Return to Glossary. ratio in a text;
(2) track the occurrence of a particular word in a text and plot all occurrences of the word in a dispersion plot;
(3) show graphically the relative frequencies of the word across n equal sub-parts of the text and add to the plot chi-square and a dispersion measure (default is Juilland's D).

Ingredients:
Raw text file
The R programming/statistical package (base package)

User-specified input:
A search word
Number of parts (n) which the text file will be divided into (task 3)
Dispersion measure to use (task 3)

Steps:
Read into R a text file.
If necessary, clean/organize text.
Tokenize the text file into words.
Make one vector containing all the words of the text file in the order in which they occur in the original text.
Calculate the type/tokenTokens are strings of characters, such as word fragments, words, phrases or sentences, generated from a source text. In text analysis, tokens are useful for generating everything from word counts to statistical analysis to creating a concordance. For more information, see the Wikipedia. Return to Glossary. ratio incrementally for each position and plot it. Show the positions where a search word occurs in red. (task 1 above)

Identify the positions where a search word occurs in the vector.
Make a distribution plot to graphically show the positions of each occurrence of the searchword. (task 2 above).

Divide the vector of words into n equal sub-parts.
Make a barplot showing frequency of the search word within each sub-part.
Calculate frequency and percentages, chi-square and the selected dispersion measure indicating how even/uneven the dispersion of the search word is within a text. Add all these measures to the barplot. (task 3 above)

Discussion:
The recipe produces three ".png" plots.

For a critical overview of various dispersion measures, see Gries (2008):

Gries, Stefan Th. 2008. Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics 13(4). 403-437.
Available here:http://www.linguistics.ucsb.edu/faculty/stgries/research/Dispersion@IJCL....
Additional web resources (dispersion scripts) for this paper available here: http://www.linguistics.ucsb.edu/faculty/stgries/research/dispersion/link...

Gries, Stefan Th. 2009. Dispersions and adjusted frequencies in corpora: further explorations. In Stefan Th. Gries, Stefanie Wulff, & Mark Davies (eds.), Corpus linguistic applications: current studies, new directions, 197-212. Amsterdam: Rodopi.
Available here: http://www.linguistics.ucsb.edu/faculty/stgries/research/Dispersion_Rodo....


Glossary:
dispersion
vector (in R)

Next steps:


Use a Concordancing Tool to Learn Something About a Topic

This is a recipe to compare different text.


Ingredients

  1. A concordancing tool (e.g. Wordsmith, AntConc, Voyeur, etc.)
  2. A plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files. Plain text files do not require a specialized program, such as a word processor, to read them. For more information, see the Wikipedia. Return to Glossary. version of a text/documents
  3. Search term(s)

 

Steps

  1. Find a concordancing tool such as AntConc, MonoConc, Wordsmith, or use this one: TAPoR
  2. Locate a plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files. Plain text files do not require a specialized program, such as a word processor, to read them. For more information, see the Wikipedia. Return to Glossary. file of the documet(s) or corpus you would like to use. (See how to convert a text from XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. to Plain TextPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files. Plain text files do not require a specialized program, such as a word processor, to read them. For more information, see the Wikipedia. Return to Glossary. if necessary).  You can also use Mark Davies online BNC interface which allows you to apply a concordancing tool to the 100 million word British National Corpus.
  3. Download the text file(s) if necessary
  4. Upload your text/corpus to the concordancing tool
  5. Identify a search term or search terms (such as a character’s name, or a phrase)
  6. Identify the contextual parameters of the search term (how many characters or words on either side of the search, search within a sentence or across sentences etc.)
  7. Create a concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. of the search
  8. Tab separate the search term from the left and right contexts if possible
  9. Export to readable format (spread sheet)
  10. Begin to sort and/or annotate the concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. lines by adding comments in category columns in the spread sheet
  11. Analyze results


Discussion

Glossary

Concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. lineA line is the string of text limited by the width of a page. Lines are often used in tokenization, and may contain parts of one or more sentences. For example "The quick brown fox jumps over the lazy dog." is a complete sentence and occurs on one line. By contrast, "Hard by a great forest dwelt a poor wood-cutter with his wife and his two children. The boy was called Hansel and the girl Gretel. He had little to bite and to break, and once when great dearth fell on the land, he could no longer procure even daily bread." spans three sentences and four lines. Return to Glossary. -
ContextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph. Context is particularly important when generating a concordance for a string. Return to Glossary. parameter -  
Readable Format -
Window span -
XMLXML, or Extensible Markup Language, is a language used in web development to make a text readable by web browsers and/or store data. Like HTML, XML is primarily formed of paired elements. Unlike HTML, the elements are defined by the user, rather than predefined. For example, both < book >< /book > and < murfle >< /murfle > are valid element pairs. These elements apply characteristics and metadata to the text within them. One pair of elements may be nested inside another: < book >< title >< /title >< /book > Elements may also be modified by attributes and attribute values: < book format="hardcover" > In this case, the book element has the attribute 'format' and the attribute value 'hardcover'. In addition to storing metadata about the text, attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. -

Next Steps / Further Information

Where to get digitized versions of texts: Project Gutenberg

Case Study
We wish to explore the search term ‘witch(es)’ in contemporary British usage (spoken and written). Specifically we are interested in what type of objects are described as being possessed by witches in this group.

In this case we have chosen to use a site that provides both the corpus of contemporary British texts as well as a built-in concordancing tool  (Mark Davies’ online BNC ...).

We searched on the lemma WITCH (=witch, witches, witch’s, witches’) and chose 100 lines of the concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary., using the default settings of the interface.

We brought the 100 lines into a spreadsheet with the search word tab-separated from the left and right contexts.

We coded each concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. lineA line is the string of text limited by the width of a page. Lines are often used in tokenization, and may contain parts of one or more sentences. For example "The quick brown fox jumps over the lazy dog." is a complete sentence and occurs on one line. By contrast, "Hard by a great forest dwelt a poor wood-cutter with his wife and his two children. The boy was called Hansel and the girl Gretel. He had little to bite and to break, and once when great dearth fell on the land, he could no longer procure even daily bread." spans three sentences and four lines. Return to Glossary. for any noun possessed by the search word, e.g. ‘broomstick’ in “a witch’s broomstick”.
Results: Objects possessed by witches in our sample set includes: “all of their belongings”, broom, broomstick, cat, “cone of power”, coven, cow, cottage, hat, “microphone headsets & miniature televisions,” stew

 

Locate and Identify

A text may be full of valuable information, but sometimes something specific is sought.  Recipes in this section allow users to quickly serparate the wheat from the chaff and locate what they are seeking.

Add a French Language Text to TAPoR

Introduction

This recipe takes a French language text and adds it to the TAPoR workspace for textual analysis.

ALERT!

This recipe ensures that the fundamental task of loading text into a text analysis environment is accomplished correctly. For proper analysis, the text must be interpreted by the computer in the same way in which you enter it, including accented characters. There are a variety of ways in which text can be encoded by operating systems and applications during text entry and storage. This recipe will ensure that your text has been entered and encoded properly for analysis and that you can enter search terms and parameters from your browser to complete analytical tasks.

 

This recipe is available as a PDF download.

 

Ingredients

  • An electronic text in the French language
  • A List Words Tool such as TAPoR List Words Tool
  • A Concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. tool such as the TAPoR Find Words - Concordance Tool
  • A Text Editor capable of converting between character encodings
wrench.gif This recipe is applied to a sample text in Exercise to Add a French language Text to TAPoR

 

Steps

  1. Prepare Text in an external editor to ensure that it is encoded correctly ;
    or
  2. Confirm that the French language web page that you wish to use is encoded properly ;
  3. Log in to TAPoR;
  4. Add your encoded French language text file to MyTexts This is an area of the TAPoR in which you collect your private texts for analysis. It is also a portal to access publicly available texts which have been added by other users. In this area you can view the catalogue of texts available to you or add, edit, tag, and view the contents of specific texts. For more information see the TAPoR Tutorial. Return to Glossary. ;
  5. Generate a word list (sorted by frequency) using the TAPoR List Words Tool;
  6. Explore an accented word individually using TAPoR Find Words - Concordance Tool

 

Discussion

 

  • Text Editors

You may require a text editor to encode your text into UTF-8 or Latin-1 to maintain the accents and special characters in the textual language. On a Windows system, this can be done through NotePad and under Macintosh OSX through TextEdit. On Unix-based systems, you will find a text editor installed as part of the standard system install. Word processors typically provide a much deeper tool set for formatting text and generally save documents in their native format which is not appropriate for importing into a text analysis environment. However, they too can be used to save a plain textPlain text refers to a text without any additional formatting affecting its human readability, often found in .txt files. Plain text files do not require a specialized program, such as a word processor, to read them. For more information, see the Wikipedia. Return to Glossary. file with appropriate encoding by following the appropriate steps.

    • Instructions for saving as UTF-8 or Latin-1 using NotePad
    • Instructions for saving as UTF-8 or Latin-1 using TextEdit
    • Instructions for saving as UTF-8 or Latin-1 using MicrosoftWord
  • Web Page Encoding

To verify that the web page that you wish to import into TAPoR is encoded in either UTF-8 or Latin-1, you need to check the browser settings. In Internet Explorer, simply go to the View Menu and select the Encoding Option. This should read Unicode (UTF-8). On Firefox, the option is Character Encoding under the View menu. This should also read Unicode (UTF-8). If this is not the case, then you can manually select the encoding you wish to use from this menu. On other web browsers, the process should be similar. Please consult their help files for specific instructions on character encoding. If you view the page source for your web page, it may contain the HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers. HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this: < body >< p >< /p >< /body > In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body. Elements may also be modified by attributes and attribute values: < p class="hangingindent" > In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. lineA line is the string of text limited by the width of a page. Lines are often used in tokenization, and may contain parts of one or more sentences. For example "The quick brown fox jumps over the lazy dog." is a complete sentence and occurs on one line. By contrast, "Hard by a great forest dwelt a poor wood-cutter with his wife and his two children. The boy was called Hansel and the girl Gretel. He had little to bite and to break, and once when great dearth fell on the land, he could no longer procure even daily bread." spans three sentences and four lines. Return to Glossary.:
<meta http-equiv="Content-Type" content="text/htmlHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers. HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this: < body >< p >< /p >< /body > In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body. Elements may also be modified by attributes and attribute values: < p class="hangingindent" > In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. ; charset=utf-8" />or

"<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>"

Which will indicate that it is encoded properly for text analysis.

 

Glossary

  • Text Encoding One of the most important aspects of the text input process is the encoding of the text which you are working with. It must be encoded as either UTF8 or Latin-1, which provides proper mapping of accented and other extended characters. See the links below for more background information on encoding processes. For example, when properly encoded the character 'e' is differentiated from the character 'é' and 'é' is not seen as the character 'e' + some symbol. For more information on Text Encoding see Text Encoding at Wikipedia. Return to Glossary.
One of the most important aspects of the text input process is the encoding of the text which you are working with. It must be encoded as either UTF8(8-bit Unicode Character Encoding) Unicode character encoding is an evolution of the ASCII set to permit support of a greater number of alphanumeric characters including those with diacritical marks such as accents. More information on UTF-8 is available at Wikipedia. Return to Glossary. or Latin-1, which provides proper mapping of accented and other extended characters. See the links below for more background information on encoding processes. For example, when propoerly encoded the charcter 'e' is differentiated from the character 'é' and 'é' is not seen as the character 'e' + some symbol.
  • UTF-8 (8-bit Unicode Character Encoding)
Unicode character encoding is an evolution of the ASCII set to permit support of a greater number of alphanumeric characters including those with diacritical marks such as accents. More information on UTF-8 is available at:Wikipedia
  • Latin-1 (ISO 8859-1)
Latin-1 character encoding is an evolution of the ASCII character set to permit support of a greater number of alphanumeric characters including those with diacritical marks such as accents. It is being supplanted by the UTF-8 Character Encoding More information on Latin-1 is available at: Wikipedia
  • MyTexts This is an area of the TAPoR in which you collect your private texts for analysis. It is also a portal to access publicly available texts which have been added by other users. In this area you can view the catalogue of texts available to you or add, edit, tag, and view the contents of specific texts. For more information see the TAPoR Tutorial. Return to Glossary.
This is an area of the TAPoR in which you collect your private texts for analysis. It is also a portal to access publicly available texts which have been added by other users. In this area you can view the catalogue of texts available to you or add, edit, tagA tag, also called an element, is characteristically used within HTML and XML to apply characteristics (such as headings, paragraphs or user-defined categories) or metadata to a document, usually a text. HTML and XML tags generally appear in matching pairs of an opening and a closing tag, with text in between. All text within a tag pair is modified by that tag, and one tag pair may be nested inside another. In the case of HTML, tags are used to format a text directly, or as a delimiter for CSS formatting to the text within that tag. An HTML paragraph tag: < p >< /p > In the case of XML, tags may be also be used as a delimiter for CSS formatting to the text within that tag, but its primary purpose is to apply metadata to that text. Ex: < book format="hardcover" >< /book > Both HTML and XML tags may be modified with attribute/value pairs. In the above example, format="hardcover" is the attribute/value pair modifying the tag < book >. Return to Glossary., and view the contents of specific texts.

A Complete Glossary

 

Next Steps/Further Information

 

Comments

Aggregate Information from the Web to Explore a Particular Concept

Introduction

This recipe uses the Googlizer The Googlizer queries the Google search engine using a word or phrase you provide and returns the results of the Google search. For more information see this TAPoR tutorial. Return to Glossary., frequency lists, concordancesA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary. and collocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary. to efficiently explore information from the web for a particular topic.

This recipe is available as a PDF download.

 

Ingredients

  • A topic in which you are interested
  • A Thesaurus Tool such as Eva at POETS
  • An Aggregation Tool such as the TAPoR Googlizer
  • A CollocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary. tool such as the TAPoR Find Collocates Tool
  • A Concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. tool such as the TAPoR Find Words - Concordance Tool
  • A List Words tool such as the TAPoR List Words Tool
wrench.gif This recipe is applied in Exercise to Aggregate Information from the Web To Explore a Particular Concept.

 

Steps

  1. Identify your topic;
  2. Use a thesaurus tool such as Eva at POETS to find related words that may help to identify your topic;
  3. Login to the TAPoR portal; ;
  4. Input topic keywords into the TAPoR Googlizer to get a list of URLs and associated HTMLHTML, or Hypertext Markup Language, is a language used in web development to make a text readable by web browsers. HTML is primarily formed of paired elements, such as < body >< /body > or < p >< /p >, that apply some characteristic to the text within it. One pair of elements may be nested inside another like this: < body >< p >< /p >< /body > In this case, < body >< /body > marks the beginning and end of the body of the document, while < p >< /p > marks the beginning and end of a paragraph within the body. Elements may also be modified by attributes and attribute values: < p class="hangingindent" > In this case, the paragraph element has the attribute 'class' and the attribute value 'hangingindent'. Attribute/attribute value pairs are frequently used in combination with CSS to apply formatting to the text within the element. Return to Glossary. ;
  5. Compile the results into an Aggregated text;
  6. Generate a frequency list from the aggregated text using the TAPoR List Words Tool to find interesting words for further exploration;
  7. Choose some interesting words from the resulting list;
  8. Use those words one at a time in TAPoR Find Collocates Tool to explore words that appear together;
  9. Choose some interesting combinations of words from the resulting lists;
  10. Use TAPoR Find Words - Concordance Tool to find them in the aggregated text file or Google them as phrases in quotation marks.

 

Discussion

  • Why Login

When using an aggregation tool such as the Googlizer The Googlizer queries the Google search engine using a word or phrase you provide and returns the results of the Google search. For more information see this TAPoR tutorial. Return to Glossary., you must be able to save text to the Databench The Databench is a temporary workspace where you can store your text analysis results in the TAPoR for further use. For more information see this TAPoR tutorial. Return to Glossary. as part of the process. To make this possible you must be logged into the system to maintain your own personal workspace. If you require access to TAPoR please visit the TAPoR signup page.

 

Glossary

The Googlizer The Googlizer queries the Google search engine using a word or phrase you provide and returns the results of the Google search. For more information see this TAPoR tutorial. Return to Glossary. queries the Google search engine using a word or phrase you provide and returns the results of the Google search.

A Complete Glossary

 

Next Steps/Further Information

 

Comments

Explore A Text in a Foreign Language

Introduction

This recipe examines a text in a language in which you are not fluent and demonstrates a strategic approach to comprehension using text analysis tools.

 

Ingredients

  • An electronic text in a language foreign to you to explore and attempt to comprehend;
  • A Concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. tool such as the TAPoR Find Words - Concordance Tool
  • A List Words tool such as the TAPoR List Words Tool
wrench.gif This recipe needs a good exercise to demonstrate how to Explore a Text in a Foreign Language?

 

Steps

  1. Take an electronic text in a language which you are not fluent;
  2. Generate a word list (sorted by frequency) using the TAPoR List Words Tool;
  3. Consider the contexts of the particular words that appear in the resulting list using TAPoR Find Words - Concordance Tool;

 

Discussion

  • In this recipe we will demonstrate a very rudimentary, logical way in which you could explore comprehending a foreign language text by examining the contextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph. Context is particularly important when generating a concordance for a string. Return to Glossary. of word that you do not recognize. Lacking a dictionary, but by noting re-occurrence and adjacency, a limited comprehension of a text can be attempted.

 

Glossary

 

  • Stop List A stop list is a series of words that you may choose to exclude from a particular operation because you deem them to be irrelevant or obstructive to your analysis task. If you are searching for descriptive terms for example, you may choose to exclude function words normally occurring as part of everyday speech. Your interest may lie only in extraordinary words. Return to Glossary.
A Stop list A stop list is a series of words that you may choose to exclude from a particular operation because you deem them to be irrelevant or obstructive to your analysis task. If you are searching for descriptive terms for example, you may choose to exclude function words normally occurring as part of everyday speech. Your interest may lie only in extraordinary words. Return to Glossary. is a series of words that you may choose to exclude from a particular operation because you deem them to be irrelevant or obstructive to your analysis task. If you are searching for descriptive terms for example, you may choose to exclude function words normally occuring as part of everyday speech. Your interest may lie only in extraordinary words.

A Complete Glossary

 

Next Steps/Further Information

 

Comments

Identify Simple Themes within a Text

Introduction

This is a recipe to identify simple themes within a sample text.

This recipe and exercise is available as a PDF download.

 

Ingredients

  • An electronic text to explore
  • A Concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. tool such as the TAPoR Find Words - Concordance Tool
  • A CollocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary. tool such as the TAPoR Find Collocates Tool
  • A List Words tool such as the TAPoR List Words Tool
wrench.gif This recipe is applied to a sample text in Identifying Themes within a Text

 

Steps

  1. Find an electronic text at a source such as Project Gutenberg ;
  2. Prepare text by removing any added infrastructure ;
  3. Generate a word list (sorted by frequency) using the TAPoR List Words Tool;
  4. Examine list to see if anything unusual stands out;
  5. Refine word list by applying a stop list A stop list is a series of words that you may choose to exclude from a particular operation because you deem them to be irrelevant or obstructive to your analysis task. If you are searching for descriptive terms for example, you may choose to exclude function words normally occurring as part of everyday speech. Your interest may lie only in extraordinary words. Return to Glossary. ;
  6. Re-examine list for particular words you expect or don't expect to see ;
  7. Explore keywords using Find Words - Concordance Tool to find their contextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph. Context is particularly important when generating a concordance for a string. Return to Glossary.;
  8. Identify collocated words using Find Collocates Tool to determine usage patterns;

 

Discussion

  • Finding a Text

Possible sources for electronic texts are listed on the Electronic Texts Panel of TAPoR. When preparing text for analysis, you should be aware that academic infrastructure included in the text may obstruct reading the text for its original construction. It may be useful to remove notes and other materials added by subsequent authors from the original work. You can use tools such TAPoR Extract Text to remove added material.

  • Using a Word List

The word list can provide a first clue about the nature of the text. Questions which can be asked of the word list may include:

  1. What are the basic preoccupations of this text?
  2. What is unusual in the text?
  3. Are there any patterns in the tenses of words used?
  4. Given any expectations, are there words missing from the word list?

 

Glossary

  • Stop List A stop list is a series of words that you may choose to exclude from a particular operation because you deem them to be irrelevant or obstructive to your analysis task. If you are searching for descriptive terms for example, you may choose to exclude function words normally occurring as part of everyday speech. Your interest may lie only in extraordinary words. Return to Glossary.

A Stop list A stop list is a series of words that you may choose to exclude from a particular operation because you deem them to be irrelevant or obstructive to your analysis task. If you are searching for descriptive terms for example, you may choose to exclude function words normally occurring as part of everyday speech. Your interest may lie only in extraordinary words. Return to Glossary. is a series of words that you may choose to exclude from a particular operation because you deem them to be irrelevant or obstructive to your analysis task. If you are searching for descriptive terms for example, you may choose to exclude function words normally occuring as part of everyday speech. Your interest may lie only in extraordinary words.

A Complete Glossary

 

Next Steps/Further Information

 

Comments

Identify Syntactic Dependencies Within a Text

Introduction

This recipe uses tools such as CollocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary. and Co-OccurenceCo-occurrence is the number of times two patterns occur in a set order within a set distance of one another in a source text. For more information, see the Wikipedia. Return to Glossary. to explore the syntactic dependencies within the textual construction

 

Ingredients

  • A text with known dependencies
  • A CollocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary. tool such as the TAPoR Find Collocates Tool
  • A Co-occurrence tool such as the TAPoR Find Co-occurrence Tool
wrench.gif This recipe needs a good exercise to demonstrate how to Identify Syntactic Dependencies within a Text?

 

Steps

  1. Take an electronic text from a source such as Project Gutenberg or provide one of your own ;
  2. Use the TAPoR Find Collocates Tool using one of your target words as the input;
  3. Examine list for words whose mutual occurrence you are interest in;
  4. Use the TAPoR Find Co-occurrence Tool with various word pairs indcated by the Collocates search to explore the relationship between words.

 

Discussion

  • Finding a Text

Sources for electronic texts are listed on the Electronic Texts Panel of TAPoR. When preparing text for analysis, you should be aware that… what improves… what can obstruct?.

 

Glossary

 

A Complete Glossary

 

Next Steps/Further Information

Tutorial

Recipes in this section teach the use of basic tools.  Mastering the use of these tools will greatly increase users' abilities to make use of the methods in the other sections, as well as being useful in and of themselves.

Build a Simple Concordance

Introduction

This is a recipe to build a simple concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. from a text

 

Ingredients

  • An electronic text to explore
  • A Concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. tool such as the TAPoR Find Words - Concordance Tool
wrench.gif This recipe is applied to a sample text in Build a Simple Concordance

 

Steps

  1. Get an electronic text from a source such as Project Gutenberg ;
  2. Concord keywords using Find Words - Concordance Tool;
  3. Explore the resulting concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. to appreciate the keyword in its contextIn text analysis, context refers to the text surrounding a string of characters, which may be as short as a word or as long as a paragraph. Context is particularly important when generating a concordance for a string. Return to Glossary.;
  4. Export the concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. to an external editor.

 

Discussion

  • Finding a Text

Possible sources for electronic texts are listed on the Electronic Texts Panel of TAPoR. When preparing text for analysis, you should be aware that academic infrastructure included in the text may obstruct reading the text for its original construction. It may be useful to remove notes and other materials added by subsequent authors from the original work. You can use tools such TAPoR Extract Text to remove added material.

 

Glossary

A Complete Glossary

 

Next Steps/Further Information

 

Comments

Find Collocated Words

Introduction

This is a recipe to find collocated words for a key word

 

Ingredients

  • An electronic text to explore
  • A CollocationCollocation refers to the occurrence of words adjacently more often than would be expected by chance. Collocation is the relationship between two words or groups of words that often go together and form a common expression. If the expression is heard often, the words become 'glued' together in our minds. 'Crystal clear', 'middle management' 'nuclear family' and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'. Return to Glossary. tool such as the TAPoR Find Collocates Tool
wrench.gif This recipe is applied to a sample text in Find Collocated Words

 

Steps

  1. Get an electronic text from a source such as Project Gutenberg ;
  2. Choose a keyword you are interested in;
  3. Find collocated words using the TAPoR Find Collocates Tool;
  4. Explore the results for interesting combonations of words.

 

Discussion

  • Finding a Text

Possible sources for electronic texts are listed on the Electronic Texts Panel of TAPoR. When preparing text for analysis, you should be aware that academic infrastructure included in the text may obstruct reading the text for its original construction. It may be useful to remove notes and other materials added by subsequent authors from the original work. You can use tools such TAPoR Extract Text to remove added material.

 

Glossary

A Complete Glossary

 

Next Steps/Further Information

 

Comments

Listing Words to Find Themes

Introduction

This is a recipe to list words to suggest themes in a text

 

Ingredients

wrench.gif This recipe is applied to a sample text in List Words to Identify Themes

 

Steps

  1. Get an electronic text from a source such as Project Gutenberg ; 1 Generate a word list (sorted by frequency) using the TAPoR List Words Tool;
  2. Examine list to see if anything unusual stands out;
  3. Refine word list by applying a stop list A stop list is a series of words that you may choose to exclude from a particular operation because you deem them to be irrelevant or obstructive to your analysis task. If you are searching for descriptive terms for example, you may choose to exclude function words normally occurring as part of everyday speech. Your interest may lie only in extraordinary words. Return to Glossary. ;
  4. Re-examine list for particular words you expect or don't expect to see ;

 

Discussion

  • Finding a Text

Possible sources for electronic texts are listed on the Electronic Texts Panel of TAPoR. When preparing text for analysis, you should be aware that academic infrastructure included in the text may obstruct reading the text for its original construction. It may be useful to remove notes and other materials added by subsequent authors from the original work. You can use tools such TAPoR Extract Text to remove added material.

  • Using a Word List

The word list can provide a first clue about the nature of the text. Questions which can be asked of the word list may include:

  1. What are the basic preoccupations of this text?
  2. What is unusual in the text?
  3. Are there any patterns in the tenses of words used?
  4. Given any expectations, are there words missing from the word list?

 

Glossary

  • Stop List A stop list is a series of words that you may choose to exclude from a particular operation because you deem them to be irrelevant or obstructive to your analysis task. If you are searching for descriptive terms for example, you may choose to exclude function words normally occurring as part of everyday speech. Your interest may lie only in extraordinary words. Return to Glossary.
A Stop list A stop list is a series of words that you may choose to exclude from a particular operation because you deem them to be irrelevant or obstructive to your analysis task. If you are searching for descriptive terms for example, you may choose to exclude function words normally occurring as part of everyday speech. Your interest may lie only in extraordinary words. Return to Glossary. is a series of words that you may choose to exclude from a particular operation because you deem them to be irrelevant or obstructive to your analysis task. If you are searching for descriptive terms for example, you may choose to exclude function words normally occuring as part of everyday speech. Your interest may lie only in extraordinary words.

A Complete Glossary

 

Next Steps/Further Information

 

Comments