In tokenizationTokenization is the act of generating tokens, such as word fragments, words, phrases or sentences, from a source text based on a delimiter.
In text analysis, tokenization enables the generation of everything from word counts to statistical analysis to creating a concordance.
For more information, see the Wikipedia.
Return to Glossary., a separatorIn tokenization, a separator is the character or string in characters delineating where one token ends and the next begins.
When tokenizing, the separator may be retained at the beginning or end of the token, or stripped from it entirely.
Return to Glossary. is the character or stringA string is a series of characters (symbols, letters or numbers) of finite length.
Strings are used to generate a collocation, concordance, co-occurrence, or any other type of textual analysis in which locating a word fragment, word, phrase, sentence and so on is important.
For more information, see the Wikipedia.
Return to Glossary. in characters delineating where one tokenTokens are strings of characters, such as word fragments, words, phrases or sentences, generated from a source text.
In text analysis, tokens are useful for generating everything from word counts to statistical analysis to creating a concordance.
For more information, see the Wikipedia.
Return to Glossary. ends and the next begins.
When tokenizing, the separatorIn tokenization, a separator is the character or string in characters delineating where one token ends and the next begins.
When tokenizing, the separator may be retained at the beginning or end of the token, or stripped from it entirely.
Return to Glossary. may be retained at the beginning or end of the tokenTokens are strings of characters, such as word fragments, words, phrases or sentences, generated from a source text.
In text analysis, tokens are useful for generating everything from word counts to statistical analysis to creating a concordance.
For more information, see the Wikipedia.
Return to Glossary., or stripped from it entirely.