i6_core.corpus.stats
¶
- class i6_core.corpus.stats.CountCorpusWordFrequenciesJob(*args, **kwargs)¶
Extracts a list of words and their counts in the provided bliss corpus
- Parameters:
bliss_corpus (Path) – path to corpus file
- run()¶
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]
- class i6_core.corpus.stats.ExtractOovWordsFromCorpusJob(*args, **kwargs)¶
Extracts the out of vocabulary words based on a given corpus and lexicon
- Parameters:
bliss_corpus (Union[Path, str]) – path to corpus file
bliss_lexicon (Union[Path, str]) – path to lexicon
casing (str) – changes the casing of the orthography (options: upper, lower, none) str.upper() is problematic for german since ß -> SS https://bugs.python.org/issue34928
- run()¶
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]