i6_core.corpus.stats

class i6_core.corpus.stats.CountCorpusWordFrequenciesJob(*args, **kwargs)

Extracts a list of words and their counts in the provided bliss corpus

Parameters:

bliss_corpus (Path) – path to corpus file

run()
tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]

class i6_core.corpus.stats.ExtractOovWordsFromCorpusJob(*args, **kwargs)

Extracts the out of vocabulary words based on a given corpus and lexicon

Parameters:
  • bliss_corpus (Union[Path, str]) – path to corpus file

  • bliss_lexicon (Union[Path, str]) – path to lexicon

  • casing (str) – changes the casing of the orthography (options: upper, lower, none) str.upper() is problematic for german since ß -> SS https://bugs.python.org/issue34928

run()
tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]