`i6_core.lm.vocabulary`¶

class i6_core.lm.vocabulary.LmIndexVocabulary(vocab: sisyphus.job_path.Path, vocab_size: sisyphus.job_path.Variable, unknown_token: Union[sisyphus.job_path.Variable, str])¶

unknown_token: Union[Variable, str]¶

vocab: Path¶

vocab_size: Variable¶

class i6_core.lm.vocabulary.LmIndexVocabularyFromLexiconJob(*args, **kwargs)¶

Computes a <word>: <index> vocabulary file from a bliss lexicon for Word-Level LM training

Sentence begin/end will point to index 0, unknown to index 1. Both are taking directly from the lexicon via the “special” marking:

<lemma special=”sentence-begin”> -> index 0

<lemma special=”sentence-end”> -> index 0

<lemma special=”unknown”> -> index 1

If <synt> tokens are provided in a lemma, they will be used instead of <orth>

CAUTION: Be aware of: https://github.com/rwth-i6/returnn/issues/1245 when using Returnn’s LmDataset

Parameters:

bliss_lexicon – us the lemmas from the lexicon to define the indices
count_ordering_text – optional text that can be used to define the index order based on the lemma count

run()¶

tasks()¶

Returns:: yields Task’s
Return type:: list[sisyphus.task.Task]

class i6_core.lm.vocabulary.VocabularyFromLmJob(*args, **kwargs)¶

Extract the vocabulary from an existing LM. Currently supports only arpa files for input.

Parameters:: lm_file (Path) – path to the lm arpa file

run()¶

tasks()¶

Returns:: yields Task’s
Return type:: list[sisyphus.task.Task]

`i6_core.lm.vocabulary`¶

i6_core

Navigation

Related Topics

i6_core.lm.vocabulary¶

`i6_core.lm.vocabulary`¶