i6_core.lm.srilm
¶
- class i6_core.lm.srilm.ComputeBestMixJob(*args, **kwargs)¶
Compute the best mixture weights for a combination of count LMs based on the given PPL logs
- Parameters:
ppl_logs – List of PPL Logs to compute the weights from
compute_best_mix_exe – Path to srilm compute_best_mix executable
- run()¶
Call the srilm script and extracts the different weights from the log, then relinks log to output folder
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]
- class i6_core.lm.srilm.ComputeNgramLmJob(*args, **kwargs)¶
Generate count based LM with SRILM
- Parameters:
ngram_order – Maximum n gram order
data – Either text file or counts file to read from, set data mode accordingly the counts file can come from the CountNgramsJob.out_counts
data_mode – Defines whether input format is text based or count based
vocab – Vocabulary file, one word per line
extra_ngram_args – Extra arguments for the execution call e.g. [‘-kndiscount’]
count_exe – Path to srilm ngram-count exe
mem_rqmt – Memory requirements of Job (not hashed)
time_rqmt – Time requirements of Job (not hashed)
cpu_rqmt – Amount of Cpus required for Job (not hashed)
fs_rqmt – Space on fileserver required for Job, example: “200G” (not hashed)
Example options for ngram_args: -kndiscount -interpolate -debug <int> -addsmooth <int>
- compress()¶
executes the previously created compression script and relinks the lm from work folder to output folder
- create_files()¶
creates bash script for lm creation and compression that will be executed in the run Task
- classmethod hash(kwargs)¶
delete the queue requirements from the hashing
- run()¶
executes the previously created lm script and relinks the vocabulary from work folder to output folder
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]
- class i6_core.lm.srilm.ComputeNgramLmPerplexityJob(*args, **kwargs)¶
Calculate the Perplexity of a Ngram LM via SRILM
- Parameters:
ngram_order – Maximum n gram order
lm – LM to evaluate
eval_data – Data to calculate PPL on
vocab – Vocabulary file
set_unknown_flag – sets unknown lemma
extra_ppl_args – Extra arguments for the execution call e.g. ‘-debug 2’
ngram_exe – Path to srilm ngram exe
mem_rqmt – Memory requirements of Job (not hashed)
time_rqmt – Time requirements of Job (not hashed)
cpu_rqmt – Amount of Cpus required for Job (not hashed)
fs_rqmt – Space on fileserver required for Job, example: “200G” (not hashed)
- create_files()¶
creates bash script that will be executed in the run Task
- get_ppl()¶
extracts various outputs from the ppl.log file
- classmethod hash(kwargs)¶
delete the queue requirements from the hashing
- run()¶
executes the previously created script and relinks the log file from work folder to output folder
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]
- class i6_core.lm.srilm.CountNgramsJob(*args, **kwargs)¶
Count ngrams with SRILM
- Parameters:
ngram_order – Maximum n gram order
data – Input data to be read as textfile
extra_count_args – Extra arguments for the execution call e.g. [‘-unk’]
count_exe – Path to srilm ngram-count executable
mem_rqmt – Memory requirements of Job (not hashed)
time_rqmt – Time requirements of Job (not hashed)
cpu_rqmt – Amount of Cpus required for Job (not hashed)
fs_rqmt – Space on fileserver required for Job, example: “200G” (not hashed)
Example options/parameters for count_args: -unk
- create_files()¶
creates bash script that will be executed in the run Task
- classmethod hash(kwargs)¶
delete the queue requirements from the hashing
- run()¶
executes the previously created bash script and relinks outputs from work folder to output folder
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]
- class i6_core.lm.srilm.InterpolateNgramLmJob(*args, **kwargs)¶
Uses SRILM to interpolate different LMs with previously calculated weights
- Parameters:
ngram_lms – List of language models to interpolate, format: ARPA, compressed ARPA
weights – Weights of different language models, has to be same order as ngram_lms
ngram_order – Maximum n gram order
extra_interpolation_args – Additional arguments for interpolation
ngram_exe – Path to srilm ngram executable
mem_rqmt – Memory requirements of Job (not hashed)
time_rqmt – Time requirements of Job (not hashed)
cpu_rqmt – Amount of Cpus required for Job (not hashed)
fs_rqmt – Space on fileserver required for Job, example: “200G” (not hashed)
- classmethod hash(parsed_args)¶
delete the queue requirements from the hashing
- run()¶
delete the executable from the hashing
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]
- class i6_core.lm.srilm.PruneLMWithHelperLMJob(*args, **kwargs)¶
Job that prunes the given LM with the help of a helper LM
- Parameters:
ngram_order – Maximum n gram order
lm – LM to be pruned
prune_thresh – Pruning threshold
helper_lm – helper/’Katz’ LM to prune the other LM with
ngram_exe – Path to srilm ngram-count executable
mem_rqmt – Memory requirements of Job (not hashed)
time_rqmt – Time requirements of Job (not hashed)
cpu_rqmt – Amount of Cpus required for Job (not hashed)
fs_rqmt – Space on fileserver required for Job, example: “200G” (not hashed)
- create_files()¶
creates bash script that will be executed in the run Task
- classmethod hash(kwargs)¶
delete the queue requirements from the hashing
- run()¶
executes the previously created script and relinks the lm from work folder to output folder
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]