i6_core.lm.srilm

class i6_core.lm.srilm.ComputeBestMixJob(*args, **kwargs)

Compute the best mixture weights for a combination of count LMs based on the given PPL logs

Parameters:
  • ppl_logs – List of PPL Logs to compute the weights from

  • compute_best_mix_exe – Path to srilm compute_best_mix executable

run()

Call the srilm script and extracts the different weights from the log, then relinks log to output folder

tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]

class i6_core.lm.srilm.ComputeNgramLmJob(*args, **kwargs)

Generate count based LM with SRILM

Parameters:
  • ngram_order – Maximum n gram order

  • data – Either text file or counts file to read from, set data mode accordingly the counts file can come from the CountNgramsJob.out_counts

  • data_mode – Defines whether input format is text based or count based

  • vocab – Vocabulary file, one word per line

  • extra_ngram_args – Extra arguments for the execution call e.g. [‘-kndiscount’]

  • count_exe – Path to srilm ngram-count exe

  • mem_rqmt – Memory requirements of Job (not hashed)

  • time_rqmt – Time requirements of Job (not hashed)

  • cpu_rqmt – Amount of Cpus required for Job (not hashed)

  • fs_rqmt – Space on fileserver required for Job, example: “200G” (not hashed)

Example options for ngram_args: -kndiscount -interpolate -debug <int> -addsmooth <int>

class DataMode(value)

An enumeration.

COUNT = 2
TEXT = 1
compress()

executes the previously created compression script and relinks the lm from work folder to output folder

create_files()

creates bash script for lm creation and compression that will be executed in the run Task

classmethod hash(kwargs)

delete the queue requirements from the hashing

run()

executes the previously created lm script and relinks the vocabulary from work folder to output folder

tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]

class i6_core.lm.srilm.ComputeNgramLmPerplexityJob(*args, **kwargs)

Calculate the Perplexity of a Ngram LM via SRILM

Parameters:
  • ngram_order – Maximum n gram order

  • lm – LM to evaluate

  • eval_data – Data to calculate PPL on

  • vocab – Vocabulary file

  • set_unknown_flag – sets unknown lemma

  • extra_ppl_args – Extra arguments for the execution call e.g. ‘-debug 2’

  • ngram_exe – Path to srilm ngram exe

  • mem_rqmt – Memory requirements of Job (not hashed)

  • time_rqmt – Time requirements of Job (not hashed)

  • cpu_rqmt – Amount of Cpus required for Job (not hashed)

  • fs_rqmt – Space on fileserver required for Job, example: “200G” (not hashed)

create_files()

creates bash script that will be executed in the run Task

get_ppl()

extracts various outputs from the ppl.log file

classmethod hash(kwargs)

delete the queue requirements from the hashing

run()

executes the previously created script and relinks the log file from work folder to output folder

tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]

class i6_core.lm.srilm.CountNgramsJob(*args, **kwargs)

Count ngrams with SRILM

Parameters:
  • ngram_order – Maximum n gram order

  • data – Input data to be read as textfile

  • extra_count_args – Extra arguments for the execution call e.g. [‘-unk’]

  • count_exe – Path to srilm ngram-count executable

  • mem_rqmt – Memory requirements of Job (not hashed)

  • time_rqmt – Time requirements of Job (not hashed)

  • cpu_rqmt – Amount of Cpus required for Job (not hashed)

  • fs_rqmt – Space on fileserver required for Job, example: “200G” (not hashed)

Example options/parameters for count_args: -unk

create_files()

creates bash script that will be executed in the run Task

classmethod hash(kwargs)

delete the queue requirements from the hashing

run()

executes the previously created bash script and relinks outputs from work folder to output folder

tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]

class i6_core.lm.srilm.InterpolateNgramLmJob(*args, **kwargs)

Uses SRILM to interpolate different LMs with previously calculated weights

Parameters:
  • ngram_lms – List of language models to interpolate, format: ARPA, compressed ARPA

  • weights – Weights of different language models, has to be same order as ngram_lms

  • ngram_order – Maximum n gram order

  • extra_interpolation_args – Additional arguments for interpolation

  • ngram_exe – Path to srilm ngram executable

  • mem_rqmt – Memory requirements of Job (not hashed)

  • time_rqmt – Time requirements of Job (not hashed)

  • cpu_rqmt – Amount of Cpus required for Job (not hashed)

  • fs_rqmt – Space on fileserver required for Job, example: “200G” (not hashed)

classmethod hash(parsed_args)

delete the queue requirements from the hashing

run()

delete the executable from the hashing

tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]

class i6_core.lm.srilm.PruneLMWithHelperLMJob(*args, **kwargs)

Job that prunes the given LM with the help of a helper LM

Parameters:
  • ngram_order – Maximum n gram order

  • lm – LM to be pruned

  • prune_thresh – Pruning threshold

  • helper_lm – helper/’Katz’ LM to prune the other LM with

  • ngram_exe – Path to srilm ngram-count executable

  • mem_rqmt – Memory requirements of Job (not hashed)

  • time_rqmt – Time requirements of Job (not hashed)

  • cpu_rqmt – Amount of Cpus required for Job (not hashed)

  • fs_rqmt – Space on fileserver required for Job, example: “200G” (not hashed)

create_files()

creates bash script that will be executed in the run Task

classmethod hash(kwargs)

delete the queue requirements from the hashing

run()

executes the previously created script and relinks the lm from work folder to output folder

tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]