i6_core.text.label.subword_nmt.apply

class i6_core.text.label.subword_nmt.apply.ApplyBPEModelToLexiconJob(*args, **kwargs)

Apply BPE codes to a Bliss lexicon file

Parameters:
  • bliss_lexicon (Path) –

  • bpe_codes (Path) –

  • bpe_vocab (Path|None) –

  • subword_nmt_repo (Optional[Path]) –

run()
tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]

class i6_core.text.label.subword_nmt.apply.ApplyBPEToTextJob(*args, **kwargs)

Apply BPE codes on a text file

Parameters:
  • text_file – words text file to convert to bpe

  • bpe_codes – bpe codes file, e.g. ReturnnTrainBpeJob.out_bpe_codes

  • bpe_vocab – if provided, then merge operations that produce OOV are reverted, use e.g. ReturnnTrainBpeJob.out_bpe_dummy_count_vocab

  • subword_nmt_repo – subword nmt repository path. see also CloneGitRepositoryJob

  • gzip_output – use gzip on the output text

  • mini_task – if the Job should run locally, e.g. only a small (<1M lines) text should be processed

classmethod hash(parsed_args)
Parameters:

parsed_args (dict[str]) –

Returns:

hash for job given the arguments

Return type:

str

run()
tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]