i6_core.bpe.apply

This is an old location of bpe jobs kept for backwards compatibility, for new setups using the subword-nmt based BPE, please use i6_core.label.bpe, for other setups please switch to the sentencepiece implementation

class i6_core.bpe.apply.ApplyBPEModelToLexiconJob(*args, **kwargs)

Apply BPE codes to a Bliss lexicon file

Parameters:
  • bliss_lexicon (Path) –

  • bpe_codes (Path) –

  • bpe_vocab (Path|None) –

  • subword_nmt_repo (Optional[Path]) –

class i6_core.bpe.apply.ApplyBPEToTextJob(*args, **kwargs)

Apply BPE codes on a text file

Parameters:
  • text_file – words text file to convert to bpe

  • bpe_codes – bpe codes file, e.g. ReturnnTrainBpeJob.out_bpe_codes

  • bpe_vocab – if provided, then merge operations that produce OOV are reverted, use e.g. ReturnnTrainBpeJob.out_bpe_dummy_count_vocab

  • subword_nmt_repo – subword nmt repository path. see also CloneGitRepositoryJob

  • gzip_output – use gzip on the output text

  • mini_task – if the Job should run locally, e.g. only a small (<1M lines) text should be processed