i6_core.corpus.transform
¶
- class i6_core.corpus.transform.AddCacheToCorpusJob(*args, **kwargs)¶
Adds cache manager call to all audio paths in a corpus file :param Path bliss_corpus: bliss corpora file path
- run()¶
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]
- class i6_core.corpus.transform.ApplyLexiconToCorpusJob(*args, **kwargs)¶
Use a bliss lexicon to convert all words in a bliss corpus into their phoneme representation.
Currently only supports picking the first phoneme.
- Parameters:
bliss_corpus (Path) – path to a bliss corpus xml
bliss_lexicon (Path) – path to a bliss lexicon file
word_separation_orth (str|None) – a default word separation lemma orth. The corresponding phoneme (or phonemes in some special cases) are inserted between each word. Usually it makes sense to use something like “[SILENCE]” or “[space]” or so).
strategy (LexiconStrategy) – strategy to determine which representation is selected
- run()¶
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]
- class i6_core.corpus.transform.CompressCorpusJob(*args, **kwargs)¶
Compresses a corpus by concatenating audio files and using a compression codec. Does currently not support corpora with subcorpora, files need to be .wav :param Path bliss_corpus: path to an xml corpus file with wave recordings :param str format: supported file formats, currently limited to mp3 :param str bitrate: bitrate as string, e.g. ‘32k’ or ‘192k’, can also be an integer e.g. 192000 :param int max_num_splits: maximum number of resulting audio files.
- add_duration_to_recordings(c)¶
open each recording, extract the duration and add the duration to the recording object # TODO: this is a lengthy operation, but so far there was no alternative… :param corpus.Corpus c: :return:
- info()¶
read the log.run file to extract the current status of the compression job :return:
- run()¶
- run_ffmpeg(ffmpeg_inputs, output_path)¶
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]
- class i6_core.corpus.transform.MergeCorporaJob(*args, **kwargs)¶
Merges Bliss Corpora files into a single file as subcorpora or flat
- Parameters:
bliss_corpora (Iterable[Path]) – any iterable of bliss corpora file paths to merge
name (str) – name of the new corpus (subcorpora will keep the original names)
merge_strategy (MergeStrategy) – how the corpora should be merged, e.g. as subcorpora or flat
- run()¶
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]
- class i6_core.corpus.transform.MergeCorpusSegmentsAndAudioJob(*args, **kwargs)¶
This job merges segments and audio files based on a rasr cluster map and a list of cluster_names. The cluster map should map segments to something like cluster.XXX where XXX is a natural number (starting with 1). The lines in the cluster_names file will be used as names for the recordings in the new corpus.
The job outputs a new corpus file + the corresponding audio files.
- run()¶
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]
- class i6_core.corpus.transform.MergeStrategy(value)¶
An enumeration.
- CONCATENATE = 2¶
- FLAT = 1¶
- SUBCORPORA = 0¶
- class i6_core.corpus.transform.ReplaceTranscriptionFromCtmJob(*args, **kwargs)¶
- run()¶
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]
- class i6_core.corpus.transform.ShiftCorpusSegmentStartJob(*args, **kwargs)¶
Shifts the start time of a corpus to change the fft window offset
- Parameters:
bliss_corpus (Path) – path to a bliss corpus file
corpus_name (str) – name of the new corpus
shift (int) – shift in seconds
- run()¶
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]