`i6_core.corpus.transform`¶

class i6_core.corpus.transform.AddCacheToCorpusJob(*args, **kwargs)¶

Adds cache manager call to all audio paths in a corpus file :param Path bliss_corpus: bliss corpora file path

run()¶

tasks()¶

Returns:: yields Task’s
Return type:: list[sisyphus.task.Task]

class i6_core.corpus.transform.ApplyLexiconToCorpusJob(*args, **kwargs)¶

Use a bliss lexicon to convert all words in a bliss corpus into their phoneme representation.

Currently only supports picking the first phoneme.

Parameters:

bliss_corpus (Path) – path to a bliss corpus xml
bliss_lexicon (Path) – path to a bliss lexicon file
word_separation_orth (str|None) – a default word separation lemma orth. The corresponding phoneme (or phonemes in some special cases) are inserted between each word. Usually it makes sense to use something like “[SILENCE]” or “[space]” or so).
strategy (LexiconStrategy) – strategy to determine which representation is selected

run()¶

tasks()¶

Returns:: yields Task’s
Return type:: list[sisyphus.task.Task]

class i6_core.corpus.transform.CompressCorpusJob(*args, **kwargs)¶

Compresses a corpus by concatenating audio files and using a compression codec. Does currently not support corpora with subcorpora, files need to be .wav :param Path bliss_corpus: path to an xml corpus file with wave recordings :param str format: supported file formats, currently limited to mp3 :param str bitrate: bitrate as string, e.g. ‘32k’ or ‘192k’, can also be an integer e.g. 192000 :param int max_num_splits: maximum number of resulting audio files.

add_duration_to_recordings(c)¶: open each recording, extract the duration and add the duration to the recording object # TODO: this is a lengthy operation, but so far there was no alternative… :param corpus.Corpus c: :return:

info()¶: read the log.run file to extract the current status of the compression job :return:

run()¶

run_ffmpeg(ffmpeg_inputs, output_path)¶

tasks()¶

Returns:: yields Task’s
Return type:: list[sisyphus.task.Task]

class i6_core.corpus.transform.MergeCorporaJob(*args, **kwargs)¶

Merges Bliss Corpora files into a single file as subcorpora or flat

Parameters:

bliss_corpora (Iterable[Path]) – any iterable of bliss corpora file paths to merge
name (str) – name of the new corpus (subcorpora will keep the original names)
merge_strategy (MergeStrategy) – how the corpora should be merged, e.g. as subcorpora or flat

run()¶

tasks()¶

Returns:: yields Task’s
Return type:: list[sisyphus.task.Task]

class i6_core.corpus.transform.MergeCorpusSegmentsAndAudioJob(*args, **kwargs)¶

This job merges segments and audio files based on a rasr cluster map and a list of cluster_names. The cluster map should map segments to something like cluster.XXX where XXX is a natural number (starting with 1). The lines in the cluster_names file will be used as names for the recordings in the new corpus.

The job outputs a new corpus file + the corresponding audio files.

run()¶

tasks()¶

Returns:: yields Task’s
Return type:: list[sisyphus.task.Task]

class i6_core.corpus.transform.MergeStrategy(value)¶

An enumeration.

CONCATENATE = 2¶

FLAT = 1¶

SUBCORPORA = 0¶

class i6_core.corpus.transform.ReplaceTranscriptionFromCtmJob(*args, **kwargs)¶

run()¶

tasks()¶

Returns:: yields Task’s
Return type:: list[sisyphus.task.Task]

class i6_core.corpus.transform.ShiftCorpusSegmentStartJob(*args, **kwargs)¶

Shifts the start time of a corpus to change the fft window offset

Parameters:

bliss_corpus (Path) – path to a bliss corpus file
corpus_name (str) – name of the new corpus
shift (int) – shift in seconds

run()¶

tasks()¶

Returns:: yields Task’s
Return type:: list[sisyphus.task.Task]

`i6_core.corpus.transform`¶

i6_core

Navigation

Related Topics

i6_core.corpus.transform¶

`i6_core.corpus.transform`¶