i6_core.datasets.librispeech

class i6_core.datasets.librispeech.DownloadLibriSpeechCorpusJob(*args, **kwargs)

Download a part of the LibriSpeech corpus from https://www.openslr.org/resources/12 and checks for file integrity via md5sum

(see also: https://www.openslr.org/12/)

To get the corpus metadata, use DownloadLibriSpeechMetadataJob

self.out_corpus_folder links to the root of the speaker_id/chapter/* folder structure

Parameters:

corpus_key (str) – corpus identifier, e.g. “train-clean-100”

run()
tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]

class i6_core.datasets.librispeech.DownloadLibriSpeechMetadataJob(*args, **kwargs)

Downloads the metadata file and checks for md5sum integrity

Defines outputs for “SPEAKERS.TXT, CHAPTERS.TXT and BOOKS.TXT”

Parameters:

corpus_key (str) – corpus identifier, e.g. “train-clean-100”

class i6_core.datasets.librispeech.LibriSpeechCreateBlissCorpusJob(*args, **kwargs)

Creates a Bliss corpus from a LibriSpeech corpus folder using the speaker information in addition

Outputs a single bliss .xml.gz file

Parameters:
  • corpus_folder (Path) – Path to a LibriSpeech corpus folder

  • speaker_metadata (Path) – Path to SPEAKER.TXT file from the MetdataJob (out_speakers)

run()
tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]