`i6_core.datasets.librispeech`¶

class i6_core.datasets.librispeech.DownloadLibriSpeechCorpusJob(*args, **kwargs)¶

Download a part of the LibriSpeech corpus from https://www.openslr.org/resources/12 and checks for file integrity via md5sum

To get the corpus metadata, use DownloadLibriSpeechMetadataJob

self.out_corpus_folder links to the root of the speaker_id/chapter/* folder structure

Parameters:: corpus_key (str) – corpus identifier, e.g. “train-clean-100”

tasks()¶

class i6_core.datasets.librispeech.DownloadLibriSpeechMetadataJob(*args, **kwargs)¶

Downloads the metadata file and checks for md5sum integrity

Defines outputs for “SPEAKERS.TXT, CHAPTERS.TXT and BOOKS.TXT”

Parameters:: corpus_key (str) – corpus identifier, e.g. “train-clean-100”

class i6_core.datasets.librispeech.LibriSpeechCreateBlissCorpusJob(*args, **kwargs)¶

Creates a Bliss corpus from a LibriSpeech corpus folder using the speaker information in addition

Outputs a single bliss .xml.gz file

Parameters:

corpus_folder (Path) – Path to a LibriSpeech corpus folder
speaker_metadata (Path) – Path to SPEAKER.TXT file from the MetdataJob (out_speakers)

tasks()¶

i6_core