i6_core.datasets.librispeech
¶
- class i6_core.datasets.librispeech.DownloadLibriSpeechCorpusJob(*args, **kwargs)¶
Download a part of the LibriSpeech corpus from https://www.openslr.org/resources/12 and checks for file integrity via md5sum
(see also: https://www.openslr.org/12/)
To get the corpus metadata, use DownloadLibriSpeechMetadataJob
self.out_corpus_folder links to the root of the speaker_id/chapter/* folder structure
- Parameters:
corpus_key (str) – corpus identifier, e.g. “train-clean-100”
- run()¶
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]
- class i6_core.datasets.librispeech.DownloadLibriSpeechMetadataJob(*args, **kwargs)¶
Downloads the metadata file and checks for md5sum integrity
Defines outputs for “SPEAKERS.TXT, CHAPTERS.TXT and BOOKS.TXT”
- Parameters:
corpus_key (str) – corpus identifier, e.g. “train-clean-100”
- class i6_core.datasets.librispeech.LibriSpeechCreateBlissCorpusJob(*args, **kwargs)¶
Creates a Bliss corpus from a LibriSpeech corpus folder using the speaker information in addition
Outputs a single bliss .xml.gz file
- Parameters:
corpus_folder (Path) – Path to a LibriSpeech corpus folder
speaker_metadata (Path) – Path to SPEAKER.TXT file from the MetdataJob (out_speakers)
- run()¶
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]