`i6_core.datasets.huggingface`¶

https://huggingface.co/docs/datasets/

class i6_core.datasets.huggingface.DownloadAndPrepareHuggingFaceDatasetJob(*args, **kwargs)¶

https://huggingface.co/docs/datasets/ https://huggingface.co/datasets

pip install datasets

Basically wraps datasets.load_dataset(...).save_to_disk(out_dir).

Example for Librispeech:

DownloadAndPrepareHuggingFaceDatasetJob(“librispeech_asr”, “clean”) https://github.com/huggingface/datasets/issues/4179

Parameters:

path – Path or name of the dataset, parameter passed to Dataset.load_dataset
name – Name of the dataset configuration, parameter passed to Dataset.load_dataset
data_files – Path(s) to the source data file(s), parameter passed to Dataset.load_dataset
revision – Version of the dataset script, parameter passed to Dataset.load_dataset
time_rqmt (float) –
mem_rqmt (float) –
cpu_rqmt (int) –
mini_task (bool) – the job should be run as mini_task

classmethod hash(kwargs)¶

Parameters:: parsed_args (dict[str]) –
Returns:: hash for job given the arguments
Return type:: str

run()¶

tasks()¶

Returns:: yields Task’s
Return type:: list[sisyphus.task.Task]