i6_core.datasets.huggingface
¶
https://huggingface.co/docs/datasets/
- class i6_core.datasets.huggingface.DownloadAndPrepareHuggingFaceDatasetJob(*args, **kwargs)¶
https://huggingface.co/docs/datasets/ https://huggingface.co/datasets
pip install datasets
Basically wraps
datasets.load_dataset(...).save_to_disk(out_dir)
.Example for Librispeech:
DownloadAndPrepareHuggingFaceDatasetJob(“librispeech_asr”, “clean”) https://github.com/huggingface/datasets/issues/4179
- Parameters:
path – Path or name of the dataset, parameter passed to Dataset.load_dataset
name – Name of the dataset configuration, parameter passed to Dataset.load_dataset
data_files – Path(s) to the source data file(s), parameter passed to Dataset.load_dataset
revision – Version of the dataset script, parameter passed to Dataset.load_dataset
time_rqmt (float) –
mem_rqmt (float) –
cpu_rqmt (int) –
mini_task (bool) – the job should be run as mini_task
- classmethod hash(kwargs)¶
- Parameters:
parsed_args (dict[str]) –
- Returns:
hash for job given the arguments
- Return type:
str
- run()¶
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]