i6_core.datasets.tf_datasets

This module adds jobs for TF datasets, as documented here: https://www.tensorflow.org/datasets

class i6_core.datasets.tf_datasets.DownloadAndPrepareTfDatasetJob(*args, **kwargs)

This job downloads and prepares a TF dataset. The processed files are stored in a data_dir folder, from where it can be loaded again (see https://www.tensorflow.org/datasets/overview#load_a_dataset)

Install the dependencies:

pip install tensorflow-datasets

It further needs some extra dependencies, for example for ‘librispeech’:

pip install apache_beam pip install pydub # ffmpeg installed

See here for some more: https://github.com/tensorflow/datasets/blob/master/setup.py

Also maybe:

pip install datasets  # for Huggingface community datasets
Parameters:
classmethod hash(kwargs)
Parameters:

parsed_args (dict[str]) –

Returns:

hash for job given the arguments

Return type:

str

run()
tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]