i6_core.datasets.tedlium2

class i6_core.datasets.tedlium2.CreateTEDLIUM2BlissCorpusJob(*args, **kwargs)

Processes stm files from TEDLIUM2 corpus folders and creates Bliss corpus files Outputs a stm file and a bliss .xml.gz file for each train/dev/test set

Parameters:

{corpus_key (Dict) – Path} corpus_folders:

load_stm_data(stm_file)

:param str stm_file

make_corpus()

create bliss corpus from stm file (always include speakers)

make_stm()
tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]

class i6_core.datasets.tedlium2.DownloadTEDLIUM2CorpusJob(*args, **kwargs)

Download full TED-LIUM Release 2 corpus from https://projets-lium.univ-lemans.fr/wp-content/uploads/corpus/TED-LIUM/ (all train/dev/test/LM/dictionary data included)

process_dict()

minor modification on the dictionary (see comments)

run()
tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]