i6_core.datasets.tedlium2
¶
- class i6_core.datasets.tedlium2.CreateTEDLIUM2BlissCorpusJob(*args, **kwargs)¶
Processes stm files from TEDLIUM2 corpus folders and creates Bliss corpus files Outputs a stm file and a bliss .xml.gz file for each train/dev/test set
- Parameters:
{corpus_key (Dict) – Path} corpus_folders:
- load_stm_data(stm_file)¶
:param str stm_file
- make_corpus()¶
create bliss corpus from stm file (always include speakers)
- make_stm()¶
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]
- class i6_core.datasets.tedlium2.DownloadTEDLIUM2CorpusJob(*args, **kwargs)¶
Download full TED-LIUM Release 2 corpus from https://projets-lium.univ-lemans.fr/wp-content/uploads/corpus/TED-LIUM/ (all train/dev/test/LM/dictionary data included)
- process_dict()¶
minor modification on the dictionary (see comments)
- run()¶
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]