i6_core.corpus.segments

class i6_core.corpus.segments.DynamicSplitSegmentFileJob(*args, **kwargs)

Split the segments to concurrent many shares. It is a variant to the existing SplitSegmentFileJob. This requires a tk.Delayed variable (instead of int) for the argument concurrent.

Parameters:
  • segment_file (tk.Path|str) – segment file

  • concurrent (tk.Delayed) – number of splits

run()
tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]

class i6_core.corpus.segments.SegmentCorpusByRegexJob(*args, **kwargs)
run()
tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]

class i6_core.corpus.segments.SegmentCorpusBySpeakerJob(*args, **kwargs)
run()
tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]

class i6_core.corpus.segments.SegmentCorpusJob(*args, **kwargs)
run()
tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]

class i6_core.corpus.segments.ShuffleAndSplitSegmentsJob(*args, **kwargs)
default_split = {'dev': 0.1, 'train': 0.9}
classmethod hash(kwargs)
Parameters:

parsed_args (dict[str]) –

Returns:

hash for job given the arguments

Return type:

str

run()
tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]

class i6_core.corpus.segments.SortSegmentsByLengthAndShuffleJob(*args, **kwargs)
Parameters:
  • crp – rasr.crp.CommonRasrParameters

  • shuffle_strength – float in [0,inf) determines how much the length should affect sorting 0 -> completely random; inf -> strictly sorted

  • shuffle_seed – random number seed

run()
tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]

class i6_core.corpus.segments.SplitSegmentFileJob(*args, **kwargs)
run()
tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]

class i6_core.corpus.segments.UpdateSegmentsWithSegmentMapJob(*args, **kwargs)

Update a segment file with a segment mapping file (e.g. from corpus compression)

Parameters:
  • segment_file (Path) – path to the segment text file (uncompressed)

  • segment_map (Path) – path to the segment map (gz or uncompressed)

run()
tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]