i6_core.corpus.filter

class i6_core.corpus.filter.FilterCorpusBySegmentDurationJob(*args, **kwargs)
Parameters:
  • bliss_corpus – path of the corpus file

  • min_duration – minimum duration for a segment to keep (in seconds)

  • max_duration – maximum duration for a segment to keep (in seconds)

run()
tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]

class i6_core.corpus.filter.FilterCorpusBySegmentsJob(*args, **kwargs)
Parameters:
  • bliss_corpus

  • segment_file – a single segment file or a list of segment files

  • compressed

  • invert_match

  • delete_empty_recordings – if true, empty recordings will be removed

run()
tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]

class i6_core.corpus.filter.FilterCorpusRemoveUnknownWordSegmentsJob(*args, **kwargs)

Filter segments of a bliss corpus if there are unknowns with respect to a given lexicon

Parameters:
  • bliss_corpus

  • bliss_lexicon

  • case_sensitive – consider casing for check against lexicon

  • all_unknown – all words have to be unknown in order for the segment to be discarded

run()
tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]

class i6_core.corpus.filter.FilterSegmentsByAlignmentConfidenceJob(*args, **kwargs)
Parameters:
  • alignment_logs – alignment_job.out_log_file; task_id -> log_file

  • percentile – percent of alignment segments to keep. should be in (0,100]. for np.percentile()

  • crp – used to set the number of output segments. if none, number of alignment log files is used instead.

  • plot – plot the distribution of alignment scores

  • absolute_threshold – alignments with score above this number are discarded

run()
tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]

class i6_core.corpus.filter.FilterSegmentsByListJob(*args, **kwargs)

Filters segment list file using a given list of segments, which is either used as black or as white list :param segment_files: original segment list files to be filtered :param filter_list: list used for filtering or a path to a text file with the entries of that list one per line :param invert_match: black list (if False) or white list (if True) usage

run()
tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]

class i6_core.corpus.filter.FilterSegmentsByRegexJob(*args, **kwargs)

Filters segment list file using a given regular expression :param segment_files: original segment list files to be filtered :param filter_regex: regex used for filtering :param invert_match: keep segment if regex does not match (if False) or does match (if True)

run()
tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]