i6_core.corpus.filter
¶
- class i6_core.corpus.filter.FilterCorpusBySegmentDurationJob(*args, **kwargs)¶
- Parameters:
bliss_corpus – path of the corpus file
min_duration – minimum duration for a segment to keep (in seconds)
max_duration – maximum duration for a segment to keep (in seconds)
- run()¶
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]
- class i6_core.corpus.filter.FilterCorpusBySegmentsJob(*args, **kwargs)¶
- Parameters:
bliss_corpus –
segment_file – a single segment file or a list of segment files
compressed –
invert_match –
delete_empty_recordings – if true, empty recordings will be removed
- run()¶
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]
- class i6_core.corpus.filter.FilterCorpusRemoveUnknownWordSegmentsJob(*args, **kwargs)¶
Filter segments of a bliss corpus if there are unknowns with respect to a given lexicon
- Parameters:
bliss_corpus –
bliss_lexicon –
case_sensitive – consider casing for check against lexicon
all_unknown – all words have to be unknown in order for the segment to be discarded
- run()¶
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]
- class i6_core.corpus.filter.FilterSegmentsByAlignmentConfidenceJob(*args, **kwargs)¶
- Parameters:
alignment_logs – alignment_job.out_log_file; task_id -> log_file
percentile – percent of alignment segments to keep. should be in (0,100]. for
np.percentile()
crp – used to set the number of output segments. if none, number of alignment log files is used instead.
plot – plot the distribution of alignment scores
absolute_threshold – alignments with score above this number are discarded
- run()¶
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]
- class i6_core.corpus.filter.FilterSegmentsByListJob(*args, **kwargs)¶
Filters segment list file using a given list of segments, which is either used as black or as white list :param segment_files: original segment list files to be filtered :param filter_list: list used for filtering or a path to a text file with the entries of that list one per line :param invert_match: black list (if False) or white list (if True) usage
- run()¶
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]
- class i6_core.corpus.filter.FilterSegmentsByRegexJob(*args, **kwargs)¶
Filters segment list file using a given regular expression :param segment_files: original segment list files to be filtered :param filter_regex: regex used for filtering :param invert_match: keep segment if regex does not match (if False) or does match (if True)
- run()¶
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]