i6_core.text.processing
¶
- class i6_core.text.processing.ConcatenateJob(*args, **kwargs)¶
Concatenate all given input files (gz or raw)
- Parameters:
text_files (list[Path]) – input text files
zip_out (bool) – apply gzip to the output
out_name (str) – user specific name
- run()¶
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]
- class i6_core.text.processing.HeadJob(*args, **kwargs)¶
Return the head of a text file, either absolute or as ratio (provide one)
- Parameters:
text_file (Path) – text file (gz or raw)
num_lines (int) – number of lines to extract
ratio (float) – ratio of lines to extract
- run()¶
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]
- class i6_core.text.processing.PipelineJob(*args, **kwargs)¶
Reads a text file and applies a list of piped shell commands
- Parameters:
text_files (iterable[Path]|Path) – text file (raw or gz) or list of files to be processed
pipeline (list[str|DelayedBase]) – list of shell commands to form the pipeline, can be empty to use the job for concatenation or gzip compression only.
zip_output (bool) – apply gzip to the output
check_equal_length (bool) – the line count of the input and output should match
mini_task (bool) – the pipeline should be run as mini_task
- classmethod hash(parsed_args)¶
- Parameters:
parsed_args (dict[str]) –
- Returns:
hash for job given the arguments
- Return type:
str
- run()¶
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]
- class i6_core.text.processing.SetDifferenceJob(*args, **kwargs)¶
Return the set difference of two text files, where one line is one element.
This job performs the set difference minuend - subtrahend. Unlike the bash utility comm, the two files do not need to be sorted. :param Path minuend: left-hand side of the set subtraction :param Path subtrahend: right-hand side of the set subtraction :param bool gzipped: whether the output should be compressed in gzip format
- run()¶
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]
- class i6_core.text.processing.TailJob(*args, **kwargs)¶
Return the tail of a text file, either absolute or as ratio (provide one)
- Parameters:
text_file (Path) – text file (gz or raw)
num_lines (int) – number of lines to extract
ratio (float) – ratio of lines to extract
- run()¶
- class i6_core.text.processing.WriteToTextFileJob(*args, **kwargs)¶
Write a given content into a text file, one entry per line
- Parameters:
content (list|dict|str) – input which will be written into a text file
- run()¶
- tasks()¶
- Returns:
yields Task’s
- Return type:
list[sisyphus.task.Task]