i6_core.text.processing

class i6_core.text.processing.ConcatenateJob(*args, **kwargs)

Concatenate all given input files (gz or raw)

Parameters:
  • text_files (list[Path]) – input text files

  • zip_out (bool) – apply gzip to the output

  • out_name (str) – user specific name

run()
tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]

class i6_core.text.processing.HeadJob(*args, **kwargs)

Return the head of a text file, either absolute or as ratio (provide one)

Parameters:
  • text_file (Path) – text file (gz or raw)

  • num_lines (int) – number of lines to extract

  • ratio (float) – ratio of lines to extract

run()
tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]

class i6_core.text.processing.PipelineJob(*args, **kwargs)

Reads a text file and applies a list of piped shell commands

Parameters:
  • text_files (iterable[Path]|Path) – text file (raw or gz) or list of files to be processed

  • pipeline (list[str|DelayedBase]) – list of shell commands to form the pipeline, can be empty to use the job for concatenation or gzip compression only.

  • zip_output (bool) – apply gzip to the output

  • check_equal_length (bool) – the line count of the input and output should match

  • mini_task (bool) – the pipeline should be run as mini_task

classmethod hash(parsed_args)
Parameters:

parsed_args (dict[str]) –

Returns:

hash for job given the arguments

Return type:

str

run()
tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]

class i6_core.text.processing.SetDifferenceJob(*args, **kwargs)

Return the set difference of two text files, where one line is one element.

This job performs the set difference minuend - subtrahend. Unlike the bash utility comm, the two files do not need to be sorted. :param Path minuend: left-hand side of the set subtraction :param Path subtrahend: right-hand side of the set subtraction :param bool gzipped: whether the output should be compressed in gzip format

run()
tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]

class i6_core.text.processing.TailJob(*args, **kwargs)

Return the tail of a text file, either absolute or as ratio (provide one)

Parameters:
  • text_file (Path) – text file (gz or raw)

  • num_lines (int) – number of lines to extract

  • ratio (float) – ratio of lines to extract

run()
class i6_core.text.processing.WriteToTextFileJob(*args, **kwargs)

Write a given content into a text file, one entry per line

Parameters:

content (list|dict|str) – input which will be written into a text file

run()
tasks()
Returns:

yields Task’s

Return type:

list[sisyphus.task.Task]