i6_core.lib.rasr_cache

This module is about reading (maybe later also writing) the Rasr archive format.

class i6_core.lib.rasr_cache.AllophoneLabeling(silence_phone, allophone_file, phoneme_file=None, state_tying_file=None, verbose_out=None)

Allophone labeling.

Parameters:
  • silence_phone (str) – e.g. “si”

  • allophone_file (str) – list of allophones

  • phoneme_file (str|None) – list of phonemes

  • state_tying_file (str|None) – allophone state tying (e.g. via CART). maps each allophone state to a class label

  • verbose_out (file) – stream to dump log messages

get_label_idx(allo_idx, state_idx)
Parameters:
  • allo_idx (int) –

  • state_idx (int) –

Return type:

int

get_label_idx_by_allo_state_idx(allo_state_idx)
Parameters:

allo_state_idx (int) –

Return type:

int

class i6_core.lib.rasr_cache.FileArchive(filename, must_exists=False, encoding='ascii')

File archive.

RasrCacheHeader = 'SP_ARC1\x00'
addAttributes(filename, dim, duration)
Parameters:
  • filename (str) –

  • dim (int) –

  • duration (float) –

addFeatureCache(filename, features, times)
Parameters:
  • filename (str) –

  • features

  • times

end_recovery_tag = 1437226410
file_list()
Return type:

list[str]

finalize()

Finalize.

getState(mix)
Parameters:

mix (int) –

Returns:

(mix, state)

Return type:

(int,int)

has_entry(filename)
Parameters:

filename (str) – argument for self.read()

Returns:

True if we have this entry

read(filename, typ)
Parameters:
  • filename (str) – the entry-name in the archive

  • typ (str) – “str”, “feat” or “align”

Returns:

depending on typ, “str” -> string, “feat” -> (time, data), “align” -> align, where string is a str, time is list of time-stamp tuples (start-time,end-time) in millisecs,

data is a list of features, each a numpy vector,

align is a list of (time, allophone, state), time is an int from 0 to len of align,

allophone is some int, state is e.g. in [0,1,2].

Return type:

str|(list[numpy.ndarray],list[numpy.ndarray])|list[(int,int,int)]

readFileInfoTable()

Read file info table.

read_S16()
Return type:

float

read_U32()
Return type:

int

read_U8()
Return type:

int

read_bytes(l)
Return type:

bytes

read_char()
Return type:

int

read_f32()
Return type:

float

read_f64()
Return type:

float

read_packed_U32()
Return type:

int

read_str(l, enc='ascii')
Return type:

str

read_u32()
Return type:

int

read_u64()
Return type:

int

read_v(typ, size)
Parameters:
  • typ (str) – “f” for float (float32) or “d” for double (float64)

  • size (int) – number of elements to return

Returns:

numpy array of shape (size,) of dtype depending on typ

Return type:

numpy.ndarray

scanArchive()

Scan archive.

setAllophones(f)
Parameters:

f (str) – allophone filename. line-separated. will ignore lines starting with “#”

start_recovery_tag = 2857740885
writeFileInfoTable()

Write file info table.

write_U32(i)
Parameters:

i (int) –

Return type:

int

write_char(i)
Parameters:

i (int) –

Return type:

int

write_f32(i)
Parameters:

i (float) –

Return type:

int

write_f64(i)
Parameters:

i (float) –

Return type:

int

write_str(s, enc='ascii')
Parameters:

s (str) –

Return type:

int

write_u32(i)
Parameters:

i (int) –

Return type:

int

write_u64(i)
Parameters:

i (int) –

Return type:

int

class i6_core.lib.rasr_cache.FileArchiveBundle(filename, encoding='ascii')

File archive bundle.

Parameters:
  • filename (str) – .bundle file

  • encoding (str) – encoding used in the files

file_list()
Return type:

list[str]

Returns:

list of content-filenames (which can be used for self.read())

has_entry(filename)
Parameters:

filename (str) – argument for self.read()

Returns:

True if we have this entry

read(filename, typ)
Parameters:
  • filename (str) – the entry-name in the archive

  • typ (str) – “str”, “feat” or “align”

Returns:

depending on typ, “str” -> string, “feat” -> (time, data), “align” -> align, where string is a str, time is list of time-stamp tuples (start-time,end-time) in millisecs,

data is a list of features, each a numpy vector,

align is a list of (time, allophone, state), time is an int from 0 to len of align,

allophone is some int, state is e.g. in [0,1,2].

Return type:

str|(list[numpy.ndarray],list[numpy.ndarray])|list[(int,int,int)]

Uses FileArchive.read().

setAllophones(filename)
Parameters:

filename (str) – allophone filename

class i6_core.lib.rasr_cache.FileInfo(name, pos, size, compressed, index)

File info.

Parameters:
  • name (str) –

  • pos (int) –

  • size (int) –

  • compressed (bool|int) –

  • index (int) –

class i6_core.lib.rasr_cache.MixtureSet(filename)

Mixture set.

Parameters:

filename (str) –

getCovByIdx(idx)
Parameters:

idx (int) –

Return type:

numpy.ndarray

getMeanByIdx(idx)
Parameters:

idx (int) –

Return type:

numpy.ndarray

getNumberMixtures()
Return type:

int

read_U32()
Return type:

int

read_char()
Return type:

int

read_f32()
Return type:

float

read_f64()
Return type:

float

read_str(l, enc='ascii')
Parameters:
  • l (int) –

  • enc (str) –

Return type:

str

read_u32()
Return type:

int

read_u64()
Return type:

int

read_v(size, a)
Parameters:
  • size (int) –

  • a (array.array) –

Return type:

array.array

write(filename)
Parameters:

filename (str) –

write_U32(i)
Parameters:

i (int) –

Return type:

int

write_char(i)
Parameters:

i (int) –

Return type:

int

write_f32(i)
Parameters:

i (float) –

Return type:

int

write_f64(i)
Parameters:

i (float) –

Return type:

int

write_str(s, enc='ascii')
Parameters:
  • s (str) –

  • enc (str) –

Return type:

int

write_u32(i)
Parameters:

i (int) –

Return type:

int

write_u64(i)
Parameters:

i (int) –

Return type:

int

class i6_core.lib.rasr_cache.WordBoundaries(filename)

Word boundaries.

Parameters:

filename (str) –

read_str(l, enc='ascii')
Return type:

str

read_u16()
Return type:

int

read_u32()
Return type:

int

i6_core.lib.rasr_cache.is_rasr_cache_file(filename)
Parameters:

filename (str) – file to check. must exist

Returns:

True iff this is a rasr cache (which can be loaded with open_file_archive())

Return type:

bool

i6_core.lib.rasr_cache.open_file_archive(archive_filename, must_exists=True, encoding='ascii')
Parameters:
  • archive_filename (str) –

  • must_exists (bool) –

  • encoding (str) –

Return type:

FileArchiveBundle|FileArchive