rcx_tk.sequence

Functions

process_sequence_file(→ None)

Processes a metadata file, keeping and renaming specific columns.

process_sequence(→ pandas.DataFrame)

Processes the metadata dataframe.

cleanup(→ pandas.DataFrame)

Removes the file Name column and moves the sampleName col.

validate_injection_order(→ bool)

Validates if injectionOrder is of integer type.

derive_additional_metadata(→ pandas.DataFrame)

Derives additional metadata columns.

rearrange_columns(→ pandas.DataFrame)

Rearranges the columns.

validate_filenames_column(→ None)

Validates the file names.

add_local_order(→ int)

Returns the localOrder value, i.e. the last n-digits after the last underscore.

add_sequence_identifier(→ str)

Returns the sequenceIdentifier value, i.e. everything before last _[digits].

separate_filename(→ Tuple[str, str])

Split a filename into the non-numeric prefix and trailing numeric suffix.

add_subject_identifier(→ str)

Returns the subjectIdentifier value, i.e. everything between [digit_] and [_digit].

Module Contents

rcx_tk.sequence.process_sequence_file(file_path: str, out_path: str) None[source]

Processes a metadata file, keeping and renaming specific columns.

Parameters:
  • file_path (str) – A path to the metadata file.

  • out_path (str) – A path where processed metadata dataframe is exported.

rcx_tk.sequence.process_sequence(df: pandas.DataFrame) pandas.DataFrame[source]

Processes the metadata dataframe.

Parameters:

df (pd.DataFrame) – The metadata dataframe.

Returns:

A metadata dataframe with rearranged and newly derived columns.

Return type:

pd.DataFrame

rcx_tk.sequence.cleanup(df: pandas.DataFrame) pandas.DataFrame[source]

Removes the file Name column and moves the sampleName col.

Parameters:

df (pd.DataFrame) – The metadata dataframe.

Returns:

The processed dataframe.

Return type:

pd.DataFrame

rcx_tk.sequence.validate_injection_order(df: pandas.DataFrame) bool[source]

Validates if injectionOrder is of integer type.

Parameters:

df (pd.DataFrame) – The metadata dataframe.

Returns:

Whether the injectionOrder is integer.

Return type:

bool

rcx_tk.sequence.derive_additional_metadata(df: pandas.DataFrame) pandas.DataFrame[source]

Derives additional metadata columns.

Parameters:

df (pd.DataFrame) – The metadata dataframe.

Returns:

The processed dataframe.

Return type:

pd.DataFrame

rcx_tk.sequence.rearrange_columns(df: pandas.DataFrame) pandas.DataFrame[source]

Rearranges the columns.

Parameters:

df (pd.DataFrame) – The metadata dataframe.

Returns:

The processed dataframe.

Return type:

pd.DataFrame

rcx_tk.sequence.validate_filenames_column(df: pandas.DataFrame) None[source]

Validates the file names.

Parameters:

df (pd.DataFrame) – A dataframe to process.

Raises:

ValueError – An error if there is any invalid file name.

rcx_tk.sequence.add_local_order(file_name: str) int[source]

Returns the localOrder value, i.e. the last n-digits after the last underscore.

Parameters:

file_name (str) – The filename.

Returns:

The localOrder value.

Return type:

int

rcx_tk.sequence.add_sequence_identifier(file_name: str) str[source]

Returns the sequenceIdentifier value, i.e. everything before last _[digits].

Parameters:

file_name (str) – The filename.

Returns:

The sequenceIdentifier value.

Return type:

str

rcx_tk.sequence.separate_filename(file_name: str) Tuple[str, str][source]

Split a filename into the non-numeric prefix and trailing numeric suffix.

Parameters:

file_name (str) – The filename.

Returns:

Splitted file_name.

Return type:

Tuple[str, str]

rcx_tk.sequence.add_subject_identifier(file_name: str) str[source]

Returns the subjectIdentifier value, i.e. everything between [digit_] and [_digit].

Parameters:

file_name (str) – The filename.

Returns:

The subjectIdentifier value.

Return type:

str