rcx_tk.msdial

Functions

`process_msdial_file`(→ None)	Process MSDial output file to group duplicate alignments.
`get_n_samples`(→ int)	Obtain number of samples from msdial file.
`process_msdial`(→ pandas.DataFrame)	Function to process a DataFrame of MSDial results to group duplicate alignments.
`refine`(→ list[pandas.Index])	Refine clusters based on mz tolerance, splitting them if the quant mass is different.
`aggregations`(→ dict[str, collections.abc.Callable])	Generate aggregation functions based on column types.
`find_clusters`(→ list[pandas.Index])	Transitive merging of all duplicate indices into groups, where groups are merged if there is any overlap.
`union`(→ pandas.Index)	Function to combine list of indices to union index.
`find_all_duplicates`(→ list[pandas.Index])	Get index of any duplicate values in any column.

Module Contents

rcx_tk.msdial.process_msdial_file(file_path: str, out_path: str, mz_tol_ppm: int) → None[source]

Process MSDial output file to group duplicate alignments.

Parameters:

file_path (str) – Input file path.
out_path (str) – Output file path.
mz_tol_ppm (int) – m/z tolerance in ppm to use for splitting clustered alignments.

rcx_tk.msdial.get_n_samples(file_path: str) → int[source]

Obtain number of samples from msdial file.

Parameters:: file_path (str) – Path to msdial file.
Returns:: Number of samples contained in the file.
Return type:: int

rcx_tk.msdial.process_msdial(df: pandas.DataFrame, n_samples: int, mz_tol_ppm: int, metadata_cols: int = 27, index_col: str = 'Alignment ID') → pandas.DataFrame[source]

Function to process a DataFrame of MSDial results to group duplicate alignments.

Parameters:

df (pd.DataFrame) – Dataframe with MSDial results.
n_samples (int) – Number of samples - required to determine number of intensity cols in df.
mz_tol_ppm (int) – m/z tolerance in ppm to use for splitting clustered alignments.
metadata_cols (int, optional) – Number of columns containing data prior to feature abundances. Defaults to 27.
index_col (str, optional) – Column to denote the index. Defaults to “Alignment ID”.

Returns:

DataFrame with clustered alignment ids.

Return type:

pd.DataFrame

rcx_tk.msdial.refine(clusters: list[pandas.Index], metadata: pandas.DataFrame, mz_tol_ppm: int) → list[pandas.Index][source]

Refine clusters based on mz tolerance, splitting them if the quant mass is different.

Parameters:

clusters (list[pd.Index]) – List of clusters to refine.
metadata (pd.DataFrame) – Metadata section of the msdial file to use for refining clusters.
mz_tol_ppm (int) – m/z tolerance in ppm to use to split clusters.

Returns:

Refined list of clusters.

Return type:

list[pd.Index]

rcx_tk.msdial.aggregations(mean_columns: list[str], concat_columns: list[str], abundance_columns: list[str]) → dict[str, collections.abc.Callable][source]

Generate aggregation functions based on column types.

Parameters:

mean_columns (list[str]) – List of columns to aggregate using mean.
concat_columns (list[str]) – List of columns to aggregate using concatenation.
abundance_columns (list[str]) – List of columns to aggregate using max.

Returns:

Dictionary with functions to use for pd.aggregate

Return type:

dict[str, function]

rcx_tk.msdial.find_clusters(all_duplicates: list[pandas.Index]) → list[pandas.Index][source]

Transitive merging of all duplicate indices into groups, where groups are merged if there is any overlap.

Parameters:: all_duplicates (list[pd.Index]) – List of all duplicate indices.
Returns:: Clusters of connected duplicates.
Return type:: list[pd.Index]

rcx_tk.msdial.union(all_duplicates: list[pandas.Index]) → pandas.Index[source]

Function to combine list of indices to union index.

Parameters:: all_duplicates (list[pd.Index]) – All indices to combine.
Returns:: Union of all indices.
Return type:: pd.Index

rcx_tk.msdial.find_all_duplicates(data_matrix: pandas.DataFrame) → list[pandas.Index][source]

Get index of any duplicate values in any column.

Parameters:: data_matrix (pd.DataFrame) – DataFrame to check column-by-column for duplicate values.
Returns:: All indexes of duplicates.
Return type:: list[pd.Index]