rcx_tk.msdial
Functions
|
Process MSDial output file to group duplicate alignments. |
|
Obtain number of samples from msdial file. |
|
Function to process a DataFrame of MSDial results to group duplicate alignments. |
|
Refine clusters based on mz tolerance, splitting them if the quant mass is different. |
|
Generate aggregation functions based on column types. |
|
Transitive merging of all duplicate indices into groups, where groups are merged if there is any overlap. |
|
Function to combine list of indices to union index. |
|
Get index of any duplicate values in any column. |
Module Contents
- rcx_tk.msdial.process_msdial_file(file_path: str, out_path: str, mz_tol_ppm: int) None[source]
Process MSDial output file to group duplicate alignments.
- rcx_tk.msdial.process_msdial(df: pandas.DataFrame, n_samples: int, mz_tol_ppm: int, metadata_cols: int = 27, index_col: str = 'Alignment ID') pandas.DataFrame[source]
Function to process a DataFrame of MSDial results to group duplicate alignments.
- Parameters:
df (pd.DataFrame) – Dataframe with MSDial results.
n_samples (int) – Number of samples - required to determine number of intensity cols in df.
mz_tol_ppm (int) – m/z tolerance in ppm to use for splitting clustered alignments.
metadata_cols (int, optional) – Number of columns containing data prior to feature abundances. Defaults to 27.
index_col (str, optional) – Column to denote the index. Defaults to “Alignment ID”.
- Returns:
DataFrame with clustered alignment ids.
- Return type:
pd.DataFrame
- rcx_tk.msdial.refine(clusters: list[pandas.Index], metadata: pandas.DataFrame, mz_tol_ppm: int) list[pandas.Index][source]
Refine clusters based on mz tolerance, splitting them if the quant mass is different.
- rcx_tk.msdial.aggregations(mean_columns: list[str], concat_columns: list[str], abundance_columns: list[str]) dict[str, collections.abc.Callable][source]
Generate aggregation functions based on column types.
- Parameters:
- Returns:
Dictionary with functions to use for pd.aggregate
- Return type:
- rcx_tk.msdial.find_clusters(all_duplicates: list[pandas.Index]) list[pandas.Index][source]
Transitive merging of all duplicate indices into groups, where groups are merged if there is any overlap.
- rcx_tk.msdial.union(all_duplicates: list[pandas.Index]) pandas.Index[source]
Function to combine list of indices to union index.
- Parameters:
all_duplicates (list[pd.Index]) – All indices to combine.
- Returns:
Union of all indices.
- Return type:
pd.Index
- rcx_tk.msdial.find_all_duplicates(data_matrix: pandas.DataFrame) list[pandas.Index][source]
Get index of any duplicate values in any column.
- Parameters:
data_matrix (pd.DataFrame) – DataFrame to check column-by-column for duplicate values.
- Returns:
All indexes of duplicates.
- Return type:
list[pd.Index]