rcx_tk.msdial ============= .. py:module:: rcx_tk.msdial Functions --------- .. autoapisummary:: rcx_tk.msdial.process_msdial_file rcx_tk.msdial.get_n_samples rcx_tk.msdial.process_msdial rcx_tk.msdial.refine rcx_tk.msdial.aggregations rcx_tk.msdial.find_clusters rcx_tk.msdial.union rcx_tk.msdial.find_all_duplicates Module Contents --------------- .. py:function:: process_msdial_file(file_path: str, out_path: str, mz_tol_ppm: int) -> None Process MSDial output file to group duplicate alignments. :param file_path: Input file path. :type file_path: str :param out_path: Output file path. :type out_path: str :param mz_tol_ppm: m/z tolerance in ppm to use for splitting clustered alignments. :type mz_tol_ppm: int .. py:function:: get_n_samples(file_path: str) -> int Obtain number of samples from msdial file. :param file_path: Path to msdial file. :type file_path: str :returns: Number of samples contained in the file. :rtype: int .. py:function:: process_msdial(df: pandas.DataFrame, n_samples: int, mz_tol_ppm: int, metadata_cols: int = 27, index_col: str = 'Alignment ID') -> pandas.DataFrame Function to process a DataFrame of MSDial results to group duplicate alignments. :param df: Dataframe with MSDial results. :type df: pd.DataFrame :param n_samples: Number of samples - required to determine number of intensity cols in df. :type n_samples: int :param mz_tol_ppm: m/z tolerance in ppm to use for splitting clustered alignments. :type mz_tol_ppm: int :param metadata_cols: Number of columns containing data prior to feature abundances. Defaults to 27. :type metadata_cols: int, optional :param index_col: Column to denote the index. Defaults to "Alignment ID". :type index_col: str, optional :returns: DataFrame with clustered alignment ids. :rtype: pd.DataFrame .. py:function:: refine(clusters: list[pandas.Index], metadata: pandas.DataFrame, mz_tol_ppm: int) -> list[pandas.Index] Refine clusters based on mz tolerance, splitting them if the quant mass is different. :param clusters: List of clusters to refine. :type clusters: list[pd.Index] :param metadata: Metadata section of the msdial file to use for refining clusters. :type metadata: pd.DataFrame :param mz_tol_ppm: m/z tolerance in ppm to use to split clusters. :type mz_tol_ppm: int :returns: Refined list of clusters. :rtype: list[pd.Index] .. py:function:: aggregations(mean_columns: list[str], concat_columns: list[str], abundance_columns: list[str]) -> dict[str, collections.abc.Callable] Generate aggregation functions based on column types. :param mean_columns: List of columns to aggregate using mean. :type mean_columns: list[str] :param concat_columns: List of columns to aggregate using concatenation. :type concat_columns: list[str] :param abundance_columns: List of columns to aggregate using max. :type abundance_columns: list[str] :returns: Dictionary with functions to use for pd.aggregate :rtype: dict[str, function] .. py:function:: find_clusters(all_duplicates: list[pandas.Index]) -> list[pandas.Index] Transitive merging of all duplicate indices into groups, where groups are merged if there is any overlap. :param all_duplicates: List of all duplicate indices. :type all_duplicates: list[pd.Index] :returns: Clusters of connected duplicates. :rtype: list[pd.Index] .. py:function:: union(all_duplicates: list[pandas.Index]) -> pandas.Index Function to combine list of indices to union index. :param all_duplicates: All indices to combine. :type all_duplicates: list[pd.Index] :returns: Union of all indices. :rtype: pd.Index .. py:function:: find_all_duplicates(data_matrix: pandas.DataFrame) -> list[pandas.Index] Get index of any duplicate values in any column. :param data_matrix: DataFrame to check column-by-column for duplicate values. :type data_matrix: pd.DataFrame :returns: All indexes of duplicates. :rtype: list[pd.Index]