rcx_tk.msdial
=============

.. py:module:: rcx_tk.msdial


Functions
---------

.. autoapisummary::

   rcx_tk.msdial.process_msdial_file
   rcx_tk.msdial.get_n_samples
   rcx_tk.msdial.process_msdial
   rcx_tk.msdial.refine
   rcx_tk.msdial.aggregations
   rcx_tk.msdial.find_clusters
   rcx_tk.msdial.union
   rcx_tk.msdial.find_all_duplicates


Module Contents
---------------

.. py:function:: process_msdial_file(file_path: str, out_path: str, mz_tol_ppm: int) -> None

   Process MSDial output file to group duplicate alignments.

   :param file_path: Input file path.
   :type file_path: str
   :param out_path: Output file path.
   :type out_path: str
   :param mz_tol_ppm: m/z tolerance in ppm to use for splitting clustered alignments.
   :type mz_tol_ppm: int


.. py:function:: get_n_samples(file_path: str) -> int

   Obtain number of samples from msdial file.

   :param file_path: Path to msdial file.
   :type file_path: str

   :returns: Number of samples contained in the file.
   :rtype: int


.. py:function:: process_msdial(df: pandas.DataFrame, n_samples: int, mz_tol_ppm: int, metadata_cols: int = 27, index_col: str = 'Alignment ID') -> pandas.DataFrame

   Function to process a DataFrame of MSDial results to group duplicate alignments.

   :param df: Dataframe with MSDial results.
   :type df: pd.DataFrame
   :param n_samples: Number of samples - required to determine number of intensity cols in df.
   :type n_samples: int
   :param mz_tol_ppm: m/z tolerance in ppm to use for splitting clustered alignments.
   :type mz_tol_ppm: int
   :param metadata_cols: Number of columns containing data prior to feature abundances. Defaults to 27.
   :type metadata_cols: int, optional
   :param index_col: Column to denote the index. Defaults to "Alignment ID".
   :type index_col: str, optional

   :returns: DataFrame with clustered alignment ids.
   :rtype: pd.DataFrame


.. py:function:: refine(clusters: list[pandas.Index], metadata: pandas.DataFrame, mz_tol_ppm: int) -> list[pandas.Index]

   Refine clusters based on mz tolerance, splitting them if the quant mass is different.

   :param clusters: List of clusters to refine.
   :type clusters: list[pd.Index]
   :param metadata: Metadata section of the msdial file to use for refining clusters.
   :type metadata: pd.DataFrame
   :param mz_tol_ppm: m/z tolerance in ppm to use to split clusters.
   :type mz_tol_ppm: int

   :returns: Refined list of clusters.
   :rtype: list[pd.Index]


.. py:function:: aggregations(mean_columns: list[str], concat_columns: list[str], abundance_columns: list[str]) -> dict[str, collections.abc.Callable]

   Generate aggregation functions based on column types.

   :param mean_columns: List of columns to aggregate using mean.
   :type mean_columns: list[str]
   :param concat_columns: List of columns to aggregate using concatenation.
   :type concat_columns: list[str]
   :param abundance_columns: List of columns to aggregate using max.
   :type abundance_columns: list[str]

   :returns: Dictionary with functions to use for pd.aggregate
   :rtype: dict[str, function]


.. py:function:: find_clusters(all_duplicates: list[pandas.Index]) -> list[pandas.Index]

   Transitive merging of all duplicate indices into groups, where groups are merged if there is any overlap.

   :param all_duplicates: List of all duplicate indices.
   :type all_duplicates: list[pd.Index]

   :returns: Clusters of connected duplicates.
   :rtype: list[pd.Index]


.. py:function:: union(all_duplicates: list[pandas.Index]) -> pandas.Index

   Function to combine list of indices to union index.

   :param all_duplicates: All indices to combine.
   :type all_duplicates: list[pd.Index]

   :returns: Union of all indices.
   :rtype: pd.Index


.. py:function:: find_all_duplicates(data_matrix: pandas.DataFrame) -> list[pandas.Index]

   Get index of any duplicate values in any column.

   :param data_matrix: DataFrame to check column-by-column for duplicate values.
   :type data_matrix: pd.DataFrame

   :returns: All indexes of duplicates.
   :rtype: list[pd.Index]