{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\" Created on March 26, 2026 // @author: Sarah Shi \"\"\"\n",
    "\n",
    "import os\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "import mineralML as mm\n",
    "import Thermobar as pt\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "%config InlineBackend.figure_format = 'png'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# mineralML Quickstart for Tabular Microanalytical Data\n",
    "\n",
    "This notebook shows **how to load and run your microanalytical data through mineralML**, across a number of different formats returned straight from Cameca, Probe for EPMA, and AZtec (Oxford Instruments). This is a five step process: \n",
    "\n",
    "1. Extract and standardize data from raw instrument files (Excel or CSV) using `mm.extract_cameca`, `mm.extract_probe4epma`, or `mm.extract_aztec`. For reference, the common outputs for these three types of instruments is provided on GitHub (https://github.com/sarahshi/mineralML/tree/main/docs/TabularData/microanalysis.xlsx), in the three sheets. Clean and align columns with `mm.prep_df`.\n",
    "2. Run data through the neural network with `mm.predict_class_prob` to derive classifications and prediction scores. \n",
    "3. Export predictions and prediction scores with `mm.export_predictions_to_excel`.\n",
    "4. Calculate mineral compositions and plot compositions in empirical classification space (ternaries, quadrilaterals).\n",
    "5. Run data through `Thermobar` for thermobarometric estimates. \n",
    "\n",
    "We loaded in the **mineralML** Python package as `mm`. **mineralML** has trained machine learning models for classifying minerals. This implementation aims to get your electron microprobe or quantitative EDS compositions classified and processed. We remove some degrees of freedom to simplify the process as much as possible. The minerals considered for this study include: Amphibole, Apatite, Biotite, Calcite, Chlorite, Epidote, Feldspar (Alkali Feldspar and Plagioclase), Garnet, Glass, Kalsilite, Leucite, Melilite, Muscovite, Nepheline, Olivine, Oxide (Rhombohedral_Oxides including Hematite-Ilmenite, Spinel_Group including Magnetite-Spinel), Pyroxene (Clinopyroxene, Orthopyroxene, Na-Pyroxene), Quartz, Rutile, Serpentine, Titanite, Tourmaline, and Zircon. \n",
    "\n",
    "An Excel or CSV file containing your electron microprobe or EDS analyses is necessary. Thanks to the integrated extraction functions, you no longer need to manually format complex headers or convert relative errors! Just point the correct extractor to your raw instrument export. The necessary oxides are SiO$_2$, TiO$_2$, Al$_2$O$_3$, FeO$_t$, MnO, MgO, CaO, Na$_2$O, K$_2$O, Cr$_2$O$_3$, P$_2$O$_5$, and ZrO$_2$ (if you are aiming to classify zircon). For the oxides not analyzed for specific minerals, the preprocessing will fill in the nan values as 0. \n",
    "\n",
    "We will apply the neural network method to the dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Load and prepare data for analysis\n",
    "\n",
    "We will use `mm.extract_cameca`, `mm.extract_probe4epma`, and `mm.extract_aztec` to extract the tabular data from their respective sheets in our EPMA dataset. Because these functions standardize the output format, we can easily combine them into a single dataframe before running them through `mm.prep_df`.\n",
    "\n",
    "Note that the uncertainties from Cameca EPMAs are provided as 3 sigma! We convert these to 1 sigma."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define the path to your microanalysis data output. Here, I've written three functions for dealing with Cameca, Probe for EPMA, and AZtec data outputs. If you have other outputs you'd like to work with, reach out and I can expand the functions available. \n",
    "file_path = 'TabularData/microanalysis.xlsx'\n",
    "\n",
    "# Extract data from each respective sheet/instrument\n",
    "df_cameca = mm.extract_cameca(file_path, sheet_name='Cameca')\n",
    "display('Cameca:', df_cameca.head())\n",
    "\n",
    "df_p4e = mm.extract_probe4epma(file_path, sheet_name='Probe4EPMA')\n",
    "display('Probe for EPMA:', df_p4e.head())\n",
    "\n",
    "df_aztec = mm.extract_aztec(file_path, sheet_name='AZtec')\n",
    "display('AZtec:', df_aztec.head())\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Combine them into a single standardized dataframe\n",
    "df_load = pd.concat([df_cameca, df_p4e, df_aztec], ignore_index=True)\n",
    "\n",
    "# Prepare the dataframe by removing rows with too many NaNs, and filling in zeros. \n",
    "df_nn = mm.prep_df(df_load, # dataframe to prepare\n",
    "                   renormalize=False, # optionally renormalize rows to sum to 100 wt%\n",
    "                   convert_fe=False, # optionally convert disparate input formats of Fe all to FeOt\n",
    "                   drop_empty_rows=False, # optionally drop rows with more nan values than the min_oxide_count\n",
    "                   min_oxide_count=2, # minimum number of oxides in a row to keep that analysis\n",
    "                   verbose=True\n",
    "                   )\n",
    "\n",
    "# Examine the prepared dataframe\n",
    "display(df_nn.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Apply the trained neural network (mm.predict_class_prob)\n",
    "\n",
    "We will use `mm.predict_class_prob` to do so."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The trained neural network can be applied in just one line. It returns predictions in columns called \"Predict_Mineral\", \"Submineral\" (if applicable, for pyroxenes, feldspars, and oxides), \"Predict_Probability\", \"Second_Predict_Mineral\", \"Second_Predict_Probability\".\n",
    "\n",
    "df_pred_nn = mm.predict_class_prob(df_nn)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Examine the predicted mineral classifications\n",
    "\n",
    "display(df_pred_nn.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There is a good amount of information in this dataframe. The predicted mineral is provided in the `Predict_Mineral` column, along with the prediction score expressed in the `Prediction_Score` column (representing likelihood of prediction) and standard deviation on this prediction in the `Prediction_Score_Sigma` column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Examine the unique predicted minerals. \n",
    "\n",
    "print(np.unique(df_pred_nn.Predict_Mineral))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Export prediction results\n",
    "\n",
    "Say you would like to go back to working with Excel now. Use `mm.export_predictions_to_excel` to export the predictions and these values. All the original input data are returned in the first sheet, and data are split into individual mineral phases in all other sheets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Export prediction results to an Excel workbook with one sheet called \"All\" containing all rows, and additional sheets for each predicted mineral.\n",
    "\n",
    "mm.export_predictions_to_excel(df_pred_nn, filename='TabularData/probe_prediction_results.xlsx')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Calculate mineral components, plot minerals in empirical classification space\n",
    "\n",
    "We have plagioclase feldspars, olivines, pyroxenes, and glasses here. Let's calculate olivine forsterite contents, plot up some pyroxene compositions in quadrilateral space, plot up some feldspars in ternary space, and examine some glass compositions in TAS space. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Separate out samples based on their mineralML-derived Predict_Minerals.\n",
    "\n",
    "fspars = df_pred_nn[df_pred_nn.Predict_Mineral=='Plagioclase']\n",
    "ols = df_pred_nn[df_pred_nn.Predict_Mineral=='Olivine']\n",
    "pxs = df_pred_nn[df_pred_nn.Predict_Mineral=='Clinopyroxene']\n",
    "gls = df_pred_nn[df_pred_nn.Predict_Mineral=='Glass']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A. Let's look at the feldspars, and plot them up in component space on the ternary diagram! "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use FeldsparClassifier to examine at the component space (XAn, XAb, XOr)\n",
    "fspar_comp = mm.FeldsparClassifier(fspars).calculate_components()\n",
    "display(fspar_comp)\n",
    "\n",
    "# Use FeldsparClassifier to plot up these data. \n",
    "fig = mm.FeldsparClassifier(fspars).plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "These are albites, and they do indeed plot up in albite space!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "B. Let's look at the olivines, and calculate up some stoichiometric site occupancies and XFo values! "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use OlivineCalculator to examine stoichiometric/component space (XFo)\n",
    "ols_comp = mm.OlivineCalculator(ols).calculate_components()\n",
    "display(ols_comp.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "All the relevant site information (M, T sites) and forsterite content are calculated and returned! "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "C. Let's look at the pyroxenes, and calculate up stoichiometric site occupancies and examine at the component space (En, Wo, Fs)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use PyroxeneClassifier to examine stoichiometric/component space (En, Wo, Fs).\n",
    "pxs_comp = mm.PyroxeneClassifier(pxs).calculate_components()\n",
    "display(pxs_comp.head())\n",
    "\n",
    "# Use PyroxeneClassifier to plot up these data. \n",
    "fig = mm.PyroxeneClassifier(pxs).plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "These are diopsides, and they do indeed plot up in diopside space!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "D. Let's look at the glasses, and calculate up stoichiometric site occupancies and examine them in TAS space."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use GlassClassifier to return Mg#s and determine the TAS classification.\n",
    "gls_comp = mm.GlassClassifier(gls).calculate_components()\n",
    "display(gls_comp)\n",
    "\n",
    "# Use GlassClassifier to plot up these data. \n",
    "fig = mm.GlassClassifier(gls).plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Neat. These are indeed basalts! "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Perform Thermobarometry\n",
    "\n",
    "Let's say that you analyzed these points for thermobarometry, and want to run things through the `Thermobar` package (Wieser et al., 2022) now. How might you do so? I have written some functions to allow for compatibility with `Thermobar`, so let's test things out with some glass analyses as an example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Filter out the glass from the mineralML prediction in the column, Predict_Mineral. Can also reuse the gls dataframe from above, but remade here for additional reference. \n",
    "glass = df_pred_nn[df_pred_nn.Predict_Mineral=='Glass']\n",
    "\n",
    "# Prepare dataframe for thermobarometry by appending the suffix of _Liq to the oxide columns\n",
    "glass_pt = mm.format_for_thermobar(glass, suffix='_Liq')\n",
    "display(glass_pt)\n",
    "\n",
    "# Calculate temperatures with Thermobar, with the Shea et al., 2022 liquid thermometer.\n",
    "liq_ts = pt.calculate_liq_only_temp(liq_comps=glass_pt,\n",
    "                                    equationT='T_Shea2022_MgO')\n",
    "display(liq_ts)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can do the same thing for any other mineral or melt thermobarometer/chemometer. Append a different suffix for usage with `Thermobar`! Find one more example of clinopyroxene barometry below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Filter out the clinopyroxenes from the mineralML prediction in the column, Predict_Mineral. Can also reuse the pxs dataframe from above, but remade here for additional reference. \n",
    "clinopyroxene = df_pred_nn[df_pred_nn.Predict_Mineral=='Clinopyroxene']\n",
    "\n",
    "# Prepare dataframe for thermobarometry by appending the suffix of _Cpx to the oxide columns\n",
    "clinopyroxene_pt = mm.format_for_thermobar(clinopyroxene, suffix='_Cpx')\n",
    "display(clinopyroxene_pt)\n",
    "\n",
    "# Calculate temperatures and pressures with Thermobar, with the Putirka, 2008 clinopyroxene thermobarometer.\n",
    "cpx_ps = pt.calculate_cpx_only_press_temp(cpx_comps=clinopyroxene_pt, \n",
    "                                          equationT=\"T_Put2008_eq32d\",\n",
    "                                          equationP=\"P_Put2008_eq32b\",\n",
    "                                          H2O_Liq=3)\n",
    "display(cpx_ps)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that the first 5 analyses do not have any Na2O and thus no jadeite component, so they return pressures of NaNs (cannot get a pressure from this clinopyroxene)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Hopefully, this sort of workflow can facilitate certainty in mineralogy prior to performing thermobarometry! "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "science",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.15"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}