{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\" Created on August 22, 2025 // Updated on March 20, 2026 // @author: Sarah Shi \"\"\"\n",
    "\n",
    "import os\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "import mineralML as mm\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "%config InlineBackend.figure_format = 'png'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "# Helper Functions Quickstart\n\nThis notebook demonstrates how to use helper functions defined in mineralML:\n1. Load a CSV with `mm.load_df` (or `pd.read_csv` directly).\n2. Clean and align columns with `mm.prep_df`.\n3. Convert between oxide and elemental wt% with `mm.oxide_to_element` / `mm.element_to_oxide`.\n\nWe loaded in the **mineralML** Python package as `mm`. **mineralML** has trained machine learning models for classifying minerals. This implementation aims to get your electron microprobe or quantitative EDS compositions classified and processed. We remove some degrees of freedom to simplify the process as much as possible. The minerals considered for this study include: Amphibole, Apatite, Biotite, Calcite, Chlorite, Epidote, Feldspar (Alkali Feldspar and Plagioclase), Garnet, Glass, Kalsilite, Leucite, Melilite, Muscovite, Nepheline, Olivine, Oxide (Rhombohedral_Oxides including Hematite-Ilmenite, Spinel_Group including Magnetite-Spinel), Pyroxene (Clinopyroxene, Orthopyroxene, Na-Pyroxene), Quartz, Rutile, Serpentine, Titanite, Tourmaline, and Zircon. \n\nOne CSV file containing your electron microprobe analyses in oxide weight percentages is necessary. Find an example [here](https://github.com/sarahshi/mineralML/blob/main/docs/examples/training_hundred.csv). The necessary oxides are SiO$_2$, TiO$_2$, Al$_2$O$_3$, FeO$_t$, MnO, MgO, CaO, Na$_2$O, K$_2$O, Cr$_2$O$_3$, P$_2$O$_5$, and ZrO$_2$ (if you are aiming to classify zircon). For the oxides not analyzed for specific minerals, the preprocessing will fill in the nan values as 0."
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Load and prepare data for analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Read in your dataframe of mineral data, called training_hundred.csv. \n",
    "\n",
    "df_load = mm.load_df('TabularData/training_hundred.csv')\n",
    "display(df_load.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Clean and align columns of dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Prepare the dataframe by removing rows with too many NaNs, and filling in zeros. \n",
    "\n",
    "df_nn = mm.prep_df(df_load, # dataframe to prepare\n",
    "                   renormalize=False, # optionally renormalize rows to sum to 100 wt%\n",
    "                   convert_fe=False, # optionally convert disparate input formats of Fe all to FeOt\n",
    "                   drop_empty_rows=False, # optionally drop rows with more nan values than the min_oxide_count\n",
    "                   min_oxide_count=2, # minimum number of oxides in a row to keep that analysis\n",
    "                   verbose=True\n",
    "                   )\n",
    "display(df_nn.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "## 3. Convert between oxide and elemental wt%\n\nThese may be generally useful functions for converting between oxide and elemental data. Use `mm.oxide_to_element` to go from oxide wt% to elemental wt%, and `mm.element_to_oxide` to go the other direction."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_nn_elemental, factors_ox2el = mm.oxide_to_element(df_nn)\n",
    "display(df_nn_elemental)\n",
    "display(factors_ox2el)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_nn_oxide, factors_el2ox = mm.element_to_oxide(df_nn_elemental)\n",
    "display(df_nn_oxide)\n",
    "display(factors_el2ox)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "Compare the `df_nn_oxide` and `df_nn` dataframes. Are these data the same?"
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "science",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}