{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\" Created on March 28, 2026 // @author: Sarah Shi \"\"\"\n",
    "\n",
    "import os\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "import mineralML as mm\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "%config InlineBackend.figure_format = 'png'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# mineralML Quickstart for Mapped EBSD and/or EDS Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook shows **how to load and run your quantitative EDS and EBSD data through mineralML** with an example Mount Hood andesite: `MH0811b`. These data were collected for the mineralML manuscript (find the preprint here: https://doi.org/10.31223/X53J2M) and the data are in the GitHub repository (https://github.com/sarahshi/mineralML/tree/main/docs/examples/Maps/MountHood_MH0811b). Please refer to the paper for more information about this sample. The previous example workbook highlights how EDS maps are processed. We will apply the same procedure here, but additionally highlight how the EBSD processing code for CTF files works in mineralML. \n",
    "\n",
    "This is a seven step process: \n",
    "1. Load and plot a phase map directly from an EBSD CTF file. \n",
    "2. Load a directory containing all your CSVs of mapped chemical data with `mm.load_df` (or `pd.read_csv` directly). [Optional] Convert the input from element to oxide wt%.\n",
    "3. Predict the mineral class with mineralML and automatically plot the mineral phase map, mineral phase counts, and prediction score histograms.\n",
    "4. Plot EBSD and mineralML-generated EDS maps side-by-side. \n",
    "5. Plot prediction score map and individual oxide maps.\n",
    "6. Plot compositions of mapped minerals in various classification diagrams (ternary, quadrilateral).\n",
    "7. Plot chemical variation maps.\n",
    "\n",
    "I have conveniently (I hope!) wrapped all of these bits into one function, called `mm.run_map` and `mm.plot_ctf_phases`.\n",
    "\n",
    "We loaded in the **mineralML** Python package as `mm`. **mineralML** has trained machine learning models for classifying minerals. This implementation aims to get your electron microprobe or quantitative EDS compositions classified and processed. We remove some degrees of freedom to simplify the process as much as possible. The minerals considered for this study include: Amphibole, Apatite, Biotite, Calcite, Chlorite, Epidote, Feldspar (Alkali Feldspar and Plagioclase), Garnet, Glass, Kalsilite, Leucite, Melilite, Muscovite, Nepheline, Olivine, Oxide (Rhombohedral_Oxides including Hematite-Ilmenite, Spinel_Group including Magnetite-Spinel), Pyroxene (Clinopyroxene, Orthopyroxene, Na-Pyroxene), Quartz, Rutile, Serpentine, Titanite, Tourmaline, and Zircon. \n",
    "\n",
    "CSV files containing your mapped data in oxide weight percentages (or converted to) is necessary. Find an example [here](https://github.com/sarahshi/mineralML/tree/main/docs/examples/Maps/MountHood_MH0811b). The necessary oxides are SiO$_2$, TiO$_2$, Al$_2$O$_3$, FeO$_t$, MnO, MgO, CaO, Na$_2$O, K$_2$O, Cr$_2$O$_3$, P$_2$O$_5$, and ZrO$_2$ (if you are aiming to classify zircon). For the oxides not analyzed for specific minerals, the preprocessing will fill in the nan values as 0."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Load and plot a phase map from an EBSD CTF file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# --- MH0811b Configs ---\n",
    "mh_file_path = \"Maps/MH0811b_EBSD_EDS.ctf\" # path to the CTF file exported from AZtec\n",
    "\n",
    "# Merge more verbose EBSD phase names into broader mineral groups, matching with mineralML naming convention\n",
    "mh_merge_rules = {\n",
    "    \"Andesine\": \"Plagioclase\", \"Orthoclase\": \"Alkali_Feldspar\", # feldspar endmembers to group names\n",
    "    \"Augite\": \"Clinopyroxene\", \"Enstatite\": \"Orthopyroxene\", # pyroxene endmembers to group names\n",
    "    \"Magnetite\": \"Oxide\", \"Ilmenite\": \"Oxide\", # Fe-Ti oxides to single oxide group\n",
    "    \"Quartz-new\": \"SiO2_Polymorph\", \"Cristobalite\": \"SiO2_Polymorph\", # silica phases to single group\n",
    "}\n",
    "\n",
    "# Pin each mineral group to a consistent color across figures\n",
    "mh_base_cols = {\n",
    "    \"Plagioclase\": \"#66C4C4\", \"Alkali_Feldspar\": \"#FEF7C2\", \"Feldspar_Miscibility_Gap\": \"#003D36\",\n",
    "    \"Clinopyroxene\": \"#E57A7A\", \"Orthopyroxene\": \"#931D1D\",\n",
    "    \"Oxide\": \"#2E2DCE\", \"Glass\": \"#F9C300\",\n",
    "    \"Apatite\": \"#5B6768\", \"SiO2_Polymorph\": \"#CEC6CD\",\n",
    "    \"Unindexed\": \"#FFFFFF\" # white background for unindexed pixels\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot the EBSD phase map from the CTF file. This parses the CTF header for\n",
    "# grid dimensions and phase definitions, maps each pixel's phase ID to its\n",
    "# name, applies rename_dict for partial case-insensitive matching, and plots\n",
    "# a 2D categorical phase map with legend ordered by abundance.\n",
    "\n",
    "mh_ebsd_fig, mh_ebsd_phase_map, _, _, _ = mm.plot_ctf_phases(mh_file_path, # load and plot the CTF phase map\n",
    "                                                             rename_dict=mh_merge_rules, # apply the merge rules defined above\n",
    "                                                             phase_colors=mh_base_cols, # apply the color scheme defined above\n",
    "                                                             title=None, # suppress the auto-generated title\n",
    "                                                             scalebar_um=100, # 100 µm scale bar, computed from CTF step size\n",
    "                                                             )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot a stacked horizontal bar showing area proportions of each phase,\n",
    "# normalized to classified pixels only. Phases below min_frac are excluded,\n",
    "# and each segment is annotated with its percentage by default.\n",
    "\n",
    "fig = mm.plot_phase_proportions(mh_ebsd_phase_map, # input the EBSD phase map to compute proportions\n",
    "                                title=\"MH0811b EBSD Phase Proportions\", # provide a title\n",
    "                                min_frac=0.0001, # set a minimum fraction threshold to exclude very rare phases\n",
    "                                phase_colors=mh_base_cols, # apply the same color scheme as the phase map\n",
    "                                annotate=True, # annotate each segment with its percentage\n",
    "                                )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The remaining steps are the same procedure detailed in the first mapping `.ipynb` on Read The Docs. We will go through it again here for reference, so we can make side by side figures."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Load and prepare EDS data for analysis\n",
    "\n",
    "Here, we will work with EDS data that are in elemental weight percent. This means that we will have to do a conversion to oxide weight percent. The data directory is `Maps/MountHood_MH0811b`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Find your directory of mapped mineral data, stored in Maps/MountHood_MH0811b. \n",
    "# This code identifies any file with CSV and appends it to the map. \n",
    "\n",
    "base = \"Maps\"\n",
    "map_dirs = []\n",
    "for root, subdirs, files in os.walk(base):\n",
    "    # Skip any path that includes 'Ignore' in its folder names\n",
    "    if \"Ignore\" in root.split(os.sep):\n",
    "        continue\n",
    "    \n",
    "    if any(f.lower().endswith(\".csv\") for f in files):\n",
    "        map_dirs.append(root)\n",
    "\n",
    "print(map_dirs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Apply the trained neural network with mm.run_map\n",
    "\n",
    "We will use `mm.run_map` which will return all you need!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Inspect the inputs and outputs of mm.run_map\n",
    "help(mm.run_map)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Here is our all in one function! Read the inputs and outputs provided above. \n",
    " \n",
    "output = mm.run_map(next((s for s in map_dirs if 'MountHood_MH0811b' in s), None), # provide the directory of interest\n",
    "                    renormalize=True, # optionally renormalize totals to 100 wt%\n",
    "                    total_threshold=None, # optionally filter out total values below a given value, for when EDS picks up epoxy pixels\n",
    "                    pred_score_threshold=0.6, # provide a prediction score threshold. here, i only want values with >= 0.6 prediction score\n",
    "                    min_frac=0.001, # provide a minimum pixel fraction for the phase to be displayed\n",
    "                    units='element_wt%', # provide the unit. can choose 'element_wt%' or 'oxide_wt%'\n",
    "                    phases=mh_base_cols.keys(), # phases of interest\n",
    "                    scalebar_um=100, # define size of scalebar desired, in microns\n",
    "                    pixel_size_um=2.0, # define size of each pixel of scalebar, in microns \n",
    "                    scalebar_loc='lower left', # specify location for scalebar\n",
    "                    scalebar_col='black', # specify color for scalebar\n",
    "                    phase_colors=mh_base_cols, # provide a color scheme for the phases, as a dictionary mapping phase names to color codes\n",
    "                    )\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot a stacked horizontal bar showing area proportions of each phase,\n",
    "# normalized to classified pixels only. Phases below min_frac are excluded,\n",
    "# and each segment is annotated with its percentage by default.\n",
    "\n",
    "fig = mm.plot_phase_proportions(output['mineral_map'], # input the EDS phase map to compute proportions\n",
    "                                title=\"MH0811b EDS Phase Proportions\", # provide a title\n",
    "                                phases=mh_base_cols.keys(), # specify the phases to include\n",
    "                                min_frac=0.0001, # set a minimum fraction threshold to exclude very rare phases\n",
    "                                phase_colors=mh_base_cols, # apply the same color scheme as the phase map\n",
    "                                annotate=True, # annotate each segment with its percentage\n",
    "                                )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Inspect what is in the outputs\n",
    "output.keys()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's say you now want to work with these data in dataframe form rather than dictionary form. How would you do this? Access `output['df_pred']` to retrieve the predictions dataframe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Pull the dataframe of predictions\n",
    "df_pred = output['df_pred']\n",
    "display(df_pred)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Plot EBSD and mineralML-generated EDS phase maps side-by-side\n",
    "\n",
    "Let's plot the two phase maps side by side! "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# EBSD phase map \n",
    "mh_ebsd_fig"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# EDS phase map\n",
    "output['figs'][0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now examine the phase maps produced by the two methods and compare their performance. See the preprint for a more detailed description of these maps. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Plot oxide concentration maps and prediction score maps\n",
    "\n",
    "Let's plot the original oxide maps loaded from the directory. We can examine how well the predicted phase map matches some of the observations made in oxide space. We have a handy function for doing so, with `mm.plot_oxide_map`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = mm.plot_oxide_map(\n",
    "    output, # take the output from run_map\n",
    "    oxide_name='SiO2', # specify the oxide of interest\n",
    "    scalebar_um=50, # define size of scalebar desired, in microns\n",
    "    pixel_size_um=2, # define size of each pixel of scalebar, in microns \n",
    "    scalebar_loc='upper right', # specify location for scalebar\n",
    "    scalebar_col='black', # specify color for scalebar\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's plot the prediction scores from the output, in mapped form. This allows for further investigation to determine where predictions are more and less certain. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = mm.plot_score_map(\n",
    "    output, # take the output from run_map\n",
    "    scalebar_um=50, # define size of scalebar desired, in microns\n",
    "    pixel_size_um=2, # define size of each pixel of scalebar, in microns \n",
    "    scalebar_loc='upper right', # specify location for scalebar\n",
    "    scalebar_col='black', # specify color for scalebar\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Plot compositions of mapped minerals in various classification diagrams (ternary, quadrilateral).\n",
    "\n",
    "We can do some more with mineralML now. Let's plot all the feldspars, pyroxenes, and spinels in ternary space. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Identify the phases present."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Here are all our feldspars \n",
    "fspars = df_pred[df_pred.Predict_Mineral == 'Plagioclase']\n",
    "display('Feldspars:', fspars)\n",
    "\n",
    "# Here are all our pyroxenes \n",
    "pxs_names = ['Clinopyroxene', 'Orthopyroxene']\n",
    "pxs = df_pred[df_pred.Predict_Mineral.isin(pxs_names)]\n",
    "display('Pyroxenes:', pxs)\n",
    "\n",
    "# Here are all our oxides \n",
    "ox_names = ['Oxide']\n",
    "oxs = df_pred[df_pred.Predict_Mineral.isin(ox_names)]\n",
    "display('Oxides:', oxs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Plot these feldspars, pyroxenes, and spinels! "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use FeldsparClassifier to examine at the component space (XAn, XAb, XOr)\n",
    "fspar_comp = mm.FeldsparClassifier(fspars).calculate_components()\n",
    "display(fspar_comp)\n",
    "\n",
    "# Use FeldsparClassifier to plot up these data. \n",
    "fig = mm.FeldsparClassifier(fspars).plot()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use PyroxeneClassifier to examine at the component space (En, Wo, Fs). If sodic pyroxenes are also within this input, this will plot them up in the sodic pyroxene ternary\n",
    "pxs_comp = mm.PyroxeneClassifier(pxs).calculate_components()\n",
    "display(pxs_comp)\n",
    "\n",
    "# Use PyroxeneClassifier to plot up these data. \n",
    "fig = mm.PyroxeneClassifier(pxs).plot()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use OxideClassifier to examine at the component space.\n",
    "oxs_comp = mm.OxideClassifier(oxs).calculate_components()\n",
    "display(oxs_comp)\n",
    "\n",
    "# Use OxideClassifier to plot up these data. \n",
    "fig = mm.OxideClassifier(oxs).plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You might note that the structure of these three `...Classifier` classes is identical. That is intentional! `mm.FeldsparClassifier`, `mm.PyroxeneClassifier`, and `mm.OxideClassifier` all have `calculate_components` and `plot` methods embedded."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. Plot chemical variation maps\n",
    "\n",
    "We know the mineralogy now. What if you now want to inspect the chemical variation within the individual crystals? Pull the component maps created for each sample and plot this up with `mm.plot_component_composite`.\n",
    "\n",
    "This function currently does this calculation for feldspars, pyroxenes, olivines, and amphibole. This can easily be expanded with all the stoichiometric mineral functions. Here, I will just show this for these common igneous phases."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Inspect what’s available:\n",
    "print(sorted(output[\"component_maps\"].keys()))\n",
    "\n",
    "#  Plot map highlighting internal compositional variation\n",
    "fig = mm.plot_component_composite(output, # specify output from above\n",
    "                                  title=\"MH0811b\", # optionally add a title to this plot\n",
    "                                  phases=mh_base_cols.keys(), # phases of interest\n",
    "                                  phase_colors=mh_base_cols, # provide a color scheme for the phases, as a dictionary mapping phase names to color codes\n",
    "                                  smooth_sigma=0.25, # add a Gaussian blur to smooth compositional data, usually turned off. \n",
    "                                  scalebar_um=100, # define size of scalebar desired, in microns\n",
    "                                  pixel_size_um=2.0, # define size of each pixel of scalebar, in microns \n",
    "                                  scalebar_loc='lower left', # specify location for scalebar\n",
    "                                  scalebar_col='black', # specify color for scalebar\n",
    "                                  )\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One could alternatively use all the functions within `mineralML.mapping` to do these same things, in a more stepwise manner. Look through the documentation if you would like to use individual bits of this code."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "science",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}