This page was generated from
docs/examples/mineralML_helpers.ipynb.
Interactive online version:
.
[1]:
""" Created on August 22, 2025 // Updated on March 20, 2026 // @author: Sarah Shi """
import os
import numpy as np
import pandas as pd
import mineralML as mm
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'png'
Helper Functions Quickstart
This notebook demonstrates how to use helper functions defined in mineralML:
Load a CSV with
mm.load_df(orpd.read_csvdirectly).Clean and align columns with
mm.prep_df.Convert between oxide and elemental wt% with
mm.oxide_to_element/mm.element_to_oxide.
We loaded in the mineralML Python package as mm. mineralML has trained machine learning models for classifying minerals. This implementation aims to get your electron microprobe or quantitative EDS compositions classified and processed. We remove some degrees of freedom to simplify the process as much as possible. The minerals considered for this study include: Amphibole, Apatite, Biotite, Calcite, Chlorite, Epidote, Feldspar (Alkali Feldspar and Plagioclase), Garnet, Glass,
Kalsilite, Leucite, Melilite, Muscovite, Nepheline, Olivine, Oxide (Rhombohedral_Oxides including Hematite-Ilmenite, Spinel_Group including Magnetite-Spinel), Pyroxene (Clinopyroxene, Orthopyroxene, Na-Pyroxene), Quartz, Rutile, Serpentine, Titanite, Tourmaline, and Zircon.
One CSV file containing your electron microprobe analyses in oxide weight percentages is necessary. Find an example here. The necessary oxides are SiO\(_2\), TiO\(_2\), Al\(_2\)O\(_3\), FeO\(_t\), MnO, MgO, CaO, Na\(_2\)O, K\(_2\)O, Cr\(_2\)O\(_3\), P\(_2\)O\(_5\), and ZrO\(_2\) (if you are aiming to classify zircon). For the oxides not analyzed for specific minerals, the preprocessing will fill in the nan values as 0.
1. Load and prepare data for analysis
[2]:
# Read in your dataframe of mineral data, called training_hundred.csv.
df_load = mm.load_df('TabularData/training_hundred.csv')
display(df_load.head())
| Sample Name | SiO2 | TiO2 | Al2O3 | FeOt | MnO | MgO | CaO | Na2O | K2O | P2O5 | Cr2O3 | ZrO2 | Mineral | Source | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Z2099 | 42.96 | 1.80 | 14.33 | 4.07 | 0.07 | 17.39 | 12.03 | 3.10 | 0.03 | NaN | 0.65 | NaN | Amphibole | Vannuccietal1995 |
| 1 | Z2070 | 43.03 | 2.39 | 13.35 | 4.09 | 0.06 | 17.01 | 11.71 | 2.97 | 0.05 | NaN | 0.74 | NaN | Amphibole | Vannuccietal1995 |
| 2 | Z2073 | 42.95 | 3.02 | 14.12 | 4.35 | 0.06 | 17.53 | 12.02 | 3.04 | 0.07 | NaN | 0.76 | NaN | Amphibole | Vannuccietal1995 |
| 3 | Z2067 | 43.01 | 4.65 | 12.83 | 4.39 | 0.07 | 17.14 | 12.14 | 2.88 | 0.03 | NaN | 0.84 | NaN | Amphibole | Vannuccietal1995 |
| 4 | Z2068 | 42.13 | 4.87 | 12.15 | 4.08 | 0.05 | 16.42 | 11.89 | 2.75 | 0.02 | NaN | 1.26 | NaN | Amphibole | Vannuccietal1995 |
2. Clean and align columns of dataframe
[3]:
# Prepare the dataframe by removing rows with too many NaNs, and filling in zeros.
df_nn = mm.prep_df(df_load, # dataframe to prepare
renormalize=False, # optionally renormalize rows to sum to 100 wt%
convert_fe=False, # optionally convert disparate input formats of Fe all to FeOt
drop_empty_rows=False, # optionally drop rows with more nan values than the min_oxide_count
min_oxide_count=2, # minimum number of oxides in a row to keep that analysis
verbose=True
)
display(df_nn.head())
prep_df: 2800 row(s) processed (of 2800 input, 0 dropped).
| Sample Name | SiO2 | TiO2 | Al2O3 | FeOt | MnO | MgO | CaO | Na2O | K2O | Cr2O3 | P2O5 | ZrO2 | Mineral | Source | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Z2099 | 42.96 | 1.80 | 14.33 | 4.07 | 0.07 | 17.39 | 12.03 | 3.10 | 0.03 | 0.65 | 0.0 | 0.0 | Amphibole | Vannuccietal1995 |
| 1 | Z2070 | 43.03 | 2.39 | 13.35 | 4.09 | 0.06 | 17.01 | 11.71 | 2.97 | 0.05 | 0.74 | 0.0 | 0.0 | Amphibole | Vannuccietal1995 |
| 2 | Z2073 | 42.95 | 3.02 | 14.12 | 4.35 | 0.06 | 17.53 | 12.02 | 3.04 | 0.07 | 0.76 | 0.0 | 0.0 | Amphibole | Vannuccietal1995 |
| 3 | Z2067 | 43.01 | 4.65 | 12.83 | 4.39 | 0.07 | 17.14 | 12.14 | 2.88 | 0.03 | 0.84 | 0.0 | 0.0 | Amphibole | Vannuccietal1995 |
| 4 | Z2068 | 42.13 | 4.87 | 12.15 | 4.08 | 0.05 | 16.42 | 11.89 | 2.75 | 0.02 | 1.26 | 0.0 | 0.0 | Amphibole | Vannuccietal1995 |
3. Convert between oxide and elemental wt%
These may be generally useful functions for converting between oxide and elemental data. Use mm.oxide_to_element to go from oxide wt% to elemental wt%, and mm.element_to_oxide to go the other direction.
[4]:
df_nn_elemental, factors_ox2el = mm.oxide_to_element(df_nn)
display(df_nn_elemental)
display(factors_ox2el)
| Si | Ti | Al | Fe | Mn | Mg | Ca | Na | K | P | Cr | Zr | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 20.081004 | 1.078815 | 7.584172 | 3.163648 | 0.054212 | 10.486794 | 8.597730 | 2.299683 | 0.024904 | 0.000000 | 0.444732 | 0.0 |
| 1 | 20.113725 | 1.432426 | 7.065506 | 3.179195 | 0.046467 | 10.257641 | 8.369029 | 2.203244 | 0.041507 | 0.000000 | 0.506310 | 0.0 |
| 2 | 20.076330 | 1.810011 | 7.473029 | 3.381295 | 0.046467 | 10.571219 | 8.590583 | 2.255173 | 0.058110 | 0.000000 | 0.519994 | 0.0 |
| 3 | 20.104376 | 2.786937 | 6.790295 | 3.412387 | 0.054212 | 10.336035 | 8.676346 | 2.136479 | 0.024904 | 0.000000 | 0.574730 | 0.0 |
| 4 | 19.693033 | 2.918793 | 6.430404 | 3.171421 | 0.038723 | 9.901849 | 8.497673 | 2.040041 | 0.016603 | 0.000000 | 0.862096 | 0.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2795 | 23.554046 | 1.024874 | 7.383057 | 9.755230 | 0.170381 | 4.160948 | 8.419057 | 1.787818 | 0.273949 | 0.065464 | 0.000000 | 0.0 |
| 2796 | 23.044542 | 0.994907 | 7.404227 | 9.281071 | 0.162636 | 4.317737 | 8.604877 | 1.602360 | 0.265647 | 0.065464 | 0.000000 | 0.0 |
| 2797 | 23.245539 | 0.964940 | 7.213696 | 9.444306 | 0.162636 | 4.299646 | 8.504820 | 1.706216 | 0.273949 | 0.069828 | 0.000000 | 0.0 |
| 2798 | 23.161400 | 0.928979 | 7.446567 | 9.249979 | 0.162636 | 4.293616 | 8.540555 | 1.713634 | 0.257346 | 0.074192 | 0.000000 | 0.0 |
| 2799 | 23.217493 | 1.000900 | 7.123724 | 9.249979 | 0.162636 | 4.311707 | 8.483379 | 1.735889 | 0.240743 | 0.069828 | 0.000000 | 0.0 |
2800 rows × 12 columns
SiO2 0.467435
TiO2 0.599341
Al2O3 0.529251
FeOt 0.777309
MnO 0.774457
MgO 0.603036
CaO 0.714691
Na2O 0.741833
K2O 0.830148
P2O5 0.436426
Cr2O3 0.684203
ZrO2 0.740346
dtype: float64
[5]:
df_nn_oxide, factors_el2ox = mm.element_to_oxide(df_nn_elemental)
display(df_nn_oxide)
display(factors_el2ox)
| SiO2 | TiO2 | Al2O3 | FeOt | MnO | MgO | CaO | Na2O | K2O | P2O5 | Cr2O3 | ZrO2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 42.96 | 1.80 | 14.33 | 4.07 | 0.07 | 17.39 | 12.03 | 3.10 | 0.03 | 0.00 | 0.65 | 0.0 |
| 1 | 43.03 | 2.39 | 13.35 | 4.09 | 0.06 | 17.01 | 11.71 | 2.97 | 0.05 | 0.00 | 0.74 | 0.0 |
| 2 | 42.95 | 3.02 | 14.12 | 4.35 | 0.06 | 17.53 | 12.02 | 3.04 | 0.07 | 0.00 | 0.76 | 0.0 |
| 3 | 43.01 | 4.65 | 12.83 | 4.39 | 0.07 | 17.14 | 12.14 | 2.88 | 0.03 | 0.00 | 0.84 | 0.0 |
| 4 | 42.13 | 4.87 | 12.15 | 4.08 | 0.05 | 16.42 | 11.89 | 2.75 | 0.02 | 0.00 | 1.26 | 0.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2795 | 50.39 | 1.71 | 13.95 | 12.55 | 0.22 | 6.90 | 11.78 | 2.41 | 0.33 | 0.15 | 0.00 | 0.0 |
| 2796 | 49.30 | 1.66 | 13.99 | 11.94 | 0.21 | 7.16 | 12.04 | 2.16 | 0.32 | 0.15 | 0.00 | 0.0 |
| 2797 | 49.73 | 1.61 | 13.63 | 12.15 | 0.21 | 7.13 | 11.90 | 2.30 | 0.33 | 0.16 | 0.00 | 0.0 |
| 2798 | 49.55 | 1.55 | 14.07 | 11.90 | 0.21 | 7.12 | 11.95 | 2.31 | 0.31 | 0.17 | 0.00 | 0.0 |
| 2799 | 49.67 | 1.67 | 13.46 | 11.90 | 0.21 | 7.15 | 11.87 | 2.34 | 0.29 | 0.16 | 0.00 | 0.0 |
2800 rows × 12 columns
SiO2 2.139335
TiO2 1.668498
Al2O3 1.889461
FeOt 1.286489
MnO 1.291226
MgO 1.658276
CaO 1.399207
Na2O 1.348012
K2O 1.204605
P2O5 2.291341
Cr2O3 1.461555
ZrO2 1.350719
dtype: float64
Compare the df_nn_oxide and df_nn dataframes. Are these data the same?