This page was generated from docs/examples/mineralML_helpers.ipynb. Interactive online version: Binder badge.

Python Notebook Download

[1]:
""" Created on August 22, 2025 // Updated on March 20, 2026 // @author: Sarah Shi """

import os
import numpy as np
import pandas as pd

import mineralML as mm

import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'png'

Helper Functions Quickstart

This notebook demonstrates how to use helper functions defined in mineralML:

  1. Load a CSV with mm.load_df (or pd.read_csv directly).

  2. Clean and align columns with mm.prep_df.

  3. Convert between oxide and elemental wt% with mm.oxide_to_element / mm.element_to_oxide.

We loaded in the mineralML Python package as mm. mineralML has trained machine learning models for classifying minerals. This implementation aims to get your electron microprobe or quantitative EDS compositions classified and processed. We remove some degrees of freedom to simplify the process as much as possible. The minerals considered for this study include: Amphibole, Apatite, Biotite, Calcite, Chlorite, Epidote, Feldspar (Alkali Feldspar and Plagioclase), Garnet, Glass, Kalsilite, Leucite, Melilite, Muscovite, Nepheline, Olivine, Oxide (Rhombohedral_Oxides including Hematite-Ilmenite, Spinel_Group including Magnetite-Spinel), Pyroxene (Clinopyroxene, Orthopyroxene, Na-Pyroxene), Quartz, Rutile, Serpentine, Titanite, Tourmaline, and Zircon.

One CSV file containing your electron microprobe analyses in oxide weight percentages is necessary. Find an example here. The necessary oxides are SiO\(_2\), TiO\(_2\), Al\(_2\)O\(_3\), FeO\(_t\), MnO, MgO, CaO, Na\(_2\)O, K\(_2\)O, Cr\(_2\)O\(_3\), P\(_2\)O\(_5\), and ZrO\(_2\) (if you are aiming to classify zircon). For the oxides not analyzed for specific minerals, the preprocessing will fill in the nan values as 0.

1. Load and prepare data for analysis

[2]:
# Read in your dataframe of mineral data, called training_hundred.csv.

df_load = mm.load_df('TabularData/training_hundred.csv')
display(df_load.head())
Sample Name SiO2 TiO2 Al2O3 FeOt MnO MgO CaO Na2O K2O P2O5 Cr2O3 ZrO2 Mineral Source
0 Z2099 42.96 1.80 14.33 4.07 0.07 17.39 12.03 3.10 0.03 NaN 0.65 NaN Amphibole Vannuccietal1995
1 Z2070 43.03 2.39 13.35 4.09 0.06 17.01 11.71 2.97 0.05 NaN 0.74 NaN Amphibole Vannuccietal1995
2 Z2073 42.95 3.02 14.12 4.35 0.06 17.53 12.02 3.04 0.07 NaN 0.76 NaN Amphibole Vannuccietal1995
3 Z2067 43.01 4.65 12.83 4.39 0.07 17.14 12.14 2.88 0.03 NaN 0.84 NaN Amphibole Vannuccietal1995
4 Z2068 42.13 4.87 12.15 4.08 0.05 16.42 11.89 2.75 0.02 NaN 1.26 NaN Amphibole Vannuccietal1995

2. Clean and align columns of dataframe

[3]:
# Prepare the dataframe by removing rows with too many NaNs, and filling in zeros.

df_nn = mm.prep_df(df_load, # dataframe to prepare
                   renormalize=False, # optionally renormalize rows to sum to 100 wt%
                   convert_fe=False, # optionally convert disparate input formats of Fe all to FeOt
                   drop_empty_rows=False, # optionally drop rows with more nan values than the min_oxide_count
                   min_oxide_count=2, # minimum number of oxides in a row to keep that analysis
                   verbose=True
                   )
display(df_nn.head())
prep_df: 2800 row(s) processed (of 2800 input, 0 dropped).
Sample Name SiO2 TiO2 Al2O3 FeOt MnO MgO CaO Na2O K2O Cr2O3 P2O5 ZrO2 Mineral Source
0 Z2099 42.96 1.80 14.33 4.07 0.07 17.39 12.03 3.10 0.03 0.65 0.0 0.0 Amphibole Vannuccietal1995
1 Z2070 43.03 2.39 13.35 4.09 0.06 17.01 11.71 2.97 0.05 0.74 0.0 0.0 Amphibole Vannuccietal1995
2 Z2073 42.95 3.02 14.12 4.35 0.06 17.53 12.02 3.04 0.07 0.76 0.0 0.0 Amphibole Vannuccietal1995
3 Z2067 43.01 4.65 12.83 4.39 0.07 17.14 12.14 2.88 0.03 0.84 0.0 0.0 Amphibole Vannuccietal1995
4 Z2068 42.13 4.87 12.15 4.08 0.05 16.42 11.89 2.75 0.02 1.26 0.0 0.0 Amphibole Vannuccietal1995

3. Convert between oxide and elemental wt%

These may be generally useful functions for converting between oxide and elemental data. Use mm.oxide_to_element to go from oxide wt% to elemental wt%, and mm.element_to_oxide to go the other direction.

[4]:
df_nn_elemental, factors_ox2el = mm.oxide_to_element(df_nn)
display(df_nn_elemental)
display(factors_ox2el)
Si Ti Al Fe Mn Mg Ca Na K P Cr Zr
0 20.081004 1.078815 7.584172 3.163648 0.054212 10.486794 8.597730 2.299683 0.024904 0.000000 0.444732 0.0
1 20.113725 1.432426 7.065506 3.179195 0.046467 10.257641 8.369029 2.203244 0.041507 0.000000 0.506310 0.0
2 20.076330 1.810011 7.473029 3.381295 0.046467 10.571219 8.590583 2.255173 0.058110 0.000000 0.519994 0.0
3 20.104376 2.786937 6.790295 3.412387 0.054212 10.336035 8.676346 2.136479 0.024904 0.000000 0.574730 0.0
4 19.693033 2.918793 6.430404 3.171421 0.038723 9.901849 8.497673 2.040041 0.016603 0.000000 0.862096 0.0
... ... ... ... ... ... ... ... ... ... ... ... ...
2795 23.554046 1.024874 7.383057 9.755230 0.170381 4.160948 8.419057 1.787818 0.273949 0.065464 0.000000 0.0
2796 23.044542 0.994907 7.404227 9.281071 0.162636 4.317737 8.604877 1.602360 0.265647 0.065464 0.000000 0.0
2797 23.245539 0.964940 7.213696 9.444306 0.162636 4.299646 8.504820 1.706216 0.273949 0.069828 0.000000 0.0
2798 23.161400 0.928979 7.446567 9.249979 0.162636 4.293616 8.540555 1.713634 0.257346 0.074192 0.000000 0.0
2799 23.217493 1.000900 7.123724 9.249979 0.162636 4.311707 8.483379 1.735889 0.240743 0.069828 0.000000 0.0

2800 rows × 12 columns

SiO2     0.467435
TiO2     0.599341
Al2O3    0.529251
FeOt     0.777309
MnO      0.774457
MgO      0.603036
CaO      0.714691
Na2O     0.741833
K2O      0.830148
P2O5     0.436426
Cr2O3    0.684203
ZrO2     0.740346
dtype: float64
[5]:
df_nn_oxide, factors_el2ox = mm.element_to_oxide(df_nn_elemental)
display(df_nn_oxide)
display(factors_el2ox)
SiO2 TiO2 Al2O3 FeOt MnO MgO CaO Na2O K2O P2O5 Cr2O3 ZrO2
0 42.96 1.80 14.33 4.07 0.07 17.39 12.03 3.10 0.03 0.00 0.65 0.0
1 43.03 2.39 13.35 4.09 0.06 17.01 11.71 2.97 0.05 0.00 0.74 0.0
2 42.95 3.02 14.12 4.35 0.06 17.53 12.02 3.04 0.07 0.00 0.76 0.0
3 43.01 4.65 12.83 4.39 0.07 17.14 12.14 2.88 0.03 0.00 0.84 0.0
4 42.13 4.87 12.15 4.08 0.05 16.42 11.89 2.75 0.02 0.00 1.26 0.0
... ... ... ... ... ... ... ... ... ... ... ... ...
2795 50.39 1.71 13.95 12.55 0.22 6.90 11.78 2.41 0.33 0.15 0.00 0.0
2796 49.30 1.66 13.99 11.94 0.21 7.16 12.04 2.16 0.32 0.15 0.00 0.0
2797 49.73 1.61 13.63 12.15 0.21 7.13 11.90 2.30 0.33 0.16 0.00 0.0
2798 49.55 1.55 14.07 11.90 0.21 7.12 11.95 2.31 0.31 0.17 0.00 0.0
2799 49.67 1.67 13.46 11.90 0.21 7.15 11.87 2.34 0.29 0.16 0.00 0.0

2800 rows × 12 columns

SiO2     2.139335
TiO2     1.668498
Al2O3    1.889461
FeOt     1.286489
MnO      1.291226
MgO      1.658276
CaO      1.399207
Na2O     1.348012
K2O      1.204605
P2O5     2.291341
Cr2O3    1.461555
ZrO2     1.350719
dtype: float64

Compare the df_nn_oxide and df_nn dataframes. Are these data the same?