This page was generated from docs/examples/mineralML_synthetic_data.ipynb. Interactive online version: Binder badge.

Python Notebook Download

[1]:
""" Created on November 13, 2023 // @author: Sarah Shi """

import os
import numpy as np
import pandas as pd

import mineralML as mm

import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'png'

Synthetic Mineral Generator

This notebook shows how the synthetic mineral generator in mineralML works, with an example CSV for groundtruthing: training_hundred.csv. This is a three step process:

  1. Load and prepare data for analysis.

  2. Define endmembers and generator settings (e.g., oxygen_basis, mixing distribution/parameters, minor elements, noise scales).

  3. Generate synthetic compositions and evaluate them (convert to oxide wt% and cations; optionally use compare_distributions to compare against the natural dataset).

We loaded in the mineralML Python package as mm. mineralML has trained machine learning models for classifying minerals. This implementation aims to get your electron microprobe or quantitative EDS compositions classified and processed. We remove some degrees of freedom to simplify the process as much as possible. The minerals considered for this study include: Amphibole, Apatite, Biotite, Calcite, Chlorite, Epidote, Feldspar (KFeldspar and Plagioclase), Garnet, Glass, Kalsilite, Leucite, Melilite, Muscovite, Nepheline, Olivine, Pyroxene (Clinopyroxene and Orthopyroxene), Quartz, Rhombohedral_Oxides (Hematite-Ilmenite), Rutile, Serpentine, Spinels (Magnetite-Spinel), Titanite, Tourmaline, and Zircon.

One CSV file containing your electron microprobe analyses in oxide weight percentages is necessary. Find an example here. The necessary oxides are \(SiO_2\), \(TiO_2\), \(Al_2O_3\), \(FeO_t\), \(MnO\), \(MgO\), \(CaO\), \(Na_2O\), \(K_2O\), \(Cr_2O_3\), and \(P_2O_5\). For the oxides not analyzed for specific minerals, the preprocessing will fill in the nan values as 0.

Load and prepare data for groundtruthing

[2]:
# Read in your dataframe of mineral data, called training_hundred.csv.
# Prepare the dataframe by removing rows with too many NaNs, and filling in zeros.

df_load = mm.load_df('TabularData/synth_groundtruth.csv')
[3]:
# Examine the prepared dataframe

display(df_load.head())
Sample Name SiO2 TiO2 Al2O3 FeOt MnO MgO CaO Na2O K2O P2O5 Cr2O3 Mineral
43110 CN_C_Ol1 39.846040 0.000020 0.019150 17.398750 0.243865 43.126690 0.219630 0.014950 0.007775 0.013685 NaN Olivine
43111 CN_C_Ol1' 39.787840 0.010270 0.026165 17.446295 0.324905 43.227635 0.188190 0.015305 0.006115 0.017815 NaN Olivine
43112 CN_C_Ol2 38.896455 0.014615 0.005655 21.791545 0.349310 39.472920 0.204635 0.004635 0.006005 0.013985 NaN Olivine
43113 CN_C_Ol3 39.451170 0.000020 0.028775 19.528820 0.298085 41.429130 0.231755 0.000010 0.001825 0.019815 NaN Olivine
43114 CN_C_Ol3_MI2 39.680195 0.006940 0.019405 18.502340 0.303840 42.207405 0.226190 0.018970 0.000010 0.000020 NaN Olivine

Olivine

Let’s apply the generator to olivine as a simple binary solid solution between forsterite (Mg₂SiO₄) and fayalite (Fe₂SiO₄). We understand olivine systematics quite well, so we can test this before applying this to a more complex system. We will keep everything on a 4-oxygen basis, add small amounts of Ca and Mn as minors, and then check that the synthetic cloud matches the natural data. The steps are as follows:

  1. Define endmembers (4 oxygen basis). Use cation counts per formula unit; iron as total cations (Fe2t) so the framework can convert to FeOt downstream.

  2. Specify minor elements (optional but realistic).

  3. Instantiate the generator with mm.SolidSolutionGenerator.

  4. Generate synthetic compositions.

  5. Compute sites/derived components.

  6. Compare synthetic vs natural distributions with compare_distributions.

  7. Plot paired violin distributions for cations (and matching oxides if present). Report KS statistics (ks_stat, p_value) plus means/stds. Lower ks_stat / higher p_value means a better match.

Gotchas:

  • Keep iron conventions straight: Fe2t (cations) aligns with FeOt (oxide). Don’t mix FeO/Fe₂O₃ with FeOt in the same row.

[4]:
# Pull natural data
df_ol_natural = df_load[df_load["Mineral"]=="Olivine"]
ol_calc_natural = mm.OlivineCalculator(df_ol_natural)
ol_comp_natural = ol_calc_natural.calculate_components()

# Define endmembers
ol_endmembers = {
    # Forsterite: Mg₂SiO₄
    'Fo': {'Mg': 2, 'Si': 1, 'O': 4},
    # Fayalite: Fe₂SiO₄
    'Fa': {'Fe2t': 2, 'Si': 1, 'O': 4}
}

# Specify minor elements
ol_minors = {
    'Ca': {'distribution': 'exponential', 'scale': 0.01, 'max_fraction': 0.01},
    'Mn': {'distribution': 'exponential', 'scale': 0.01, 'max_fraction': 0.01}
}

# Instantiate generator
ol_gen = mm.SolidSolutionGenerator(
    endmembers=ol_endmembers,
    oxygen_basis=4,
    element_noise_scale=0.025,
    min_site_fraction=0.2,
    minor_elements=ol_minors,
    mixing_dist='beta',
    mixing_params={'a': 1, 'b': 1}
)

# Generate samples, use the olivine calculator to calculate site allocations, etc.
df_ol = ol_gen.generate(1000)
ol_calc_synth = mm.OlivineCalculator(df_ol)
ol_comp_synth = ol_calc_synth.calculate_components()
display(ol_comp_synth)

# Calculate and compare the distributions of the output data
stats_ol = ol_gen.compare_distributions(base_df=ol_comp_natural, synth_df=ol_comp_synth, suptitle="Olivine")
display(stats_ol)

# Scatter‐plot comparing base vs. synthetic oxide proportions
fig, ax = plt.subplots(1, 3, figsize=(18, 5))
ax[0].scatter(ol_comp_natural["FeOt"], ol_comp_natural["MgO"], s=20, c="g", lw=0.25, ec='k')
ax[0].scatter(ol_comp_synth["FeOt"], ol_comp_synth["MgO"], s=20, c="r", lw=0.5, ec='k')
ax[0].set_xlabel("FeO")
ax[0].set_ylabel("MgO")

ax[1].scatter(ol_comp_natural["SiO2"], ol_comp_natural["MgO"], s=20, c="g", lw=0.25, ec='k')
ax[1].scatter(ol_comp_synth["SiO2"], ol_comp_synth["MgO"], s=20, c="r", lw=0.5, ec='k')
ax[1].set_xlabel("SiO2")
ax[1].set_ylabel("MgO")

ax[2].scatter(ol_comp_natural["XFo"], ol_comp_natural["M_site_expanded"], s=20, c="g", lw=0.25, ec='k', label="Natural")
ax[2].scatter(ol_comp_synth["XFo"], ol_comp_synth["M_site_expanded"], s=20, c="r", lw=0.25, ec='k', label="Synthetic")
ax[2].set_xlabel("XFo (Mg/(Mg+Fe))")
ax[2].set_ylabel("M-site Expanded")
ax[2].legend()
plt.tight_layout()
Charge mismatch: 9.01 vs 8
Charge mismatch: 9.39 vs 8
Charge mismatch: 8.87 vs 8
Charge mismatch: 7.12 vs 8
Charge mismatch: 8.97 vs 8
Charge mismatch: 7.19 vs 8
Charge mismatch: 8.82 vs 8
Charge mismatch: 8.92 vs 8
Charge mismatch: 9.43 vs 8
Charge mismatch: 7.07 vs 8
Charge mismatch: 8.95 vs 8
Charge mismatch: 7.18 vs 8
Charge mismatch: 9.01 vs 8
Charge mismatch: 7.08 vs 8
Charge mismatch: 7.06 vs 8
Charge mismatch: 8.85 vs 8
Charge mismatch: 9.11 vs 8
Charge mismatch: 8.85 vs 8
Charge mismatch: 8.83 vs 8
Charge mismatch: 9.11 vs 8
Charge mismatch: 7.01 vs 8
Charge mismatch: 6.98 vs 8
Charge mismatch: 9.05 vs 8
Charge mismatch: 8.86 vs 8
Charge mismatch: 8.81 vs 8
Charge mismatch: 7.14 vs 8
Charge mismatch: 7.15 vs 8
Charge mismatch: 8.82 vs 8
Charge mismatch: 8.81 vs 8
Charge mismatch: 7.00 vs 8
Charge mismatch: 7.15 vs 8
Charge mismatch: 8.90 vs 8
Charge mismatch: 9.21 vs 8
Charge mismatch: 7.18 vs 8
Charge mismatch: 6.95 vs 8
Charge mismatch: 7.05 vs 8
Charge mismatch: 7.08 vs 8
Charge mismatch: 7.16 vs 8
Charge mismatch: 7.09 vs 8
Charge mismatch: 7.13 vs 8
Charge mismatch: 8.81 vs 8
Charge mismatch: 7.20 vs 8
Charge mismatch: 9.19 vs 8
Charge mismatch: 7.16 vs 8
Charge mismatch: 7.12 vs 8
Charge mismatch: 9.54 vs 8
Charge mismatch: 6.79 vs 8
Charge mismatch: 8.95 vs 8
Charge mismatch: 9.07 vs 8
Charge mismatch: 9.29 vs 8
Charge mismatch: 8.86 vs 8
Charge mismatch: 8.81 vs 8
Charge mismatch: 8.87 vs 8
Charge mismatch: 8.84 vs 8
Charge mismatch: 8.88 vs 8
Charge mismatch: 8.97 vs 8
Charge mismatch: 9.77 vs 8
Sample SiO2 FeOt MnO MgO CaO SiO2_mols FeOt_mols MnO_mols MgO_mols ... Predict_Mineral Prediction_Score Prediction_Score_Sigma Second_Predict_Mineral Second_Prediction_Score Cation_Sum M_site T_site M_site_expanded Fo
0 NaN 34.622234 41.032973 0.541935 23.626166 0.176691 0.576269 0.571140 0.007640 0.586193 ... NaN NaN NaN NaN NaN 3.006716 1.994833 0.993284 2.013432 0.506503
1 NaN 36.787297 30.920356 0.630162 31.473580 0.188606 0.612305 0.430382 0.008883 0.780897 ... NaN NaN NaN NaN NaN 2.999557 1.979104 1.000443 1.999114 0.644688
2 NaN 29.930481 65.841998 0.553641 3.647932 0.025948 0.498177 0.916458 0.007805 0.090510 ... NaN NaN NaN NaN NaN 3.009386 2.002332 0.990614 2.018772 0.089883
3 NaN 38.550577 18.016443 0.170479 43.039413 0.223088 0.641654 0.250772 0.002403 1.067859 ... NaN NaN NaN NaN NaN 3.015989 2.022191 0.984011 2.031978 0.809824
4 NaN 37.177048 32.068149 0.758269 29.973814 0.022720 0.618792 0.446358 0.010689 0.743686 ... NaN NaN NaN NaN NaN 2.985055 1.951913 1.014945 1.970110 0.624923
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
995 NaN 36.612935 27.910138 0.174822 34.608587 0.693518 0.609403 0.388483 0.002464 0.858680 ... NaN NaN NaN NaN NaN 3.017409 2.010904 0.982591 2.034818 0.688507
996 NaN 43.656266 1.857720 0.477573 53.888113 0.120327 0.726636 0.025858 0.006732 1.337028 ... NaN NaN NaN NaN NaN 2.971148 1.929726 1.028852 1.942296 0.981027
997 NaN 36.625604 27.030119 0.793513 34.874823 0.675940 0.609614 0.376233 0.011186 0.865286 ... NaN NaN NaN NaN NaN 3.018330 1.999236 0.981670 2.036660 0.696957
998 NaN 40.456203 8.932501 0.083621 50.038740 0.488935 0.673372 0.124332 0.001179 1.241521 ... NaN NaN NaN NaN NaN 3.010654 2.006766 0.989346 2.021308 0.908971
999 NaN 31.582903 55.343212 0.243916 12.732789 0.097179 0.525681 0.770325 0.003438 0.315916 ... NaN NaN NaN NaN NaN 3.018691 2.027728 0.981309 2.037382 0.290834

1000 rows × 33 columns

../_images/examples_mineralML_synthetic_data_7_2.png
(<Figure size 1200x600 with 11 Axes>,
                ks_stat        p_value  mean_base  mean_synth  std_base  \
 cation
 Si_cat_4ox    0.213778   7.300022e-39   0.994964    0.995663  0.007191
 Fe2t_cat_4ox  0.703333   0.000000e+00   0.327584    0.998053  0.094729
 Mn_cat_4ox    0.380164  7.802087e-125   0.005196    0.009274  0.002395
 Mg_cat_4ox    0.703054   0.000000e+00   1.664797    0.992028  0.104283
 Ca_cat_4ox    0.348305  2.934413e-104   0.009132    0.009319  0.032880

               std_synth
 cation
 Si_cat_4ox     0.016431
 Fe2t_cat_4ox   0.589282
 Mn_cat_4ox     0.008350
 Mg_cat_4ox     0.587416
 Ca_cat_4ox     0.008193  )
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/checkouts/readthedocs.org/user_builds/mineralml/conda/stable/lib/python3.9/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key)
   3811 try:
-> 3812     return self._engine.get_loc(casted_key)
   3813 except KeyError as err:

File pandas/_libs/index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:7096, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'XFo'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[4], line 53
     50 ax[1].set_xlabel("SiO2")
     51 ax[1].set_ylabel("MgO")
---> 53 ax[2].scatter(ol_comp_natural["XFo"], ol_comp_natural["M_site_expanded"], s=20, c="g", lw=0.25, ec='k', label="Natural")
     54 ax[2].scatter(ol_comp_synth["XFo"], ol_comp_synth["M_site_expanded"], s=20, c="r", lw=0.25, ec='k', label="Synthetic")
     55 ax[2].set_xlabel("XFo (Mg/(Mg+Fe))")

File ~/checkouts/readthedocs.org/user_builds/mineralml/conda/stable/lib/python3.9/site-packages/pandas/core/frame.py:4107, in DataFrame.__getitem__(self, key)
   4105 if self.columns.nlevels > 1:
   4106     return self._getitem_multilevel(key)
-> 4107 indexer = self.columns.get_loc(key)
   4108 if is_integer(indexer):
   4109     indexer = [indexer]

File ~/checkouts/readthedocs.org/user_builds/mineralml/conda/stable/lib/python3.9/site-packages/pandas/core/indexes/base.py:3819, in Index.get_loc(self, key)
   3814     if isinstance(casted_key, slice) or (
   3815         isinstance(casted_key, abc.Iterable)
   3816         and any(isinstance(x, slice) for x in casted_key)
   3817     ):
   3818         raise InvalidIndexError(key)
-> 3819     raise KeyError(key) from err
   3820 except TypeError:
   3821     # If we have a listlike key, _check_indexing_error will raise
   3822     #  InvalidIndexError. Otherwise we fall through and re-raise
   3823     #  the TypeError.
   3824     self._check_indexing_error(key)

KeyError: 'XFo'
../_images/examples_mineralML_synthetic_data_7_5.png

Feldspar

Let’s double check that this generator works, and apply this to plagioclase as a simple binary solid solution between albite (NaAlSi₃O₈) and anorthite (CaAl₂Si₂O₈). We understand plagioclase feldspar systematics quite well, so we can test this before applying this to a more complex system. We will have an 8-oxygen basis, add small amounts of K as a minor element, and then check that the synthetic cloud matches the natural data. The steps are as above.

[5]:
# Pull natural data
df_plag_natural = df_load[df_load["Mineral"]=="Plagioclase"]
plag_calc_natural = mm.FeldsparCalculator(df_plag_natural)
plag_comp_natural = plag_calc_natural.calculate_components()

# Define endmembers
plag_endmembers = {
    # Albite: NaAlSi₃O₈
    'Ab': {'Na': 1, 'Al': 1, 'Si': 3, 'O': 8},
    # Anorthite: CaAl₂Si₂O₈
    'An': {'Ca': 1, 'Al': 2, 'Si': 2, 'O': 8},
}

# Specify minor elements
plag_minors = {'K': {'distribution': 'exponential', 'scale': 0.01, 'max_fraction': 0.02}}

# Instantiate generator
plag_gen = mm.SolidSolutionGenerator(
    endmembers=plag_endmembers,
    oxygen_basis=8,
    element_noise_scale=0.05,
    min_site_fraction=0.2,
    minor_elements=plag_minors,
    mixing_dist='beta',
    mixing_params={'a': 2, 'b': 2}
)

# Generate samples
df_plag = plag_gen.generate(1000)
plag_calc_synth = mm.FeldsparCalculator(df_plag)
plag_comp_synth = plag_calc_synth.calculate_components()
display(plag_comp_synth)

# Calculate and compare the distributions of the output data
stats_pl = plag_gen.compare_distributions(base_df=plag_comp_natural, synth_df=plag_comp_synth, suptitle="Plagioclase")
display(stats_pl)

# Scatter‐plot comparing base vs. synthetic oxide proportions
fig, ax = plt.subplots(1, 3, figsize=(18, 5))
ax[0].scatter(plag_comp_natural["Na2O"], plag_comp_natural["CaO"], s=20, c="g", lw=0.25, ec='k')
ax[0].scatter(plag_comp_synth["Na2O"], plag_comp_synth["CaO"], s=20, c="r", lw=0.25, ec='k')
ax[0].set_xlabel("Na2O")
ax[0].set_ylabel("CaO")

ax[1].scatter(plag_comp_natural["Al2O3"], plag_comp_natural["SiO2"], s=20, c="g", lw=0.25, ec='k')
ax[1].scatter(plag_comp_synth["Al2O3"], plag_comp_synth["SiO2"], s=20, c="r", lw=0.25, ec='k')
ax[1].set_xlabel("Al2O3")
ax[1].set_ylabel("SiO2")

ax[2].scatter(plag_comp_natural["An"], plag_comp_natural["Ab"], s=20, c="g", lw=0.25, ec='k', label="Natural")
ax[2].scatter(plag_comp_synth["An"], plag_comp_synth["Ab"], s=20, c="r", lw=0.25, ec='k', label="Synthetic")
ax[2].set_xlabel("An (Ca/(Ca+Na))")
ax[2].set_ylabel("Ab (Na/(Ca+Na))")
ax[2].legend()
plt.tight_layout()

Charge mismatch: 17.65 vs 16
Charge mismatch: 17.64 vs 16
Charge mismatch: 14.22 vs 16
Charge mismatch: 18.35 vs 16
Charge mismatch: 14.32 vs 16
Charge mismatch: 17.83 vs 16
Charge mismatch: 14.04 vs 16
Charge mismatch: 14.03 vs 16
Charge mismatch: 18.14 vs 16
Charge mismatch: 18.18 vs 16
Charge mismatch: 18.16 vs 16
Charge mismatch: 17.81 vs 16
Charge mismatch: 13.94 vs 16
Charge mismatch: 13.87 vs 16
Charge mismatch: 18.64 vs 16
Charge mismatch: 17.90 vs 16
Charge mismatch: 17.92 vs 16
Charge mismatch: 17.71 vs 16
Charge mismatch: 14.29 vs 16
Charge mismatch: 13.88 vs 16
Charge mismatch: 17.78 vs 16
Charge mismatch: 14.40 vs 16
Charge mismatch: 14.24 vs 16
Charge mismatch: 14.31 vs 16
Charge mismatch: 17.61 vs 16
Charge mismatch: 14.36 vs 16
Charge mismatch: 14.38 vs 16
Charge mismatch: 17.87 vs 16
Charge mismatch: 17.90 vs 16
Charge mismatch: 18.10 vs 16
Charge mismatch: 17.65 vs 16
Charge mismatch: 18.08 vs 16
Charge mismatch: 17.72 vs 16
Charge mismatch: 14.38 vs 16
Charge mismatch: 13.92 vs 16
Charge mismatch: 17.68 vs 16
Charge mismatch: 13.80 vs 16
Sample SiO2 Al2O3 CaO Na2O K2O SiO2_mols Al2O3_mols CaO_mols Na2O_mols ... Prediction_Score Prediction_Score_Sigma Second_Predict_Mineral Second_Prediction_Score Cation_Sum M_site T_site An Ab Or
0 NaN 59.460622 25.952514 5.182643 8.819577 0.584644 0.989691 0.254536 0.092419 0.142300 ... NaN NaN NaN NaN 5.062328 1.044083 4.018245 0.237319 0.730806 0.031876
1 NaN 64.047671 22.197230 2.714266 11.028512 0.012321 1.066040 0.217705 0.048402 0.177940 ... NaN NaN NaN NaN 5.062958 1.074603 3.988355 0.119646 0.879707 0.000647
2 NaN 62.040566 23.074472 4.113203 10.702734 0.069024 1.032633 0.226309 0.073349 0.172684 ... NaN NaN NaN NaN 5.096514 1.123871 3.972643 0.174564 0.821948 0.003488
3 NaN 53.926853 29.236092 12.578089 4.123420 0.135547 0.897584 0.286741 0.224299 0.066529 ... NaN NaN NaN NaN 4.970187 0.977686 3.992501 0.622645 0.369366 0.007989
4 NaN 59.268809 25.385215 6.827795 8.012397 0.505784 0.986498 0.248972 0.121757 0.129276 ... NaN NaN NaN NaN 5.041107 1.051093 3.990014 0.311360 0.661178 0.027462
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
995 NaN 59.050816 26.019549 5.518501 9.383394 0.027740 0.982870 0.255194 0.098409 0.151397 ... NaN NaN NaN NaN 5.084953 1.078119 4.006834 0.244925 0.753609 0.001466
996 NaN 53.598291 29.890896 10.429338 5.912343 0.169132 0.892115 0.293163 0.185981 0.095393 ... NaN NaN NaN NaN 5.046133 1.032568 4.013565 0.488964 0.501595 0.009441
997 NaN 51.959308 30.779872 12.255669 4.929267 0.075884 0.864835 0.301882 0.218549 0.079531 ... NaN NaN NaN NaN 5.038022 1.033939 4.004084 0.576308 0.419444 0.004249
998 NaN 58.209449 26.679834 9.683447 5.410578 0.016691 0.968866 0.261670 0.172680 0.087297 ... NaN NaN NaN NaN 4.934358 0.932326 4.002032 0.496737 0.502243 0.001019
999 NaN 59.241188 25.734320 7.480539 7.336771 0.207182 0.986038 0.252396 0.133397 0.118375 ... NaN NaN NaN NaN 5.002290 1.004402 3.997888 0.356155 0.632100 0.011745

1000 rows × 34 columns

../_images/examples_mineralML_synthetic_data_9_2.png
(<Figure size 1200x600 with 11 Axes>,
              ks_stat        p_value  mean_base  mean_synth  std_base  \
 cation
 Si_cat_8ox  0.323833   5.514531e-89   2.427908    2.487417  0.307028
 Al_cat_8ox  0.284369   2.484052e-68   1.554179    1.510300  0.296910
 Ca_cat_8ox  0.339222   7.681478e-98   0.582427    0.508972  0.308634
 Na_cat_8ox  0.351959  1.613700e-105   0.385243    0.491474  0.268054
 K_cat_8ox   0.342515  8.537683e-100   0.028198    0.010015  0.035713

             std_synth
 cation
 Si_cat_8ox   0.229870
 Al_cat_8ox   0.233183
 Ca_cat_8ox   0.225994
 Na_cat_8ox   0.227782
 K_cat_8ox    0.010432  )
../_images/examples_mineralML_synthetic_data_9_4.png

Kalsilite

Success, this mm.SolidSolutionGenerator works with familiar solid solution minerals. Let’s test wonkier (less common) minerals, such as kalsilite. Kalsilite is a feldspathoid mineral that shares the tridymite framework. There is a Na-K exchange between nepheline and kalsilite. We will have a 4-oxygen basis and check that the synthetic cloud matches the natural data. The steps are as above.

[6]:
# Pull natural data
df_ks_natural = df_load[df_load["Mineral"]=="Kalsilite"]
ks_calc_natural = mm.KalsiliteCalculator(df_ks_natural)
ks_comp_natural = ks_calc_natural.calculate_components()

# Define endmembers
ks_endmembers = {
    # Kalsilite K[AlSiO₄]
    "Ks": {"K":  1, "Al": 1, "Si": 1, "O": 4},
    # Nepheline Na[AlSiO₄] ~ simplification
    "Ne": {"Na": 1, "Al": 1, "Si": 1, "O": 4}
}

# Specify minor elements
ks_minors = {} # no minors for pure K[AlSiO₄]-Na[AlSiO₄]

# Instantiate generator
gen_ks = mm.SolidSolutionGenerator(
    endmembers = ks_endmembers,
    oxygen_basis = 4,
    minor_elements = ks_minors,
    element_noise_scale = 0.02,
    min_site_fraction = 0.2,
    mixing_dist = "beta",
    mixing_params = {"a": 1, "b": 200},
)

# Generate samples, use the kalsilite calculator to calculate site allocations, etc.
df_ks = gen_ks.generate(n_samples=500)
ks_calc_synth = mm.KalsiliteCalculator(df_ks)
ks_comp_synth = ks_calc_synth.calculate_components()
ks_comp_synth['Mineral'] = 'Kalsilite'
display(ks_comp_synth)

# Calculate and compare the distributions of the output data
stats_ks = gen_ks.compare_distributions(base_df=ks_comp_natural, synth_df=ks_comp_synth, suptitle="Kalsilite")
display(stats_ks)

fig, ax = plt.subplots(1, 4, figsize = (20, 5))
ax = ax.flatten()
ax[0].scatter(ks_comp_natural['Cation_Sum'], ks_comp_natural['Cation_Sum'], s=20, c="g", lw=0.25, ec='k')
ax[0].scatter(ks_comp_synth['Cation_Sum'], ks_comp_synth['Cation_Sum'], s=20, c="r", lw=0.25, ec='k')
ax[0].set_xlabel('Cation_Sum')
ax[0].set_ylabel('Cation_Sum')
ax[1].scatter(ks_comp_natural['A_B_site'], ks_comp_natural['T_site'], s=20, c="g", lw=0.25, ec='k')
ax[1].scatter(ks_comp_synth['A_B_site'], ks_comp_synth['T_site'], s=20, c="g", lw=0.25, ec='r')
ax[1].set_xlabel('A_B_site (K+Na)')
ax[1].set_ylabel('T_site')
ax[2].scatter(ks_comp_natural['K2O'], ks_comp_natural['Na2O'], s=20, c="g", lw=0.25, ec='k')
ax[2].scatter(ks_comp_synth['K2O'], ks_comp_synth['Na2O'], s=20, c="r", lw=0.25, ec='k')
ax[2].set_xlabel('K2O')
ax[2].set_ylabel('Na2O')
ax[3].scatter(ks_comp_natural['SiO2'], ks_comp_natural['Al2O3'], s=20, c="g", lw=0.25, ec='k', label='Natural')
ax[3].scatter(ks_comp_synth['SiO2'], ks_comp_synth['Al2O3'], s=20, c="r", lw=0.25, ec='k', label='Synthetic')
ax[3].set_xlabel('SiO2')
ax[3].set_ylabel('Al2O3')
plt.tight_layout()
plt.show()

Charge mismatch: 9.05 vs 8
Charge mismatch: 6.78 vs 8
Charge mismatch: 9.14 vs 8
Charge mismatch: 8.90 vs 8
Charge mismatch: 8.89 vs 8
Charge mismatch: 9.05 vs 8
Charge mismatch: 8.80 vs 8
Charge mismatch: 7.09 vs 8
Charge mismatch: 8.96 vs 8
Charge mismatch: 7.13 vs 8
Charge mismatch: 7.13 vs 8
Charge mismatch: 6.90 vs 8
Charge mismatch: 9.18 vs 8
Charge mismatch: 7.06 vs 8
Charge mismatch: 7.17 vs 8
Charge mismatch: 8.84 vs 8
Charge mismatch: 8.87 vs 8
Charge mismatch: 8.84 vs 8
Charge mismatch: 8.93 vs 8
Charge mismatch: 8.95 vs 8
Charge mismatch: 8.82 vs 8
Charge mismatch: 8.85 vs 8
Charge mismatch: 7.04 vs 8
Sample SiO2 Al2O3 Na2O K2O SiO2_mols Al2O3_mols Na2O_mols K2O_mols SiO2_ox ... Predict_Mineral Prediction_Score Prediction_Score_Sigma Second_Predict_Mineral Second_Prediction_Score Cation_Sum A_B_site A_site B_site T_site
0 NaN 39.398207 31.499564 0.151414 28.950815 0.655762 0.308940 0.002443 0.307347 1.311525 ... NaN NaN NaN NaN NaN 2.971933 0.972600 0.964930 0.007670 1.999334
1 NaN 37.887370 32.515291 0.019159 29.578180 0.630615 0.318902 0.000309 0.314007 1.261231 ... NaN NaN NaN NaN NaN 2.996622 0.993000 0.992023 0.000977 2.003622
2 NaN 37.966272 32.135924 0.035911 29.861893 0.631929 0.315182 0.000579 0.317019 1.263857 ... NaN NaN NaN NaN NaN 3.003542 1.005455 1.003621 0.001834 1.998088
3 NaN 38.260329 32.347487 0.004657 29.387528 0.636823 0.317257 0.000075 0.311983 1.273646 ... NaN NaN NaN NaN NaN 2.987936 0.983838 0.983601 0.000237 2.004098
4 NaN 36.991351 32.277292 0.002631 30.728727 0.615702 0.316568 0.000042 0.326221 1.231403 ... NaN NaN NaN NaN NaN 3.033241 1.040974 1.040839 0.000135 1.992266
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
495 NaN 37.337217 32.480681 0.025390 30.156712 0.621458 0.318563 0.000410 0.320149 1.242917 ... NaN NaN NaN NaN NaN 3.016399 1.017983 1.016682 0.001301 1.998416
496 NaN 37.672011 31.853838 0.031845 30.442306 0.627031 0.312415 0.000514 0.323180 1.254062 ... NaN NaN NaN NaN NaN 3.020674 1.029643 1.028009 0.001634 1.991030
497 NaN 37.864325 32.902749 0.264994 28.967932 0.630232 0.322703 0.004276 0.307528 1.260464 ... NaN NaN NaN NaN NaN 2.990495 0.981914 0.968450 0.013464 2.008580
498 NaN 38.217503 31.696522 0.261382 29.824593 0.636110 0.310872 0.004217 0.316623 1.272220 ... NaN NaN NaN NaN NaN 3.008357 1.016250 1.002892 0.013358 1.992107
499 NaN 38.211059 32.335294 0.113227 29.340420 0.636003 0.317137 0.001827 0.311483 1.272006 ... NaN NaN NaN NaN NaN 2.991093 0.988075 0.982314 0.005761 2.003018

500 rows × 29 columns

../_images/examples_mineralML_synthetic_data_11_2.png
(<Figure size 1200x600 with 10 Axes>,
             ks_stat   p_value  mean_base  mean_synth  std_base  std_synth
 cation
 Si_cat_4ox    0.054  0.282183   1.000535    0.999837  0.012252   0.012897
 Al_cat_4ox    0.122  0.000094   0.996509    1.000027  0.017107   0.016177
 Na_cat_4ox    0.063  0.139869   0.005185    0.004836  0.006651   0.004619
 K_cat_4ox     0.060  0.178957   0.996766    0.995734  0.020745   0.022908)
../_images/examples_mineralML_synthetic_data_11_4.png