This page was generated from docs/examples/mineralML_synthetic_data.ipynb. Interactive online version: Binder badge.

Python Notebook Download

[1]:
""" Created on November 13, 2023 // @author: Sarah Shi """

import os
import numpy as np
import pandas as pd

import mineralML as mm

import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'png'

Synthetic Mineral Generator

This notebook shows how the synthetic mineral generator in mineralML works, with an example CSV for groundtruthing: training_hundred.csv. This is a three step process:

  1. Load and prepare data for analysis.

  2. Define endmembers and generator settings (e.g., oxygen_basis, mixing distribution/parameters, minor elements, noise scales).

  3. Generate synthetic compositions and evaluate them (convert to oxide wt% and cations; optionally use compare_distributions to compare against the natural dataset).

We loaded in the mineralML Python package as mm. mineralML has trained machine learning models for classifying minerals. This implementation aims to get your electron microprobe or quantitative EDS compositions classified and processed. We remove some degrees of freedom to simplify the process as much as possible. The minerals considered for this study include: Amphibole, Apatite, Biotite, Calcite, Chlorite, Epidote, Feldspar (KFeldspar and Plagioclase), Garnet, Glass, Kalsilite, Leucite, Melilite, Muscovite, Nepheline, Olivine, Pyroxene (Clinopyroxene and Orthopyroxene), Quartz, Rhombohedral_Oxides (Hematite-Ilmenite), Rutile, Serpentine, Spinels (Magnetite-Spinel), Titanite, Tourmaline, and Zircon.

One CSV file containing your electron microprobe analyses in oxide weight percentages is necessary. Find an example here. The necessary oxides are \(SiO_2\), \(TiO_2\), \(Al_2O_3\), \(FeO_t\), \(MnO\), \(MgO\), \(CaO\), \(Na_2O\), \(K_2O\), \(Cr_2O_3\), and \(P_2O_5\). For the oxides not analyzed for specific minerals, the preprocessing will fill in the nan values as 0.

Load and prepare data for groundtruthing

[2]:
# Read in your dataframe of mineral data, called training_hundred.csv.
# Prepare the dataframe by removing rows with too many NaNs, and filling in zeros.

df_load = mm.load_df('TabularData/synth_groundtruth.csv')
[3]:
# Examine the prepared dataframe

display(df_load.head())
Sample Name SiO2 TiO2 Al2O3 FeOt MnO MgO CaO Na2O K2O P2O5 Cr2O3 Mineral
43110 CN_C_Ol1 39.846040 0.000020 0.019150 17.398750 0.243865 43.126690 0.219630 0.014950 0.007775 0.013685 NaN Olivine
43111 CN_C_Ol1' 39.787840 0.010270 0.026165 17.446295 0.324905 43.227635 0.188190 0.015305 0.006115 0.017815 NaN Olivine
43112 CN_C_Ol2 38.896455 0.014615 0.005655 21.791545 0.349310 39.472920 0.204635 0.004635 0.006005 0.013985 NaN Olivine
43113 CN_C_Ol3 39.451170 0.000020 0.028775 19.528820 0.298085 41.429130 0.231755 0.000010 0.001825 0.019815 NaN Olivine
43114 CN_C_Ol3_MI2 39.680195 0.006940 0.019405 18.502340 0.303840 42.207405 0.226190 0.018970 0.000010 0.000020 NaN Olivine

Olivine

Let’s apply the generator to olivine as a simple binary solid solution between forsterite (Mg₂SiO₄) and fayalite (Fe₂SiO₄). We understand olivine systematics quite well, so we can test this before applying this to a more complex system. We will keep everything on a 4-oxygen basis, add small amounts of Ca and Mn as minors, and then check that the synthetic cloud matches the natural data. The steps are as follows:

  1. Define endmembers (4 oxygen basis). Use cation counts per formula unit; iron as total cations (Fe2t) so the framework can convert to FeOt downstream.

  2. Specify minor elements (optional but realistic).

  3. Instantiate the generator with mm.SolidSolutionGenerator.

  4. Generate synthetic compositions.

  5. Compute sites/derived components.

  6. Compare synthetic vs natural distributions with compare_distributions.

  7. Plot paired violin distributions for cations (and matching oxides if present). Report KS statistics (ks_stat, p_value) plus means/stds. Lower ks_stat / higher p_value means a better match.

Gotchas:

  • Keep iron conventions straight: Fe2t (cations) aligns with FeOt (oxide). Don’t mix FeO/Fe₂O₃ with FeOt in the same row.

[4]:
# Pull natural data
df_ol_natural = df_load[df_load["Mineral"]=="Olivine"]
ol_calc_natural = mm.OlivineCalculator(df_ol_natural)
ol_comp_natural = ol_calc_natural.calculate_components()

# Define endmembers
ol_endmembers = {
    # Forsterite: Mg₂SiO₄
    'Fo': {'Mg': 2, 'Si': 1, 'O': 4},
    # Fayalite: Fe₂SiO₄
    'Fa': {'Fe2t': 2, 'Si': 1, 'O': 4}
}

# Specify minor elements
ol_minors = {
    'Ca': {'distribution': 'exponential', 'scale': 0.01, 'max_fraction': 0.01},
    'Mn': {'distribution': 'exponential', 'scale': 0.01, 'max_fraction': 0.01}
}

# Instantiate generator
ol_gen = mm.SolidSolutionGenerator(
    endmembers=ol_endmembers,
    oxygen_basis=4,
    element_noise_scale=0.025,
    min_site_fraction=0.2,
    minor_elements=ol_minors,
    mixing_dist='beta',
    mixing_params={'a': 1, 'b': 1}
)

# Generate samples, use the olivine calculator to calculate site allocations, etc.
df_ol = ol_gen.generate(1000)
ol_calc_synth = mm.OlivineCalculator(df_ol)
ol_comp_synth = ol_calc_synth.calculate_components()
display(ol_comp_synth)

# Calculate and compare the distributions of the output data
stats_ol = ol_gen.compare_distributions(base_df=ol_comp_natural, synth_df=ol_comp_synth, suptitle="Olivine")
display(stats_ol)

# Scatter‐plot comparing base vs. synthetic oxide proportions
fig, ax = plt.subplots(1, 3, figsize=(18, 5))
ax[0].scatter(ol_comp_natural["FeOt"], ol_comp_natural["MgO"], s=20, c="g", lw=0.25, ec='k')
ax[0].scatter(ol_comp_synth["FeOt"], ol_comp_synth["MgO"], s=20, c="r", lw=0.5, ec='k')
ax[0].set_xlabel("FeO")
ax[0].set_ylabel("MgO")

ax[1].scatter(ol_comp_natural["SiO2"], ol_comp_natural["MgO"], s=20, c="g", lw=0.25, ec='k')
ax[1].scatter(ol_comp_synth["SiO2"], ol_comp_synth["MgO"], s=20, c="r", lw=0.5, ec='k')
ax[1].set_xlabel("SiO2")
ax[1].set_ylabel("MgO")

ax[2].scatter(ol_comp_natural["XFo"], ol_comp_natural["M_site_expanded"], s=20, c="g", lw=0.25, ec='k', label="Natural")
ax[2].scatter(ol_comp_synth["XFo"], ol_comp_synth["M_site_expanded"], s=20, c="r", lw=0.25, ec='k', label="Synthetic")
ax[2].set_xlabel("XFo (Mg/(Mg+Fe))")
ax[2].set_ylabel("M-site Expanded")
ax[2].legend()
plt.tight_layout()
Charge mismatch: 9.01 vs 8
Charge mismatch: 6.94 vs 8
Charge mismatch: 6.94 vs 8
Charge mismatch: 7.05 vs 8
Charge mismatch: 8.93 vs 8
Charge mismatch: 7.01 vs 8
Charge mismatch: 8.85 vs 8
Charge mismatch: 8.81 vs 8
Charge mismatch: 7.18 vs 8
Charge mismatch: 9.24 vs 8
Charge mismatch: 9.19 vs 8
Charge mismatch: 8.93 vs 8
Charge mismatch: 8.86 vs 8
Charge mismatch: 9.37 vs 8
Charge mismatch: 8.85 vs 8
Charge mismatch: 8.96 vs 8
Charge mismatch: 9.21 vs 8
Charge mismatch: 8.82 vs 8
Charge mismatch: 8.84 vs 8
Charge mismatch: 8.82 vs 8
Charge mismatch: 8.84 vs 8
Charge mismatch: 8.95 vs 8
Charge mismatch: 6.89 vs 8
Charge mismatch: 8.94 vs 8
Charge mismatch: 8.87 vs 8
Charge mismatch: 7.19 vs 8
Charge mismatch: 8.87 vs 8
Charge mismatch: 6.91 vs 8
Charge mismatch: 9.04 vs 8
Charge mismatch: 9.06 vs 8
Charge mismatch: 9.18 vs 8
Charge mismatch: 8.84 vs 8
Charge mismatch: 8.86 vs 8
Charge mismatch: 9.16 vs 8
Charge mismatch: 8.92 vs 8
Charge mismatch: 9.15 vs 8
Charge mismatch: 9.25 vs 8
Charge mismatch: 8.93 vs 8
Charge mismatch: 8.84 vs 8
Charge mismatch: 9.12 vs 8
Charge mismatch: 7.15 vs 8
Charge mismatch: 8.87 vs 8
Charge mismatch: 9.02 vs 8
Charge mismatch: 8.83 vs 8
Charge mismatch: 6.86 vs 8
Charge mismatch: 8.85 vs 8
Sample SiO2 FeOt MnO MgO CaO SiO2_mols FeOt_mols MnO_mols MgO_mols ... Predict_Mineral Prediction_Score Prediction_Score_Sigma Second_Predict_Mineral Second_Prediction_Score Cation_Sum M_site T_site M_site_expanded Fo
0 NaN 38.057656 18.582042 0.101790 42.611955 0.646556 0.633450 0.258644 0.001435 1.057253 ... NaN NaN NaN NaN NaN 3.023871 2.027763 0.976129 2.047742 0.803446
1 NaN 41.232638 4.610927 1.003885 52.534122 0.618427 0.686296 0.064180 0.014152 1.303434 ... NaN NaN NaN NaN NaN 3.007305 1.978189 0.992695 2.014611 0.953072
2 NaN 30.725403 62.024168 0.396335 6.601908 0.252185 0.511408 0.863317 0.005587 0.163801 ... NaN NaN NaN NaN NaN 3.006984 1.994386 0.993016 2.013967 0.159476
3 NaN 34.542700 42.354007 0.516885 22.496470 0.089939 0.574945 0.589527 0.007286 0.558164 ... NaN NaN NaN NaN NaN 3.002901 1.990384 0.997099 2.005803 0.486336
4 NaN 28.552581 70.722991 0.137032 0.437402 0.149994 0.475243 0.984397 0.001932 0.010852 ... NaN NaN NaN NaN NaN 3.025314 2.041180 0.974686 2.050627 0.010904
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
995 NaN 38.465519 26.446324 0.002434 34.764191 0.321532 0.640238 0.368108 0.000034 0.862541 ... NaN NaN NaN NaN NaN 2.982494 1.955822 1.017506 1.964988 0.700883
996 NaN 41.084110 5.753936 1.076025 52.053290 0.032639 0.683823 0.080089 0.015169 1.291504 ... NaN NaN NaN NaN NaN 3.007150 1.991431 0.992850 2.014299 0.941609
997 NaN 30.447684 67.769091 0.462822 1.254588 0.065814 0.506786 0.943281 0.006524 0.031128 ... NaN NaN NaN NaN NaN 2.984234 1.953038 1.015766 1.968467 0.031945
998 NaN 39.740200 12.603179 0.252967 47.327458 0.076196 0.661455 0.175424 0.003566 1.174250 ... NaN NaN NaN NaN NaN 3.011836 2.016314 0.988164 2.023671 0.870025
999 NaN 30.798780 61.377310 0.221952 7.419692 0.182265 0.512629 0.854314 0.003129 0.184091 ... NaN NaN NaN NaN NaN 3.009432 2.006538 0.990568 2.018864 0.177283

1000 rows × 33 columns

../_images/examples_mineralML_synthetic_data_7_2.png
(<Figure size 1200x600 with 11 Axes>,
                ks_stat        p_value  mean_base  mean_synth  std_base  \
 cation
 Si_cat_4ox    0.207446   1.339550e-36   0.994964    0.995683  0.007191
 Fe2t_cat_4ox  0.717734   0.000000e+00   0.327584    1.005634  0.094729
 Mn_cat_4ox    0.409477  1.305635e-145   0.005196    0.009560  0.002395
 Mg_cat_4ox    0.711328   0.000000e+00   1.664797    0.984216  0.104283
 Ca_cat_4ox    0.353214  2.691866e-107   0.009132    0.009224  0.032880

               std_synth
 cation
 Si_cat_4ox     0.016142
 Fe2t_cat_4ox   0.571602
 Mn_cat_4ox     0.008282
 Mg_cat_4ox     0.572773
 Ca_cat_4ox     0.008208  )
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/checkouts/readthedocs.org/user_builds/mineralml/conda/latest/lib/python3.9/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key)
   3811 try:
-> 3812     return self._engine.get_loc(casted_key)
   3813 except KeyError as err:

File pandas/_libs/index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:7096, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'XFo'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[4], line 53
     50 ax[1].set_xlabel("SiO2")
     51 ax[1].set_ylabel("MgO")
---> 53 ax[2].scatter(ol_comp_natural["XFo"], ol_comp_natural["M_site_expanded"], s=20, c="g", lw=0.25, ec='k', label="Natural")
     54 ax[2].scatter(ol_comp_synth["XFo"], ol_comp_synth["M_site_expanded"], s=20, c="r", lw=0.25, ec='k', label="Synthetic")
     55 ax[2].set_xlabel("XFo (Mg/(Mg+Fe))")

File ~/checkouts/readthedocs.org/user_builds/mineralml/conda/latest/lib/python3.9/site-packages/pandas/core/frame.py:4107, in DataFrame.__getitem__(self, key)
   4105 if self.columns.nlevels > 1:
   4106     return self._getitem_multilevel(key)
-> 4107 indexer = self.columns.get_loc(key)
   4108 if is_integer(indexer):
   4109     indexer = [indexer]

File ~/checkouts/readthedocs.org/user_builds/mineralml/conda/latest/lib/python3.9/site-packages/pandas/core/indexes/base.py:3819, in Index.get_loc(self, key)
   3814     if isinstance(casted_key, slice) or (
   3815         isinstance(casted_key, abc.Iterable)
   3816         and any(isinstance(x, slice) for x in casted_key)
   3817     ):
   3818         raise InvalidIndexError(key)
-> 3819     raise KeyError(key) from err
   3820 except TypeError:
   3821     # If we have a listlike key, _check_indexing_error will raise
   3822     #  InvalidIndexError. Otherwise we fall through and re-raise
   3823     #  the TypeError.
   3824     self._check_indexing_error(key)

KeyError: 'XFo'
../_images/examples_mineralML_synthetic_data_7_5.png

Feldspar

Let’s double check that this generator works, and apply this to plagioclase as a simple binary solid solution between albite (NaAlSi₃O₈) and anorthite (CaAl₂Si₂O₈). We understand plagioclase feldspar systematics quite well, so we can test this before applying this to a more complex system. We will have an 8-oxygen basis, add small amounts of K as a minor element, and then check that the synthetic cloud matches the natural data. The steps are as above.

[5]:
# Pull natural data
df_plag_natural = df_load[df_load["Mineral"]=="Plagioclase"]
plag_calc_natural = mm.FeldsparCalculator(df_plag_natural)
plag_comp_natural = plag_calc_natural.calculate_components()

# Define endmembers
plag_endmembers = {
    # Albite: NaAlSi₃O₈
    'Ab': {'Na': 1, 'Al': 1, 'Si': 3, 'O': 8},
    # Anorthite: CaAl₂Si₂O₈
    'An': {'Ca': 1, 'Al': 2, 'Si': 2, 'O': 8},
}

# Specify minor elements
plag_minors = {'K': {'distribution': 'exponential', 'scale': 0.01, 'max_fraction': 0.02}}

# Instantiate generator
plag_gen = mm.SolidSolutionGenerator(
    endmembers=plag_endmembers,
    oxygen_basis=8,
    element_noise_scale=0.05,
    min_site_fraction=0.2,
    minor_elements=plag_minors,
    mixing_dist='beta',
    mixing_params={'a': 2, 'b': 2}
)

# Generate samples
df_plag = plag_gen.generate(1000)
plag_calc_synth = mm.FeldsparCalculator(df_plag)
plag_comp_synth = plag_calc_synth.calculate_components()
display(plag_comp_synth)

# Calculate and compare the distributions of the output data
stats_pl = plag_gen.compare_distributions(base_df=plag_comp_natural, synth_df=plag_comp_synth, suptitle="Plagioclase")
display(stats_pl)

# Scatter‐plot comparing base vs. synthetic oxide proportions
fig, ax = plt.subplots(1, 3, figsize=(18, 5))
ax[0].scatter(plag_comp_natural["Na2O"], plag_comp_natural["CaO"], s=20, c="g", lw=0.25, ec='k')
ax[0].scatter(plag_comp_synth["Na2O"], plag_comp_synth["CaO"], s=20, c="r", lw=0.25, ec='k')
ax[0].set_xlabel("Na2O")
ax[0].set_ylabel("CaO")

ax[1].scatter(plag_comp_natural["Al2O3"], plag_comp_natural["SiO2"], s=20, c="g", lw=0.25, ec='k')
ax[1].scatter(plag_comp_synth["Al2O3"], plag_comp_synth["SiO2"], s=20, c="r", lw=0.25, ec='k')
ax[1].set_xlabel("Al2O3")
ax[1].set_ylabel("SiO2")

ax[2].scatter(plag_comp_natural["An"], plag_comp_natural["Ab"], s=20, c="g", lw=0.25, ec='k', label="Natural")
ax[2].scatter(plag_comp_synth["An"], plag_comp_synth["Ab"], s=20, c="r", lw=0.25, ec='k', label="Synthetic")
ax[2].set_xlabel("An (Ca/(Ca+Na))")
ax[2].set_ylabel("Ab (Na/(Ca+Na))")
ax[2].legend()
plt.tight_layout()

Charge mismatch: 17.60 vs 16
Charge mismatch: 17.68 vs 16
Charge mismatch: 18.45 vs 16
Charge mismatch: 18.23 vs 16
Charge mismatch: 17.87 vs 16
Charge mismatch: 14.39 vs 16
Charge mismatch: 13.86 vs 16
Charge mismatch: 14.35 vs 16
Charge mismatch: 13.66 vs 16
Charge mismatch: 18.58 vs 16
Charge mismatch: 14.23 vs 16
Charge mismatch: 17.81 vs 16
Charge mismatch: 14.24 vs 16
Charge mismatch: 17.98 vs 16
Charge mismatch: 14.37 vs 16
Charge mismatch: 14.35 vs 16
Charge mismatch: 17.69 vs 16
Charge mismatch: 14.40 vs 16
Charge mismatch: 14.23 vs 16
Charge mismatch: 14.27 vs 16
Charge mismatch: 17.70 vs 16
Charge mismatch: 14.37 vs 16
Charge mismatch: 17.69 vs 16
Charge mismatch: 17.74 vs 16
Charge mismatch: 18.45 vs 16
Charge mismatch: 18.14 vs 16
Charge mismatch: 14.25 vs 16
Charge mismatch: 18.44 vs 16
Charge mismatch: 14.37 vs 16
Charge mismatch: 17.71 vs 16
Charge mismatch: 14.31 vs 16
Charge mismatch: 18.08 vs 16
Charge mismatch: 17.65 vs 16
Charge mismatch: 17.73 vs 16
Charge mismatch: 14.30 vs 16
Charge mismatch: 18.18 vs 16
Charge mismatch: 17.80 vs 16
Charge mismatch: 17.79 vs 16
Charge mismatch: 14.36 vs 16
Charge mismatch: 14.17 vs 16
Charge mismatch: 17.78 vs 16
Charge mismatch: 18.61 vs 16
Charge mismatch: 17.82 vs 16
Charge mismatch: 13.83 vs 16
Charge mismatch: 17.73 vs 16
Charge mismatch: 18.10 vs 16
Charge mismatch: 17.76 vs 16
Charge mismatch: 17.66 vs 16
Charge mismatch: 13.88 vs 16
Charge mismatch: 18.17 vs 16
Sample SiO2 Al2O3 CaO Na2O K2O SiO2_mols Al2O3_mols CaO_mols Na2O_mols ... Prediction_Score Prediction_Score_Sigma Second_Predict_Mineral Second_Prediction_Score Cation_Sum M_site T_site An Ab Or
0 NaN 59.897825 25.019196 7.890367 6.956200 0.236413 0.996968 0.245382 0.140705 0.112235 ... NaN NaN NaN NaN 4.978481 0.991969 3.986512 0.380084 0.606357 0.013559
1 NaN 50.954046 30.840758 15.155543 2.589603 0.460050 0.848103 0.302479 0.270261 0.041782 ... NaN NaN NaN NaN 4.976163 0.995951 3.980212 0.743307 0.229828 0.026865
2 NaN 55.272688 28.753490 9.867316 5.854656 0.251851 0.919985 0.282008 0.175959 0.094462 ... NaN NaN NaN NaN 5.012980 1.000931 4.012048 0.475269 0.510288 0.014443
3 NaN 60.255596 24.377683 7.676166 7.524016 0.166539 1.002923 0.239091 0.136885 0.121396 ... NaN NaN NaN NaN 4.999567 1.027670 3.971897 0.357203 0.633570 0.009227
4 NaN 55.247496 28.611137 10.669933 5.094758 0.376676 0.919566 0.280611 0.190272 0.082201 ... NaN NaN NaN NaN 4.986643 0.981045 4.005599 0.524638 0.453310 0.022052
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
995 NaN 55.367246 28.637984 9.755576 6.044619 0.194574 0.921559 0.280875 0.173966 0.097527 ... NaN NaN NaN NaN 5.018645 1.008756 4.009888 0.466208 0.522721 0.011071
996 NaN 57.519438 26.860582 6.753130 8.712587 0.154263 0.957381 0.263442 0.120425 0.140573 ... NaN NaN NaN NaN 5.092421 1.091334 4.001086 0.297458 0.694451 0.008090
997 NaN 50.354261 31.377032 15.659059 2.278380 0.331268 0.838120 0.307739 0.279240 0.036761 ... NaN NaN NaN NaN 4.969945 0.986086 3.983859 0.776109 0.204342 0.019549
998 NaN 64.693460 22.247595 3.837613 8.732941 0.488392 1.076789 0.218199 0.068434 0.140902 ... NaN NaN NaN NaN 4.959268 0.954400 4.004868 0.189775 0.781469 0.028756
999 NaN 53.919921 28.599028 12.483694 4.795270 0.202086 0.897469 0.280493 0.222615 0.077369 ... NaN NaN NaN NaN 5.009550 1.039004 3.970547 0.583305 0.405452 0.011243

1000 rows × 34 columns

../_images/examples_mineralML_synthetic_data_9_2.png
(<Figure size 1200x600 with 11 Axes>,
              ks_stat        p_value  mean_base  mean_synth  std_base  \
 cation
 Si_cat_8ox  0.333277   2.295830e-94   2.427908    2.501358  0.307028
 Al_cat_8ox  0.312092   1.557395e-82   1.554179    1.495628  0.296910
 Ca_cat_8ox  0.331242   3.424084e-93   0.582427    0.499233  0.308634
 Na_cat_8ox  0.340598   1.178878e-98   0.385243    0.498897  0.268054
 K_cat_8ox   0.343153  3.548568e-100   0.028198    0.010322  0.035713

             std_synth
 cation
 Si_cat_8ox   0.232774
 Al_cat_8ox   0.236659
 Ca_cat_8ox   0.227723
 Na_cat_8ox   0.229164
 K_cat_8ox    0.010488  )
../_images/examples_mineralML_synthetic_data_9_4.png

Kalsilite

Success, this mm.SolidSolutionGenerator works with familiar solid solution minerals. Let’s test wonkier (less common) minerals, such as kalsilite. Kalsilite is a feldspathoid mineral that shares the tridymite framework. There is a Na-K exchange between nepheline and kalsilite. We will have a 4-oxygen basis and check that the synthetic cloud matches the natural data. The steps are as above.

[6]:
# Pull natural data
df_ks_natural = df_load[df_load["Mineral"]=="Kalsilite"]
ks_calc_natural = mm.KalsiliteCalculator(df_ks_natural)
ks_comp_natural = ks_calc_natural.calculate_components()

# Define endmembers
ks_endmembers = {
    # Kalsilite K[AlSiO₄]
    "Ks": {"K":  1, "Al": 1, "Si": 1, "O": 4},
    # Nepheline Na[AlSiO₄] ~ simplification
    "Ne": {"Na": 1, "Al": 1, "Si": 1, "O": 4}
}

# Specify minor elements
ks_minors = {} # no minors for pure K[AlSiO₄]-Na[AlSiO₄]

# Instantiate generator
gen_ks = mm.SolidSolutionGenerator(
    endmembers = ks_endmembers,
    oxygen_basis = 4,
    minor_elements = ks_minors,
    element_noise_scale = 0.02,
    min_site_fraction = 0.2,
    mixing_dist = "beta",
    mixing_params = {"a": 1, "b": 200},
)

# Generate samples, use the kalsilite calculator to calculate site allocations, etc.
df_ks = gen_ks.generate(n_samples=500)
ks_calc_synth = mm.KalsiliteCalculator(df_ks)
ks_comp_synth = ks_calc_synth.calculate_components()
ks_comp_synth['Mineral'] = 'Kalsilite'
display(ks_comp_synth)

# Calculate and compare the distributions of the output data
stats_ks = gen_ks.compare_distributions(base_df=ks_comp_natural, synth_df=ks_comp_synth, suptitle="Kalsilite")
display(stats_ks)

fig, ax = plt.subplots(1, 4, figsize = (20, 5))
ax = ax.flatten()
ax[0].scatter(ks_comp_natural['Cation_Sum'], ks_comp_natural['Cation_Sum'], s=20, c="g", lw=0.25, ec='k')
ax[0].scatter(ks_comp_synth['Cation_Sum'], ks_comp_synth['Cation_Sum'], s=20, c="r", lw=0.25, ec='k')
ax[0].set_xlabel('Cation_Sum')
ax[0].set_ylabel('Cation_Sum')
ax[1].scatter(ks_comp_natural['A_B_site'], ks_comp_natural['T_site'], s=20, c="g", lw=0.25, ec='k')
ax[1].scatter(ks_comp_synth['A_B_site'], ks_comp_synth['T_site'], s=20, c="g", lw=0.25, ec='r')
ax[1].set_xlabel('A_B_site (K+Na)')
ax[1].set_ylabel('T_site')
ax[2].scatter(ks_comp_natural['K2O'], ks_comp_natural['Na2O'], s=20, c="g", lw=0.25, ec='k')
ax[2].scatter(ks_comp_synth['K2O'], ks_comp_synth['Na2O'], s=20, c="r", lw=0.25, ec='k')
ax[2].set_xlabel('K2O')
ax[2].set_ylabel('Na2O')
ax[3].scatter(ks_comp_natural['SiO2'], ks_comp_natural['Al2O3'], s=20, c="g", lw=0.25, ec='k', label='Natural')
ax[3].scatter(ks_comp_synth['SiO2'], ks_comp_synth['Al2O3'], s=20, c="r", lw=0.25, ec='k', label='Synthetic')
ax[3].set_xlabel('SiO2')
ax[3].set_ylabel('Al2O3')
plt.tight_layout()
plt.show()

Charge mismatch: 8.88 vs 8
Charge mismatch: 8.87 vs 8
Charge mismatch: 8.97 vs 8
Charge mismatch: 8.81 vs 8
Charge mismatch: 9.11 vs 8
Charge mismatch: 8.85 vs 8
Charge mismatch: 8.86 vs 8
Charge mismatch: 8.98 vs 8
Charge mismatch: 7.19 vs 8
Charge mismatch: 8.81 vs 8
Charge mismatch: 8.97 vs 8
Charge mismatch: 8.85 vs 8
Charge mismatch: 8.97 vs 8
Charge mismatch: 9.29 vs 8
Charge mismatch: 8.86 vs 8
Sample SiO2 Al2O3 Na2O K2O SiO2_mols Al2O3_mols Na2O_mols K2O_mols SiO2_ox ... Predict_Mineral Prediction_Score Prediction_Score_Sigma Second_Predict_Mineral Second_Prediction_Score Cation_Sum A_B_site A_site B_site T_site
0 NaN 38.159790 31.795911 0.096860 29.947439 0.635150 0.311847 0.001563 0.317927 1.270299 ... NaN NaN NaN NaN NaN 3.006059 1.012112 1.007162 0.004951 1.993947
1 NaN 38.191066 31.424981 0.297396 30.086556 0.635670 0.308209 0.004798 0.319404 1.271340 ... NaN NaN NaN NaN NaN 3.016452 1.029144 1.013912 0.015232 1.987308
2 NaN 38.052940 31.930824 0.357440 29.658796 0.633371 0.313170 0.005767 0.314863 1.266742 ... NaN NaN NaN NaN NaN 3.009196 1.015100 0.996841 0.018258 1.994096
3 NaN 37.921714 32.742921 0.165985 29.169380 0.631187 0.321135 0.002678 0.309667 1.262374 ... NaN NaN NaN NaN NaN 2.991417 0.984491 0.976050 0.008441 2.006926
4 NaN 38.716823 31.910334 0.057868 29.314976 0.644421 0.312969 0.000934 0.311213 1.288842 ... NaN NaN NaN NaN NaN 2.983826 0.983178 0.980237 0.002941 2.000648
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
495 NaN 38.314994 32.429322 0.053727 29.201957 0.637733 0.318059 0.000867 0.310013 1.275466 ... NaN NaN NaN NaN NaN 2.984599 0.978947 0.976217 0.002730 2.005652
496 NaN 38.593452 31.805957 0.116333 29.484258 0.642368 0.311945 0.001877 0.313010 1.284735 ... NaN NaN NaN NaN NaN 2.991225 0.993545 0.987623 0.005922 1.997680
497 NaN 38.347049 31.655401 0.024592 29.972958 0.638266 0.310469 0.000397 0.318198 1.276533 ... NaN NaN NaN NaN NaN 3.002363 1.008796 1.007539 0.001256 1.993568
498 NaN 37.776901 32.535484 0.058867 29.628748 0.628777 0.319100 0.000950 0.314544 1.257553 ... NaN NaN NaN NaN NaN 3.000321 0.997470 0.994467 0.003003 2.002851
499 NaN 37.999079 32.048646 0.145399 29.806876 0.632475 0.314326 0.002346 0.316435 1.264949 ... NaN NaN NaN NaN NaN 3.005789 1.009316 1.001888 0.007428 1.996474

500 rows × 29 columns

../_images/examples_mineralML_synthetic_data_11_2.png
(<Figure size 1200x600 with 10 Axes>,
             ks_stat   p_value  mean_base  mean_synth  std_base  std_synth
 cation
 Si_cat_4ox    0.098  0.003224   1.000535    1.001127  0.012252   0.012679
 Al_cat_4ox    0.071  0.068304   0.996509    0.998752  0.017107   0.016009
 Na_cat_4ox    0.086  0.014132   0.005185    0.005302  0.006651   0.005168
 K_cat_4ox     0.107  0.000937   0.996766    0.993933  0.020745   0.022256)
../_images/examples_mineralML_synthetic_data_11_4.png