{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\"\"\" Created on November 13, 2023 // @author: Sarah Shi \"\"\"\n", "\n", "import os\n", "import numpy as np\n", "import pandas as pd\n", "\n", "import mineralML as mm\n", "\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "%config InlineBackend.figure_format = 'png'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Synthetic Mineral Generator" ] }, { "cell_type": "markdown", "metadata": {}, "source": "This notebook shows **how the synthetic mineral generator in mineralML works**, with an example CSV for groundtruthing: `training_hundred.csv`. This is a three step process: \n1. Load and prepare data for analysis.\n2. Define endmembers and generator settings (e.g., `oxygen_basis`, mixing distribution/parameters, minor elements, noise scales).\n3. Generate synthetic compositions and evaluate them (convert to oxide wt% and cations; optionally use `compare_distributions` to compare against the natural dataset).\n\nWe loaded in the **mineralML** Python package as `mm`. **mineralML** has trained machine learning models for classifying minerals. This implementation aims to get your electron microprobe or quantitative EDS compositions classified and processed. We remove some degrees of freedom to simplify the process as much as possible. The minerals considered for this study include: Amphibole, Apatite, Biotite, Calcite, Chlorite, Epidote, Feldspar (KFeldspar and Plagioclase), Garnet, Glass, Kalsilite, Leucite, Melilite, Muscovite, Nepheline, Olivine, Pyroxene (Clinopyroxene and Orthopyroxene), Quartz, Rhombohedral_Oxides (Hematite-Ilmenite), Rutile, Serpentine, Spinels (Magnetite-Spinel), Titanite, Tourmaline, and Zircon. \n\nOne CSV file containing your electron microprobe analyses in oxide weight percentages is necessary. Find an example [here](https://github.com/sarahshi/mineralML/blob/main/docs/examples/training_hundred.csv). The necessary oxides are $SiO_2$, $TiO_2$, $Al_2O_3$, $FeO_t$, $MnO$, $MgO$, $CaO$, $Na_2O$, $K_2O$, $Cr_2O_3$, and $P_2O_5$. For the oxides not analyzed for specific minerals, the preprocessing will fill in the nan values as 0." }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load and prepare data for groundtruthing" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Read in your dataframe of mineral data, called training_hundred.csv. \n", "# Prepare the dataframe by removing rows with too many NaNs, and filling in zeros. \n", "\n", "df_load = mm.load_df('TabularData/synth_groundtruth.csv')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Examine the prepared dataframe\n", "\n", "display(df_load.head())" ] }, { "cell_type": "markdown", "metadata": {}, "source": "# Olivine\n\nLet's apply the generator to olivine as a simple binary solid solution between forsterite (Mg₂SiO₄) and fayalite (Fe₂SiO₄). We understand olivine systematics quite well, so we can test this before applying this to a more complex system. We will keep everything on a 4-oxygen basis, add small amounts of Ca and Mn as minors, and then check that the synthetic cloud matches the natural data. The steps are as follows: \n\n1. Define endmembers (4 oxygen basis). Use cation counts per formula unit; iron as total cations (`Fe2t`) so the framework can convert to `FeOt` downstream.\n2. Specify minor elements (optional but realistic).\n3. Instantiate the generator with `mm.SolidSolutionGenerator`.\n4. Generate synthetic compositions.\n5. Compute sites/derived components.\n6. Compare synthetic vs natural distributions with `compare_distributions`.\n7. Plot paired violin distributions for cations (and matching oxides if present). Report KS statistics (`ks_stat`, `p_value`) plus means/stds. Lower `ks_stat` / higher `p_value` means a better match.\n\nGotchas:\n- Keep iron conventions straight: `Fe2t` (cations) aligns with `FeOt` (oxide). Don't mix FeO/Fe₂O₃ with FeOt in the same row." }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Pull natural data \n", "df_ol_natural = df_load[df_load[\"Mineral\"]==\"Olivine\"]\n", "ol_calc_natural = mm.OlivineCalculator(df_ol_natural)\n", "ol_comp_natural = ol_calc_natural.calculate_components()\n", "\n", "# Define endmembers \n", "ol_endmembers = {\n", " # Forsterite: Mg₂SiO₄\n", " 'Fo': {'Mg': 2, 'Si': 1, 'O': 4},\n", " # Fayalite: Fe₂SiO₄\n", " 'Fa': {'Fe2t': 2, 'Si': 1, 'O': 4}\n", "}\n", "\n", "# Specify minor elements\n", "ol_minors = {\n", " 'Ca': {'distribution': 'exponential', 'scale': 0.01, 'max_fraction': 0.01},\n", " 'Mn': {'distribution': 'exponential', 'scale': 0.01, 'max_fraction': 0.01}\n", "}\n", "\n", "# Instantiate generator\n", "ol_gen = mm.SolidSolutionGenerator(\n", " endmembers=ol_endmembers,\n", " oxygen_basis=4,\n", " element_noise_scale=0.025,\n", " min_site_fraction=0.2,\n", " minor_elements=ol_minors,\n", " mixing_dist='beta',\n", " mixing_params={'a': 1, 'b': 1}\n", ")\n", "\n", "# Generate samples, use the olivine calculator to calculate site allocations, etc. \n", "df_ol = ol_gen.generate(1000)\n", "ol_calc_synth = mm.OlivineCalculator(df_ol)\n", "ol_comp_synth = ol_calc_synth.calculate_components()\n", "display(ol_comp_synth)\n", "\n", "# Calculate and compare the distributions of the output data\n", "stats_ol = ol_gen.compare_distributions(base_df=ol_comp_natural, synth_df=ol_comp_synth, suptitle=\"Olivine\")\n", "display(stats_ol)\n", "\n", "# Scatter‐plot comparing base vs. synthetic oxide proportions\n", "fig, ax = plt.subplots(1, 3, figsize=(18, 5))\n", "ax[0].scatter(ol_comp_natural[\"FeOt\"], ol_comp_natural[\"MgO\"], s=20, c=\"g\", lw=0.25, ec='k')\n", "ax[0].scatter(ol_comp_synth[\"FeOt\"], ol_comp_synth[\"MgO\"], s=20, c=\"r\", lw=0.5, ec='k')\n", "ax[0].set_xlabel(\"FeO\")\n", "ax[0].set_ylabel(\"MgO\")\n", "\n", "ax[1].scatter(ol_comp_natural[\"SiO2\"], ol_comp_natural[\"MgO\"], s=20, c=\"g\", lw=0.25, ec='k')\n", "ax[1].scatter(ol_comp_synth[\"SiO2\"], ol_comp_synth[\"MgO\"], s=20, c=\"r\", lw=0.5, ec='k')\n", "ax[1].set_xlabel(\"SiO2\")\n", "ax[1].set_ylabel(\"MgO\")\n", "\n", "ax[2].scatter(ol_comp_natural[\"XFo\"], ol_comp_natural[\"M_site_expanded\"], s=20, c=\"g\", lw=0.25, ec='k', label=\"Natural\")\n", "ax[2].scatter(ol_comp_synth[\"XFo\"], ol_comp_synth[\"M_site_expanded\"], s=20, c=\"r\", lw=0.25, ec='k', label=\"Synthetic\")\n", "ax[2].set_xlabel(\"XFo (Mg/(Mg+Fe))\")\n", "ax[2].set_ylabel(\"M-site Expanded\")\n", "ax[2].legend()\n", "plt.tight_layout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": "# Feldspar\n\nLet's double check that this generator works, and apply this to plagioclase as a simple binary solid solution between albite (NaAlSi₃O₈) and anorthite (CaAl₂Si₂O₈). We understand plagioclase feldspar systematics quite well, so we can test this before applying this to a more complex system. We will have an 8-oxygen basis, add small amounts of K as a minor element, and then check that the synthetic cloud matches the natural data. The steps are as above." }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Pull natural data \n", "df_plag_natural = df_load[df_load[\"Mineral\"]==\"Plagioclase\"]\n", "plag_calc_natural = mm.FeldsparCalculator(df_plag_natural)\n", "plag_comp_natural = plag_calc_natural.calculate_components()\n", "\n", "# Define endmembers \n", "plag_endmembers = {\n", " # Albite: NaAlSi₃O₈\n", " 'Ab': {'Na': 1, 'Al': 1, 'Si': 3, 'O': 8},\n", " # Anorthite: CaAl₂Si₂O₈\n", " 'An': {'Ca': 1, 'Al': 2, 'Si': 2, 'O': 8},\n", "}\n", "\n", "# Specify minor elements\n", "plag_minors = {'K': {'distribution': 'exponential', 'scale': 0.01, 'max_fraction': 0.02}}\n", "\n", "# Instantiate generator\n", "plag_gen = mm.SolidSolutionGenerator(\n", " endmembers=plag_endmembers,\n", " oxygen_basis=8,\n", " element_noise_scale=0.05,\n", " min_site_fraction=0.2,\n", " minor_elements=plag_minors,\n", " mixing_dist='beta',\n", " mixing_params={'a': 2, 'b': 2}\n", ")\n", "\n", "# Generate samples\n", "df_plag = plag_gen.generate(1000)\n", "plag_calc_synth = mm.FeldsparCalculator(df_plag)\n", "plag_comp_synth = plag_calc_synth.calculate_components()\n", "display(plag_comp_synth)\n", "\n", "# Calculate and compare the distributions of the output data\n", "stats_pl = plag_gen.compare_distributions(base_df=plag_comp_natural, synth_df=plag_comp_synth, suptitle=\"Plagioclase\")\n", "display(stats_pl)\n", "\n", "# Scatter‐plot comparing base vs. synthetic oxide proportions\n", "fig, ax = plt.subplots(1, 3, figsize=(18, 5))\n", "ax[0].scatter(plag_comp_natural[\"Na2O\"], plag_comp_natural[\"CaO\"], s=20, c=\"g\", lw=0.25, ec='k')\n", "ax[0].scatter(plag_comp_synth[\"Na2O\"], plag_comp_synth[\"CaO\"], s=20, c=\"r\", lw=0.25, ec='k')\n", "ax[0].set_xlabel(\"Na2O\")\n", "ax[0].set_ylabel(\"CaO\")\n", "\n", "ax[1].scatter(plag_comp_natural[\"Al2O3\"], plag_comp_natural[\"SiO2\"], s=20, c=\"g\", lw=0.25, ec='k')\n", "ax[1].scatter(plag_comp_synth[\"Al2O3\"], plag_comp_synth[\"SiO2\"], s=20, c=\"r\", lw=0.25, ec='k')\n", "ax[1].set_xlabel(\"Al2O3\")\n", "ax[1].set_ylabel(\"SiO2\")\n", "\n", "ax[2].scatter(plag_comp_natural[\"An\"], plag_comp_natural[\"Ab\"], s=20, c=\"g\", lw=0.25, ec='k', label=\"Natural\")\n", "ax[2].scatter(plag_comp_synth[\"An\"], plag_comp_synth[\"Ab\"], s=20, c=\"r\", lw=0.25, ec='k', label=\"Synthetic\")\n", "ax[2].set_xlabel(\"An (Ca/(Ca+Na))\")\n", "ax[2].set_ylabel(\"Ab (Na/(Ca+Na))\")\n", "ax[2].legend()\n", "plt.tight_layout()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": "# Kalsilite\n\nSuccess, this `mm.SolidSolutionGenerator` works with familiar solid solution minerals. Let's test wonkier (less common) minerals, such as kalsilite. Kalsilite is a feldspathoid mineral that shares the tridymite framework. There is a Na-K exchange between nepheline and kalsilite. We will have a 4-oxygen basis and check that the synthetic cloud matches the natural data. The steps are as above." }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Pull natural data \n", "df_ks_natural = df_load[df_load[\"Mineral\"]==\"Kalsilite\"]\n", "ks_calc_natural = mm.KalsiliteCalculator(df_ks_natural)\n", "ks_comp_natural = ks_calc_natural.calculate_components()\n", "\n", "# Define endmembers \n", "ks_endmembers = {\n", " # Kalsilite K[AlSiO₄]\n", " \"Ks\": {\"K\": 1, \"Al\": 1, \"Si\": 1, \"O\": 4},\n", " # Nepheline Na[AlSiO₄] ~ simplification\n", " \"Ne\": {\"Na\": 1, \"Al\": 1, \"Si\": 1, \"O\": 4}\n", "}\n", "\n", "# Specify minor elements\n", "ks_minors = {} # no minors for pure K[AlSiO₄]-Na[AlSiO₄]\n", "\n", "# Instantiate generator\n", "gen_ks = mm.SolidSolutionGenerator(\n", " endmembers = ks_endmembers,\n", " oxygen_basis = 4,\n", " minor_elements = ks_minors,\n", " element_noise_scale = 0.02,\n", " min_site_fraction = 0.2,\n", " mixing_dist = \"beta\",\n", " mixing_params = {\"a\": 1, \"b\": 200},\n", ")\n", "\n", "# Generate samples, use the kalsilite calculator to calculate site allocations, etc. \n", "df_ks = gen_ks.generate(n_samples=500)\n", "ks_calc_synth = mm.KalsiliteCalculator(df_ks)\n", "ks_comp_synth = ks_calc_synth.calculate_components()\n", "ks_comp_synth['Mineral'] = 'Kalsilite'\n", "display(ks_comp_synth)\n", "\n", "# Calculate and compare the distributions of the output data\n", "stats_ks = gen_ks.compare_distributions(base_df=ks_comp_natural, synth_df=ks_comp_synth, suptitle=\"Kalsilite\")\n", "display(stats_ks)\n", "\n", "fig, ax = plt.subplots(1, 4, figsize = (20, 5))\n", "ax = ax.flatten()\n", "ax[0].scatter(ks_comp_natural['Cation_Sum'], ks_comp_natural['Cation_Sum'], s=20, c=\"g\", lw=0.25, ec='k')\n", "ax[0].scatter(ks_comp_synth['Cation_Sum'], ks_comp_synth['Cation_Sum'], s=20, c=\"r\", lw=0.25, ec='k')\n", "ax[0].set_xlabel('Cation_Sum')\n", "ax[0].set_ylabel('Cation_Sum')\n", "ax[1].scatter(ks_comp_natural['A_B_site'], ks_comp_natural['T_site'], s=20, c=\"g\", lw=0.25, ec='k')\n", "ax[1].scatter(ks_comp_synth['A_B_site'], ks_comp_synth['T_site'], s=20, c=\"g\", lw=0.25, ec='r')\n", "ax[1].set_xlabel('A_B_site (K+Na)')\n", "ax[1].set_ylabel('T_site')\n", "ax[2].scatter(ks_comp_natural['K2O'], ks_comp_natural['Na2O'], s=20, c=\"g\", lw=0.25, ec='k')\n", "ax[2].scatter(ks_comp_synth['K2O'], ks_comp_synth['Na2O'], s=20, c=\"r\", lw=0.25, ec='k')\n", "ax[2].set_xlabel('K2O')\n", "ax[2].set_ylabel('Na2O')\n", "ax[3].scatter(ks_comp_natural['SiO2'], ks_comp_natural['Al2O3'], s=20, c=\"g\", lw=0.25, ec='k', label='Natural')\n", "ax[3].scatter(ks_comp_synth['SiO2'], ks_comp_synth['Al2O3'], s=20, c=\"r\", lw=0.25, ec='k', label='Synthetic')\n", "ax[3].set_xlabel('SiO2')\n", "ax[3].set_ylabel('Al2O3')\n", "plt.tight_layout()\n", "plt.show()\n" ] } ], "metadata": { "kernelspec": { "display_name": "science", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.18" } }, "nbformat": 4, "nbformat_minor": 2 }