mineralML Documentation
Core Functions
- class mineralML.core.LabelDataset(x, labels)[source]
A PyTorch Dataset subclass designed to contain features and labels for machine learning. It verifies and maintains the input features as a float tensor and labels as a long tensor in a shape that’s compatible with model training requirements. If the input data are not already in a 2D shape, the data are reshaped to ensure compatibility with PyTorch’s batch processing.
- Parameters:
x (ndarray) – The array of input features, expected to be a 2D array (samples by features).
labels (ndarray) – The array of labels corresponding to the input data, expected to be a 1D array.
- mineralML.core.export_predictions_to_excel(results_df, filename='prediction_results.xlsx')[source]
Export prediction results to an Excel workbook with one sheet called “All” containing all rows, and additional sheets for each predicted mineral.
- mineralML.core.load_df(filepath, index_col=0, **kwargs)[source]
Loads a DataFrame from a CSV/Excel file specified by the given file path. The first column of the CSV is set as the index of the DataFrame.
- Parameters:
filepath (str) – The path to the CSV file to be loaded.
index_col – int | str | None, default 0 Column to use as the row labels of the DataFrame.
**kwargs – Passed through to pandas reader: - pd.read_csv for CSV - pd.read_excel for Excel
- Returns:
Pandas DataFrame containing the data from the CSV file.
- Return type:
df (DataFrame)
- mineralML.core.load_model(model, optimizer=None, path='')[source]
Loads a model’s state and optionally an optimizer’s state from a saved checkpoint file. The function updates the model’s parameters with those found in the checkpoint and, if an optimizer is provided, also updates the optimizer’s state.
- Parameters:
model (nn.Module) – The PyTorch model to which the saved state will be loaded.
optimizer (torch.optim.Optimizer, optional) – The optimizer for which the state is to be loaded. If None, only the model state is loaded. Defaults to None.
path (str) – The path to the file containing the saved checkpoint. The checkpoint file should have a dictionary containing ‘params’ and ‘optimizer’ keys.
It is assumed that the checkpoint file at the specified ‘path’ is accessible and contains a valid state dictionary for the model and, optionally, the optimizer.
- mineralML.core.load_scaler(scaler_path)[source]
Loads a pre-fitted scaler’s mean and std from a .npz file. This scaler is a StandardScaler for normalizing or standardizing input data before passing it to a machine learning model.
- Returns:
The mean and std from the scaler object ‘scaler_ae/nn.npz’.
- Return type:
mean, std (pandas Series)
- Raises:
FileNotFoundError – If ‘scaler_ae/nn.npz’ is not found in the expected directory.
Exception – Propagates any exception raised during the scaler loading process.
- mineralML.core.same_seeds(seed)[source]
Sets the seed for generating random numbers to the provided value for various libraries including PyTorch, NumPy, and Python’s random module to ensure reproducibility across multiple runs. It also sets the CuDNN backend to operate in a deterministic mode. This function is helpful for debugging and to ensure that experimental runs are repeatable with the same sequence of random numbers being generated each time. It is particularly useful when working with stochastic processes in machine learning experiments where reproducibility is crucial.
- Parameters:
seed (int) – The seed value to use for all random number generators.
- mineralML.core.save_model_nn(optimizer, best_model_state, path)[source]
Saves the state dictionary of a neural network’s best model along with the state of its optimizer to a file. The checkpoint is saved as a dictionary with ‘params’ holding the model state and ‘optimizer’ holding the optimizer state. The saved file can be used to load the model and continue training or for evaluation without the need to retrain the model from scratch.
- mineralML.core.weights_init(m)[source]
Applies an initialization scheme to the weights and biases of a Batch Normalization layer in a neural network. If the module ‘m’ is of the class ‘BatchNorm’, it initializes the layer’s weights with a normal distribution centered around 1.0 with a standard deviation of 0.02, and sets the biases to 0.
- Parameters:
m (nn.Module) – The module to initialize.
This function is typically used as an argument to apply method of nn.Module when initializing the weights of a neural network.
Sequential, Transfer-Learning Machine Learning Functions
- class mineralML.hybrid.FeatureExtractor(input_dim=11, classes=23, hidden_layer_sizes=[64, 32, 16], dropout_rate=0.1, use_bayesian_feature_layer=True, use_bayesian_classifier=False)[source]
Stage A classifier: extracts features and returns logits, optionally with the intermediate feature embedding h.
- Parameters:
input_dim (int) – Number of input oxide features.
classes (int) – Number of output mineral classes.
dropout_rate (float) – Dropout probability (0.0 = no dropout).
use_bayesian_feature_layer (bool) – If True, the final feature layer is a VariationalLayer instead of a standard Linear layer.
use_bayesian_classifier (bool) – If True, the classification head is a VariationalLayer.
- class mineralML.hybrid.LatentProjector(feat_dim, hidden=32, dropout_rate=0.0, nonlinear=True)[source]
Stage B: trainable mapper from feature embedding h to a 2D latent space z2.
- class mineralML.hybrid.ReconstructionDecoder(z_dim, output_dim, decoder_hidden_sizes=[64, 32], dropout_rate=0.0)[source]
Stage B: trainable decoder from 2D latent z2 back to oxide space x.
- class mineralML.hybrid.ReconstructionWrapper(classifier: FeatureExtractor, mapper2d: LatentProjector, decoder: ReconstructionDecoder)[source]
Inference wrapper combining classifier, mapper, and decoder. Returns (logits, reconstructed oxides, z2) on forward pass.
- Parameters:
classifier (FeatureExtractor) – Trained Stage A classifier.
mapper2d (LatentProjector) – Trained Stage B latent projector.
decoder (ReconstructionDecoder) – Trained Stage B decoder.
- class mineralML.hybrid.VariationalLayer(in_features, out_features)[source]
Bayesian linear layer using variational inference. Models weights and biases as Gaussian distributions rather than point estimates, enabling uncertainty quantification through weight sampling
- Parameters:
- weight_mu
Mean of the weight distributions.
- Type:
Parameter
- weight_rho
Unconstrained std parameters for weight distributions.
- Type:
Parameter
- bias_mu
Mean of the bias distributions.
- Type:
Parameter
- bias_rho
Unconstrained std parameters for bias distributions.
- Type:
Parameter
- softplus
Softplus activation ensuring positive standard deviations.
- Type:
nn.Softplus
- mineralML.hybrid.balance(df, n=1000)[source]
Groups to 2000 total: - Pyroxene group (clinopyroxene + orthopyroxene -> ‘pyroxene’), kmeans for representative sampling - Feldspar group (plagioclase + k-feldspar -> ‘feldspar’), kmeans for representative sampling - Olivine, kmeans for representative sampling - Amphibole, kmeans for representative sampling to capture tremolite and actinolite - Rhombohedral oxide group (hematite + ilmenite -> ‘rhombohedral oxide’) - Spinel group (magnetite + spinel -> ‘spinel’) - Glass (separate group with 2000 samples), TAS stratified sampling
Groups to 1000 total: - Garnet group - All other classes get standard n samples (default 1000). If count <1250, shuffle+oversample.
- Parameters:
df (pd.DataFrame) – Input DataFrame with a ‘Mineral’ column and oxide columns.
n (int) – Base target sample count per member class (default 1000).
- Returns:
Resampled DataFrame with balanced class counts.
- Return type:
df_balanced (pd.DataFrame)
- mineralML.hybrid.build_model_from_config(model_config, device=None)[source]
Build a ReconstructionWrapper from a saved model_config dictionary.
- mineralML.hybrid.class2mineral(pred_class)[source]
Translates predicted class codes into mineral names using a mapping from the trained neural network.
- Parameters:
pred_class (array-like) – Array of predicted class codes (integers).
- Returns:
- Array of mineral names corresponding to the
predicted class codes.
- Return type:
pred_mineral (ndarray)
- mineralML.hybrid.compute_z2_from_df(df, wrapper, batch_size=256, device=None)[source]
Computes 2D latent representations (z2) and predicted class labels for a DataFrame.
- Parameters:
df (pd.DataFrame) – Input DataFrame with oxide columns.
wrapper (ReconstructionWrapper) – Loaded model wrapper.
batch_size (int) – Batch size for inference.
device (str|None) – Device string. If None, uses the wrapper’s current device.
- Returns:
(N, 2) array of 2D latent coordinates. Preds_out (ndarray): (N,) array of predicted class indices.
- Return type:
Z2_out (ndarray)
- mineralML.hybrid.convert_fe_to_feot(df)[source]
Handle inconsistent Fe speciation in databases by converting all to FeOt.
- Parameters:
df (pd.DataFrame) – Array of oxide compositions.
- Returns:
Array of oxide compositions with converted Fe.
- Return type:
df (pd.DataFrame)
- mineralML.hybrid.enable_mc_sampling(model, *, enable_dropout: bool)[source]
Enables stochasticity for MC inference without breaking BatchNorm. Keeps BatchNorm in eval() mode, enables VariationalLayer sampling, and optionally enables Dropout.
- Parameters:
model (nn.Module) – The model to configure for MC sampling.
enable_dropout (bool) – If True, sets Dropout layers to train() mode.
- Returns:
The configured model (modified in-place).
- Return type:
model (nn.Module)
- mineralML.hybrid.format_oxide_label(label)[source]
Format oxide names with subscripts for plot labels. Adapts to the active matplotlib text renderer — uses mathtext by default, falls back to plain LaTeX syntax if usetex is enabled, or returns the raw string if neither is available.
- mineralML.hybrid.kl_divergence_sum(model)[source]
Sums KL divergences across all VariationalLayer modules in a model.
- Parameters:
model (nn.Module) – PyTorch model containing VariationalLayer submodules.
- Returns:
Total KL divergence.
- Return type:
kl_div (float)
- mineralML.hybrid.load_hybrid_checkpoint(model_path=None, device=None, optimizer=None, strict=True, eval_mode=True)[source]
Load a hybrid checkpoint, rebuild the model from model_config, and restore model weights. Optionally restore optimizer state.
- Parameters:
model_path (str | None) – Path to the checkpoint. If None, uses the bundled default model file.
device (str | torch.device | None) – Device to load the model on. If None, uses CUDA when available, otherwise CPU.
optimizer (torch.optim.Optimizer | None) – Optimizer to restore from the checkpoint if optimizer state is present.
strict (bool) – Passed to model.load_state_dict().
eval_mode (bool) – If True, calls model.eval() before returning.
- Returns:
Loaded ReconstructionWrapper. checkpoint (dict): Full checkpoint dictionary. model_config (dict): The checkpoint model_config dictionary.
- Return type:
model (nn.Module)
- mineralML.hybrid.load_minclass_nn(minclass_path='mineral_classes_nn_v0030.npz')[source]
Deprecated — use load_mineral_classes instead.
- mineralML.hybrid.load_mineral_classes(minclass_path='mineral_classes_nn_v0030.npz')[source]
Loads mineral classes and their corresponding mappings from a .npz file. The file is expected to contain an array of class names under the ‘classes’ key. This function creates a dictionary that maps an integer code to each class name.
- mineralML.hybrid.norm_data(df, scaler_path='scaler_nn_v0030.npz')[source]
Normalizes oxide composition data using a predefined StandardScaler. Ensures that the DataFrame has been preprocessed before applying the transformation.
- Parameters:
df (pd.DataFrame) – Input DataFrame containing oxide composition data.
scaler_path (str) – Filename or relative path to the saved scaler .npz file.
- Returns:
Transformed oxide composition data.
- Return type:
array_x (ndarray)
- mineralML.hybrid.plot_harker(df_train=None, train_minerals=None, overlay_datasets=None, oxides=['SiO2', 'TiO2', 'Al2O3', 'FeOt', 'MnO', 'MgO', 'CaO', 'Na2O', 'K2O', 'Cr2O3', 'P2O5'], x_oxide='SiO2', extra_pairs=None, plot_totals=False, title=None, train_mineral_col='Mineral', train_kws=None, new_kws=None)[source]
Plots Harker diagrams for training data with optional study dataset overlays.
- Parameters:
df_train (pd.DataFrame|None) – Primary dataset containing training geochemical data.
train_minerals (list|None) – Mineral phases to filter and plot as background points.
overlay_datasets (dict|None) – {study_name: DataFrame} or {study_name: (DataFrame, kws_dict)} for datasets plotted at full opacity.
oxides (list) – Oxide names to plot on the Y-axes against x_oxide.
x_oxide (str) – Independent variable on the X-axis.
extra_pairs (list[tuple]|None) – Additional specific plots, e.g., [(“CaO”, “Na2O”)].
plot_totals (bool) – If True, calculates and plots x_oxide vs. oxide sum.
title (str|None) – Figure suptitle.
train_mineral_col (str) – Column name for mineral labels in df_train.
train_kws (dict|None) – Scatter keywords for training data. Defaults to {“s”: 20, “alpha”: 0.1, “ec”: “k”, “lw”: 0.25}.
new_kws (dict|None) – Default scatter keywords for overlay datasets. Defaults to {“s”: 60, “alpha”: 1.0, “ec”: “k”, “lw”: 1}.
- mineralML.hybrid.plot_latent_space(df, label_column='Predict_Mineral', submineral_column='Submineral', title='Latent Space (z2) Overlay', ref_kws=None, new_kws=None, max_points=250000, filename=None, seed=88)[source]
Plots a 2D latent space overlaying new data on top of reference (training) data. Loads pre-computed training latents as a background and projects the provided DataFrame samples as a foreground overlay.
- Parameters:
df (pd.DataFrame) – Input data to be projected into the latent space.
label_column (str) – Column name in df representing pre-computed labels.
submineral_column (str) – Fallback column for resolving ‘Oxide’ labels (e.g., ‘Oxide’ -> ‘Magnetite’ -> ‘Spinel’).
title (str) – Title displayed at the top of the plot.
ref_kws (dict|None) – Keyword arguments for the background (training) scatter. Defaults to {“s”: 10, “alpha”: 0.10, “marker”: “x”}.
new_kws (dict|None) – Keyword arguments for the foreground (new data) scatter.
max_points (int) – Maximum number of points to plot per layer.
filename (str|None) – Path to save the figure. If None, displays interactively.
seed (int|None) – If provided, calls same_seeds(seed) to make predictions fully reproducible. If None (default), results are non-deterministic.
- mineralML.hybrid.plot_latent_space_training(model, dataset, title, filename, batch_size=256)[source]
Plots latent space representations (z) for a dataset. If latent_dim > 2, uses PCA to reduce to 2D for visualization. Saves a PDF to filename.
- Parameters:
- Returns:
Latent vectors with shape (N, latent_dim). labels (ndarray): Integer labels with shape (N,).
- Return type:
latents (ndarray)
- mineralML.hybrid.plot_loss_curves(train_losses, valid_losses, filename)[source]
Plots Stage A (classification, KL, total) and Stage B (reconstruction) loss curves, then saves to disk.
- mineralML.hybrid.plot_z2_overlay(df, label_column='Predict_Mineral', title='Latent Space (z2) Overlay', ref_kws=None, new_kws=None, max_points=250000, filename=None)[source]
Deprecated — use
plot_latent_spaceinstead.
- mineralML.hybrid.predict_class_prob(df, n_iterations=250, *, model_path=None, mc_dropout=True, return_recon_oxides=False, scaler_path='scaler_nn_v0030.npz', verbose=True, seed=88)[source]
Predicts mineral classes with Monte Carlo Bayesian averaging using the neural network with reconstruction classifier.
- Parameters:
df (pd.DataFrame) – Input oxide compositions. Metadata columns (‘Mineral’, ‘Source’, ‘SampleID’, ‘Sample’, ‘Sample Name’, ‘Sample ID’) are preserved in the output when present.
n_iterations (int) – Number of MC forward passes for prediction score averaging.
model_path (str|None) – Path to the .pt checkpoint. If None, defaults to the bundled model in the same directory as this module.
mc_dropout (bool) – If True, enables dropout during inference for MC sampling.
return_recon_oxides (bool) – If True, appends reconstructed oxide columns to the output DataFrame.
scaler_path (str) – Filename or relative path to the saved scaler .npz file.
seed (int|None) – If provided, calls same_seeds(seed) before MC sampling to make predictions fully reproducible. If None (default), MC draws are non-deterministic.
- Returns:
- Predictions including ‘Predict_Mineral’,
’Prediction_Score’, ‘Prediction_Score_Sigma’, ‘Second_Predict_Mineral’, and ‘Second_Prediction_Score’.
- Return type:
result_df (pd.DataFrame)
- mineralML.hybrid.predict_class_prob_nnwr(df, n_iterations=50, *, model_path=None, mc_dropout=True, return_recon_oxides=False, scaler_path='scaler_nn_v0030.npz', verbose=True, seed=88)[source]
Deprecated — use predict_class_prob instead.
- mineralML.hybrid.prep_df(df, renormalize=False, convert_fe=False, drop_empty_rows=False, min_oxide_count=2, verbose=True)[source]
Prepares a DataFrame for analysis by performing data cleaning specific to mineralogical data. Handles missing values and ensures the presence of required oxide columns. Fills missing oxide values with zero while preserving all original columns in the dataset.
- Parameters:
df (pd.DataFrame) – Input DataFrame containing mineral composition data. Metadata columns (‘Mineral’, ‘Source’, ‘SampleID’, ‘Sample’, ‘Sample Name’, ‘Sample ID’) are preserved in the output when present.
renormalize (bool) – If True, renormalizes the oxide columns to 100 wt%.
convert_fe (bool) – If True, automatically converts FeO, Fe2O3, and Fe2O3t columns to FeOt using
Fe_Conversion(). If False (the default), raises a ValueError when these columns are present without a corresponding FeOt column.drop_empty_rows (bool) – If True, drops rows where fewer than
min_oxide_countoxide columns have non-zero values. Useful for large datasets with many blank or near-blank analyses.min_oxide_count (int) – Minimum number of oxide columns that must have non-zero values for a row to be kept. Only used when
drop_empty_rows=True. Default is 2.verbose (bool) – If True, prints a summary of the number of rows processed and any rows dropped or coerced.
- Returns:
Cleaned DataFrame with NaN filled with zero for oxides.
- Return type:
df (pd.DataFrame)
- mineralML.hybrid.train_hybrid_model(df, hls_list, kl_weight_decay_list, lr, wd, dr, ep, n, balanced, ep_bottle=1000, lr_bottle=0.0001, wd_bottle=0.0001, mapper_hidden=16, mapper_nonlinear=True, decoder_hidden_sizes=(64, 32), kl_decay_epochs=750, use_bayesian_feature_layer=True, use_bayesian_classifier=False, name='nn2d', plot_latent_during_training=False, plot_every=50, plot_on='valid')[source]
Full training pipeline for the neural network classifier with reconstruction.
Stage A trains the classifier (CE + KL annealing), picking the best model by validation CE. Stage B freezes the classifier and trains a mapper (h -> z2) plus decoder (z2 -> x) with MSE reconstruction loss.
- Parameters:
df (pd.DataFrame) – Training DataFrame with ‘Mineral’ column and oxide columns.
hls_list (list[list[int]]) – Hidden layer size configurations to sweep.
kl_weight_decay_list (list[float]) – KL weight decay values to sweep.
lr (float) – Learning rate for Stage A.
wd (float) – Weight decay for Stage A.
dr (float) – Dropout rate.
ep (int) – Number of epochs for Stage A.
n (int) – Validation split size (number of samples).
balanced (bool) – If True, balance the training set via
balance().ep_bottle (int) – Number of epochs for Stage B.
lr_bottle (float) – Learning rate for Stage B.
wd_bottle (float) – Weight decay for Stage B.
mapper_hidden (int) – Hidden layer size for the latent projector.
mapper_nonlinear (bool) – If True, use a nonlinear mapper.
decoder_hidden_sizes (tuple[int]) – Hidden layer sizes for the decoder.
kl_decay_epochs (int) – Number of epochs over which to anneal the KL weight.
use_bayesian_feature_layer (bool) – If True, use VariationalLayer for features.
use_bayesian_classifier (bool) – If True, use VariationalLayer for the classifier head.
name (str) – Run name used for output filenames.
plot_latent_during_training (bool) – If True, plot z2 during Stage B training.
plot_every (int) – Plot interval in epochs during Stage B.
plot_on (str) – ‘valid’ or ‘train’ — which dataset to plot during training.
- Returns:
State dict of the best ReconstructionWrapper.
- Return type:
best_model_state (dict)
- mineralML.hybrid.train_nn_hybrid_bottleneck(classifier, mapper2d, decoder, optimizer, train_loader, valid_loader, n_epoch, criterion_recon=None, patience=50, plot_latent=False, plot_every=100, max_plot_points=10000, plot_on='valid')[source]
Stage B training: freezes the classifier and trains the mapper (h -> z2) and decoder (z2 -> x) with MSE reconstruction loss.
- Parameters:
classifier (FeatureExtractor) – Frozen Stage A classifier.
mapper2d (LatentProjector) – Trainable latent projector.
decoder (ReconstructionDecoder) – Trainable reconstruction decoder.
optimizer (torch.optim.Optimizer) – Optimizer for mapper + decoder parameters.
train_loader (DataLoader) – Training data loader.
valid_loader (DataLoader) – Validation data loader.
n_epoch (int) – Maximum number of training epochs.
criterion_recon (nn.Module|None) – Reconstruction loss function. Defaults to MSELoss.
patience (int) – Early stopping patience (epochs without improvement).
plot_latent (bool) – If True, periodically plot the 2D latent space.
plot_every (int) – Plot interval in epochs.
max_plot_points (int) – Maximum number of points to plot.
plot_on (str) – ‘valid’ or ‘train’ — which dataset to plot.
- Returns:
Training reconstruction loss history. valid_losses (dict): Validation reconstruction loss history. best_valid (float): Best validation reconstruction loss achieved. best_mapper_state (dict): State dict of the best mapper. best_decoder_state (dict): State dict of the best decoder.
- Return type:
train_losses (dict)
- mineralML.hybrid.train_nn_hybrid_classifier(model, optimizer, train_loader, valid_loader, n_epoch, criterion_cls=None, kl_weight_decay=0.001, kl_decay_epochs=750, patience=50)[source]
Stage A training: trains the classifier with cross-entropy loss and annealed KL divergence regularization from VariationalLayers.
- Parameters:
model (FeatureExtractor) – Classifier model to train.
optimizer (torch.optim.Optimizer) – Optimizer for model parameters.
train_loader (DataLoader) – Training data loader.
valid_loader (DataLoader) – Validation data loader.
n_epoch (int) – Maximum number of training epochs.
criterion_cls (nn.Module|None) – Classification loss. Defaults to CrossEntropyLoss.
kl_weight_decay (float) – Maximum KL weight after annealing.
kl_decay_epochs (int) – Number of epochs over which to anneal the KL weight.
patience (int) – Early stopping patience (epochs without improvement).
- Returns:
Training loss histories (‘total’, ‘classification’, ‘kl’). valid_losses (dict): Validation loss histories. best_valid (float): Best validation classification loss achieved. best_state (dict): State dict of the best model.
- Return type:
train_losses (dict)
- mineralML.hybrid.unique_mapping(pred_class)[source]
Generates a mapping of unique class codes from predicted class labels. Loads a predefined category list and creates a subset mapping for the unique classes found. Unknown classes are assigned a code of -1.
- Parameters:
pred_class (array-like) – Array of predicted class labels (integer codes).
- Returns:
Array of unique class codes found in pred_class. valid_mapping (dict): Dictionary mapping class codes to their corresponding mineral names, including ‘Unknown’ for code -1.
- Return type:
unique (ndarray)
Mapping Functions
- mineralML.mapping.batch_extract_line_profiles(res, transects, keys=None, source='auto', *, pixel_size_um=None, method='mean', smooth_window=1, return_long=False)[source]
Batch-extract profiles for one or more map keys from saved transect coordinates.
- Parameters:
res (dict) – Result dictionary returned by
run_map().transects (pd.DataFrame|list[dict]) – Table with at least
x0,y0,x1,y1. Ifwidth_pxorn_binsare present they are reused per transect.keys (str|list[str]|None) – Oxide/component keys to extract. If None, defaults to available oxide maps in the canonical
OXIDESorder.source (str) – One of
"auto","oxide", or"component".pixel_size_um (float|None) – Override physical pixel size. If None, uses the value from each transect row when present.
method (str) – Aggregation method passed to
extract_line_profile(). Use"none"to keep individual pixels,"mean"or"median"to bin along the transect.smooth_window (int) – Rolling smoothing window, in bins (or pixels when
method="none").return_long (bool) – If True, also return the long-format profile table as a second output.
- Returns:
- One row per pixel (
method="none") or per bin, with each key as its own column.
- profiles_long_df (pd.DataFrame, optional): Long-format table returned
only when
return_long=True.
- One row per pixel (
- Return type:
profiles_df (pd.DataFrame)
- mineralML.mapping.df_to_maps(df, shape)[source]
Convert a flattened DataFrame back into dict of 2D arrays.
- mineralML.mapping.extract_line_profile(data, start, end, width_px=1.0, n_bins=None, pixel_size_um=None, method='none', smooth_window=1)[source]
Extract a line profile from a 2-D map with a finite-width strip. Pixels with centers inside the strip are projected onto the transect axis. With
method="none"every pixel is returned as its own row (no binning). Otherwise pixels are aggregated into distance bins.- Parameters:
data (array-like) – 2-D map to sample.
start (array-like) – (x, y) start coordinate in pixel space.
end (array-like) – (x, y) end coordinate in pixel space.
width_px (float) – Total projection width in pixels.
n_bins (int|None) – Number of bins along the profile. Ignored when
method="none". Defaults to roughly one bin per pixel of line length.pixel_size_um (float|None) – Micrometres per pixel, used to populate physical-distance columns.
method (str) – Aggregation method, one of
"mean","median", or"none". Use"none"to skip binning and return each pixel as an individual data point.smooth_window (int) – Optional rolling window, in bins (or pixels when
method="none").
- Returns:
- Aggregated profile table, or one row per
pixel when
method="none".
samples (pd.DataFrame): Raw projected strip pixels inside the strip.
- Return type:
profile_df (pd.DataFrame)
- mineralML.mapping.fill_phase_holes(mineral_map, max_hole_size=10, exclude_phases=None)[source]
Fills holes strictly within individual continuous phases.
This function prevents accidental “spillover” by ensuring that it only fills empty spaces (NaNs or Unindexed) completely enclosed by a single phase. It intentionally preserves natural interstitial networks.
- Parameters:
mineral_map (ndarray) – 2D array of phase labels (strings or IDs).
max_hole_size (int) – The maximum area (in pixels) of an enclosed empty space allowed to be filled.
exclude_phases (list[str], optional) – Phases that naturally exist as interstitial material and should NOT be artificially expanded. Defaults to [“Glass”, “Vesicles”].
- Returns:
A new 2D array with enclosed holes filled.
- Return type:
filled_map (ndarray)
- mineralML.mapping.get_profile_map(res, key, source='auto')[source]
Resolve a 2-D map for line-profile extraction from a
run_map()result or a plain oxide-map dict returned byload_maps_from_dir().- Parameters:
- Returns:
2-D float array for the requested map.
- Return type:
data (ndarray)
- mineralML.mapping.interactive_line_profile(res, key, method='none', source='auto', *, phase=None, width_px=3.0, n_bins=None, pixel_size_um=None, smooth_window=1, cmap='viridis', vmin=None, vmax=None, title=None, cbar_label=None, layout='vertical', multi=True, figsize=None)[source]
Launch a clickable Jupyter transect tool for oxide or component maps.
Intended for notebook use with an interactive Matplotlib backend such as
%matplotlib widget. Click once for the profile start and a second time for the profile end. Pressrto clear and redraw.- Parameters:
res (dict) – Result dictionary returned by
run_map().key (str) – Oxide or component key, e.g.
"SiO2"or"Feldspar.An".method (str) – Aggregation method passed to
extract_line_profile().source (str) – One of
"auto","oxide", or"component".phase (str|list[str]|None) – If provided, mask the map so only pixels matching this phase are shown; all others are set to NaN.
width_px (float) – Transect-strip width in pixels.
n_bins (int|None) – Number of distance bins.
pixel_size_um (float|None) – Micrometers per pixel.
smooth_window (int) – Rolling smoothing window, in bins.
cmap (str) – Colormap for the source map.
vmin (float|None) – Lower display limit.
vmax (float|None) – Upper display limit.
title (str|None) – Title for the map panel.
cbar_label (str|None) – Colorbar label for the map panel.
layout (str) –
"vertical"for stacked axes or"horizontal"for side-by-side axes.multi (bool) – If True, each completed click-pair is retained as a new profile. If False, a new transect replaces the previous one.
figsize (tuple) – Figure size.
- Returns:
- Dictionary with keys
fig,profiles (list of per-transect DataFrames),
profiles_df(all transects concatenated),samples(list of per-transect raw pixel DataFrames),samples_df(all concatenated),coordinates_df(transect metadata), and helper accessorsget_profile,get_samples,get_coordinates.
- Dictionary with keys
- Return type:
controller (dict)
- mineralML.mapping.interactive_pixels(result, region=1, cmap_name='tab20', phase_colors=None, phase=None, oxide_key=None, oxide_cmap='viridis', vmin=None, vmax=None)[source]
Display the phase map and collect oxide compositions by clicking pixels. Each click places a marker on the map, prints the oxide values, and appends a row to
controller["picks"]. Setregionto an odd integer greater than 1 to average over an n×n box of same-phase pixels around each click rather than recording a single pixel. Requires an interactive Matplotlib backend (%matplotlib widgetin a notebook).- Keybindings:
r/u: undo the last picked pixel c: clear all picks q: quit (disconnect click and key handlers)
- Parameters:
result (dict) – Result dictionary returned by
run_map().region (int) – Odd integer side length of the square region to average around each clicked pixel. Default
region=1records the single clicked pixel. Set to3,5, etc. to average over an n×n box — only pixels matching the clicked pixel phase are included.cmap_name (str) – Matplotlib colormap for the phase map display.
phase_colors (dict|None) – Optional manual color overrides {PhaseName: color}.
phase (str|list[str]|None) – If provided, only pixels matching this phase are shown and clickable. Others are rendered as background.
oxide_key (str|None) – If provided, display this oxide or component as a heatmap instead of the phase map (e.g.
"SiO2"). The phase map legend is replaced by a colorbar.oxide_cmap (str) – Colormap for the oxide heatmap when
map_keyis set.vmin (float|None) – Lower display limit for the oxide heatmap.
vmax (float|None) – Upper display limit for the oxide heatmap.
- Returns:
- Dict with keys
'fig'and'picks'. picksis apd.DataFramethat grows with each click, with columnsx,y,phase, and one column per oxide.
- Dict with keys
- Return type:
controller (dict)
- mineralML.mapping.load_element_maps(path, drop_trailing_blank=False, verbose=True)[source]
Load element maps from a directory of CSVs into a dictionary of 2D arrays.
- Parameters:
- Returns:
- Dictionary mapping element symbols (str) to 2D numpy arrays (float).
NaNs are preserved. Trailing blank columns are dropped when drop_trailing_blank is True.
- Return type:
out (dict)
- mineralML.mapping.load_maps_from_dir(path, units='element_wt%', renormalize=False)[source]
Load per-element CSV maps from a directory and return oxide wt% maps.
- Parameters:
path (str) – Path to directory containing element CSV maps.
units (str) – Interpretation of the input map values. Use
"element_wt%"if the CSV maps contain elemental wt% values that should be stoichiometrically converted to oxide wt%, or"oxide_wt%"if the CSV maps already contain oxide wt% values and only need to be relabeled from element-style names to oxide names.renormalize (bool) – If True, rescale each pixel so oxides sum to 100 wt%.
- Returns:
Dictionary mapping oxide names (str) to 2D numpy arrays (float).
- Return type:
ox_maps (dict)
- mineralML.mapping.maps_to_df(E)[source]
Convert a dictionary of 2D arrays into a flat DataFrame.
- Parameters:
E (dict) – Dictionary mapping element symbols to 2D numpy arrays (maps).
- Returns:
Flattened DataFrame with each element as a column. shape (tuple): Original 2D shape (H, W) of the maps.
- Return type:
df (pd.DataFrame)
- mineralML.mapping.parse_ctf_header(filepath)[source]
Parses the header of a .ctf file to extract grid dimensions and phase mappings. The file is expected to contain ‘XCells’, ‘YCells’, ‘Phases’, and a data table starting with ‘PhasetXtY’.
- Parameters:
filepath (str) – The path to the .ctf file.
- Returns:
The number of cells in the X direction. y_cells (int): The number of cells in the Y direction. data_start (int): The line index where the actual data table begins. phase_mapping (dict): A dictionary that maps each integer ID to its phase name.
- Return type:
x_cells (int)
- mineralML.mapping.pick_common_phases(mineral_map, top_k=None)[source]
Select abundant phases by pixel fraction, optionally capped at top_k.
- mineralML.mapping.plot_component_composite(res, title='Composite', save_path=None, remove_islands_flag=True, fill_holes_flag=True, hole_size=10, min_frac=0.0, phases=None, mask_config=None, phase_colors=None, smooth_sigma=0.0, limits_mode='std', percentile=(5, 95), legend_on=True, legend_cols=1, ax=None, scalebar_um=None, pixel_size_um=None, scalebar_loc='lower left', scalebar_col='black', cbar_hgap=0.015, cbar_vgap=-0.05, cbar_height=0.03, dpi=300)[source]
Renders a composite map overlaying continuous solid-solution compositions (e.g., Plagioclase An%, Olivine Fo%) on top of a categorical phase mask. Small holes in the phase map and compositional data can be filled with
fill_holes_flag, and isolated pixel clusters removed viaremove_islands_flag— matching the same parameters inrun_map.- Parameters:
res (dict) – The result dictionary returned by
run_map(), containing ‘mineral_map’ and ‘component_maps’.title (str) – Title of the composite plot.
save_path (str, optional) – Filepath to save the figure (e.g., ‘plot.png’).
remove_islands_flag (bool) – If True, removes isolated pixel clusters smaller than 2 pixels from the phase map.
fill_holes_flag (bool) – If True, fills small holes in both the phase map and continuous component data up to
hole_sizepixels.hole_size (int) – Maximum hole area (in pixels) to fill.
min_frac (float) – Minimum pixel fraction required for a phase to be included in the composite. Fractions are computed from the cleaned mineral map after island removal / hole filling.
phases (list[str], optional) – Explicit list of phases to plot.
mask_config (dict, optional) – Custom layer masking configuration (e.g., mapping Glass).
phase_colors (dict, optional) – Custom categorical colors for leftover phases.
smooth_sigma (float) – Gaussian blur sigma for smoothing compositional data.
limits_mode (str) – ‘percentile’ or ‘std’ for auto-scaling color ramps.
percentile (tuple) – (min, max) percentiles for color limits.
legend_on (bool) – If True, display the legend.
legend_cols (int) – Number of columns in the legend.
ax (matplotlib.axes.Axes, optional) – Pre-existing axes to plot onto.
scalebar_um (float, optional) – Length of the scale bar in micrometers.
pixel_size_um (float) – Physical size of a single pixel in micrometers.
scalebar_loc (str) – Location of the scale bar (e.g., ‘lower left’).
scalebar_col (str) – Color of the scale bar text/line.
cbar_hgap (float) – Horizontal gap between adjacent colorbars (axes fraction).
cbar_vgap (float) – Vertical offset of the colorbar row below the map (axes fraction; negative = below the axes).
cbar_height (float) – Height of each colorbar (axes fraction).
dpi (int) – Resolution of the figure.
- Returns:
The generated composite figure. mineral_map (ndarray): The cleaned 2D categorical map used as the base. processed_comp_maps (dict): The smoothed/filled 2D continuous component data.
- Return type:
fig (matplotlib.figure.Figure)
- mineralML.mapping.plot_ctf_phases(filepath: str, max_legend=25, rename_dict=None, phase_colors=None, ax=None, title='default', scalebar_um=None, scalebar_loc='lower left', scalebar_col='black', legend_on=True)[source]
Loads phase data from a .ctf file and generates a 2D categorical phase map. It maps raw phase IDs to their corresponding names, optionally renames them, and orders the legend by phase abundance.
- Parameters:
filepath (str) – The path to the .ctf file.
max_legend (int, optional) – The maximum number of phases to display in the legend. Defaults to 25.
rename_dict (dict, optional) – A dictionary mapping messy phase names (or partial matches) to clean names. Defaults to None.
phase_colors (dict, optional) – A dictionary mapping clean phase names to specific matplotlib colors (e.g., {‘Quartz’: ‘red’, ‘Enstatite’: ‘#00FF00’}). Defaults to None.
ax (matplotlib.axes.Axes, optional) – An existing axes object to plot on. If None, a new figure and axes will be created.
title (str or None, optional) – The title for the plot. If “default”, creates an auto-generated title with dimensions. If None, no title is shown.
scalebar_um (float, optional) – Length of the scale bar in micrometers.
scalebar_loc (str) – Location of the scale bar (e.g., ‘lower left’).
scalebar_col (str) – Color of the scale bar text/line.
legend_on (bool) – If True, displays the legend. Defaults to True.
- Returns:
The figure object. phase_map (ndarray): A 2D array of the mapped phase names as strings. raw_ids (ndarray): A 2D array of the raw numeric phase IDs from the file. phase_mapping (dict): A dictionary mapping raw IDs to phase names. unique_names (ndarray): An array of the unique phase names sorted by abundance.
- Return type:
fig (matplotlib.figure.Figure)
- mineralML.mapping.plot_line_profile(profile_df, ax=None, label=None, color='black', show_counts=False)[source]
Plot a line-profile dataframe produced by
extract_line_profile().- Parameters:
- Returns:
Axis containing the profile.
- Return type:
ax (matplotlib.axes.Axes)
- mineralML.mapping.plot_locations(res, transects, map_key=None, source='auto', *, cmap='viridis', vmin=None, vmax=None, title=None, cbar_label=None, show_width=True, annotate=True, annotate_offset_px=4.0, ax=None, figsize=(7, 7))[source]
Plot transect lines or pixel pick locations on top of a map or blank canvas.
Accepts either a transects table (from
interactive_line_profileorbatch_line_profiles) with columnsx0,y0,x1,y1, or a pixel picks table (fromextract_pixel_comp) with columnsx,y. The input type is detected automatically.- Parameters:
res (dict) – Result dictionary returned by
run_map().transects (pd.DataFrame|list[dict]) – Transect table with
x0,y0,x1,y1, or pixel picks table withx,y.map_key (str|None) – Background oxide/component key. If None, uses a blank pixel-space canvas based on
res['shape'].source (str) – One of
"auto","oxide", or"component".cmap (str) – Background colormap when
map_keyis provided.vmin (float|None) – Lower display limit for the background map.
vmax (float|None) – Upper display limit for the background map.
title (str|None) – Plot title.
cbar_label (str|None) – Colorbar label for the background map.
show_width (bool) – If True, draw the finite-width strip outline when
width_pxis available (transect mode only).annotate (bool) – If True, label each point or transect with its index.
annotate_offset_px (float) – Pixel offset for transect annotation labels.
ax (matplotlib.axes.Axes|None) – Existing axis to draw on.
figsize (tuple) – Figure size when creating a new figure.
- Returns:
Figure containing the overlay. ax (matplotlib.axes.Axes): Axis containing the map and overlay.
- Return type:
fig (matplotlib.figure.Figure)
- mineralML.mapping.plot_oxide_map(res, oxide_name, title=None, cmap='viridis', vmin=None, vmax=None, cbar_label=None, **kwargs)[source]
Plots a 2-D oxide-concentration map with a colourbar.
- Parameters:
res (dict) – The result dictionary returned by
run_map(), containing'oxide_maps'.oxide_name (str) – Oxide name to plot from
res['oxide_maps'](e.g.,'SiO2','FeOt').title (str, optional) – Plot title. Defaults to
'{oxide_name} Map'.cmap (str or Colormap, optional) – Colormap name. Defaults to
'magma'.vmin (float, optional) – Lower colour-scale limit. Defaults to the data minimum, ignoring background values.
vmax (float, optional) – Upper colour-scale limit. Defaults to the data maximum, ignoring background values.
cbar_label (str, optional) – Colourbar label. Defaults to
'{oxide_name} (wt.%)'.**kwargs – Forwarded to
_plot_continuous_map(e.g.,bg_value,scalebar_um,pixel_size_um,scalebar_col,scalebar_loc,ax,dpi).
- Returns:
The generated figure. ax_map (matplotlib.axes.Axes): The axes containing the oxide map.
- Return type:
fig (matplotlib.figure.Figure)
- mineralML.mapping.plot_phase_counts(mineral_map_2d, title='Mineral Phases (count)', phases=None, normalize=True, min_frac=0.0001, ax=None)[source]
Bar chart of pixel counts (or fractions) per phase with auto figure width.
- Parameters:
- Returns:
(fig, ax) with the bar chart.
- Return type:
fig_ax (tuple)
- mineralML.mapping.plot_phase_map(mineral_map_2d, phases=None, title='Phase Map', bg_color=(0.08, 0.08, 0.08), cmap_name='tab20', legend_side='right', legend_cols=1, remove_islands_flag=False, fill_holes_flag=False, cleanup_min_size=2, hole_size=10, min_frac=1e-05, scalebar_um=None, pixel_size_um=None, scalebar_loc='lower left', scalebar_col='black', phase_colors=None, legend_on=True, ax=None, dpi=300)[source]
Generates a 2D categorical phase map colored by mineral type.
Applies optional morphological cleaning (removing small islands and filling small holes) before rendering. Automatically handles legend placement, figure sizing, and scale bar generation.
- Parameters:
mineral_map_2d (array-like) – 2D array of phase labels (strings or ints).
phases (list[str], optional) – Explicit list of phases to include in the legend. If None, common phases are automatically extracted.
title (str) – The title of the plot.
bg_color (tuple) – RGB tuple for the background (unindexed/NaN) color.
cmap_name (str) – Name of the matplotlib colormap to sample from.
legend_side (str) – Placement of the legend (‘right’, ‘left’, ‘top’, ‘bottom’).
legend_cols (int) – Number of columns for the legend text.
remove_islands_flag (bool) – If True, removes isolated pixels.
fill_holes_flag (bool) – If True, fills small holes within continuous phases.
cleanup_min_size (int) – Minimum pixel area to keep if remove_islands_flag is True.
hole_size (int) – Maximum hole area (in pixels) to fill if fill_holes_flag is True.
min_frac (float) – Minimum pixel fraction (default 0.00001, i.e., 0.001%) required to keep a phase. Rare phases below this are grouped into ‘Unknown’.
scalebar_um (float, optional) – Length of the scale bar in micrometers.
pixel_size_um (float) – Physical size of a single pixel in micrometers.
scalebar_loc (str) – Location of the scale bar (e.g., ‘lower left’).
scalebar_col (str) – Color of the scale bar text/line.
phase_colors (dict, optional) – Custom mapping of {PhaseName: (R,G,B)}.
legend_on (bool) – If True, displays the legend.
ax (matplotlib.axes.Axes, optional) – Pre-existing axes to plot on.
dpi (int) – Resolution for the generated figure.
- Returns:
The generated matplotlib figure. ax_map (matplotlib.axes.Axes): The axes containing the image map. cleaned_mineral_map (ndarray): The processed 2D mineral map after cleanup.
- Return type:
fig (matplotlib.figure.Figure)
- mineralML.mapping.plot_phase_proportions(mineral_map, title='Phase Proportions', phases=None, min_frac=0.0001, phase_colors=None, cmap_name='tab20', annotate=True, annotate_kw=None, ax=None)[source]
Stacked horizontal bar of phase area proportions. Proportions are normalized to classified pixels only. The fraction of unclassified pixels (NaN, epoxy, low-confidence) is printed below the x-axis as a note.
- Parameters:
mineral_map (array-like) – (H,W) or (N,) phase labels.
title (str) – Axes title / y-label for the bar.
phases (list[str]|None) – Subset of phases in display order (None→auto).
min_frac (float) – Minimum pixel fraction required to include a phase.
phase_colors (dict|None) – {PhaseName: color}. Falls back to cmap_name.
cmap_name (str) – Matplotlib colormap used when phase_colors is incomplete.
annotate (bool) – If True, label each segment with its percentage.
annotate_kw (dict|None) – Extra keyword arguments forwarded to
_annotate_stacked_bar.ax (matplotlib.axes.Axes|None) – Axes to plot on (None→create new).
- Returns:
(fig, ax) with the stacked bar chart.
- Return type:
fig_ax (tuple)
- mineralML.mapping.plot_pred_score_histograms(pred_score_map, mineral_map, pred_score_threshold, phases=None, bins=50, min_frac=0.0001, share_y=True, title='Prediction Scores', empirical_phases=('Zircon', 'SiO2_Polymorph', 'Carbonate'))[source]
Horizontal histograms of per-phase prediction scores (auto grid).
- Parameters:
pred_score_map (array-like) – (H,W) max class prediction scores per pixel.
mineral_map (array-like) – (H,W) predicted labels (NaN allowed).
pred_score_threshold (float) – Lower bound for the y-axis.
phases (list[str] | None) – Subset of phases to plot (None -> auto).
bins (int) – Histogram bins.
min_frac (float) – Minimum pixel fraction required to plot a phase.
share_y (bool) – Share prediction score axis across panels.
title (str) – Figure suptitle text.
empirical_phases (iterable[str]) – Phases assigned empirically and therefore lacking prediction scores.
- Returns:
(fig, axes)
- Return type:
- mineralML.mapping.plot_score_map(res, phases=None, title='Prediction Score Map', cmap='magma', vmin=0, vmax=1, cbar_label='Prediction Score', **kwargs)[source]
Plots a continuous prediction-score map with a colourbar.
- Parameters:
res (dict) – The result dictionary returned by
run_map(), containing ‘mineral_map’ and ‘component_maps’.phases (list[str] | None) – If provided, only show prediction scores for these phases; all other pixels are masked to background.
title (str) – Plot title.
cmap (str or Colormap) – Colourmap name (default ‘magma’).
vmin (float) – Lower colour-scale limit.
vmax (float) – Upper colour-scale limit.
cbar_label (str) – Label for the colourbar.
**kwargs – Forwarded to
_plot_continuous_map(bg_value, scalebar_um, pixel_size_um, scalebar_col, scalebar_loc, ax, dpi).
- Returns:
The generated figure. ax_map (matplotlib.axes.Axes): The axes containing the score map.
- Return type:
fig (matplotlib.figure.Figure)
- mineralML.mapping.remove_islands(mineral_map, min_size=2, connectivity=1, fill_val=0, phase_min_sizes=None, grouped_phases=None, ignore_vals=None)[source]
Removes isolated islands of minerals using morphological size filtering.
This works with both string and integer arrays. Small objects below the specified area threshold are overwritten with fill_val.
- Parameters:
mineral_map (ndarray) – 2D array of phase labels or IDs.
min_size (int) – Default minimum area (in pixels) for an object to be kept.
connectivity (int) – Neighborhood definition (1 for 4-connected, 2 for 8-connected).
fill_val (any) – The value to insert where pixels are removed (e.g., “nan”, 0).
phase_min_sizes (dict, optional) – Custom minimum sizes mapped per phase.
grouped_phases (list[tuple], optional) – Lists of phases that should be treated as a single continuous unit when evaluating size (e.g., grouped clinopyroxene and orthopyroxene).
ignore_vals (set, optional) – Values that are considered background and skipped (e.g., ‘NaN’, ‘Unindexed’).
- Returns:
A new 2D array with small islands removed.
- Return type:
cleaned_map (ndarray)
- mineralML.mapping.renormalize_maps(ox_maps)[source]
Scale each pixel so oxide totals sum to 100 wt%.
- mineralML.mapping.run_map(sample_input, renormalize=False, total_threshold=None, n_iterations=50, min_frac=1e-05, pred_score_threshold=0.6, units='element_wt%', top_k=None, phases=None, exclude_phases=None, phase_colors=None, bar_style='vertical', components_spec=None, remove_islands_flag=False, fill_holes_flag=False, hole_size=10, scalebar_um=None, pixel_size_um=None, scalebar_loc='lower left', scalebar_col='black', show=True)[source]
Load, convert, predict, and plot for one folder of CSV maps. Always computes mineral components and returns a full results dictionary. Use
remove_islands_flagandfill_holes_flagto clean the phase map before plotting and downstream analysis.- Parameters:
sample_input (str | Path | dict) – Directory path or a dict of oxide maps.
renormalize (bool) – If True, scale each pixel so oxides sum to 100 wt%. Applied after total masking.
total_threshold (float | None) – Pixels with oxide total below this value (wt%) are set to NaN before renormalization and prediction. Use this to mask epoxy/background.
n_iterations (int) – MC forward passes for prediction.
min_frac (float) – Minimum pixel fraction required to keep a phase.
pred_score_threshold (float) – Label NaN where max prediction score < threshold.
units (str) – Input format — ‘element_wt%’ or ‘oxide_wt%’.
top_k (int|None) – Cap displayed phases after filtering.
phases (list[str]|None) – Explicit phases to plot (overrides auto-pick).
exclude_phases (list[str]|None) – Phases to remove from auto-pick.
phase_colors (dict|None) – Manual color mapping {PhaseName: HexColor}.
bar_style (str) – “vertical” for the default bar chart (
plot_phase_counts), or “stacked” for a stacked horizontal bar (plot_phase_proportions).components_spec (dict|None) – Custom mineral formula logic.
remove_islands_flag (bool) – If True, removes isolated pixel clusters smaller than 2 pixels (4-connected) from the phase map. Useful for cleaning up salt-and-pepper noise in the epoxy region.
fill_holes_flag (bool) – If True, fills enclosed background holes within continuous phase regions up to
hole_sizepixels. Useful for patching small gaps inside large mineral grains.hole_size (int) – Maximum hole area (in pixels) to fill when
fill_holes_flagis True.scalebar_um (float, optional) – Length of the scale bar in micrometers.
pixel_size_um (float) – Physical size of a single pixel in micrometers.
scalebar_loc (str) – Location of the scale bar (e.g., ‘lower left’).
scalebar_col (str) – Color of the scale bar text/line.
show (bool) – If True, calls plt.show().
- Returns:
- Dictionary with keys ‘figs’, ‘shape’, ‘oxide_maps’,
’df_pred’, ‘mineral_map’, ‘pred_score_map’, ‘kept_phases’, ‘component_maps’, ‘component_frames’.
oxide_mapsincludes a'Total'key with the per-pixel oxide sum.
- Return type:
result (dict)
Stoichiometric Functions
- class mineralML.stoichiometry.AmphiboleClassifier(comps)[source]
General amphibole calculations for classification and plotting.
- classify(subclass=True, eps=1e-09)[source]
Classify amphibole analyses using Leake-style Si (apfu) and Mg# produced by AmphiboleCalculator.calculate_components().
Returns a DataFrame with Mineral and (optionally) Submineral.
Note: only calcic amphiboles (1.5 <= Ca_B <= 2.05) are supported by the Leake ternary classifier. Rows outside that range are flagged but not classified further.
- class mineralML.stoichiometry.ApatiteCalculator(comps)[source]
Apatite-specific calculations. Ca5(PO4)3(F,OH,Cl).
- class mineralML.stoichiometry.BaseMineralCalculator(comps)[source]
Base class for mineral composition calculations. Implement calculate_components() for each mineral.
- class mineralML.stoichiometry.BiotiteCalculator(comps)[source]
Biotite-specific calculations. XM^{2+}3[Si3Al]010(OH)2.
- class mineralML.stoichiometry.CalciteCalculator(comps)[source]
Calcite-specific calculations. CaCO3.
- class mineralML.stoichiometry.ChloriteCalculator(comps)[source]
Chlorite-specific calculations. (Mg,Fe)10Al2[Al2Si6O20](OH)16
- class mineralML.stoichiometry.ClinopyroxeneCalculator(comps)[source]
Clinopyroxene-specific calculations. Ca(Mg,Fe)Si2O6.
- class mineralML.stoichiometry.EpidoteCalculator(comps)[source]
Epidote-specific calculations. A2M3Z3(O,OH,F)12.
- class mineralML.stoichiometry.FeldsparClassifier(comps)[source]
General feldspar calculations for classification and plotting.
- class mineralML.stoichiometry.GlassClassifier(comps)[source]
General glass calculations with built-in TAS classification and plotting.
- calculate_components(subclass=True, which_name='volcanic')[source]
Calculates base components and optionally appends TAS classification.
- plot(df_class=None, subclass=True, which_name='volcanic', figsize=(8, 6), ax=None, **scatter_kwargs)[source]
Plot glasses on a TAS diagram.
- Parameters:
df_class (pd.DataFrame|None) – Pre-computed output of calculate_components(). Runs it automatically if None.
subclass (bool) – If True, colour-codes points by TAS rock type.
which_name (str) – ‘volcanic’ or ‘intrusive’ label variant for both field labels and the legend.
figsize (tuple) – Figure size passed to plt.subplots if creating a new figure.
ax (matplotlib.axes.Axes|None) – Existing axis to plot onto. If None, a new figure and axis are created.
**scatter_kwargs – Passed to ax.scatter (e.g. s=40, alpha=0.9).
- Returns:
The figure object. ax (matplotlib.axes.Axes): The axis object.
- Return type:
fig (matplotlib.figure.Figure)
- class mineralML.stoichiometry.KalsiliteCalculator(comps)[source]
Kalsilite-specific calculations. K[AlSiO4].
- class mineralML.stoichiometry.LeuciteCalculator(comps)[source]
Leucite-specific calculations. K[AlSi2O6].
- class mineralML.stoichiometry.MeliliteCalculator(comps)[source]
Melilite-specific calculations. (Ca,Na)2[(Mg,Fe2+,Al,Si)3O7].
- class mineralML.stoichiometry.MuscoviteCalculator(comps)[source]
Muscovite-specific calculations. XM^{3+}2[Si3Al]010(OH)2.
- class mineralML.stoichiometry.NephelineCalculator(comps)[source]
Nepheline-specific calculations. Na3(Na,K)[Al4Si4O16].
- class mineralML.stoichiometry.OlivineCalculator(comps)[source]
Olivine-specific calculations. (Mg,Fe)2SiO4.
- class mineralML.stoichiometry.OrthopyroxeneCalculator(comps)[source]
Orthopyroxene-specific calculations.
- class mineralML.stoichiometry.OxideClassifier(df)[source]
General classifier for rhombohedral oxides and spinels. Use either: - RhombohedralOxideCalculator for rhombohedral oxides (e.g., Hematite, Ilmenite) - SpinelCalculator for spinels (e.g., Spinel, Magnetite)
- Returns:
Compositions/sites from the specific calculators with XR2, XR3, XTi already populated by those calculators. Non-routed rows are passed through.
- class mineralML.stoichiometry.PyroxeneClassifier(comps)[source]
General pyroxene calculations for classification.
- calculate_components()[source]
Return complete pyroxene composition with site assignments and enstatite, ferrosilite, wollastonite.
- classify(subclass=True)[source]
Classify pyroxene analyses into broad classes (ortho vs. clino) and optional DHZ subclasses. Uses Morimoto (1988) scheme to first separate sodic vs. non-sodic pyroxenes.
- Parameters:
subclass (bool) – If True, determine Submineral classification.
- Returns:
- DataFrame with new columns for Mineral,
Submineral, En, Fs, and Wo.
- Return type:
classified_df (pd.DataFrame)
- plot(df_class=None, subclass=True, labels='short', figsize=(8, 5), quad_only=True, ax=None, **kw)[source]
Plot pyroxene compositions on the DHZ quadrilateral.
- Parameters:
df_class – Output of .classify(). If None, will call .classify(subclass).
subclass – Whether to color by Submineral (if False, colors by Mineral).
figsize – Default (8,5)
**kw – Passed to the field-boundary tax.line(…) calls (e.g. ls=’:’, lw=0.5).
- Returns:
matplotlib.figure.Figure tax: ternary.TernaryAxesSubplot
- Return type:
fig
- class mineralML.stoichiometry.RhombohedralOxideCalculator(comps)[source]
Rhombohedral oxide-specific calculations. Hematite-Ilmenite, Fe2O3-(FeTi)2O3.
- class mineralML.stoichiometry.SerpentineCalculator(comps)[source]
Serpentine-specific calculations. Mg3[Si2O5](OH)4.
- class mineralML.stoichiometry.SodicPyroxeneCalculator(comps)[source]
Sodic Pyroxene-specific calculations. (Na,Ca)(Mg,Fe3+,Al)Si2O6.
- class mineralML.stoichiometry.SpinelCalculator(comps)[source]
Spinel group-specific calculations. MgAl2O4, Fe3O4, AB2X4.
- class mineralML.stoichiometry.TASClassifier(config: dict = {'axes': {'x': 'SiO2', 'y': 'Na2O + K2O'}, 'fields': {'Bs': {'name': ['Basalt', 'Gabbro'], 'poly': [[45, 0], [45, 5], [52, 5], [52, 0]]}, 'F': {'name': ['Foidite', 'Foidolite'], 'poly': [[35, 9], [37, 14], [52.5, 18], [52.5, 14], [48.4, 11.5], [45, 9.4], [41, 7], [41, 3], [37, 3]]}, 'O1': {'label': ['Basaltic\nAndesite', 'Gabbroic\nDiorite'], 'name': ['Basaltic Andesite', 'Gabbroic Diorite'], 'poly': [[52, 0], [52, 5], [57, 5.9], [57, 0]]}, 'O2': {'name': ['Andesite', 'Diorite'], 'poly': [[57, 0], [57, 5.9], [63, 7], [63, 0]]}, 'O3': {'name': ['Dacite', 'Granodiorite'], 'poly': [[63, 0], [63, 7], [69, 8], [77.3, 0]]}, 'Pc': {'name': ['Picrite', 'Peridotgabbro'], 'poly': [[41, 3], [45, 3], [45, 2], [45, 0], [41, 0]]}, 'Ph': {'label': ['Phonolite', 'Foid\nSyenite'], 'name': ['Phonolite', 'Foid Syenite'], 'poly': [[52.5, 14], [52.5, 18], [57, 18], [63, 16.2], [61, 13.5], [57.6, 11.7]]}, 'R': {'name': ['Rhyolite', 'Granite'], 'poly': [[69, 8], [69, 13], [85.9, 6.8], [87.5, 4.7], [77.3, 0]]}, 'S1': {'label': ['Trachy-\nbasalt', 'Foidolite'], 'name': ['Trachybasalt', 'Foidolite'], 'poly': [[45, 5], [49.4, 7.3], [52, 5]]}, 'S2': {'label': ['Basaltic\nTrachy-\nandesite', 'Foidolite'], 'name': ['Basaltic Trachyandesite', 'Foidolite'], 'poly': [[49.4, 7.3], [53, 9.3], [57, 5.9], [52, 5]]}, 'S3': {'label': ['Trachy-\nandesite', 'Foidolite'], 'name': ['Trachyandesite', 'Foidolite'], 'poly': [[53, 9.3], [57.6, 11.7], [61, 8.6], [63, 7], [57, 5.9]]}, 'T1T2': {'label': ['Trachyte/\nTrachy-\ndacite', 'Syenite'], 'name': ['Trachyte-Trachydacite', 'Syenite'], 'poly': [[57.6, 11.7], [61, 13.5], [63, 16.2], [69, 13], [69, 8], [63, 7], [61, 8.6]]}, 'U1': {'label': ['Tephrite', 'Foid\nGabbro'], 'name': ['Tephrite', 'Foid Gabbro'], 'poly': [[41, 3], [41, 7], [45, 9.4], [49.4, 7.3], [45, 5], [45, 3]]}, 'U2': {'label': ['Phono-\ntephrite', 'Foid\nMonzo-\ndiorite'], 'name': ['Phonotephrite', 'Foid Monzodiorite'], 'poly': [[45, 9.4], [48.4, 11.5], [53, 9.3], [49.4, 7.3]]}, 'U3': {'label': ['Tephri-\nphonolite', 'Foid\nMonzo-\nsyenite'], 'name': ['Tephriphonolite', 'Foid Monzosyenite'], 'poly': [[48.4, 11.5], [52.5, 14], [57.6, 11.7], [53, 9.3]]}, 'nan': {'name': 'N/A', 'poly': []}, 'none': {'name': ['N/A'], 'poly': []}}, 'name': 'TAS'})[source]
Total Alkali-Silica classifier backed by a polygon JSON config.
Fields with empty or missing ‘poly’ entries are skipped. Classification uses matplotlib.path.Path for point-in-polygon tests and is vectorized over all valid fields.
- Parameters:
config (dict) – TAS config dict with ‘fields’ key mapping IDs to dicts containing ‘name’ (list[str]) and ‘poly’ (list of [x, y] vertices).
- add_to_axes(ax, add_labels=True, which_labels='volcanic', label_fontsize=7, **patch_kwargs)[source]
Draw TAS polygon boundaries (and optionally labels) onto an axis.
- Parameters:
ax (matplotlib.axes.Axes) – Target axis.
add_labels (bool) – Whether to annotate each field at its centroid.
which_labels (str) – ‘volcanic’ or ‘intrusive’ — selects which rock name to show.
label_fontsize (int) – Font size for field labels.
**patch_kwargs – Passed to matplotlib.patches.Polygon (e.g. linewidth, alpha, edgecolor).
- Returns:
The modified axis.
- Return type:
ax (matplotlib.axes.Axes)
- get_rock_name(field_id, which='volcanic', for_display=False)[source]
Resolve a field ID to its rock name.
- Parameters:
field_id (str|float) – A field ID string or NaN for unclassified points.
which (str) – ‘volcanic’ returns index 0 of the name list; ‘intrusive’ returns index -1.
for_display (bool) – If True, returns the ‘label’ entry (with line breaks) when available, falling back to ‘name’. If False, always uses ‘name’.
- Returns:
Rock name, or ‘Unclassified’ if the field ID is not found.
- Return type:
- class mineralML.stoichiometry.TitaniteCalculator(comps)[source]
Titanite-specific calculations. CaTiSiO5.
- class mineralML.stoichiometry.TourmalineCalculator(comps)[source]
Tourmaline-specific calculations. XY3Z6[Si6O18](BO3)3(O,OH)3(OH,F,O).
- class mineralML.stoichiometry.ZirconCalculator(comps)[source]
Zircon-specific calculations. ZrSiO4.
- mineralML.stoichiometry.element_to_oxide(df)[source]
Convert elemental wt% to oxide wt%, and return the conversion factors used.
- Parameters:
df (pd.DataFrame) – DataFrame with elemental wt% columns.
- Returns:
DataFrame with oxide wt% columns.
Series mapping each oxide to its conversion factor from element wt%.
- Return type:
Tuple[pd.DataFrame, pd.Series]
- mineralML.stoichiometry.element_to_oxide_identity(df)[source]
Rename element-labeled columns to oxide-labeled columns without applying stoichiometric mass conversion. Intended for mapped EDS data that are already reported in oxide wt% but may use element-style channel names.
- Parameters:
df (pd.DataFrame) – DataFrame with elemental wt% columns.
- Returns:
DataFrame with oxide wt% columns.
Series mapping each oxide to its conversion factor from element wt%.
- Return type:
Tuple[pd.DataFrame, pd.Series]
- mineralML.stoichiometry.oxide_to_element(df)[source]
Convert between oxide wt% and elemental wt%.
- Parameters:
df (pd.DataFrame) – DataFrame with oxide or elemental wt% columns.
direction (str) – ‘oxide_to_element’ or ‘element_to_oxide’.
- Returns:
DataFrame with elemental wt% columns.
Series mapping each element to its conversion factor from oxide wt%.
- Return type:
Tuple[pd.DataFrame, pd.Series]
Synthetic Mineral Generator
Confusion Matrix Functions
- mineralML.confusion_matrix.config_cell_text_and_colors(array_df, lin, col, oText, facecolors, posi, fz, fmt, show_null_values=0)[source]
Configures cell text and colors for confusion matrix visualization.
Adjusts the text and background colors of cells in the confusion matrix based on their values. Totals and percentages are calculated for the last row and column cells.
- Parameters:
array_df (np.ndarray) – 2D numpy array of the confusion matrix.
lin (int) – Row index of the cell to configure.
col (int) – Column index of the cell to configure.
oText (matplotlib.text.Text) – Text object of the cell.
facecolors (np.ndarray) – Array of facecolors for the cells.
posi (int) – Position index in the flattened array of cells.
fz (int) – Font size for cell text.
fmt (str) – Format string for cell text.
show_null_values (int, optional) – Flag to show null values. Default is 0.
- Returns:
A tuple containing two lists: text elements to add and to delete.
- Return type:
Note
The function modifies text and background colors based on the value in each cell.
- mineralML.confusion_matrix.confusion_matrix_df(given_min, pred_min)[source]
Constructs a confusion matrix as a pandas DataFrame for easy visualization and analysis. The function first finds the unique classes and maps them to their corresponding mineral names. Then, it uses these mappings to construct the confusion matrix, which compares the given and predicted classes.
When parent labels such as “Feldspar” or “Pyroxene” are present in either the given or predicted arrays, child labels (e.g., “Alkali_Feldspar”, “Plagioclase”) are automatically merged into the parent label so the confusion matrix dimensions remain consistent.
Labels that do not match any entry in the canonical mineral list after all merges are applied will trigger a UserWarning and the corresponding rows will be excluded from the confusion matrix.
- Parameters:
given_min (array-like) – The true class labels.
pred_min (array-like) – The predicted class labels.
- Returns:
- A DataFrame representing the confusion matrix, with rows
and columns labeled by the unique mineral names found in the given and predicted class arrays.
- Return type:
cm_df (DataFrame)
- mineralML.confusion_matrix.insert_totals(df_cm)[source]
Inserts total sums for each row and column into the confusion matrix DataFrame.
This function adds a ‘sum_row’ column and a ‘sum_col’ row to the DataFrame, representing the total counts across each row and column, respectively. It also sets the bottom-right cell to the grand total.
- Parameters:
df_cm (pd.DataFrame) – DataFrame representing the confusion matrix.
- Returns:
The function modifies the DataFrame in place.
- Return type:
None
Note
If ‘sum_row’ or ‘sum_col’ already exist in the DataFrame, they will be recalculated.
- mineralML.confusion_matrix.pp_matrix(df_cm, annot=True, cmap='BuGn', fmt='.2f', fz=12, lw=0.5, cbar=False, figsize=[14, 14], show_null_values=0, pred_val_axis='x', savefig=None)[source]
Creates and displays a confusion matrix visualization using Seaborn’s heatmap function.
- Parameters:
df_cm (pd.DataFrame) – DataFrame containing the confusion matrix without totals.
annot (bool, optional) – If True, display the text in each cell. Default is True.
cmap (str, optional) – Color map for the heatmap. Default is ‘BuGn’.
fmt (str, optional) – String format for annotating. Default is ‘.2f’.
fz (int, optional) – Font size for text annotations. Default is 12.
lw (float, optional) – Line width for cell borders. Default is 0.5.
cbar (bool, optional) – If True, display the color bar. Default is False.
figsize (list, optional) – Figure size. Default is [10.5, 10.5].
show_null_values (int, optional) – Show null values, 0 or 1. Default is 0.
pred_val_axis (str, optional) – Axis to show prediction values (‘x’ or ‘y’). Default is ‘x’.
savefig (str, optional) – If provided, saves the plot to the specified path with a ‘.pdf’ extension.
- Returns:
None. The function creates and displays the heatmap of the confusion matrix.
Note
The function modifies the input DataFrame to include total counts and adjusts text and color configurations. The source of the original code is from: https://github.com/wcipriano/pretty-print-confusion-matrix/blob/master/pretty_confusion_matrix/pretty_confusion_matrix.py