CrysMTM: a multiphase, temperature-resolved, multimodal dataset for crystalline materials

We present CrysMTM, a large-scale, multimodal dataset designed to benchmark temperature- and phase-sensitive machine learning models for crystalline materials. The dataset comprises approximately 30 000 atomistic samples covering the three primary polymorphs of titanium dioxide–anatase, brookite, an...

Full description

Saved in:
Bibliographic Details
Main Authors: Can Polat, Erchin Serpedin, Mustafa Kurban, Hasan Kurban
Format: Article
Language:English
Published: IOP Publishing 2025-01-01
Series:Machine Learning: Science and Technology
Subjects:
Online Access:https://doi.org/10.1088/2632-2153/adf9bc
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We present CrysMTM, a large-scale, multimodal dataset designed to benchmark temperature- and phase-sensitive machine learning models for crystalline materials. The dataset comprises approximately 30 000 atomistic samples covering the three primary polymorphs of titanium dioxide–anatase, brookite, and rutile–each evaluated across a temperature spectrum ranging from cryogenic to ambient and elevated conditions. Each data entry integrates three complementary modalities: (1) three-dimensional atomic coordinates, (2) RGBA molecular visualizations, and (3) structured textual metadata encompassing geometric descriptors, local bonding environments, and phase transformation parameters. This multimodal structure enables both supervised and self-supervised learning across graph-based, image-based, and language-based architectures. CrysMTM supports rigorous evaluation of model robustness under thermal perturbations and crystallographic phase transitions. Baseline benchmarking across 18 models–including graph neural networks (GNNs), convolutional neural networks, and foundation models–reveals significant property-specific challenges. For example, bandgap predictions exhibit errors exceeding 25%, while volumetric expansion and atomic displacement estimations frequently deviate by more than 100%. Even state-of-the-art GNNs, which achieve an average in-distribution (ID) mean absolute percentage error of approximately 20%, show a threefold increase under out-of-distribution (OOD) thermal conditions. In contrast, a few-shot multimodal large language model reduces global prediction error from 96% to 23% and narrows the performance gap between ID and OOD cases to just four percentage points. These results highlight both the selective difficulty posed by temperature-sensitive geometric targets and the considerable room for innovation in model design. All dataset files, model implementations, and pretrained checkpoints are available at https://github.com/KurbanIntelligenceLab/CrysMTM .
ISSN:2632-2153