iMESc – an interactive machine learning app for environmental sciences

As environmental sciences increasingly rely on complex datasets, machine learning (ML) has become crucial for identifying patterns and relationships. However, the integration of ML into workflows can pose challenges due to technical barriers or the time-intensive nature of coding. To address these i...

Full description

Saved in:
Bibliographic Details
Main Authors: Danilo Cândido Vieira, Fabiana S. Paula, Luciana Erika Yaginuma, Gustavo Fonseca
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-01-01
Series:Frontiers in Environmental Science
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fenvs.2025.1533292/full
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:As environmental sciences increasingly rely on complex datasets, machine learning (ML) has become crucial for identifying patterns and relationships. However, the integration of ML into workflows can pose challenges due to technical barriers or the time-intensive nature of coding. To address these issues, we developed iMESc, an interactive ML app designed to streamline and simplify ML workflows for environmental data. Developed in R and built on the Shiny platform, iMESc enables the integration of supervised and unsupervised ML methods, along with tools for data preprocessing, visualization, descriptive statistics, and spatial analysis. The Datalist system ensures seamless transitions between analytical workflows, while the “savepoints” feature enhances reproducibility by preserving the analysis state. We demonstrate iMESc’s flexibility with four workflows applied to a case study predicting nematode community structure based on environmental data. The classical statistical approaches, the Redundancy Analysis (RDA) and Piecewise RDA (pwRDA), explained 30.7% and 53%, respectively. The SuperSOM model achieved an R2 of 0.60 for training and 0.291 for testing, identifying spatial patterns across depth zones. Finally, a hybrid model combining an unsupervised SOM and followed by the supervised Random Forest model returned an accuracy of 83.47% for the training and 80.77% for the test, with Bathymetry, Chlorophyll, and Coarse Sand as key predictive variables. IMESc permits the customization of plots and saving the workflows into “savepoints” guarantying reproducibility. iMESc bridges the gap between the complexity of machine learning algorithms and the need for user-friendly interfaces in environmental research. By reducing the technical burden of coding, iMESc allows researchers to focus on scientific inquiry, improving both the efficiency and depth of their analyses.
ISSN:2296-665X