Foundation models in plant molecular biology: advances, challenges, and future directions

A foundation model (FM) is a neural network trained on large-scale data using unsupervised or self-supervised learning, capable of adapting to a wide range of downstream tasks. This review provides a comprehensive overview of FMs in plant molecular biology, emphasizing recent advances and future dir...

Full description

Saved in:
Bibliographic Details
Main Authors: Feng Xu, Tianhao Wu, Qian Cheng, Xiangfeng Wang, Jun Yan
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-06-01
Series:Frontiers in Plant Science
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fpls.2025.1611992/full
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A foundation model (FM) is a neural network trained on large-scale data using unsupervised or self-supervised learning, capable of adapting to a wide range of downstream tasks. This review provides a comprehensive overview of FMs in plant molecular biology, emphasizing recent advances and future directions. It begins by tracing the evolution of biological FMs across the DNA, RNA, protein, and single-cell levels, from tools inspired by natural language processing (NLP) to transformative models for decoding complex biological sequences. The review then focuses on plant-specific FMs such as GPN, AgroNT, PDLLMs, PlantCaduceus, and PlantRNA-FM, which address challenges that are widespread among plant genomes, including polyploidy, high repetitive sequence content, and environment-responsive regulatory elements, alongside universal FMs like GENERator and Evo 2, which leverage extensive cross-species training data for sequence design and prediction of mutation effects. Key opportunities and challenges in plant molecular biology FM development are further outlined, such as data heterogeneity, biologically informed architectures, cross-species generalization, and computational efficiency. Future research should prioritize improvements in model generalization, multi-modal data integration, and computational optimization to overcome existing limitations and unlock the potential of FMs in plant science. This review serves as an essential resource for plant molecular biologists and offers a clear snapshot of the current state and future potential of FMs in the field.
ISSN:1664-462X