Expression‐based machine learning models for predicting plant tissue identity

Abstract Premise The selection of Arabidopsis as a model organism played a pivotal role in advancing genomic science. The competing frameworks to select an agricultural‐ or ecological‐based model species were rejected, in favor of building knowledge in a species that would facilitate genome‐enabled...

Full description

Saved in:
Bibliographic Details
Main Authors: Sourabh Palande, Jeremy Arsenault, Patricia Basurto‐Lozada, Andrew Bleich, Brianna N. I. Brown, Sophia F. Buysse, Noelle A. Connors, Sikta Das Adhikari, Kara C. Dobson, Francisco Xavier Guerra‐Castillo, Maria F. Guerrero‐Carrillo, Sophia Harlow, Héctor Herrera‐Orozco, Asia T. Hightower, Paulo Izquierdo, MacKenzie Jacobs, Nicholas A. Johnson, Wendy Leuenberger, Alessandro Lopez‐Hernandez, Alicia Luckie‐Duque, Camila Martínez‐Avila, Eddy J. Mendoza‐Galindo, David Cruz Plancarte, Jenny M. Schuster, Harry Shomer, Sidney C. Sitar, Anne K. Steensma, Joanne Elise Thomson, Damián Villaseñor‐Amador, Robin Waterman, Brandon M. Webster, Madison Whyte, Sofía Zorilla‐Azcué, Beronda L. Montgomery, Aman Y. Husbands, Arjun Krishnan, Sarah Percival, Elizabeth Munch, Robert VanBuren, Daniel H. Chitwood, Alejandra Rougon‐Cardoso
Format: Article
Language:English
Published: Wiley 2025-01-01
Series:Applications in Plant Sciences
Subjects:
Online Access:https://doi.org/10.1002/aps3.11621
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832542888975663104
author Sourabh Palande
Jeremy Arsenault
Patricia Basurto‐Lozada
Andrew Bleich
Brianna N. I. Brown
Sophia F. Buysse
Noelle A. Connors
Sikta Das Adhikari
Kara C. Dobson
Francisco Xavier Guerra‐Castillo
Maria F. Guerrero‐Carrillo
Sophia Harlow
Héctor Herrera‐Orozco
Asia T. Hightower
Paulo Izquierdo
MacKenzie Jacobs
Nicholas A. Johnson
Wendy Leuenberger
Alessandro Lopez‐Hernandez
Alicia Luckie‐Duque
Camila Martínez‐Avila
Eddy J. Mendoza‐Galindo
David Cruz Plancarte
Jenny M. Schuster
Harry Shomer
Sidney C. Sitar
Anne K. Steensma
Joanne Elise Thomson
Damián Villaseñor‐Amador
Robin Waterman
Brandon M. Webster
Madison Whyte
Sofía Zorilla‐Azcué
Beronda L. Montgomery
Aman Y. Husbands
Arjun Krishnan
Sarah Percival
Elizabeth Munch
Robert VanBuren
Daniel H. Chitwood
Alejandra Rougon‐Cardoso
author_facet Sourabh Palande
Jeremy Arsenault
Patricia Basurto‐Lozada
Andrew Bleich
Brianna N. I. Brown
Sophia F. Buysse
Noelle A. Connors
Sikta Das Adhikari
Kara C. Dobson
Francisco Xavier Guerra‐Castillo
Maria F. Guerrero‐Carrillo
Sophia Harlow
Héctor Herrera‐Orozco
Asia T. Hightower
Paulo Izquierdo
MacKenzie Jacobs
Nicholas A. Johnson
Wendy Leuenberger
Alessandro Lopez‐Hernandez
Alicia Luckie‐Duque
Camila Martínez‐Avila
Eddy J. Mendoza‐Galindo
David Cruz Plancarte
Jenny M. Schuster
Harry Shomer
Sidney C. Sitar
Anne K. Steensma
Joanne Elise Thomson
Damián Villaseñor‐Amador
Robin Waterman
Brandon M. Webster
Madison Whyte
Sofía Zorilla‐Azcué
Beronda L. Montgomery
Aman Y. Husbands
Arjun Krishnan
Sarah Percival
Elizabeth Munch
Robert VanBuren
Daniel H. Chitwood
Alejandra Rougon‐Cardoso
author_sort Sourabh Palande
collection DOAJ
description Abstract Premise The selection of Arabidopsis as a model organism played a pivotal role in advancing genomic science. The competing frameworks to select an agricultural‐ or ecological‐based model species were rejected, in favor of building knowledge in a species that would facilitate genome‐enabled research. Methods Here, we examine the ability of models based on Arabidopsis gene expression data to predict tissue identity in other flowering plants. Comparing different machine learning algorithms, models trained and tested on Arabidopsis data achieved near perfect precision and recall values, whereas when tissue identity is predicted across the flowering plants using models trained on Arabidopsis data, precision values range from 0.69 to 0.74 and recall from 0.54 to 0.64. Results The identity of belowground tissue can be predicted more accurately than other tissue types, and the ability to predict tissue identity is not correlated with phylogenetic distance from Arabidopsis. k‐nearest neighbors is the most successful algorithm, suggesting that gene expression signatures, rather than marker genes, are more valuable to create models for tissue and cell type prediction in plants. Discussion Our data‐driven results highlight that the assertion that knowledge from Arabidopsis is translatable to other plants is not always true. Considering the current landscape of abundant sequencing data, we should reevaluate the scientific emphasis on Arabidopsis and prioritize plant diversity.
format Article
id doaj-art-445b8cf3380b4722a063fff37af1dfb6
institution Kabale University
issn 2168-0450
language English
publishDate 2025-01-01
publisher Wiley
record_format Article
series Applications in Plant Sciences
spelling doaj-art-445b8cf3380b4722a063fff37af1dfb62025-02-03T12:21:34ZengWileyApplications in Plant Sciences2168-04502025-01-01131n/an/a10.1002/aps3.11621Expression‐based machine learning models for predicting plant tissue identitySourabh Palande0Jeremy Arsenault1Patricia Basurto‐Lozada2Andrew Bleich3Brianna N. I. Brown4Sophia F. Buysse5Noelle A. Connors6Sikta Das Adhikari7Kara C. Dobson8Francisco Xavier Guerra‐Castillo9Maria F. Guerrero‐Carrillo10Sophia Harlow11Héctor Herrera‐Orozco12Asia T. Hightower13Paulo Izquierdo14MacKenzie Jacobs15Nicholas A. Johnson16Wendy Leuenberger17Alessandro Lopez‐Hernandez18Alicia Luckie‐Duque19Camila Martínez‐Avila20Eddy J. Mendoza‐Galindo21David Cruz Plancarte22Jenny M. Schuster23Harry Shomer24Sidney C. Sitar25Anne K. Steensma26Joanne Elise Thomson27Damián Villaseñor‐Amador28Robin Waterman29Brandon M. Webster30Madison Whyte31Sofía Zorilla‐Azcué32Beronda L. Montgomery33Aman Y. Husbands34Arjun Krishnan35Sarah Percival36Elizabeth Munch37Robert VanBuren38Daniel H. Chitwood39Alejandra Rougon‐Cardoso40Department of Computational Mathematics, Science and Engineering Michigan State University East Lansing Michigan USADepartment of Computer Science and Engineering Michigan State University East Lansing Michigan USALaboratorio Internacional de Investigación sobre el Genoma Humano (LIIGH) Universidad Nacional Autónoma de México Juriquilla Querétaro MexicoDepartment of Plant Biology Michigan State University East Lansing Michigan USADepartment of Plant Biology Michigan State University East Lansing Michigan USADepartment of Plant Biology Michigan State University East Lansing Michigan USADepartment of Horticulture Michigan State University East Lansing Michigan USADepartment of Computational Mathematics, Science and Engineering Michigan State University East Lansing Michigan USAEcology, Evolution, and Behavior Program Michigan State University East Lansing Michigan USAUnidad de Investigación Médica en Inmunología e Infectología Instituto Mexicano del Seguro Social Ciudad de México MexicoLaboratory of Agrigenomic Sciences, Escuela Nacional de Estudios Superiores Unidad León Universidad Nacional Autónoma de México León Guanajuato MexicoDepartment of Horticulture Michigan State University East Lansing Michigan USAPosgrado en Ciencias Biológicas Universidad Nacional Autónoma de México Ciudad de México MexicoDepartment of Plant Biology Michigan State University East Lansing Michigan USADepartment of Plant, Soil, and Microbial Sciences Michigan State University East Lansing Michigan USADepartment of Biochemistry and Molecular Biology Michigan State University East Lansing Michigan USAEcology, Evolution, and Behavior Program Michigan State University East Lansing Michigan USAEcology, Evolution, and Behavior Program Michigan State University East Lansing Michigan USALaboratorio Internacional de Investigación sobre el Genoma Humano (LIIGH) Universidad Nacional Autónoma de México Juriquilla Querétaro MexicoLaboratory of Agrigenomic Sciences, Escuela Nacional de Estudios Superiores Unidad León Universidad Nacional Autónoma de México León Guanajuato MexicoColección Nacional de Aves, Posgrado en Ciencias Biológicas, Instituto de Biología Universidad Nacional Autónoma de México Ciudad de México MexicoLaboratory of Agrigenomic Sciences, Escuela Nacional de Estudios Superiores Unidad León Universidad Nacional Autónoma de México León Guanajuato MexicoDepartamento de Botánica, Posgrado en Ciencias Biológicas, Instituto de Biología Universidad Nacional Autónoma de México Ciudad de México MexicoMolecular Plant Sciences Program Michigan State University East Lansing Michigan USADepartment of Computer Science and Engineering Michigan State University East Lansing Michigan USADepartment of Plant, Soil, and Microbial Sciences Michigan State University East Lansing Michigan USADepartment of Plant Biology Michigan State University East Lansing Michigan USAMolecular Plant Sciences Program Michigan State University East Lansing Michigan USAPrograma de Posgrado en Ciencias Biológicas, Facultad de Ciencias Universidad Nacional Autónoma de México Ciudad de México MexicoDepartment of Plant Biology Michigan State University East Lansing Michigan USADepartment of Plant Biology Michigan State University East Lansing Michigan USADepartment of Plant, Soil, and Microbial Sciences Michigan State University East Lansing Michigan USAPrograma de Posgrado en Ciencias Biológicas, Escuela Nacional de Estudios Superiores (ENES) Unidad Morelia, Universidad Nacional Autónoma de México Morelia Michoacán MexicoDepartment of Biology Grinnell College Grinnell Iowa USADepartment of Biology University of Pennsylvania Philadelphia Pennsylvania USADepartment of Biomedical Informatics, Center for Health AI University of Colorado Anschutz Medical Campus Aurora Colorado USADepartment of Computational Mathematics, Science and Engineering Michigan State University East Lansing Michigan USADepartment of Computational Mathematics, Science and Engineering Michigan State University East Lansing Michigan USADepartment of Horticulture Michigan State University East Lansing Michigan USADepartment of Computational Mathematics, Science and Engineering Michigan State University East Lansing Michigan USALaboratory of Agrigenomic Sciences, Escuela Nacional de Estudios Superiores Unidad León Universidad Nacional Autónoma de México León Guanajuato MexicoAbstract Premise The selection of Arabidopsis as a model organism played a pivotal role in advancing genomic science. The competing frameworks to select an agricultural‐ or ecological‐based model species were rejected, in favor of building knowledge in a species that would facilitate genome‐enabled research. Methods Here, we examine the ability of models based on Arabidopsis gene expression data to predict tissue identity in other flowering plants. Comparing different machine learning algorithms, models trained and tested on Arabidopsis data achieved near perfect precision and recall values, whereas when tissue identity is predicted across the flowering plants using models trained on Arabidopsis data, precision values range from 0.69 to 0.74 and recall from 0.54 to 0.64. Results The identity of belowground tissue can be predicted more accurately than other tissue types, and the ability to predict tissue identity is not correlated with phylogenetic distance from Arabidopsis. k‐nearest neighbors is the most successful algorithm, suggesting that gene expression signatures, rather than marker genes, are more valuable to create models for tissue and cell type prediction in plants. Discussion Our data‐driven results highlight that the assertion that knowledge from Arabidopsis is translatable to other plants is not always true. Considering the current landscape of abundant sequencing data, we should reevaluate the scientific emphasis on Arabidopsis and prioritize plant diversity.https://doi.org/10.1002/aps3.11621Arabidopsisflowering plantsgene expressionmachine learningmodel speciestissue identity
spellingShingle Sourabh Palande
Jeremy Arsenault
Patricia Basurto‐Lozada
Andrew Bleich
Brianna N. I. Brown
Sophia F. Buysse
Noelle A. Connors
Sikta Das Adhikari
Kara C. Dobson
Francisco Xavier Guerra‐Castillo
Maria F. Guerrero‐Carrillo
Sophia Harlow
Héctor Herrera‐Orozco
Asia T. Hightower
Paulo Izquierdo
MacKenzie Jacobs
Nicholas A. Johnson
Wendy Leuenberger
Alessandro Lopez‐Hernandez
Alicia Luckie‐Duque
Camila Martínez‐Avila
Eddy J. Mendoza‐Galindo
David Cruz Plancarte
Jenny M. Schuster
Harry Shomer
Sidney C. Sitar
Anne K. Steensma
Joanne Elise Thomson
Damián Villaseñor‐Amador
Robin Waterman
Brandon M. Webster
Madison Whyte
Sofía Zorilla‐Azcué
Beronda L. Montgomery
Aman Y. Husbands
Arjun Krishnan
Sarah Percival
Elizabeth Munch
Robert VanBuren
Daniel H. Chitwood
Alejandra Rougon‐Cardoso
Expression‐based machine learning models for predicting plant tissue identity
Applications in Plant Sciences
Arabidopsis
flowering plants
gene expression
machine learning
model species
tissue identity
title Expression‐based machine learning models for predicting plant tissue identity
title_full Expression‐based machine learning models for predicting plant tissue identity
title_fullStr Expression‐based machine learning models for predicting plant tissue identity
title_full_unstemmed Expression‐based machine learning models for predicting plant tissue identity
title_short Expression‐based machine learning models for predicting plant tissue identity
title_sort expression based machine learning models for predicting plant tissue identity
topic Arabidopsis
flowering plants
gene expression
machine learning
model species
tissue identity
url https://doi.org/10.1002/aps3.11621
work_keys_str_mv AT sourabhpalande expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT jeremyarsenault expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT patriciabasurtolozada expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT andrewbleich expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT briannanibrown expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT sophiafbuysse expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT noelleaconnors expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT siktadasadhikari expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT karacdobson expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT franciscoxavierguerracastillo expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT mariafguerrerocarrillo expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT sophiaharlow expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT hectorherreraorozco expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT asiathightower expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT pauloizquierdo expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT mackenziejacobs expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT nicholasajohnson expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT wendyleuenberger expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT alessandrolopezhernandez expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT alicialuckieduque expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT camilamartinezavila expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT eddyjmendozagalindo expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT davidcruzplancarte expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT jennymschuster expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT harryshomer expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT sidneycsitar expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT anneksteensma expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT joanneelisethomson expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT damianvillasenoramador expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT robinwaterman expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT brandonmwebster expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT madisonwhyte expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT sofiazorillaazcue expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT berondalmontgomery expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT amanyhusbands expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT arjunkrishnan expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT sarahpercival expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT elizabethmunch expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT robertvanburen expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT danielhchitwood expressionbasedmachinelearningmodelsforpredictingplanttissueidentity
AT alejandrarougoncardoso expressionbasedmachinelearningmodelsforpredictingplanttissueidentity