Annotating the microbial dark matter with HiFi-NN

Summary: The accurate computational annotation of protein sequences with enzymatic function remains a fundamental challenge in bioinformatics. Here, we present HiFi-NN (Hierarchically-Finetuned Nearest Neighbor search) which annotates protein sequences to the 4th level of Enzyme Commission (EC) numb...

Full description

Saved in:
Bibliographic Details
Main Authors: Gavin Ayres, Geraldene Munsamy, Michael Heinzinger, Noelia Ferruz, Kevin Yang, Bastiaan Bergman, Philipp Lorenz
Format: Article
Language:English
Published: Elsevier 2025-06-01
Series:iScience
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2589004225007412
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850260611750952960
author Gavin Ayres
Geraldene Munsamy
Michael Heinzinger
Noelia Ferruz
Kevin Yang
Bastiaan Bergman
Philipp Lorenz
author_facet Gavin Ayres
Geraldene Munsamy
Michael Heinzinger
Noelia Ferruz
Kevin Yang
Bastiaan Bergman
Philipp Lorenz
author_sort Gavin Ayres
collection DOAJ
description Summary: The accurate computational annotation of protein sequences with enzymatic function remains a fundamental challenge in bioinformatics. Here, we present HiFi-NN (Hierarchically-Finetuned Nearest Neighbor search) which annotates protein sequences to the 4th level of Enzyme Commission (EC) number with greater precision and recall than state-of-the-art deep learning methods. Furthermore, we show that this method can correctly identify the EC number of a given sequence to lower identities than BLASTp. We show that performance can be improved by increasing the diversity of the lookup set in both sequence space and the environment the sequence has been sampled from. We proceed to show that we can correct specific mis-annotations in the BRENDA enzymes database reproducing results found by others. Finally, we use HiFi-NN to annotate functional dark-matter protein sequences from NMPFamDB. Our findings pave the way for more accurate functional annotation in silico, especially for proteins from distant sequence space.
format Article
id doaj-art-28d38f991adb4c9e88de9c009b7bfa88
institution OA Journals
issn 2589-0042
language English
publishDate 2025-06-01
publisher Elsevier
record_format Article
series iScience
spelling doaj-art-28d38f991adb4c9e88de9c009b7bfa882025-08-20T01:55:37ZengElsevieriScience2589-00422025-06-0128611248010.1016/j.isci.2025.112480Annotating the microbial dark matter with HiFi-NNGavin Ayres0Geraldene Munsamy1Michael Heinzinger2Noelia Ferruz3Kevin Yang4Bastiaan Bergman5Philipp Lorenz6Basecamp Research Ltd., London, UK; Corresponding authorBasecamp Research Ltd., London, UKSchool of Computation, Information, and Technology (CIT), Department of Informatics, Bioinformatics & Computational Biology, TUM (Technical University of Munich), Munich, GermanyCentre for Genomic Regulation, Barcelona, SpainMicrosoft Research New England, Cambridge, MA, USABasecamp Research Ltd., London, UKBasecamp Research Ltd., London, UKSummary: The accurate computational annotation of protein sequences with enzymatic function remains a fundamental challenge in bioinformatics. Here, we present HiFi-NN (Hierarchically-Finetuned Nearest Neighbor search) which annotates protein sequences to the 4th level of Enzyme Commission (EC) number with greater precision and recall than state-of-the-art deep learning methods. Furthermore, we show that this method can correctly identify the EC number of a given sequence to lower identities than BLASTp. We show that performance can be improved by increasing the diversity of the lookup set in both sequence space and the environment the sequence has been sampled from. We proceed to show that we can correct specific mis-annotations in the BRENDA enzymes database reproducing results found by others. Finally, we use HiFi-NN to annotate functional dark-matter protein sequences from NMPFamDB. Our findings pave the way for more accurate functional annotation in silico, especially for proteins from distant sequence space.http://www.sciencedirect.com/science/article/pii/S2589004225007412MicrobiologyComputer science
spellingShingle Gavin Ayres
Geraldene Munsamy
Michael Heinzinger
Noelia Ferruz
Kevin Yang
Bastiaan Bergman
Philipp Lorenz
Annotating the microbial dark matter with HiFi-NN
iScience
Microbiology
Computer science
title Annotating the microbial dark matter with HiFi-NN
title_full Annotating the microbial dark matter with HiFi-NN
title_fullStr Annotating the microbial dark matter with HiFi-NN
title_full_unstemmed Annotating the microbial dark matter with HiFi-NN
title_short Annotating the microbial dark matter with HiFi-NN
title_sort annotating the microbial dark matter with hifi nn
topic Microbiology
Computer science
url http://www.sciencedirect.com/science/article/pii/S2589004225007412
work_keys_str_mv AT gavinayres annotatingthemicrobialdarkmatterwithhifinn
AT geraldenemunsamy annotatingthemicrobialdarkmatterwithhifinn
AT michaelheinzinger annotatingthemicrobialdarkmatterwithhifinn
AT noeliaferruz annotatingthemicrobialdarkmatterwithhifinn
AT kevinyang annotatingthemicrobialdarkmatterwithhifinn
AT bastiaanbergman annotatingthemicrobialdarkmatterwithhifinn
AT philipplorenz annotatingthemicrobialdarkmatterwithhifinn