A new method for detecting mixed Mycobacterium tuberculosis infection and reconstructing constituent strains provides insights into transmission
Abstract Background Mixed infection with multiple strains of the same pathogen in a single host can present clinical and analytical challenges. Whole genome sequence (WGS) data can identify signals of multiple strains in samples, though the precision of previous methods can be improved. Here, we pre...
Saved in:
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2025-01-01
|
Series: | Genome Medicine |
Subjects: | |
Online Access: | https://doi.org/10.1186/s13073-025-01430-y |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832571458097774592 |
---|---|
author | Benjamin Sobkowiak Patrick Cudahy Melanie H. Chitwood Taane G. Clark Caroline Colijn Louis Grandjean Katharine S. Walter Valeriu Crudu Ted Cohen |
author_facet | Benjamin Sobkowiak Patrick Cudahy Melanie H. Chitwood Taane G. Clark Caroline Colijn Louis Grandjean Katharine S. Walter Valeriu Crudu Ted Cohen |
author_sort | Benjamin Sobkowiak |
collection | DOAJ |
description | Abstract Background Mixed infection with multiple strains of the same pathogen in a single host can present clinical and analytical challenges. Whole genome sequence (WGS) data can identify signals of multiple strains in samples, though the precision of previous methods can be improved. Here, we present MixInfect2, a new tool to accurately detect mixed samples from Mycobacterium tuberculosis short-read WGS data. We then evaluate three approaches for reconstructing the underlying mixed constituent strain sequences. This allows these samples to be included in downstream analysis to gain insights into the epidemiology and transmission of mixed infections. Methods We employed a Gaussian mixture model to cluster allele frequencies at mixed sites (hSNPs) in each sample to identify signals of multiple strains. Building upon our previous tool, MixInfect, we increased the accuracy of classifying in vitro mixed samples through multiple improvements to the bioinformatic pipeline. Major and minor proportion constituent strains were reconstructed using three approaches and assessed by comparing the estimated sequence to the known constituent strain sequence. Lastly, mixed infections in a real-world Mycobacterium tuberculosis population from Moldova were detected with MixInfect2 and clusters of recent transmission that included major and minor constituent strains were built. Results All 36/36 in vitro mixed and 12/12 non-mixed samples were correctly classified with MixInfect2, and major strain proportions were estimated with high accuracy (within 3% of the true strain proportion), outperforming previous tools. Reconstructed major strain sequences closely matched the true constituent sequence by taking the allele at the highest frequency at hSNPs, while the best-performing approach to reconstruct the minor proportion strain sequence was identifying the closest non-mixed isolate in the same population, though no approach was effective when the minor strain proportion was at 5%. Finally, fewer mixed infections were identified in Moldova than previous estimates (6.6% vs 17.4%) and we found multiple instances where the constituent strains of mixed samples were present in transmission clusters. Conclusions MixInfect2 accurately detects samples with evidence of mixed infection from short-read WGS data and provides an excellent estimate of the mixture proportions. While there are limitations in reconstructing the constituent strain sequences of mixed samples, we present recommendations for the best approach to include these isolates in further analyses. |
format | Article |
id | doaj-art-8adfcb6876d8425e8ab4b9fbde4264cb |
institution | Kabale University |
issn | 1756-994X |
language | English |
publishDate | 2025-01-01 |
publisher | BMC |
record_format | Article |
series | Genome Medicine |
spelling | doaj-art-8adfcb6876d8425e8ab4b9fbde4264cb2025-02-02T12:35:38ZengBMCGenome Medicine1756-994X2025-01-0117111310.1186/s13073-025-01430-yA new method for detecting mixed Mycobacterium tuberculosis infection and reconstructing constituent strains provides insights into transmissionBenjamin Sobkowiak0Patrick Cudahy1Melanie H. Chitwood2Taane G. Clark3Caroline Colijn4Louis Grandjean5Katharine S. Walter6Valeriu Crudu7Ted Cohen8Department of Epidemiology of Microbial Disease, Yale School of Public HealthDivision of Infectious Diseases, Department of Internal Medicine, Yale School of MedicineDepartment of Epidemiology of Microbial Disease, Yale School of Public HealthFaculty of Infectious and Tropical Diseases, School of Hygiene and Tropical MedicineDepartment of Mathematics, Simon Fraser UniversityDepartment of Infection, Immunity and Inflammation, Institute of Child Health, University College LondonDivision of Epidemiology, University of UtahPhthisiopneumology InstituteDepartment of Epidemiology of Microbial Disease, Yale School of Public HealthAbstract Background Mixed infection with multiple strains of the same pathogen in a single host can present clinical and analytical challenges. Whole genome sequence (WGS) data can identify signals of multiple strains in samples, though the precision of previous methods can be improved. Here, we present MixInfect2, a new tool to accurately detect mixed samples from Mycobacterium tuberculosis short-read WGS data. We then evaluate three approaches for reconstructing the underlying mixed constituent strain sequences. This allows these samples to be included in downstream analysis to gain insights into the epidemiology and transmission of mixed infections. Methods We employed a Gaussian mixture model to cluster allele frequencies at mixed sites (hSNPs) in each sample to identify signals of multiple strains. Building upon our previous tool, MixInfect, we increased the accuracy of classifying in vitro mixed samples through multiple improvements to the bioinformatic pipeline. Major and minor proportion constituent strains were reconstructed using three approaches and assessed by comparing the estimated sequence to the known constituent strain sequence. Lastly, mixed infections in a real-world Mycobacterium tuberculosis population from Moldova were detected with MixInfect2 and clusters of recent transmission that included major and minor constituent strains were built. Results All 36/36 in vitro mixed and 12/12 non-mixed samples were correctly classified with MixInfect2, and major strain proportions were estimated with high accuracy (within 3% of the true strain proportion), outperforming previous tools. Reconstructed major strain sequences closely matched the true constituent sequence by taking the allele at the highest frequency at hSNPs, while the best-performing approach to reconstruct the minor proportion strain sequence was identifying the closest non-mixed isolate in the same population, though no approach was effective when the minor strain proportion was at 5%. Finally, fewer mixed infections were identified in Moldova than previous estimates (6.6% vs 17.4%) and we found multiple instances where the constituent strains of mixed samples were present in transmission clusters. Conclusions MixInfect2 accurately detects samples with evidence of mixed infection from short-read WGS data and provides an excellent estimate of the mixture proportions. While there are limitations in reconstructing the constituent strain sequences of mixed samples, we present recommendations for the best approach to include these isolates in further analyses.https://doi.org/10.1186/s13073-025-01430-yMycobacterium tuberculosisMixed infectionGenomic epidemiologyTuberculosisWhole genome sequencingBioinformatics |
spellingShingle | Benjamin Sobkowiak Patrick Cudahy Melanie H. Chitwood Taane G. Clark Caroline Colijn Louis Grandjean Katharine S. Walter Valeriu Crudu Ted Cohen A new method for detecting mixed Mycobacterium tuberculosis infection and reconstructing constituent strains provides insights into transmission Genome Medicine Mycobacterium tuberculosis Mixed infection Genomic epidemiology Tuberculosis Whole genome sequencing Bioinformatics |
title | A new method for detecting mixed Mycobacterium tuberculosis infection and reconstructing constituent strains provides insights into transmission |
title_full | A new method for detecting mixed Mycobacterium tuberculosis infection and reconstructing constituent strains provides insights into transmission |
title_fullStr | A new method for detecting mixed Mycobacterium tuberculosis infection and reconstructing constituent strains provides insights into transmission |
title_full_unstemmed | A new method for detecting mixed Mycobacterium tuberculosis infection and reconstructing constituent strains provides insights into transmission |
title_short | A new method for detecting mixed Mycobacterium tuberculosis infection and reconstructing constituent strains provides insights into transmission |
title_sort | new method for detecting mixed mycobacterium tuberculosis infection and reconstructing constituent strains provides insights into transmission |
topic | Mycobacterium tuberculosis Mixed infection Genomic epidemiology Tuberculosis Whole genome sequencing Bioinformatics |
url | https://doi.org/10.1186/s13073-025-01430-y |
work_keys_str_mv | AT benjaminsobkowiak anewmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission AT patrickcudahy anewmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission AT melaniehchitwood anewmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission AT taanegclark anewmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission AT carolinecolijn anewmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission AT louisgrandjean anewmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission AT katharineswalter anewmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission AT valeriucrudu anewmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission AT tedcohen anewmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission AT benjaminsobkowiak newmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission AT patrickcudahy newmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission AT melaniehchitwood newmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission AT taanegclark newmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission AT carolinecolijn newmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission AT louisgrandjean newmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission AT katharineswalter newmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission AT valeriucrudu newmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission AT tedcohen newmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission |