A new method for detecting mixed Mycobacterium tuberculosis infection and reconstructing constituent strains provides insights into transmission

Abstract Background Mixed infection with multiple strains of the same pathogen in a single host can present clinical and analytical challenges. Whole genome sequence (WGS) data can identify signals of multiple strains in samples, though the precision of previous methods can be improved. Here, we pre...

Full description

Saved in:
Bibliographic Details
Main Authors: Benjamin Sobkowiak, Patrick Cudahy, Melanie H. Chitwood, Taane G. Clark, Caroline Colijn, Louis Grandjean, Katharine S. Walter, Valeriu Crudu, Ted Cohen
Format: Article
Language:English
Published: BMC 2025-01-01
Series:Genome Medicine
Subjects:
Online Access:https://doi.org/10.1186/s13073-025-01430-y
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832571458097774592
author Benjamin Sobkowiak
Patrick Cudahy
Melanie H. Chitwood
Taane G. Clark
Caroline Colijn
Louis Grandjean
Katharine S. Walter
Valeriu Crudu
Ted Cohen
author_facet Benjamin Sobkowiak
Patrick Cudahy
Melanie H. Chitwood
Taane G. Clark
Caroline Colijn
Louis Grandjean
Katharine S. Walter
Valeriu Crudu
Ted Cohen
author_sort Benjamin Sobkowiak
collection DOAJ
description Abstract Background Mixed infection with multiple strains of the same pathogen in a single host can present clinical and analytical challenges. Whole genome sequence (WGS) data can identify signals of multiple strains in samples, though the precision of previous methods can be improved. Here, we present MixInfect2, a new tool to accurately detect mixed samples from Mycobacterium tuberculosis short-read WGS data. We then evaluate three approaches for reconstructing the underlying mixed constituent strain sequences. This allows these samples to be included in downstream analysis to gain insights into the epidemiology and transmission of mixed infections. Methods We employed a Gaussian mixture model to cluster allele frequencies at mixed sites (hSNPs) in each sample to identify signals of multiple strains. Building upon our previous tool, MixInfect, we increased the accuracy of classifying in vitro mixed samples through multiple improvements to the bioinformatic pipeline. Major and minor proportion constituent strains were reconstructed using three approaches and assessed by comparing the estimated sequence to the known constituent strain sequence. Lastly, mixed infections in a real-world Mycobacterium tuberculosis population from Moldova were detected with MixInfect2 and clusters of recent transmission that included major and minor constituent strains were built. Results All 36/36 in vitro mixed and 12/12 non-mixed samples were correctly classified with MixInfect2, and major strain proportions were estimated with high accuracy (within 3% of the true strain proportion), outperforming previous tools. Reconstructed major strain sequences closely matched the true constituent sequence by taking the allele at the highest frequency at hSNPs, while the best-performing approach to reconstruct the minor proportion strain sequence was identifying the closest non-mixed isolate in the same population, though no approach was effective when the minor strain proportion was at 5%. Finally, fewer mixed infections were identified in Moldova than previous estimates (6.6% vs 17.4%) and we found multiple instances where the constituent strains of mixed samples were present in transmission clusters. Conclusions MixInfect2 accurately detects samples with evidence of mixed infection from short-read WGS data and provides an excellent estimate of the mixture proportions. While there are limitations in reconstructing the constituent strain sequences of mixed samples, we present recommendations for the best approach to include these isolates in further analyses.
format Article
id doaj-art-8adfcb6876d8425e8ab4b9fbde4264cb
institution Kabale University
issn 1756-994X
language English
publishDate 2025-01-01
publisher BMC
record_format Article
series Genome Medicine
spelling doaj-art-8adfcb6876d8425e8ab4b9fbde4264cb2025-02-02T12:35:38ZengBMCGenome Medicine1756-994X2025-01-0117111310.1186/s13073-025-01430-yA new method for detecting mixed Mycobacterium tuberculosis infection and reconstructing constituent strains provides insights into transmissionBenjamin Sobkowiak0Patrick Cudahy1Melanie H. Chitwood2Taane G. Clark3Caroline Colijn4Louis Grandjean5Katharine S. Walter6Valeriu Crudu7Ted Cohen8Department of Epidemiology of Microbial Disease, Yale School of Public HealthDivision of Infectious Diseases, Department of Internal Medicine, Yale School of MedicineDepartment of Epidemiology of Microbial Disease, Yale School of Public HealthFaculty of Infectious and Tropical Diseases, School of Hygiene and Tropical MedicineDepartment of Mathematics, Simon Fraser UniversityDepartment of Infection, Immunity and Inflammation, Institute of Child Health, University College LondonDivision of Epidemiology, University of UtahPhthisiopneumology InstituteDepartment of Epidemiology of Microbial Disease, Yale School of Public HealthAbstract Background Mixed infection with multiple strains of the same pathogen in a single host can present clinical and analytical challenges. Whole genome sequence (WGS) data can identify signals of multiple strains in samples, though the precision of previous methods can be improved. Here, we present MixInfect2, a new tool to accurately detect mixed samples from Mycobacterium tuberculosis short-read WGS data. We then evaluate three approaches for reconstructing the underlying mixed constituent strain sequences. This allows these samples to be included in downstream analysis to gain insights into the epidemiology and transmission of mixed infections. Methods We employed a Gaussian mixture model to cluster allele frequencies at mixed sites (hSNPs) in each sample to identify signals of multiple strains. Building upon our previous tool, MixInfect, we increased the accuracy of classifying in vitro mixed samples through multiple improvements to the bioinformatic pipeline. Major and minor proportion constituent strains were reconstructed using three approaches and assessed by comparing the estimated sequence to the known constituent strain sequence. Lastly, mixed infections in a real-world Mycobacterium tuberculosis population from Moldova were detected with MixInfect2 and clusters of recent transmission that included major and minor constituent strains were built. Results All 36/36 in vitro mixed and 12/12 non-mixed samples were correctly classified with MixInfect2, and major strain proportions were estimated with high accuracy (within 3% of the true strain proportion), outperforming previous tools. Reconstructed major strain sequences closely matched the true constituent sequence by taking the allele at the highest frequency at hSNPs, while the best-performing approach to reconstruct the minor proportion strain sequence was identifying the closest non-mixed isolate in the same population, though no approach was effective when the minor strain proportion was at 5%. Finally, fewer mixed infections were identified in Moldova than previous estimates (6.6% vs 17.4%) and we found multiple instances where the constituent strains of mixed samples were present in transmission clusters. Conclusions MixInfect2 accurately detects samples with evidence of mixed infection from short-read WGS data and provides an excellent estimate of the mixture proportions. While there are limitations in reconstructing the constituent strain sequences of mixed samples, we present recommendations for the best approach to include these isolates in further analyses.https://doi.org/10.1186/s13073-025-01430-yMycobacterium tuberculosisMixed infectionGenomic epidemiologyTuberculosisWhole genome sequencingBioinformatics
spellingShingle Benjamin Sobkowiak
Patrick Cudahy
Melanie H. Chitwood
Taane G. Clark
Caroline Colijn
Louis Grandjean
Katharine S. Walter
Valeriu Crudu
Ted Cohen
A new method for detecting mixed Mycobacterium tuberculosis infection and reconstructing constituent strains provides insights into transmission
Genome Medicine
Mycobacterium tuberculosis
Mixed infection
Genomic epidemiology
Tuberculosis
Whole genome sequencing
Bioinformatics
title A new method for detecting mixed Mycobacterium tuberculosis infection and reconstructing constituent strains provides insights into transmission
title_full A new method for detecting mixed Mycobacterium tuberculosis infection and reconstructing constituent strains provides insights into transmission
title_fullStr A new method for detecting mixed Mycobacterium tuberculosis infection and reconstructing constituent strains provides insights into transmission
title_full_unstemmed A new method for detecting mixed Mycobacterium tuberculosis infection and reconstructing constituent strains provides insights into transmission
title_short A new method for detecting mixed Mycobacterium tuberculosis infection and reconstructing constituent strains provides insights into transmission
title_sort new method for detecting mixed mycobacterium tuberculosis infection and reconstructing constituent strains provides insights into transmission
topic Mycobacterium tuberculosis
Mixed infection
Genomic epidemiology
Tuberculosis
Whole genome sequencing
Bioinformatics
url https://doi.org/10.1186/s13073-025-01430-y
work_keys_str_mv AT benjaminsobkowiak anewmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission
AT patrickcudahy anewmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission
AT melaniehchitwood anewmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission
AT taanegclark anewmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission
AT carolinecolijn anewmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission
AT louisgrandjean anewmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission
AT katharineswalter anewmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission
AT valeriucrudu anewmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission
AT tedcohen anewmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission
AT benjaminsobkowiak newmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission
AT patrickcudahy newmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission
AT melaniehchitwood newmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission
AT taanegclark newmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission
AT carolinecolijn newmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission
AT louisgrandjean newmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission
AT katharineswalter newmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission
AT valeriucrudu newmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission
AT tedcohen newmethodfordetectingmixedmycobacteriumtuberculosisinfectionandreconstructingconstituentstrainsprovidesinsightsintotransmission