Diagnostics of lung cancer by fragmentated blood circulating cell-free DNA based on machine learning methods

IntroductionMinimally invasive diagnostics based on liquid biopsy makes it possible early detection of lung cancer (LC). The blood plasma circulating cell-free DNA (cfDNA) fragments reflect the genome and chromatin status and are considered as integral cancer biomarkers and the biological entities f...

Full description

Saved in:
Bibliographic Details
Main Authors: Ivan O. Meshkov, Alexander P. Koturgin, Pavel V. Ershov, Liubov A. Safonova, Julia A. Remizova, Valentina V. Maksyutina, Ekaterina D. Maralova, Vasilisa A. Astafieva, Alexey A. Ivashechkin, Boris D. Ignatiev, Antonida V. Makhotenko, Ekaterina A. Snigir, Valentin V. Makarov, Vladimir S. Yudin, Anton A. Keskinov, Sergey M. Yudin, Anna S. Makarova, Veronika I. Skvortsova
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-01-01
Series:Frontiers in Medicine
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fmed.2025.1435428/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832582900183203840
author Ivan O. Meshkov
Alexander P. Koturgin
Pavel V. Ershov
Liubov A. Safonova
Julia A. Remizova
Valentina V. Maksyutina
Ekaterina D. Maralova
Vasilisa A. Astafieva
Alexey A. Ivashechkin
Boris D. Ignatiev
Antonida V. Makhotenko
Ekaterina A. Snigir
Valentin V. Makarov
Vladimir S. Yudin
Anton A. Keskinov
Sergey M. Yudin
Anna S. Makarova
Veronika I. Skvortsova
author_facet Ivan O. Meshkov
Alexander P. Koturgin
Pavel V. Ershov
Liubov A. Safonova
Julia A. Remizova
Valentina V. Maksyutina
Ekaterina D. Maralova
Vasilisa A. Astafieva
Alexey A. Ivashechkin
Boris D. Ignatiev
Antonida V. Makhotenko
Ekaterina A. Snigir
Valentin V. Makarov
Vladimir S. Yudin
Anton A. Keskinov
Sergey M. Yudin
Anna S. Makarova
Veronika I. Skvortsova
author_sort Ivan O. Meshkov
collection DOAJ
description IntroductionMinimally invasive diagnostics based on liquid biopsy makes it possible early detection of lung cancer (LC). The blood plasma circulating cell-free DNA (cfDNA) fragments reflect the genome and chromatin status and are considered as integral cancer biomarkers and the biological entities for ‘cancer-of-origin’ prediction. The aim of this work is to create a method for processing next-generation sequencing (NGS) data and an interpretable binary classification model (CM), which analyzed cfDNA fragmentation features for distinguishing healthy subjects and subjects with LC.Methods148 healthy subjects and 138 subjects with LC were included in the study. cfDNA fractions, isolated from blood plasma biospecimens, were used for DNA libraries preparations and NGS on the NovaSeq 6,000 Illumina system with a coverage of 100 million reads/sample. Twelve variables, describing the abundance and length distribution of cfDNA fragments within each genomic interval, and 40 variables based on the values of position-weight matrices, describing combinations of 5-bp-long terminal motifs of cfDNA fragments, were used to characterize genomic fragmentation. Classification models of the first phase of machine learning were based either on logistic regression with L1- and L2-regularization or were probabilistic CMs based on Gaussian processes. The second phase CM was based on kernel logistic regression.ResultsThe final CM can distinguish healthy subjects and subjects with LC with AUC values of 0.872–0.875. The performance of developed CM was evaluated using datum and testing sets for each LC stage category. Sensitivity values ranged from 66.7 to 85.7%, from 77.8 to 100%, and from 70 to 80% for LC stages I, II, and III, respectively. Specificity values ranged from 79.3 to 90.0%.DiscussionThus, the CM has a good diagnostic value and does not require clinical or other data on tumor-associated biomarkers. The current method for LC detection has some advantages for future clinical implementation as a decision-making support system due to the performance of the CM requires data exclusively from NGS-analysis of blood plasma cfDNA fragmentation; the accuracy of the CM does not depend on any additional clinical data; the CM is highly interpretable and traceable; CM has appropriate modular architecture.
format Article
id doaj-art-4bdbc3c72d234dc2abb392b292b9ead8
institution Kabale University
issn 2296-858X
language English
publishDate 2025-01-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Medicine
spelling doaj-art-4bdbc3c72d234dc2abb392b292b9ead82025-01-29T06:45:51ZengFrontiers Media S.A.Frontiers in Medicine2296-858X2025-01-011210.3389/fmed.2025.14354281435428Diagnostics of lung cancer by fragmentated blood circulating cell-free DNA based on machine learning methodsIvan O. Meshkov0Alexander P. Koturgin1Pavel V. Ershov2Liubov A. Safonova3Julia A. Remizova4Valentina V. Maksyutina5Ekaterina D. Maralova6Vasilisa A. Astafieva7Alexey A. Ivashechkin8Boris D. Ignatiev9Antonida V. Makhotenko10Ekaterina A. Snigir11Valentin V. Makarov12Vladimir S. Yudin13Anton A. Keskinov14Sergey M. Yudin15Anna S. Makarova16Veronika I. Skvortsova17Federal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaThe Federal Medical and Biological Agency (FMBA of Russia), Moscow, RussiaIntroductionMinimally invasive diagnostics based on liquid biopsy makes it possible early detection of lung cancer (LC). The blood plasma circulating cell-free DNA (cfDNA) fragments reflect the genome and chromatin status and are considered as integral cancer biomarkers and the biological entities for ‘cancer-of-origin’ prediction. The aim of this work is to create a method for processing next-generation sequencing (NGS) data and an interpretable binary classification model (CM), which analyzed cfDNA fragmentation features for distinguishing healthy subjects and subjects with LC.Methods148 healthy subjects and 138 subjects with LC were included in the study. cfDNA fractions, isolated from blood plasma biospecimens, were used for DNA libraries preparations and NGS on the NovaSeq 6,000 Illumina system with a coverage of 100 million reads/sample. Twelve variables, describing the abundance and length distribution of cfDNA fragments within each genomic interval, and 40 variables based on the values of position-weight matrices, describing combinations of 5-bp-long terminal motifs of cfDNA fragments, were used to characterize genomic fragmentation. Classification models of the first phase of machine learning were based either on logistic regression with L1- and L2-regularization or were probabilistic CMs based on Gaussian processes. The second phase CM was based on kernel logistic regression.ResultsThe final CM can distinguish healthy subjects and subjects with LC with AUC values of 0.872–0.875. The performance of developed CM was evaluated using datum and testing sets for each LC stage category. Sensitivity values ranged from 66.7 to 85.7%, from 77.8 to 100%, and from 70 to 80% for LC stages I, II, and III, respectively. Specificity values ranged from 79.3 to 90.0%.DiscussionThus, the CM has a good diagnostic value and does not require clinical or other data on tumor-associated biomarkers. The current method for LC detection has some advantages for future clinical implementation as a decision-making support system due to the performance of the CM requires data exclusively from NGS-analysis of blood plasma cfDNA fragmentation; the accuracy of the CM does not depend on any additional clinical data; the CM is highly interpretable and traceable; CM has appropriate modular architecture.https://www.frontiersin.org/articles/10.3389/fmed.2025.1435428/fullmachine learning methodslung cancerfragmentomecirculating cell-free DNAcfDNAdiagnostic classification model
spellingShingle Ivan O. Meshkov
Alexander P. Koturgin
Pavel V. Ershov
Liubov A. Safonova
Julia A. Remizova
Valentina V. Maksyutina
Ekaterina D. Maralova
Vasilisa A. Astafieva
Alexey A. Ivashechkin
Boris D. Ignatiev
Antonida V. Makhotenko
Ekaterina A. Snigir
Valentin V. Makarov
Vladimir S. Yudin
Anton A. Keskinov
Sergey M. Yudin
Anna S. Makarova
Veronika I. Skvortsova
Diagnostics of lung cancer by fragmentated blood circulating cell-free DNA based on machine learning methods
Frontiers in Medicine
machine learning methods
lung cancer
fragmentome
circulating cell-free DNA
cfDNA
diagnostic classification model
title Diagnostics of lung cancer by fragmentated blood circulating cell-free DNA based on machine learning methods
title_full Diagnostics of lung cancer by fragmentated blood circulating cell-free DNA based on machine learning methods
title_fullStr Diagnostics of lung cancer by fragmentated blood circulating cell-free DNA based on machine learning methods
title_full_unstemmed Diagnostics of lung cancer by fragmentated blood circulating cell-free DNA based on machine learning methods
title_short Diagnostics of lung cancer by fragmentated blood circulating cell-free DNA based on machine learning methods
title_sort diagnostics of lung cancer by fragmentated blood circulating cell free dna based on machine learning methods
topic machine learning methods
lung cancer
fragmentome
circulating cell-free DNA
cfDNA
diagnostic classification model
url https://www.frontiersin.org/articles/10.3389/fmed.2025.1435428/full
work_keys_str_mv AT ivanomeshkov diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods
AT alexanderpkoturgin diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods
AT pavelvershov diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods
AT liubovasafonova diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods
AT juliaaremizova diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods
AT valentinavmaksyutina diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods
AT ekaterinadmaralova diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods
AT vasilisaaastafieva diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods
AT alexeyaivashechkin diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods
AT borisdignatiev diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods
AT antonidavmakhotenko diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods
AT ekaterinaasnigir diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods
AT valentinvmakarov diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods
AT vladimirsyudin diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods
AT antonakeskinov diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods
AT sergeymyudin diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods
AT annasmakarova diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods
AT veronikaiskvortsova diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods