Diagnostics of lung cancer by fragmentated blood circulating cell-free DNA based on machine learning methods
IntroductionMinimally invasive diagnostics based on liquid biopsy makes it possible early detection of lung cancer (LC). The blood plasma circulating cell-free DNA (cfDNA) fragments reflect the genome and chromatin status and are considered as integral cancer biomarkers and the biological entities f...
Saved in:
Main Authors: | , , , , , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2025-01-01
|
Series: | Frontiers in Medicine |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fmed.2025.1435428/full |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832582900183203840 |
---|---|
author | Ivan O. Meshkov Alexander P. Koturgin Pavel V. Ershov Liubov A. Safonova Julia A. Remizova Valentina V. Maksyutina Ekaterina D. Maralova Vasilisa A. Astafieva Alexey A. Ivashechkin Boris D. Ignatiev Antonida V. Makhotenko Ekaterina A. Snigir Valentin V. Makarov Vladimir S. Yudin Anton A. Keskinov Sergey M. Yudin Anna S. Makarova Veronika I. Skvortsova |
author_facet | Ivan O. Meshkov Alexander P. Koturgin Pavel V. Ershov Liubov A. Safonova Julia A. Remizova Valentina V. Maksyutina Ekaterina D. Maralova Vasilisa A. Astafieva Alexey A. Ivashechkin Boris D. Ignatiev Antonida V. Makhotenko Ekaterina A. Snigir Valentin V. Makarov Vladimir S. Yudin Anton A. Keskinov Sergey M. Yudin Anna S. Makarova Veronika I. Skvortsova |
author_sort | Ivan O. Meshkov |
collection | DOAJ |
description | IntroductionMinimally invasive diagnostics based on liquid biopsy makes it possible early detection of lung cancer (LC). The blood plasma circulating cell-free DNA (cfDNA) fragments reflect the genome and chromatin status and are considered as integral cancer biomarkers and the biological entities for ‘cancer-of-origin’ prediction. The aim of this work is to create a method for processing next-generation sequencing (NGS) data and an interpretable binary classification model (CM), which analyzed cfDNA fragmentation features for distinguishing healthy subjects and subjects with LC.Methods148 healthy subjects and 138 subjects with LC were included in the study. cfDNA fractions, isolated from blood plasma biospecimens, were used for DNA libraries preparations and NGS on the NovaSeq 6,000 Illumina system with a coverage of 100 million reads/sample. Twelve variables, describing the abundance and length distribution of cfDNA fragments within each genomic interval, and 40 variables based on the values of position-weight matrices, describing combinations of 5-bp-long terminal motifs of cfDNA fragments, were used to characterize genomic fragmentation. Classification models of the first phase of machine learning were based either on logistic regression with L1- and L2-regularization or were probabilistic CMs based on Gaussian processes. The second phase CM was based on kernel logistic regression.ResultsThe final CM can distinguish healthy subjects and subjects with LC with AUC values of 0.872–0.875. The performance of developed CM was evaluated using datum and testing sets for each LC stage category. Sensitivity values ranged from 66.7 to 85.7%, from 77.8 to 100%, and from 70 to 80% for LC stages I, II, and III, respectively. Specificity values ranged from 79.3 to 90.0%.DiscussionThus, the CM has a good diagnostic value and does not require clinical or other data on tumor-associated biomarkers. The current method for LC detection has some advantages for future clinical implementation as a decision-making support system due to the performance of the CM requires data exclusively from NGS-analysis of blood plasma cfDNA fragmentation; the accuracy of the CM does not depend on any additional clinical data; the CM is highly interpretable and traceable; CM has appropriate modular architecture. |
format | Article |
id | doaj-art-4bdbc3c72d234dc2abb392b292b9ead8 |
institution | Kabale University |
issn | 2296-858X |
language | English |
publishDate | 2025-01-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Medicine |
spelling | doaj-art-4bdbc3c72d234dc2abb392b292b9ead82025-01-29T06:45:51ZengFrontiers Media S.A.Frontiers in Medicine2296-858X2025-01-011210.3389/fmed.2025.14354281435428Diagnostics of lung cancer by fragmentated blood circulating cell-free DNA based on machine learning methodsIvan O. Meshkov0Alexander P. Koturgin1Pavel V. Ershov2Liubov A. Safonova3Julia A. Remizova4Valentina V. Maksyutina5Ekaterina D. Maralova6Vasilisa A. Astafieva7Alexey A. Ivashechkin8Boris D. Ignatiev9Antonida V. Makhotenko10Ekaterina A. Snigir11Valentin V. Makarov12Vladimir S. Yudin13Anton A. Keskinov14Sergey M. Yudin15Anna S. Makarova16Veronika I. Skvortsova17Federal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaFederal State Budgetary Institution “Centre for Strategic Planning and Management of Biomedical Health Risks” of the Federal Medical and Biological Agency (Centre for Strategic Planning, of the Federal Medical and Biological Agency), Moscow, RussiaThe Federal Medical and Biological Agency (FMBA of Russia), Moscow, RussiaIntroductionMinimally invasive diagnostics based on liquid biopsy makes it possible early detection of lung cancer (LC). The blood plasma circulating cell-free DNA (cfDNA) fragments reflect the genome and chromatin status and are considered as integral cancer biomarkers and the biological entities for ‘cancer-of-origin’ prediction. The aim of this work is to create a method for processing next-generation sequencing (NGS) data and an interpretable binary classification model (CM), which analyzed cfDNA fragmentation features for distinguishing healthy subjects and subjects with LC.Methods148 healthy subjects and 138 subjects with LC were included in the study. cfDNA fractions, isolated from blood plasma biospecimens, were used for DNA libraries preparations and NGS on the NovaSeq 6,000 Illumina system with a coverage of 100 million reads/sample. Twelve variables, describing the abundance and length distribution of cfDNA fragments within each genomic interval, and 40 variables based on the values of position-weight matrices, describing combinations of 5-bp-long terminal motifs of cfDNA fragments, were used to characterize genomic fragmentation. Classification models of the first phase of machine learning were based either on logistic regression with L1- and L2-regularization or were probabilistic CMs based on Gaussian processes. The second phase CM was based on kernel logistic regression.ResultsThe final CM can distinguish healthy subjects and subjects with LC with AUC values of 0.872–0.875. The performance of developed CM was evaluated using datum and testing sets for each LC stage category. Sensitivity values ranged from 66.7 to 85.7%, from 77.8 to 100%, and from 70 to 80% for LC stages I, II, and III, respectively. Specificity values ranged from 79.3 to 90.0%.DiscussionThus, the CM has a good diagnostic value and does not require clinical or other data on tumor-associated biomarkers. The current method for LC detection has some advantages for future clinical implementation as a decision-making support system due to the performance of the CM requires data exclusively from NGS-analysis of blood plasma cfDNA fragmentation; the accuracy of the CM does not depend on any additional clinical data; the CM is highly interpretable and traceable; CM has appropriate modular architecture.https://www.frontiersin.org/articles/10.3389/fmed.2025.1435428/fullmachine learning methodslung cancerfragmentomecirculating cell-free DNAcfDNAdiagnostic classification model |
spellingShingle | Ivan O. Meshkov Alexander P. Koturgin Pavel V. Ershov Liubov A. Safonova Julia A. Remizova Valentina V. Maksyutina Ekaterina D. Maralova Vasilisa A. Astafieva Alexey A. Ivashechkin Boris D. Ignatiev Antonida V. Makhotenko Ekaterina A. Snigir Valentin V. Makarov Vladimir S. Yudin Anton A. Keskinov Sergey M. Yudin Anna S. Makarova Veronika I. Skvortsova Diagnostics of lung cancer by fragmentated blood circulating cell-free DNA based on machine learning methods Frontiers in Medicine machine learning methods lung cancer fragmentome circulating cell-free DNA cfDNA diagnostic classification model |
title | Diagnostics of lung cancer by fragmentated blood circulating cell-free DNA based on machine learning methods |
title_full | Diagnostics of lung cancer by fragmentated blood circulating cell-free DNA based on machine learning methods |
title_fullStr | Diagnostics of lung cancer by fragmentated blood circulating cell-free DNA based on machine learning methods |
title_full_unstemmed | Diagnostics of lung cancer by fragmentated blood circulating cell-free DNA based on machine learning methods |
title_short | Diagnostics of lung cancer by fragmentated blood circulating cell-free DNA based on machine learning methods |
title_sort | diagnostics of lung cancer by fragmentated blood circulating cell free dna based on machine learning methods |
topic | machine learning methods lung cancer fragmentome circulating cell-free DNA cfDNA diagnostic classification model |
url | https://www.frontiersin.org/articles/10.3389/fmed.2025.1435428/full |
work_keys_str_mv | AT ivanomeshkov diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods AT alexanderpkoturgin diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods AT pavelvershov diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods AT liubovasafonova diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods AT juliaaremizova diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods AT valentinavmaksyutina diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods AT ekaterinadmaralova diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods AT vasilisaaastafieva diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods AT alexeyaivashechkin diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods AT borisdignatiev diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods AT antonidavmakhotenko diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods AT ekaterinaasnigir diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods AT valentinvmakarov diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods AT vladimirsyudin diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods AT antonakeskinov diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods AT sergeymyudin diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods AT annasmakarova diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods AT veronikaiskvortsova diagnosticsoflungcancerbyfragmentatedbloodcirculatingcellfreednabasedonmachinelearningmethods |