Comparative analysis of regression algorithms for drug response prediction using GDSC dataset

Abstract Background Drug response prediction can infer the relationship between an individual’s genetic profile and a drug, which can be used to determine the choice of treatment for an individual patient. Prediction of drug response is recently being performed using machine learning technology. How...

Full description

Saved in:
Bibliographic Details
Main Authors: Soojung Ha, Juho Park, Kyuri Jo
Format: Article
Language:English
Published: BMC 2025-01-01
Series:BMC Research Notes
Subjects:
Online Access:https://doi.org/10.1186/s13104-024-07026-w
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832595044492640256
author Soojung Ha
Juho Park
Kyuri Jo
author_facet Soojung Ha
Juho Park
Kyuri Jo
author_sort Soojung Ha
collection DOAJ
description Abstract Background Drug response prediction can infer the relationship between an individual’s genetic profile and a drug, which can be used to determine the choice of treatment for an individual patient. Prediction of drug response is recently being performed using machine learning technology. However, high-throughput sequencing data produces thousands of features per patient. In addition, it is difficult for researchers to know which algorithm is appropriate for prediction as various regression and feature selection algorithms exist. Methods We compared and evaluated the performance of 13 representative regression algorithms using Genomics of Drug Sensitivity in Cancer (GDSC) dataset. Three analyses was conducted to show the effect of feature selection methods, multiomics information, and drug categories on drug response prediction. Results In the experiments, Support Vector Regression algorithm and gene features selected with LINC L1000 dataset showed the best performance in terms of accuracy and execution time. However, integration of mutation and copy number variation information did not contribute to the prediction. Among the drug groups, responses of drugs related with hormone-related pathway were predicted with relatively high accuracy. Conclusion This study can help bioinformatics researchers design data processing steps and select algorithms for drug response prediction, and develop a new drug response prediction model based on the GDSC or other high-throughput sequencing datasets.
format Article
id doaj-art-581e97cc08d54217853880a1851520bc
institution Kabale University
issn 1756-0500
language English
publishDate 2025-01-01
publisher BMC
record_format Article
series BMC Research Notes
spelling doaj-art-581e97cc08d54217853880a1851520bc2025-01-19T12:08:44ZengBMCBMC Research Notes1756-05002025-01-0118S11910.1186/s13104-024-07026-wComparative analysis of regression algorithms for drug response prediction using GDSC datasetSoojung Ha0Juho Park1Kyuri Jo2Department of Computer Engineering, Chungbuk National UniversityDepartment of Computer Engineering, Chungbuk National UniversityDepartment of Computer Engineering, Chungbuk National UniversityAbstract Background Drug response prediction can infer the relationship between an individual’s genetic profile and a drug, which can be used to determine the choice of treatment for an individual patient. Prediction of drug response is recently being performed using machine learning technology. However, high-throughput sequencing data produces thousands of features per patient. In addition, it is difficult for researchers to know which algorithm is appropriate for prediction as various regression and feature selection algorithms exist. Methods We compared and evaluated the performance of 13 representative regression algorithms using Genomics of Drug Sensitivity in Cancer (GDSC) dataset. Three analyses was conducted to show the effect of feature selection methods, multiomics information, and drug categories on drug response prediction. Results In the experiments, Support Vector Regression algorithm and gene features selected with LINC L1000 dataset showed the best performance in terms of accuracy and execution time. However, integration of mutation and copy number variation information did not contribute to the prediction. Among the drug groups, responses of drugs related with hormone-related pathway were predicted with relatively high accuracy. Conclusion This study can help bioinformatics researchers design data processing steps and select algorithms for drug response prediction, and develop a new drug response prediction model based on the GDSC or other high-throughput sequencing datasets.https://doi.org/10.1186/s13104-024-07026-wDrug responseRegressionGene expressionMultiomicsGDSC dataset
spellingShingle Soojung Ha
Juho Park
Kyuri Jo
Comparative analysis of regression algorithms for drug response prediction using GDSC dataset
BMC Research Notes
Drug response
Regression
Gene expression
Multiomics
GDSC dataset
title Comparative analysis of regression algorithms for drug response prediction using GDSC dataset
title_full Comparative analysis of regression algorithms for drug response prediction using GDSC dataset
title_fullStr Comparative analysis of regression algorithms for drug response prediction using GDSC dataset
title_full_unstemmed Comparative analysis of regression algorithms for drug response prediction using GDSC dataset
title_short Comparative analysis of regression algorithms for drug response prediction using GDSC dataset
title_sort comparative analysis of regression algorithms for drug response prediction using gdsc dataset
topic Drug response
Regression
Gene expression
Multiomics
GDSC dataset
url https://doi.org/10.1186/s13104-024-07026-w
work_keys_str_mv AT soojungha comparativeanalysisofregressionalgorithmsfordrugresponsepredictionusinggdscdataset
AT juhopark comparativeanalysisofregressionalgorithmsfordrugresponsepredictionusinggdscdataset
AT kyurijo comparativeanalysisofregressionalgorithmsfordrugresponsepredictionusinggdscdataset