A robust transfer learning approach for high-dimensional linear regression to support integration of multi-source gene expression data.

Transfer learning aims to integrate useful information from multi-source datasets to improve the learning performance of target data. This can be effectively applied in genomics when we learn the gene associations in a target tissue, and data from other tissues can be integrated. However, heavy-tail...

Full description

Saved in:
Bibliographic Details
Main Authors: Lulu Pan, Qian Gao, Kecheng Wei, Yongfu Yu, Guoyou Qin, Tong Wang
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-01-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1012739
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832540334167425024
author Lulu Pan
Qian Gao
Kecheng Wei
Yongfu Yu
Guoyou Qin
Tong Wang
author_facet Lulu Pan
Qian Gao
Kecheng Wei
Yongfu Yu
Guoyou Qin
Tong Wang
author_sort Lulu Pan
collection DOAJ
description Transfer learning aims to integrate useful information from multi-source datasets to improve the learning performance of target data. This can be effectively applied in genomics when we learn the gene associations in a target tissue, and data from other tissues can be integrated. However, heavy-tail distribution and outliers are common in genomics data, which poses challenges to the effectiveness of current transfer learning approaches. In this paper, we study the transfer learning problem under high-dimensional linear models with t-distributed error (Trans-PtLR), which aims to improve the estimation and prediction of target data by borrowing information from useful source data and offering robustness to accommodate complex data with heavy tails and outliers. In the oracle case with known transferable source datasets, a transfer learning algorithm based on penalized maximum likelihood and expectation-maximization algorithm is established. To avoid including non-informative sources, we propose to select the transferable sources based on cross-validation. Extensive simulation experiments as well as an application demonstrate that Trans-PtLR demonstrates robustness and better performance of estimation and prediction when heavy-tail and outliers exist compared to transfer learning for linear regression model with normal error distribution. Data integration, Variable selection, T distribution, Expectation maximization algorithm, Genotype-Tissue Expression, Cross validation.
format Article
id doaj-art-8cae2cfb58c949e89c7a5b260eae588d
institution Kabale University
issn 1553-734X
1553-7358
language English
publishDate 2025-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj-art-8cae2cfb58c949e89c7a5b260eae588d2025-02-05T05:30:38ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582025-01-01211e101273910.1371/journal.pcbi.1012739A robust transfer learning approach for high-dimensional linear regression to support integration of multi-source gene expression data.Lulu PanQian GaoKecheng WeiYongfu YuGuoyou QinTong WangTransfer learning aims to integrate useful information from multi-source datasets to improve the learning performance of target data. This can be effectively applied in genomics when we learn the gene associations in a target tissue, and data from other tissues can be integrated. However, heavy-tail distribution and outliers are common in genomics data, which poses challenges to the effectiveness of current transfer learning approaches. In this paper, we study the transfer learning problem under high-dimensional linear models with t-distributed error (Trans-PtLR), which aims to improve the estimation and prediction of target data by borrowing information from useful source data and offering robustness to accommodate complex data with heavy tails and outliers. In the oracle case with known transferable source datasets, a transfer learning algorithm based on penalized maximum likelihood and expectation-maximization algorithm is established. To avoid including non-informative sources, we propose to select the transferable sources based on cross-validation. Extensive simulation experiments as well as an application demonstrate that Trans-PtLR demonstrates robustness and better performance of estimation and prediction when heavy-tail and outliers exist compared to transfer learning for linear regression model with normal error distribution. Data integration, Variable selection, T distribution, Expectation maximization algorithm, Genotype-Tissue Expression, Cross validation.https://doi.org/10.1371/journal.pcbi.1012739
spellingShingle Lulu Pan
Qian Gao
Kecheng Wei
Yongfu Yu
Guoyou Qin
Tong Wang
A robust transfer learning approach for high-dimensional linear regression to support integration of multi-source gene expression data.
PLoS Computational Biology
title A robust transfer learning approach for high-dimensional linear regression to support integration of multi-source gene expression data.
title_full A robust transfer learning approach for high-dimensional linear regression to support integration of multi-source gene expression data.
title_fullStr A robust transfer learning approach for high-dimensional linear regression to support integration of multi-source gene expression data.
title_full_unstemmed A robust transfer learning approach for high-dimensional linear regression to support integration of multi-source gene expression data.
title_short A robust transfer learning approach for high-dimensional linear regression to support integration of multi-source gene expression data.
title_sort robust transfer learning approach for high dimensional linear regression to support integration of multi source gene expression data
url https://doi.org/10.1371/journal.pcbi.1012739
work_keys_str_mv AT lulupan arobusttransferlearningapproachforhighdimensionallinearregressiontosupportintegrationofmultisourcegeneexpressiondata
AT qiangao arobusttransferlearningapproachforhighdimensionallinearregressiontosupportintegrationofmultisourcegeneexpressiondata
AT kechengwei arobusttransferlearningapproachforhighdimensionallinearregressiontosupportintegrationofmultisourcegeneexpressiondata
AT yongfuyu arobusttransferlearningapproachforhighdimensionallinearregressiontosupportintegrationofmultisourcegeneexpressiondata
AT guoyouqin arobusttransferlearningapproachforhighdimensionallinearregressiontosupportintegrationofmultisourcegeneexpressiondata
AT tongwang arobusttransferlearningapproachforhighdimensionallinearregressiontosupportintegrationofmultisourcegeneexpressiondata
AT lulupan robusttransferlearningapproachforhighdimensionallinearregressiontosupportintegrationofmultisourcegeneexpressiondata
AT qiangao robusttransferlearningapproachforhighdimensionallinearregressiontosupportintegrationofmultisourcegeneexpressiondata
AT kechengwei robusttransferlearningapproachforhighdimensionallinearregressiontosupportintegrationofmultisourcegeneexpressiondata
AT yongfuyu robusttransferlearningapproachforhighdimensionallinearregressiontosupportintegrationofmultisourcegeneexpressiondata
AT guoyouqin robusttransferlearningapproachforhighdimensionallinearregressiontosupportintegrationofmultisourcegeneexpressiondata
AT tongwang robusttransferlearningapproachforhighdimensionallinearregressiontosupportintegrationofmultisourcegeneexpressiondata