A robust transfer learning approach for high-dimensional linear regression to support integration of multi-source gene expression data.

Transfer learning aims to integrate useful information from multi-source datasets to improve the learning performance of target data. This can be effectively applied in genomics when we learn the gene associations in a target tissue, and data from other tissues can be integrated. However, heavy-tail...

Full description

Saved in:

Bibliographic Details
Main Authors:	Lulu Pan, Qian Gao, Kecheng Wei, Yongfu Yu, Guoyou Qin, Tong Wang
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2025-01-01
Series:	PLoS Computational Biology
Online Access:	https://doi.org/10.1371/journal.pcbi.1012739
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832540334167425024
author	Lulu Pan Qian Gao Kecheng Wei Yongfu Yu Guoyou Qin Tong Wang
author_facet	Lulu Pan Qian Gao Kecheng Wei Yongfu Yu Guoyou Qin Tong Wang
author_sort	Lulu Pan
collection	DOAJ
description	Transfer learning aims to integrate useful information from multi-source datasets to improve the learning performance of target data. This can be effectively applied in genomics when we learn the gene associations in a target tissue, and data from other tissues can be integrated. However, heavy-tail distribution and outliers are common in genomics data, which poses challenges to the effectiveness of current transfer learning approaches. In this paper, we study the transfer learning problem under high-dimensional linear models with t-distributed error (Trans-PtLR), which aims to improve the estimation and prediction of target data by borrowing information from useful source data and offering robustness to accommodate complex data with heavy tails and outliers. In the oracle case with known transferable source datasets, a transfer learning algorithm based on penalized maximum likelihood and expectation-maximization algorithm is established. To avoid including non-informative sources, we propose to select the transferable sources based on cross-validation. Extensive simulation experiments as well as an application demonstrate that Trans-PtLR demonstrates robustness and better performance of estimation and prediction when heavy-tail and outliers exist compared to transfer learning for linear regression model with normal error distribution. Data integration, Variable selection, T distribution, Expectation maximization algorithm, Genotype-Tissue Expression, Cross validation.
format	Article
id	doaj-art-8cae2cfb58c949e89c7a5b260eae588d
institution	Kabale University
issn	1553-734X 1553-7358
language	English
publishDate	2025-01-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS Computational Biology
spelling	doaj-art-8cae2cfb58c949e89c7a5b260eae588d2025-02-05T05:30:38ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582025-01-01211e101273910.1371/journal.pcbi.1012739A robust transfer learning approach for high-dimensional linear regression to support integration of multi-source gene expression data.Lulu PanQian GaoKecheng WeiYongfu YuGuoyou QinTong WangTransfer learning aims to integrate useful information from multi-source datasets to improve the learning performance of target data. This can be effectively applied in genomics when we learn the gene associations in a target tissue, and data from other tissues can be integrated. However, heavy-tail distribution and outliers are common in genomics data, which poses challenges to the effectiveness of current transfer learning approaches. In this paper, we study the transfer learning problem under high-dimensional linear models with t-distributed error (Trans-PtLR), which aims to improve the estimation and prediction of target data by borrowing information from useful source data and offering robustness to accommodate complex data with heavy tails and outliers. In the oracle case with known transferable source datasets, a transfer learning algorithm based on penalized maximum likelihood and expectation-maximization algorithm is established. To avoid including non-informative sources, we propose to select the transferable sources based on cross-validation. Extensive simulation experiments as well as an application demonstrate that Trans-PtLR demonstrates robustness and better performance of estimation and prediction when heavy-tail and outliers exist compared to transfer learning for linear regression model with normal error distribution. Data integration, Variable selection, T distribution, Expectation maximization algorithm, Genotype-Tissue Expression, Cross validation.https://doi.org/10.1371/journal.pcbi.1012739
spellingShingle	Lulu Pan Qian Gao Kecheng Wei Yongfu Yu Guoyou Qin Tong Wang A robust transfer learning approach for high-dimensional linear regression to support integration of multi-source gene expression data. PLoS Computational Biology
title	A robust transfer learning approach for high-dimensional linear regression to support integration of multi-source gene expression data.
title_full	A robust transfer learning approach for high-dimensional linear regression to support integration of multi-source gene expression data.
title_fullStr	A robust transfer learning approach for high-dimensional linear regression to support integration of multi-source gene expression data.
title_full_unstemmed	A robust transfer learning approach for high-dimensional linear regression to support integration of multi-source gene expression data.
title_short	A robust transfer learning approach for high-dimensional linear regression to support integration of multi-source gene expression data.
title_sort	robust transfer learning approach for high dimensional linear regression to support integration of multi source gene expression data
url	https://doi.org/10.1371/journal.pcbi.1012739
work_keys_str_mv	AT lulupan arobusttransferlearningapproachforhighdimensionallinearregressiontosupportintegrationofmultisourcegeneexpressiondata AT qiangao arobusttransferlearningapproachforhighdimensionallinearregressiontosupportintegrationofmultisourcegeneexpressiondata AT kechengwei arobusttransferlearningapproachforhighdimensionallinearregressiontosupportintegrationofmultisourcegeneexpressiondata AT yongfuyu arobusttransferlearningapproachforhighdimensionallinearregressiontosupportintegrationofmultisourcegeneexpressiondata AT guoyouqin arobusttransferlearningapproachforhighdimensionallinearregressiontosupportintegrationofmultisourcegeneexpressiondata AT tongwang arobusttransferlearningapproachforhighdimensionallinearregressiontosupportintegrationofmultisourcegeneexpressiondata AT lulupan robusttransferlearningapproachforhighdimensionallinearregressiontosupportintegrationofmultisourcegeneexpressiondata AT qiangao robusttransferlearningapproachforhighdimensionallinearregressiontosupportintegrationofmultisourcegeneexpressiondata AT kechengwei robusttransferlearningapproachforhighdimensionallinearregressiontosupportintegrationofmultisourcegeneexpressiondata AT yongfuyu robusttransferlearningapproachforhighdimensionallinearregressiontosupportintegrationofmultisourcegeneexpressiondata AT guoyouqin robusttransferlearningapproachforhighdimensionallinearregressiontosupportintegrationofmultisourcegeneexpressiondata AT tongwang robusttransferlearningapproachforhighdimensionallinearregressiontosupportintegrationofmultisourcegeneexpressiondata

A robust transfer learning approach for high-dimensional linear regression to support integration of multi-source gene expression data.

Similar Items