Multiple Group Testing Procedures for Analysis of High-Dimensional Genomic Data

In genetic association studies with high-dimensional genomic data, multiple group testing procedures are often required in order to identify disease/trait-related genes or genetic regions, where multiple genetic sites or variants are located within the same gene or genetic region. However, statistic...

Full description

Saved in:
Bibliographic Details
Main Authors: Hyoseok Ko, Kipoong Kim, Hokeun Sun
Format: Article
Language:English
Published: BioMed Central 2016-12-01
Series:Genomics & Informatics
Subjects:
Online Access:http://genominfo.org/upload/pdf/gni-14-187.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832569468248653824
author Hyoseok Ko
Kipoong Kim
Hokeun Sun
author_facet Hyoseok Ko
Kipoong Kim
Hokeun Sun
author_sort Hyoseok Ko
collection DOAJ
description In genetic association studies with high-dimensional genomic data, multiple group testing procedures are often required in order to identify disease/trait-related genes or genetic regions, where multiple genetic sites or variants are located within the same gene or genetic region. However, statistical testing procedures based on an individual test suffer from multiple testing issues such as the control of family-wise error rate and dependent tests. Moreover, detecting only a few of genes associated with a phenotype outcome among tens of thousands of genes is of main interest in genetic association studies. In this reason regularization procedures, where a phenotype outcome regresses on all genomic markers and then regression coefficients are estimated based on a penalized likelihood, have been considered as a good alternative approach to analysis of high-dimensional genomic data. But, selection performance of regularization procedures has been rarely compared with that of statistical group testing procedures. In this article, we performed extensive simulation studies where commonly used group testing procedures such as principal component analysis, Hotelling's T2 test, and permutation test are compared with group lasso (least absolute selection and shrinkage operator) in terms of true positive selection. Also, we applied all methods considered in simulation studies to identify genes associated with ovarian cancer from over 20,000 genetic sites generated from Illumina Infinium HumanMethylation27K Beadchip. We found a big discrepancy of selected genes between multiple group testing procedures and group lasso.
format Article
id doaj-art-828aa5cb78dc4baf91e7c31538a3c82f
institution Kabale University
issn 1598-866X
2234-0742
language English
publishDate 2016-12-01
publisher BioMed Central
record_format Article
series Genomics & Informatics
spelling doaj-art-828aa5cb78dc4baf91e7c31538a3c82f2025-02-02T21:04:22ZengBioMed CentralGenomics & Informatics1598-866X2234-07422016-12-0114418719510.5808/GI.2016.14.4.187175Multiple Group Testing Procedures for Analysis of High-Dimensional Genomic DataHyoseok Ko0Kipoong Kim1Hokeun Sun2Department of Statistics, Pusan National University, Busan 46241, Korea.Department of Statistics, Pusan National University, Busan 46241, Korea.Department of Statistics, Pusan National University, Busan 46241, Korea.In genetic association studies with high-dimensional genomic data, multiple group testing procedures are often required in order to identify disease/trait-related genes or genetic regions, where multiple genetic sites or variants are located within the same gene or genetic region. However, statistical testing procedures based on an individual test suffer from multiple testing issues such as the control of family-wise error rate and dependent tests. Moreover, detecting only a few of genes associated with a phenotype outcome among tens of thousands of genes is of main interest in genetic association studies. In this reason regularization procedures, where a phenotype outcome regresses on all genomic markers and then regression coefficients are estimated based on a penalized likelihood, have been considered as a good alternative approach to analysis of high-dimensional genomic data. But, selection performance of regularization procedures has been rarely compared with that of statistical group testing procedures. In this article, we performed extensive simulation studies where commonly used group testing procedures such as principal component analysis, Hotelling's T2 test, and permutation test are compared with group lasso (least absolute selection and shrinkage operator) in terms of true positive selection. Also, we applied all methods considered in simulation studies to identify genes associated with ovarian cancer from over 20,000 genetic sites generated from Illumina Infinium HumanMethylation27K Beadchip. We found a big discrepancy of selected genes between multiple group testing procedures and group lasso.http://genominfo.org/upload/pdf/gni-14-187.pdfgenetic association studiesgenetic selectiongenetic testingprincipal component analysis
spellingShingle Hyoseok Ko
Kipoong Kim
Hokeun Sun
Multiple Group Testing Procedures for Analysis of High-Dimensional Genomic Data
Genomics & Informatics
genetic association studies
genetic selection
genetic testing
principal component analysis
title Multiple Group Testing Procedures for Analysis of High-Dimensional Genomic Data
title_full Multiple Group Testing Procedures for Analysis of High-Dimensional Genomic Data
title_fullStr Multiple Group Testing Procedures for Analysis of High-Dimensional Genomic Data
title_full_unstemmed Multiple Group Testing Procedures for Analysis of High-Dimensional Genomic Data
title_short Multiple Group Testing Procedures for Analysis of High-Dimensional Genomic Data
title_sort multiple group testing procedures for analysis of high dimensional genomic data
topic genetic association studies
genetic selection
genetic testing
principal component analysis
url http://genominfo.org/upload/pdf/gni-14-187.pdf
work_keys_str_mv AT hyoseokko multiplegrouptestingproceduresforanalysisofhighdimensionalgenomicdata
AT kipoongkim multiplegrouptestingproceduresforanalysisofhighdimensionalgenomicdata
AT hokeunsun multiplegrouptestingproceduresforanalysisofhighdimensionalgenomicdata