A Group Feature Screening Procedure Based on Pearson Chi-Square Statistic for Biology Data with Categorical Response

The analysis of biogenetic data makes an important contribution to the understanding of disease mechanisms and the diagnosis of rare diseases. In this analysis, the selection of significant features affecting the disease provides an effective basis for subsequent disease judgment and treatment direc...

Full description

Saved in:
Bibliographic Details
Main Authors: Hanji He, Jianfeng He, Guangming Deng
Format: Article
Language:English
Published: Wiley 2024-01-01
Series:Journal of Mathematics
Online Access:http://dx.doi.org/10.1155/2024/9014764
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The analysis of biogenetic data makes an important contribution to the understanding of disease mechanisms and the diagnosis of rare diseases. In this analysis, the selection of significant features affecting the disease provides an effective basis for subsequent disease judgment and treatment direction. However, this is not a simple task as biogenetic data have challenges such as ultra-high dimensionality of potential features, imbalance of response variables, and genetic associations. This study focuses on the group structure in feature screening with biogenetic data. Specifically, group structure exists for biogenetic data, so we need to analyze the entire genome rather than individual strongly correlated genes. This study proposes a group feature screening method that considers group correlations using adjusted Pearson’s cardinality statistic to address this issue. The method can be applied to both continuous and discrete covariates. The performance of the proposed method is illustrated by simulation studies, where the proposed method performs well with imbalanced data and multicategorical responses. In the application of lung cancer diagnosis, the proposed method for imbalanced data categorization is impressive, and the dimension reduction using linear discriminant is still good.
ISSN:2314-4785