A Group Feature Screening Procedure Based on Pearson Chi-Square Statistic for Biology Data with Categorical Response

The analysis of biogenetic data makes an important contribution to the understanding of disease mechanisms and the diagnosis of rare diseases. In this analysis, the selection of significant features affecting the disease provides an effective basis for subsequent disease judgment and treatment direc...

Full description

Saved in:
Bibliographic Details
Main Authors: Hanji He, Jianfeng He, Guangming Deng
Format: Article
Language:English
Published: Wiley 2024-01-01
Series:Journal of Mathematics
Online Access:http://dx.doi.org/10.1155/2024/9014764
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832543047044300800
author Hanji He
Jianfeng He
Guangming Deng
author_facet Hanji He
Jianfeng He
Guangming Deng
author_sort Hanji He
collection DOAJ
description The analysis of biogenetic data makes an important contribution to the understanding of disease mechanisms and the diagnosis of rare diseases. In this analysis, the selection of significant features affecting the disease provides an effective basis for subsequent disease judgment and treatment direction. However, this is not a simple task as biogenetic data have challenges such as ultra-high dimensionality of potential features, imbalance of response variables, and genetic associations. This study focuses on the group structure in feature screening with biogenetic data. Specifically, group structure exists for biogenetic data, so we need to analyze the entire genome rather than individual strongly correlated genes. This study proposes a group feature screening method that considers group correlations using adjusted Pearson’s cardinality statistic to address this issue. The method can be applied to both continuous and discrete covariates. The performance of the proposed method is illustrated by simulation studies, where the proposed method performs well with imbalanced data and multicategorical responses. In the application of lung cancer diagnosis, the proposed method for imbalanced data categorization is impressive, and the dimension reduction using linear discriminant is still good.
format Article
id doaj-art-d30addae35ec4108a97aa6b4988790c1
institution Kabale University
issn 2314-4785
language English
publishDate 2024-01-01
publisher Wiley
record_format Article
series Journal of Mathematics
spelling doaj-art-d30addae35ec4108a97aa6b4988790c12025-02-03T11:53:34ZengWileyJournal of Mathematics2314-47852024-01-01202410.1155/2024/9014764A Group Feature Screening Procedure Based on Pearson Chi-Square Statistic for Biology Data with Categorical ResponseHanji He0Jianfeng He1Guangming Deng2School of Mathematics and StatisticsSchool of Economics and FinanceSchool of Mathematics and StatisticsThe analysis of biogenetic data makes an important contribution to the understanding of disease mechanisms and the diagnosis of rare diseases. In this analysis, the selection of significant features affecting the disease provides an effective basis for subsequent disease judgment and treatment direction. However, this is not a simple task as biogenetic data have challenges such as ultra-high dimensionality of potential features, imbalance of response variables, and genetic associations. This study focuses on the group structure in feature screening with biogenetic data. Specifically, group structure exists for biogenetic data, so we need to analyze the entire genome rather than individual strongly correlated genes. This study proposes a group feature screening method that considers group correlations using adjusted Pearson’s cardinality statistic to address this issue. The method can be applied to both continuous and discrete covariates. The performance of the proposed method is illustrated by simulation studies, where the proposed method performs well with imbalanced data and multicategorical responses. In the application of lung cancer diagnosis, the proposed method for imbalanced data categorization is impressive, and the dimension reduction using linear discriminant is still good.http://dx.doi.org/10.1155/2024/9014764
spellingShingle Hanji He
Jianfeng He
Guangming Deng
A Group Feature Screening Procedure Based on Pearson Chi-Square Statistic for Biology Data with Categorical Response
Journal of Mathematics
title A Group Feature Screening Procedure Based on Pearson Chi-Square Statistic for Biology Data with Categorical Response
title_full A Group Feature Screening Procedure Based on Pearson Chi-Square Statistic for Biology Data with Categorical Response
title_fullStr A Group Feature Screening Procedure Based on Pearson Chi-Square Statistic for Biology Data with Categorical Response
title_full_unstemmed A Group Feature Screening Procedure Based on Pearson Chi-Square Statistic for Biology Data with Categorical Response
title_short A Group Feature Screening Procedure Based on Pearson Chi-Square Statistic for Biology Data with Categorical Response
title_sort group feature screening procedure based on pearson chi square statistic for biology data with categorical response
url http://dx.doi.org/10.1155/2024/9014764
work_keys_str_mv AT hanjihe agroupfeaturescreeningprocedurebasedonpearsonchisquarestatisticforbiologydatawithcategoricalresponse
AT jianfenghe agroupfeaturescreeningprocedurebasedonpearsonchisquarestatisticforbiologydatawithcategoricalresponse
AT guangmingdeng agroupfeaturescreeningprocedurebasedonpearsonchisquarestatisticforbiologydatawithcategoricalresponse
AT hanjihe groupfeaturescreeningprocedurebasedonpearsonchisquarestatisticforbiologydatawithcategoricalresponse
AT jianfenghe groupfeaturescreeningprocedurebasedonpearsonchisquarestatisticforbiologydatawithcategoricalresponse
AT guangmingdeng groupfeaturescreeningprocedurebasedonpearsonchisquarestatisticforbiologydatawithcategoricalresponse