Weighting health-related estimates in the GCAT cohort and the general population of Catalonia
Abstract Population-based cohorts play a key role in personalized medicine. However, it is known that cohorts are affected by the “healthy volunteer bias” where participants are generally healthier than the broader population, compromising its representativeness. Here, we assess the healthy bias, id...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-05-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-01284-9 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849726125492666368 |
|---|---|
| author | Natalia Blay Lucía A. Carrasco-Ribelles Xavier Farré Susana Iraola-Guzmán Marc Danés-Castells Concepción Violán Rafael de Cid |
| author_facet | Natalia Blay Lucía A. Carrasco-Ribelles Xavier Farré Susana Iraola-Guzmán Marc Danés-Castells Concepción Violán Rafael de Cid |
| author_sort | Natalia Blay |
| collection | DOAJ |
| description | Abstract Population-based cohorts play a key role in personalized medicine. However, it is known that cohorts are affected by the “healthy volunteer bias” where participants are generally healthier than the broader population, compromising its representativeness. Here, we assess the healthy bias, identifying bias key indicators for representativeness of the GCAT cohort, encompassing 20,000 adult participants of Catalonia, and generating survey raked weights to enhance the cohort’s comparability. To assess and correct the bias, we compare multiple variables across sociodemographic, lifestyle, diseases and medication domains. Electronic health records of Catalonia (SIDIAP), the Health Survey of Catalonia (ESCA) and registers from the statistics institute of Catalonia (IDESCAT) and Spain (INE) were used to make the comparisons. We observed that the GCAT cohort is enriched in women and younger individuals, people with higher socioeconomic status and more health conscious and healthier individuals in terms of mortality and chronic disease prevalence. Raked survey weighting identified sex, birth year, rurality, education level, civil status, occupation status, smoking habit, household size, self-perceived health status and number of primary care visits as key weight variables. On average, raked weights reduced the differences by 70% for compared variables, and by 26% in disease prevalence estimates. We can conclude that the application of raked weights has enhanced the cohort’s representativeness, improved comparability, and yielded more precise estimates when analysing GCAT data. |
| format | Article |
| id | doaj-art-be00aafed84a4ebbb30cf60ecbf731e0 |
| institution | DOAJ |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-be00aafed84a4ebbb30cf60ecbf731e02025-08-20T03:10:17ZengNature PortfolioScientific Reports2045-23222025-05-0115111210.1038/s41598-025-01284-9Weighting health-related estimates in the GCAT cohort and the general population of CataloniaNatalia Blay0Lucía A. Carrasco-Ribelles1Xavier Farré2Susana Iraola-Guzmán3Marc Danés-Castells4Concepción Violán5Rafael de Cid6Genomes for Life-GCAT Lab, CORE Program, Germans Trias i Pujol Research Institute (IGTP)Grup de Recerca en Impacte de les Malalties Cròniques i les seves Trajectòries (GRIMTra)Genomes for Life-GCAT Lab, CORE Program, Germans Trias i Pujol Research Institute (IGTP)Genomes for Life-GCAT Lab, CORE Program, Germans Trias i Pujol Research Institute (IGTP)Grup de Recerca en Impacte de les Malalties Cròniques i les seves Trajectòries (GRIMTra)Grup de Recerca en Impacte de les Malalties Cròniques i les seves Trajectòries (GRIMTra)Genomes for Life-GCAT Lab, CORE Program, Germans Trias i Pujol Research Institute (IGTP)Abstract Population-based cohorts play a key role in personalized medicine. However, it is known that cohorts are affected by the “healthy volunteer bias” where participants are generally healthier than the broader population, compromising its representativeness. Here, we assess the healthy bias, identifying bias key indicators for representativeness of the GCAT cohort, encompassing 20,000 adult participants of Catalonia, and generating survey raked weights to enhance the cohort’s comparability. To assess and correct the bias, we compare multiple variables across sociodemographic, lifestyle, diseases and medication domains. Electronic health records of Catalonia (SIDIAP), the Health Survey of Catalonia (ESCA) and registers from the statistics institute of Catalonia (IDESCAT) and Spain (INE) were used to make the comparisons. We observed that the GCAT cohort is enriched in women and younger individuals, people with higher socioeconomic status and more health conscious and healthier individuals in terms of mortality and chronic disease prevalence. Raked survey weighting identified sex, birth year, rurality, education level, civil status, occupation status, smoking habit, household size, self-perceived health status and number of primary care visits as key weight variables. On average, raked weights reduced the differences by 70% for compared variables, and by 26% in disease prevalence estimates. We can conclude that the application of raked weights has enhanced the cohort’s representativeness, improved comparability, and yielded more precise estimates when analysing GCAT data.https://doi.org/10.1038/s41598-025-01284-9GCATCohortBiasRaked weightsRepresentativenessPopulation health |
| spellingShingle | Natalia Blay Lucía A. Carrasco-Ribelles Xavier Farré Susana Iraola-Guzmán Marc Danés-Castells Concepción Violán Rafael de Cid Weighting health-related estimates in the GCAT cohort and the general population of Catalonia Scientific Reports GCAT Cohort Bias Raked weights Representativeness Population health |
| title | Weighting health-related estimates in the GCAT cohort and the general population of Catalonia |
| title_full | Weighting health-related estimates in the GCAT cohort and the general population of Catalonia |
| title_fullStr | Weighting health-related estimates in the GCAT cohort and the general population of Catalonia |
| title_full_unstemmed | Weighting health-related estimates in the GCAT cohort and the general population of Catalonia |
| title_short | Weighting health-related estimates in the GCAT cohort and the general population of Catalonia |
| title_sort | weighting health related estimates in the gcat cohort and the general population of catalonia |
| topic | GCAT Cohort Bias Raked weights Representativeness Population health |
| url | https://doi.org/10.1038/s41598-025-01284-9 |
| work_keys_str_mv | AT nataliablay weightinghealthrelatedestimatesinthegcatcohortandthegeneralpopulationofcatalonia AT luciaacarrascoribelles weightinghealthrelatedestimatesinthegcatcohortandthegeneralpopulationofcatalonia AT xavierfarre weightinghealthrelatedestimatesinthegcatcohortandthegeneralpopulationofcatalonia AT susanairaolaguzman weightinghealthrelatedestimatesinthegcatcohortandthegeneralpopulationofcatalonia AT marcdanescastells weightinghealthrelatedestimatesinthegcatcohortandthegeneralpopulationofcatalonia AT concepcionviolan weightinghealthrelatedestimatesinthegcatcohortandthegeneralpopulationofcatalonia AT rafaeldecid weightinghealthrelatedestimatesinthegcatcohortandthegeneralpopulationofcatalonia |