Weighting health-related estimates in the GCAT cohort and the general population of Catalonia

Abstract Population-based cohorts play a key role in personalized medicine. However, it is known that cohorts are affected by the “healthy volunteer bias” where participants are generally healthier than the broader population, compromising its representativeness. Here, we assess the healthy bias, id...

Full description

Saved in:
Bibliographic Details
Main Authors: Natalia Blay, Lucía A. Carrasco-Ribelles, Xavier Farré, Susana Iraola-Guzmán, Marc Danés-Castells, Concepción Violán, Rafael de Cid
Format: Article
Language:English
Published: Nature Portfolio 2025-05-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-01284-9
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849726125492666368
author Natalia Blay
Lucía A. Carrasco-Ribelles
Xavier Farré
Susana Iraola-Guzmán
Marc Danés-Castells
Concepción Violán
Rafael de Cid
author_facet Natalia Blay
Lucía A. Carrasco-Ribelles
Xavier Farré
Susana Iraola-Guzmán
Marc Danés-Castells
Concepción Violán
Rafael de Cid
author_sort Natalia Blay
collection DOAJ
description Abstract Population-based cohorts play a key role in personalized medicine. However, it is known that cohorts are affected by the “healthy volunteer bias” where participants are generally healthier than the broader population, compromising its representativeness. Here, we assess the healthy bias, identifying bias key indicators for representativeness of the GCAT cohort, encompassing 20,000 adult participants of Catalonia, and generating survey raked weights to enhance the cohort’s comparability. To assess and correct the bias, we compare multiple variables across sociodemographic, lifestyle, diseases and medication domains. Electronic health records of Catalonia (SIDIAP), the Health Survey of Catalonia (ESCA) and registers from the statistics institute of Catalonia (IDESCAT) and Spain (INE) were used to make the comparisons. We observed that the GCAT cohort is enriched in women and younger individuals, people with higher socioeconomic status and more health conscious and healthier individuals in terms of mortality and chronic disease prevalence. Raked survey weighting identified sex, birth year, rurality, education level, civil status, occupation status, smoking habit, household size, self-perceived health status and number of primary care visits as key weight variables. On average, raked weights reduced the differences by 70% for compared variables, and by 26% in disease prevalence estimates. We can conclude that the application of raked weights has enhanced the cohort’s representativeness, improved comparability, and yielded more precise estimates when analysing GCAT data.
format Article
id doaj-art-be00aafed84a4ebbb30cf60ecbf731e0
institution DOAJ
issn 2045-2322
language English
publishDate 2025-05-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-be00aafed84a4ebbb30cf60ecbf731e02025-08-20T03:10:17ZengNature PortfolioScientific Reports2045-23222025-05-0115111210.1038/s41598-025-01284-9Weighting health-related estimates in the GCAT cohort and the general population of CataloniaNatalia Blay0Lucía A. Carrasco-Ribelles1Xavier Farré2Susana Iraola-Guzmán3Marc Danés-Castells4Concepción Violán5Rafael de Cid6Genomes for Life-GCAT Lab, CORE Program, Germans Trias i Pujol Research Institute (IGTP)Grup de Recerca en Impacte de les Malalties Cròniques i les seves Trajectòries (GRIMTra)Genomes for Life-GCAT Lab, CORE Program, Germans Trias i Pujol Research Institute (IGTP)Genomes for Life-GCAT Lab, CORE Program, Germans Trias i Pujol Research Institute (IGTP)Grup de Recerca en Impacte de les Malalties Cròniques i les seves Trajectòries (GRIMTra)Grup de Recerca en Impacte de les Malalties Cròniques i les seves Trajectòries (GRIMTra)Genomes for Life-GCAT Lab, CORE Program, Germans Trias i Pujol Research Institute (IGTP)Abstract Population-based cohorts play a key role in personalized medicine. However, it is known that cohorts are affected by the “healthy volunteer bias” where participants are generally healthier than the broader population, compromising its representativeness. Here, we assess the healthy bias, identifying bias key indicators for representativeness of the GCAT cohort, encompassing 20,000 adult participants of Catalonia, and generating survey raked weights to enhance the cohort’s comparability. To assess and correct the bias, we compare multiple variables across sociodemographic, lifestyle, diseases and medication domains. Electronic health records of Catalonia (SIDIAP), the Health Survey of Catalonia (ESCA) and registers from the statistics institute of Catalonia (IDESCAT) and Spain (INE) were used to make the comparisons. We observed that the GCAT cohort is enriched in women and younger individuals, people with higher socioeconomic status and more health conscious and healthier individuals in terms of mortality and chronic disease prevalence. Raked survey weighting identified sex, birth year, rurality, education level, civil status, occupation status, smoking habit, household size, self-perceived health status and number of primary care visits as key weight variables. On average, raked weights reduced the differences by 70% for compared variables, and by 26% in disease prevalence estimates. We can conclude that the application of raked weights has enhanced the cohort’s representativeness, improved comparability, and yielded more precise estimates when analysing GCAT data.https://doi.org/10.1038/s41598-025-01284-9GCATCohortBiasRaked weightsRepresentativenessPopulation health
spellingShingle Natalia Blay
Lucía A. Carrasco-Ribelles
Xavier Farré
Susana Iraola-Guzmán
Marc Danés-Castells
Concepción Violán
Rafael de Cid
Weighting health-related estimates in the GCAT cohort and the general population of Catalonia
Scientific Reports
GCAT
Cohort
Bias
Raked weights
Representativeness
Population health
title Weighting health-related estimates in the GCAT cohort and the general population of Catalonia
title_full Weighting health-related estimates in the GCAT cohort and the general population of Catalonia
title_fullStr Weighting health-related estimates in the GCAT cohort and the general population of Catalonia
title_full_unstemmed Weighting health-related estimates in the GCAT cohort and the general population of Catalonia
title_short Weighting health-related estimates in the GCAT cohort and the general population of Catalonia
title_sort weighting health related estimates in the gcat cohort and the general population of catalonia
topic GCAT
Cohort
Bias
Raked weights
Representativeness
Population health
url https://doi.org/10.1038/s41598-025-01284-9
work_keys_str_mv AT nataliablay weightinghealthrelatedestimatesinthegcatcohortandthegeneralpopulationofcatalonia
AT luciaacarrascoribelles weightinghealthrelatedestimatesinthegcatcohortandthegeneralpopulationofcatalonia
AT xavierfarre weightinghealthrelatedestimatesinthegcatcohortandthegeneralpopulationofcatalonia
AT susanairaolaguzman weightinghealthrelatedestimatesinthegcatcohortandthegeneralpopulationofcatalonia
AT marcdanescastells weightinghealthrelatedestimatesinthegcatcohortandthegeneralpopulationofcatalonia
AT concepcionviolan weightinghealthrelatedestimatesinthegcatcohortandthegeneralpopulationofcatalonia
AT rafaeldecid weightinghealthrelatedestimatesinthegcatcohortandthegeneralpopulationofcatalonia