Construction of the cancer patients’ database based on the US National Health and Nutrition Examination Survey (NHANES) datasets for cancer epidemiology research
Abstract Background The US National Health and Nutrition Examination Survey (NHANES) dataset does not include a specific question or laboratory test to confirm a history of cancer diagnosis. However, if straightforward variables for cancer history are introduced, US NHANES could be effectively utili...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2025-01-01
|
Series: | BMC Medical Research Methodology |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12874-025-02478-5 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832585610716512256 |
---|---|
author | Jinyoung Moon Yongseok Mun |
author_facet | Jinyoung Moon Yongseok Mun |
author_sort | Jinyoung Moon |
collection | DOAJ |
description | Abstract Background The US National Health and Nutrition Examination Survey (NHANES) dataset does not include a specific question or laboratory test to confirm a history of cancer diagnosis. However, if straightforward variables for cancer history are introduced, US NHANES could be effectively utilized in future cancer epidemiology studies. To address this gap, the authors developed a cancer patient database from the US NHANES datasets by employing multiple R programming codes. Methods To illustrate the practical application of this methodology to a real-world problem, the authors extracted the R codes applied in an academic paper published in another journal on January 30th, 2024 ( https://doi.org/10.1016/j.heliyon.2024.e24337 ). This paper will focus on the construction of the database and analysis using R codes. Entire. Results In the first example, the urine concentration of monocarboxynonyl phthalate, monocarboxyoctyl phthalate, mono-2-ethyl-5-carboxypentyl phthalate, and mono-2-hydroxy-iso-butyl phthalate (all ng/mL) were used as the independent variable, instead of the serum concentration of perfluorooctanoic acid (PFOA), perfluorooctane sulfonic acid (PFOS), perfluorohexane sulfonic acid (PFHxS), and perfluorononanoic acid (PFNA), respectively. In the second example, the serum concentration of 2,3,3’,4,4’-Pentachlorobiphenyl (PCB105), 2,3,4,4´,5-Pentachlorobiphenyl (PCB114), 2,3’,4,4’,5-Pentachlorobiphenyl (PCB118), and 2,2’,3,4,4’,5’- and 2,3,3’,4,4’,6-Hexachlorobiphenyl (PCB138) were used as the independent variable, instead of the serum concentration of PFOA, PFOS, PFHxS, and PFNA, respectively. Discussion This research offers a comprehensive set of R codes aimed at creating a single, user-friendly variable that encapsulates the history of each type of cancer while also considering the age at which the diagnosis was made. The US NHANES provides a wealth of critical data on environmental toxicant exposures. By employing these R codes, researchers can potentially discover numerous new associations between environmental toxicant exposures and cancer diagnoses. Ultimately, these codes could significantly advance the field of cancer epidemiology in relation to environmental toxicant exposure. |
format | Article |
id | doaj-art-0000ea8042054e8b90b724bdfbbaeeb6 |
institution | Kabale University |
issn | 1471-2288 |
language | English |
publishDate | 2025-01-01 |
publisher | BMC |
record_format | Article |
series | BMC Medical Research Methodology |
spelling | doaj-art-0000ea8042054e8b90b724bdfbbaeeb62025-01-26T12:39:34ZengBMCBMC Medical Research Methodology1471-22882025-01-012511710.1186/s12874-025-02478-5Construction of the cancer patients’ database based on the US National Health and Nutrition Examination Survey (NHANES) datasets for cancer epidemiology researchJinyoung Moon0Yongseok Mun1Interdisciplinary Program in Bioinformatics, College of Natural Sciences, Seoul National UniversityDepartment of Ophthalmology, Kangnam Sacred Heart Hospital, Hallym University College of MedicineAbstract Background The US National Health and Nutrition Examination Survey (NHANES) dataset does not include a specific question or laboratory test to confirm a history of cancer diagnosis. However, if straightforward variables for cancer history are introduced, US NHANES could be effectively utilized in future cancer epidemiology studies. To address this gap, the authors developed a cancer patient database from the US NHANES datasets by employing multiple R programming codes. Methods To illustrate the practical application of this methodology to a real-world problem, the authors extracted the R codes applied in an academic paper published in another journal on January 30th, 2024 ( https://doi.org/10.1016/j.heliyon.2024.e24337 ). This paper will focus on the construction of the database and analysis using R codes. Entire. Results In the first example, the urine concentration of monocarboxynonyl phthalate, monocarboxyoctyl phthalate, mono-2-ethyl-5-carboxypentyl phthalate, and mono-2-hydroxy-iso-butyl phthalate (all ng/mL) were used as the independent variable, instead of the serum concentration of perfluorooctanoic acid (PFOA), perfluorooctane sulfonic acid (PFOS), perfluorohexane sulfonic acid (PFHxS), and perfluorononanoic acid (PFNA), respectively. In the second example, the serum concentration of 2,3,3’,4,4’-Pentachlorobiphenyl (PCB105), 2,3,4,4´,5-Pentachlorobiphenyl (PCB114), 2,3’,4,4’,5-Pentachlorobiphenyl (PCB118), and 2,2’,3,4,4’,5’- and 2,3,3’,4,4’,6-Hexachlorobiphenyl (PCB138) were used as the independent variable, instead of the serum concentration of PFOA, PFOS, PFHxS, and PFNA, respectively. Discussion This research offers a comprehensive set of R codes aimed at creating a single, user-friendly variable that encapsulates the history of each type of cancer while also considering the age at which the diagnosis was made. The US NHANES provides a wealth of critical data on environmental toxicant exposures. By employing these R codes, researchers can potentially discover numerous new associations between environmental toxicant exposures and cancer diagnoses. Ultimately, these codes could significantly advance the field of cancer epidemiology in relation to environmental toxicant exposure.https://doi.org/10.1186/s12874-025-02478-5National health and nutrition examination surveyCancer historyCancer incidenceCancer epidemiologyEnvironmental toxicant exposure |
spellingShingle | Jinyoung Moon Yongseok Mun Construction of the cancer patients’ database based on the US National Health and Nutrition Examination Survey (NHANES) datasets for cancer epidemiology research BMC Medical Research Methodology National health and nutrition examination survey Cancer history Cancer incidence Cancer epidemiology Environmental toxicant exposure |
title | Construction of the cancer patients’ database based on the US National Health and Nutrition Examination Survey (NHANES) datasets for cancer epidemiology research |
title_full | Construction of the cancer patients’ database based on the US National Health and Nutrition Examination Survey (NHANES) datasets for cancer epidemiology research |
title_fullStr | Construction of the cancer patients’ database based on the US National Health and Nutrition Examination Survey (NHANES) datasets for cancer epidemiology research |
title_full_unstemmed | Construction of the cancer patients’ database based on the US National Health and Nutrition Examination Survey (NHANES) datasets for cancer epidemiology research |
title_short | Construction of the cancer patients’ database based on the US National Health and Nutrition Examination Survey (NHANES) datasets for cancer epidemiology research |
title_sort | construction of the cancer patients database based on the us national health and nutrition examination survey nhanes datasets for cancer epidemiology research |
topic | National health and nutrition examination survey Cancer history Cancer incidence Cancer epidemiology Environmental toxicant exposure |
url | https://doi.org/10.1186/s12874-025-02478-5 |
work_keys_str_mv | AT jinyoungmoon constructionofthecancerpatientsdatabasebasedontheusnationalhealthandnutritionexaminationsurveynhanesdatasetsforcancerepidemiologyresearch AT yongseokmun constructionofthecancerpatientsdatabasebasedontheusnationalhealthandnutritionexaminationsurveynhanesdatasetsforcancerepidemiologyresearch |