With big data comes big responsibility: Strategies for utilizing aggregated, standardized, de‐identified electronic health record data for research
Abstract Electronic health records (EHRs), though they are maintained and utilized for clinical and billing purposes, may provide a wealth of information for research. Currently, sources are available that offer insight into the health histories of well over a quarter of a billion people. Their use,...
Saved in:
Main Authors: | , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2025-01-01
|
Series: | Clinical and Translational Science |
Online Access: | https://doi.org/10.1111/cts.70093 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832589777427234816 |
---|---|
author | Veronica R. Olaker Sarah Fry Pauline Terebuh Pamela B. Davis Daniel J. Tisch Rong Xu Margaret G. Miller Ian Dorney Matvey B. Palchuk David C. Kaelber |
author_facet | Veronica R. Olaker Sarah Fry Pauline Terebuh Pamela B. Davis Daniel J. Tisch Rong Xu Margaret G. Miller Ian Dorney Matvey B. Palchuk David C. Kaelber |
author_sort | Veronica R. Olaker |
collection | DOAJ |
description | Abstract Electronic health records (EHRs), though they are maintained and utilized for clinical and billing purposes, may provide a wealth of information for research. Currently, sources are available that offer insight into the health histories of well over a quarter of a billion people. Their use, however, is fraught with hazards, including introduction or reinforcement of biases, clarity of disease definitions, protection of patient privacy, definitions of covariates or confounders, accuracy of medication usage compared with prescriptions, the need to introduce other data sources such as vaccination or death records and the ensuing potential for inaccuracy, duplicative records, and understanding and interpreting the outcomes of data queries. On the other hand, the possibility of study of rare disorders or the ability to link apparently disparate events are extremely valuable. Strategies for avoiding the worst pitfalls and hewing to conservative interpretations are essential. This article summarizes many of the approaches that have been used to avoid the most common pitfalls and extract the maximum information from aggregated, standardized, and de‐identified EHR data. This article describes 26 topics broken into three major areas: (1) 14 topics related to design issues for observational study using EHR data, (2) 7 topics related to analysis issues when analyzing EHR data, and (3) 5 topics related to reporting studies using EHR data. |
format | Article |
id | doaj-art-8b6e6bafad294e879ede6b6368800aa0 |
institution | Kabale University |
issn | 1752-8054 1752-8062 |
language | English |
publishDate | 2025-01-01 |
publisher | Wiley |
record_format | Article |
series | Clinical and Translational Science |
spelling | doaj-art-8b6e6bafad294e879ede6b6368800aa02025-01-24T08:17:46ZengWileyClinical and Translational Science1752-80541752-80622025-01-01181n/an/a10.1111/cts.70093With big data comes big responsibility: Strategies for utilizing aggregated, standardized, de‐identified electronic health record data for researchVeronica R. Olaker0Sarah Fry1Pauline Terebuh2Pamela B. Davis3Daniel J. Tisch4Rong Xu5Margaret G. Miller6Ian Dorney7Matvey B. Palchuk8David C. Kaelber9Center for Artificial Intelligence in Drug Discovery Case Western Reserve University School of Medicine Cleveland Ohio USACenter for Artificial Intelligence in Drug Discovery Case Western Reserve University School of Medicine Cleveland Ohio USACenter for Artificial Intelligence in Drug Discovery Case Western Reserve University School of Medicine Cleveland Ohio USACenter for Community Health Integration Case Western Reserve University School of Medicine Cleveland Ohio USADepartment of Population and Quantitative Health Sciences Case Western Reserve University School of Medicine Cleveland Ohio USACenter for Artificial Intelligence in Drug Discovery Case Western Reserve University School of Medicine Cleveland Ohio USACenter for Artificial Intelligence in Drug Discovery Case Western Reserve University School of Medicine Cleveland Ohio USAThe Center for Clinical Informatics Research and Education The MetroHealth System Cleveland Ohio USATriNetX, LLC Cambridge Massachusetts USAThe Center for Clinical Informatics Research and Education The MetroHealth System Cleveland Ohio USAAbstract Electronic health records (EHRs), though they are maintained and utilized for clinical and billing purposes, may provide a wealth of information for research. Currently, sources are available that offer insight into the health histories of well over a quarter of a billion people. Their use, however, is fraught with hazards, including introduction or reinforcement of biases, clarity of disease definitions, protection of patient privacy, definitions of covariates or confounders, accuracy of medication usage compared with prescriptions, the need to introduce other data sources such as vaccination or death records and the ensuing potential for inaccuracy, duplicative records, and understanding and interpreting the outcomes of data queries. On the other hand, the possibility of study of rare disorders or the ability to link apparently disparate events are extremely valuable. Strategies for avoiding the worst pitfalls and hewing to conservative interpretations are essential. This article summarizes many of the approaches that have been used to avoid the most common pitfalls and extract the maximum information from aggregated, standardized, and de‐identified EHR data. This article describes 26 topics broken into three major areas: (1) 14 topics related to design issues for observational study using EHR data, (2) 7 topics related to analysis issues when analyzing EHR data, and (3) 5 topics related to reporting studies using EHR data.https://doi.org/10.1111/cts.70093 |
spellingShingle | Veronica R. Olaker Sarah Fry Pauline Terebuh Pamela B. Davis Daniel J. Tisch Rong Xu Margaret G. Miller Ian Dorney Matvey B. Palchuk David C. Kaelber With big data comes big responsibility: Strategies for utilizing aggregated, standardized, de‐identified electronic health record data for research Clinical and Translational Science |
title | With big data comes big responsibility: Strategies for utilizing aggregated, standardized, de‐identified electronic health record data for research |
title_full | With big data comes big responsibility: Strategies for utilizing aggregated, standardized, de‐identified electronic health record data for research |
title_fullStr | With big data comes big responsibility: Strategies for utilizing aggregated, standardized, de‐identified electronic health record data for research |
title_full_unstemmed | With big data comes big responsibility: Strategies for utilizing aggregated, standardized, de‐identified electronic health record data for research |
title_short | With big data comes big responsibility: Strategies for utilizing aggregated, standardized, de‐identified electronic health record data for research |
title_sort | with big data comes big responsibility strategies for utilizing aggregated standardized de identified electronic health record data for research |
url | https://doi.org/10.1111/cts.70093 |
work_keys_str_mv | AT veronicarolaker withbigdatacomesbigresponsibilitystrategiesforutilizingaggregatedstandardizeddeidentifiedelectronichealthrecorddataforresearch AT sarahfry withbigdatacomesbigresponsibilitystrategiesforutilizingaggregatedstandardizeddeidentifiedelectronichealthrecorddataforresearch AT paulineterebuh withbigdatacomesbigresponsibilitystrategiesforutilizingaggregatedstandardizeddeidentifiedelectronichealthrecorddataforresearch AT pamelabdavis withbigdatacomesbigresponsibilitystrategiesforutilizingaggregatedstandardizeddeidentifiedelectronichealthrecorddataforresearch AT danieljtisch withbigdatacomesbigresponsibilitystrategiesforutilizingaggregatedstandardizeddeidentifiedelectronichealthrecorddataforresearch AT rongxu withbigdatacomesbigresponsibilitystrategiesforutilizingaggregatedstandardizeddeidentifiedelectronichealthrecorddataforresearch AT margaretgmiller withbigdatacomesbigresponsibilitystrategiesforutilizingaggregatedstandardizeddeidentifiedelectronichealthrecorddataforresearch AT iandorney withbigdatacomesbigresponsibilitystrategiesforutilizingaggregatedstandardizeddeidentifiedelectronichealthrecorddataforresearch AT matveybpalchuk withbigdatacomesbigresponsibilitystrategiesforutilizingaggregatedstandardizeddeidentifiedelectronichealthrecorddataforresearch AT davidckaelber withbigdatacomesbigresponsibilitystrategiesforutilizingaggregatedstandardizeddeidentifiedelectronichealthrecorddataforresearch |