With big data comes big responsibility: Strategies for utilizing aggregated, standardized, de‐identified electronic health record data for research

Abstract Electronic health records (EHRs), though they are maintained and utilized for clinical and billing purposes, may provide a wealth of information for research. Currently, sources are available that offer insight into the health histories of well over a quarter of a billion people. Their use,...

Full description

Saved in:
Bibliographic Details
Main Authors: Veronica R. Olaker, Sarah Fry, Pauline Terebuh, Pamela B. Davis, Daniel J. Tisch, Rong Xu, Margaret G. Miller, Ian Dorney, Matvey B. Palchuk, David C. Kaelber
Format: Article
Language:English
Published: Wiley 2025-01-01
Series:Clinical and Translational Science
Online Access:https://doi.org/10.1111/cts.70093
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832589777427234816
author Veronica R. Olaker
Sarah Fry
Pauline Terebuh
Pamela B. Davis
Daniel J. Tisch
Rong Xu
Margaret G. Miller
Ian Dorney
Matvey B. Palchuk
David C. Kaelber
author_facet Veronica R. Olaker
Sarah Fry
Pauline Terebuh
Pamela B. Davis
Daniel J. Tisch
Rong Xu
Margaret G. Miller
Ian Dorney
Matvey B. Palchuk
David C. Kaelber
author_sort Veronica R. Olaker
collection DOAJ
description Abstract Electronic health records (EHRs), though they are maintained and utilized for clinical and billing purposes, may provide a wealth of information for research. Currently, sources are available that offer insight into the health histories of well over a quarter of a billion people. Their use, however, is fraught with hazards, including introduction or reinforcement of biases, clarity of disease definitions, protection of patient privacy, definitions of covariates or confounders, accuracy of medication usage compared with prescriptions, the need to introduce other data sources such as vaccination or death records and the ensuing potential for inaccuracy, duplicative records, and understanding and interpreting the outcomes of data queries. On the other hand, the possibility of study of rare disorders or the ability to link apparently disparate events are extremely valuable. Strategies for avoiding the worst pitfalls and hewing to conservative interpretations are essential. This article summarizes many of the approaches that have been used to avoid the most common pitfalls and extract the maximum information from aggregated, standardized, and de‐identified EHR data. This article describes 26 topics broken into three major areas: (1) 14 topics related to design issues for observational study using EHR data, (2) 7 topics related to analysis issues when analyzing EHR data, and (3) 5 topics related to reporting studies using EHR data.
format Article
id doaj-art-8b6e6bafad294e879ede6b6368800aa0
institution Kabale University
issn 1752-8054
1752-8062
language English
publishDate 2025-01-01
publisher Wiley
record_format Article
series Clinical and Translational Science
spelling doaj-art-8b6e6bafad294e879ede6b6368800aa02025-01-24T08:17:46ZengWileyClinical and Translational Science1752-80541752-80622025-01-01181n/an/a10.1111/cts.70093With big data comes big responsibility: Strategies for utilizing aggregated, standardized, de‐identified electronic health record data for researchVeronica R. Olaker0Sarah Fry1Pauline Terebuh2Pamela B. Davis3Daniel J. Tisch4Rong Xu5Margaret G. Miller6Ian Dorney7Matvey B. Palchuk8David C. Kaelber9Center for Artificial Intelligence in Drug Discovery Case Western Reserve University School of Medicine Cleveland Ohio USACenter for Artificial Intelligence in Drug Discovery Case Western Reserve University School of Medicine Cleveland Ohio USACenter for Artificial Intelligence in Drug Discovery Case Western Reserve University School of Medicine Cleveland Ohio USACenter for Community Health Integration Case Western Reserve University School of Medicine Cleveland Ohio USADepartment of Population and Quantitative Health Sciences Case Western Reserve University School of Medicine Cleveland Ohio USACenter for Artificial Intelligence in Drug Discovery Case Western Reserve University School of Medicine Cleveland Ohio USACenter for Artificial Intelligence in Drug Discovery Case Western Reserve University School of Medicine Cleveland Ohio USAThe Center for Clinical Informatics Research and Education The MetroHealth System Cleveland Ohio USATriNetX, LLC Cambridge Massachusetts USAThe Center for Clinical Informatics Research and Education The MetroHealth System Cleveland Ohio USAAbstract Electronic health records (EHRs), though they are maintained and utilized for clinical and billing purposes, may provide a wealth of information for research. Currently, sources are available that offer insight into the health histories of well over a quarter of a billion people. Their use, however, is fraught with hazards, including introduction or reinforcement of biases, clarity of disease definitions, protection of patient privacy, definitions of covariates or confounders, accuracy of medication usage compared with prescriptions, the need to introduce other data sources such as vaccination or death records and the ensuing potential for inaccuracy, duplicative records, and understanding and interpreting the outcomes of data queries. On the other hand, the possibility of study of rare disorders or the ability to link apparently disparate events are extremely valuable. Strategies for avoiding the worst pitfalls and hewing to conservative interpretations are essential. This article summarizes many of the approaches that have been used to avoid the most common pitfalls and extract the maximum information from aggregated, standardized, and de‐identified EHR data. This article describes 26 topics broken into three major areas: (1) 14 topics related to design issues for observational study using EHR data, (2) 7 topics related to analysis issues when analyzing EHR data, and (3) 5 topics related to reporting studies using EHR data.https://doi.org/10.1111/cts.70093
spellingShingle Veronica R. Olaker
Sarah Fry
Pauline Terebuh
Pamela B. Davis
Daniel J. Tisch
Rong Xu
Margaret G. Miller
Ian Dorney
Matvey B. Palchuk
David C. Kaelber
With big data comes big responsibility: Strategies for utilizing aggregated, standardized, de‐identified electronic health record data for research
Clinical and Translational Science
title With big data comes big responsibility: Strategies for utilizing aggregated, standardized, de‐identified electronic health record data for research
title_full With big data comes big responsibility: Strategies for utilizing aggregated, standardized, de‐identified electronic health record data for research
title_fullStr With big data comes big responsibility: Strategies for utilizing aggregated, standardized, de‐identified electronic health record data for research
title_full_unstemmed With big data comes big responsibility: Strategies for utilizing aggregated, standardized, de‐identified electronic health record data for research
title_short With big data comes big responsibility: Strategies for utilizing aggregated, standardized, de‐identified electronic health record data for research
title_sort with big data comes big responsibility strategies for utilizing aggregated standardized de identified electronic health record data for research
url https://doi.org/10.1111/cts.70093
work_keys_str_mv AT veronicarolaker withbigdatacomesbigresponsibilitystrategiesforutilizingaggregatedstandardizeddeidentifiedelectronichealthrecorddataforresearch
AT sarahfry withbigdatacomesbigresponsibilitystrategiesforutilizingaggregatedstandardizeddeidentifiedelectronichealthrecorddataforresearch
AT paulineterebuh withbigdatacomesbigresponsibilitystrategiesforutilizingaggregatedstandardizeddeidentifiedelectronichealthrecorddataforresearch
AT pamelabdavis withbigdatacomesbigresponsibilitystrategiesforutilizingaggregatedstandardizeddeidentifiedelectronichealthrecorddataforresearch
AT danieljtisch withbigdatacomesbigresponsibilitystrategiesforutilizingaggregatedstandardizeddeidentifiedelectronichealthrecorddataforresearch
AT rongxu withbigdatacomesbigresponsibilitystrategiesforutilizingaggregatedstandardizeddeidentifiedelectronichealthrecorddataforresearch
AT margaretgmiller withbigdatacomesbigresponsibilitystrategiesforutilizingaggregatedstandardizeddeidentifiedelectronichealthrecorddataforresearch
AT iandorney withbigdatacomesbigresponsibilitystrategiesforutilizingaggregatedstandardizeddeidentifiedelectronichealthrecorddataforresearch
AT matveybpalchuk withbigdatacomesbigresponsibilitystrategiesforutilizingaggregatedstandardizeddeidentifiedelectronichealthrecorddataforresearch
AT davidckaelber withbigdatacomesbigresponsibilitystrategiesforutilizingaggregatedstandardizeddeidentifiedelectronichealthrecorddataforresearch