Automation of historical weather data rescue

Abstract Data rescuers worldwide have been trying to retrieve millions of valuable weather historical records so the observations contained in those records are preserved, searchable, analysable and machine readable. The majority of the records are written by hand, in print or cursive handwriting. A...

Full description

Saved in:
Bibliographic Details
Main Authors: Y. Zhang, R. E. Sieber
Format: Article
Language:English
Published: Wiley 2025-01-01
Series:Geoscience Data Journal
Subjects:
Online Access:https://doi.org/10.1002/gdj3.261
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832584921671008256
author Y. Zhang
R. E. Sieber
author_facet Y. Zhang
R. E. Sieber
author_sort Y. Zhang
collection DOAJ
description Abstract Data rescuers worldwide have been trying to retrieve millions of valuable weather historical records so the observations contained in those records are preserved, searchable, analysable and machine readable. The majority of the records are written by hand, in print or cursive handwriting. Automatic transcriptions to date have not been reliable or sufficiently accurate on handwritten data so most of the historical records are transcribed manually. Recent attempts integrate artificial intelligence (AI) to automatically transcribe the historical records but the results have not been promising. Currently there is no end‐to‐end workflow to automatically transcribe historical handwritten tabular records into digital datasets. We propose a workflow that uses AI to automate the handwriting transcription process. The workflow is tested using the historical climate records from the Data Rescue: Archives and Weather (DRAW) project. This workflow is composed of five steps: (1) image pre‐processing, (2) text line segmentation, (3) bounding boxes detection, (4) AI‐enabled optical character recognition (OCR) and (5) layout re‐arrangement. These steps are modular to better accommodate future advances (e.g., new image training data, better layout detectors). We hope the workflow proposed can serve as a guideline that is easily replicable and can be utilized to transcribe other historical datasets.
format Article
id doaj-art-3c4c413ac0174b84b120e0ea8d7c0de1
institution Kabale University
issn 2049-6060
language English
publishDate 2025-01-01
publisher Wiley
record_format Article
series Geoscience Data Journal
spelling doaj-art-3c4c413ac0174b84b120e0ea8d7c0de12025-01-27T08:26:33ZengWileyGeoscience Data Journal2049-60602025-01-01121n/an/a10.1002/gdj3.261Automation of historical weather data rescueY. Zhang0R. E. Sieber1Department of Geography McGill University Montreal Quebec CanadaDepartment of Geography McGill University Montreal Quebec CanadaAbstract Data rescuers worldwide have been trying to retrieve millions of valuable weather historical records so the observations contained in those records are preserved, searchable, analysable and machine readable. The majority of the records are written by hand, in print or cursive handwriting. Automatic transcriptions to date have not been reliable or sufficiently accurate on handwritten data so most of the historical records are transcribed manually. Recent attempts integrate artificial intelligence (AI) to automatically transcribe the historical records but the results have not been promising. Currently there is no end‐to‐end workflow to automatically transcribe historical handwritten tabular records into digital datasets. We propose a workflow that uses AI to automate the handwriting transcription process. The workflow is tested using the historical climate records from the Data Rescue: Archives and Weather (DRAW) project. This workflow is composed of five steps: (1) image pre‐processing, (2) text line segmentation, (3) bounding boxes detection, (4) AI‐enabled optical character recognition (OCR) and (5) layout re‐arrangement. These steps are modular to better accommodate future advances (e.g., new image training data, better layout detectors). We hope the workflow proposed can serve as a guideline that is easily replicable and can be utilized to transcribe other historical datasets.https://doi.org/10.1002/gdj3.261artificial intelligenceautomationdata rescueOCR
spellingShingle Y. Zhang
R. E. Sieber
Automation of historical weather data rescue
Geoscience Data Journal
artificial intelligence
automation
data rescue
OCR
title Automation of historical weather data rescue
title_full Automation of historical weather data rescue
title_fullStr Automation of historical weather data rescue
title_full_unstemmed Automation of historical weather data rescue
title_short Automation of historical weather data rescue
title_sort automation of historical weather data rescue
topic artificial intelligence
automation
data rescue
OCR
url https://doi.org/10.1002/gdj3.261
work_keys_str_mv AT yzhang automationofhistoricalweatherdatarescue
AT resieber automationofhistoricalweatherdatarescue