Automation of historical weather data rescue
Abstract Data rescuers worldwide have been trying to retrieve millions of valuable weather historical records so the observations contained in those records are preserved, searchable, analysable and machine readable. The majority of the records are written by hand, in print or cursive handwriting. A...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2025-01-01
|
Series: | Geoscience Data Journal |
Subjects: | |
Online Access: | https://doi.org/10.1002/gdj3.261 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832584921671008256 |
---|---|
author | Y. Zhang R. E. Sieber |
author_facet | Y. Zhang R. E. Sieber |
author_sort | Y. Zhang |
collection | DOAJ |
description | Abstract Data rescuers worldwide have been trying to retrieve millions of valuable weather historical records so the observations contained in those records are preserved, searchable, analysable and machine readable. The majority of the records are written by hand, in print or cursive handwriting. Automatic transcriptions to date have not been reliable or sufficiently accurate on handwritten data so most of the historical records are transcribed manually. Recent attempts integrate artificial intelligence (AI) to automatically transcribe the historical records but the results have not been promising. Currently there is no end‐to‐end workflow to automatically transcribe historical handwritten tabular records into digital datasets. We propose a workflow that uses AI to automate the handwriting transcription process. The workflow is tested using the historical climate records from the Data Rescue: Archives and Weather (DRAW) project. This workflow is composed of five steps: (1) image pre‐processing, (2) text line segmentation, (3) bounding boxes detection, (4) AI‐enabled optical character recognition (OCR) and (5) layout re‐arrangement. These steps are modular to better accommodate future advances (e.g., new image training data, better layout detectors). We hope the workflow proposed can serve as a guideline that is easily replicable and can be utilized to transcribe other historical datasets. |
format | Article |
id | doaj-art-3c4c413ac0174b84b120e0ea8d7c0de1 |
institution | Kabale University |
issn | 2049-6060 |
language | English |
publishDate | 2025-01-01 |
publisher | Wiley |
record_format | Article |
series | Geoscience Data Journal |
spelling | doaj-art-3c4c413ac0174b84b120e0ea8d7c0de12025-01-27T08:26:33ZengWileyGeoscience Data Journal2049-60602025-01-01121n/an/a10.1002/gdj3.261Automation of historical weather data rescueY. Zhang0R. E. Sieber1Department of Geography McGill University Montreal Quebec CanadaDepartment of Geography McGill University Montreal Quebec CanadaAbstract Data rescuers worldwide have been trying to retrieve millions of valuable weather historical records so the observations contained in those records are preserved, searchable, analysable and machine readable. The majority of the records are written by hand, in print or cursive handwriting. Automatic transcriptions to date have not been reliable or sufficiently accurate on handwritten data so most of the historical records are transcribed manually. Recent attempts integrate artificial intelligence (AI) to automatically transcribe the historical records but the results have not been promising. Currently there is no end‐to‐end workflow to automatically transcribe historical handwritten tabular records into digital datasets. We propose a workflow that uses AI to automate the handwriting transcription process. The workflow is tested using the historical climate records from the Data Rescue: Archives and Weather (DRAW) project. This workflow is composed of five steps: (1) image pre‐processing, (2) text line segmentation, (3) bounding boxes detection, (4) AI‐enabled optical character recognition (OCR) and (5) layout re‐arrangement. These steps are modular to better accommodate future advances (e.g., new image training data, better layout detectors). We hope the workflow proposed can serve as a guideline that is easily replicable and can be utilized to transcribe other historical datasets.https://doi.org/10.1002/gdj3.261artificial intelligenceautomationdata rescueOCR |
spellingShingle | Y. Zhang R. E. Sieber Automation of historical weather data rescue Geoscience Data Journal artificial intelligence automation data rescue OCR |
title | Automation of historical weather data rescue |
title_full | Automation of historical weather data rescue |
title_fullStr | Automation of historical weather data rescue |
title_full_unstemmed | Automation of historical weather data rescue |
title_short | Automation of historical weather data rescue |
title_sort | automation of historical weather data rescue |
topic | artificial intelligence automation data rescue OCR |
url | https://doi.org/10.1002/gdj3.261 |
work_keys_str_mv | AT yzhang automationofhistoricalweatherdatarescue AT resieber automationofhistoricalweatherdatarescue |