On the Choice of Training Data for Machine Learning of Geostrophic Mesoscale Turbulence

Abstract Data plays a central role in data‐driven methods, but is not often the subject of focus in investigations of machine learning algorithms as applied to Earth System Modeling related problems. Here we consider the problem of eddy‐mean interaction in rotating stratified turbulence in the prese...

Full description

Saved in:
Bibliographic Details
Main Authors: F. E. Yan, J. Mak, Y. Wang
Format: Article
Language:English
Published: American Geophysical Union (AGU) 2024-02-01
Series:Journal of Advances in Modeling Earth Systems
Subjects:
Online Access:https://doi.org/10.1029/2023MS003915
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850136351750488064
author F. E. Yan
J. Mak
Y. Wang
author_facet F. E. Yan
J. Mak
Y. Wang
author_sort F. E. Yan
collection DOAJ
description Abstract Data plays a central role in data‐driven methods, but is not often the subject of focus in investigations of machine learning algorithms as applied to Earth System Modeling related problems. Here we consider the problem of eddy‐mean interaction in rotating stratified turbulence in the presence of lateral boundaries, where it is known that rotational components of the eddy flux plays no direct role in the sub‐grid forcing onto the mean state variables, and its presence is expected to affect the performance of the trained machine learning models. While an often utilized choice in the literature is to train a model from the divergence of the eddy fluxes, here we provide theoretical arguments and numerical evidence that learning from the eddy fluxes with the rotational component appropriately filtered out, achieved in this work by means of an object called the eddy force function, results in models with comparable or better skill, but substantially reduced sensitivity to the presence of small‐scale features. We argue that while the choice of data choice and/or quality may not be critical if we simply want a model to have predictive skill, it is highly desirable and perhaps even necessary if we want to leverage data‐driven methods to aid in discovering unknown or hidden physical processes within the data itself.
format Article
id doaj-art-d91abef2e2d84d4ea7de2ec99d38a1ef
institution OA Journals
issn 1942-2466
language English
publishDate 2024-02-01
publisher American Geophysical Union (AGU)
record_format Article
series Journal of Advances in Modeling Earth Systems
spelling doaj-art-d91abef2e2d84d4ea7de2ec99d38a1ef2025-08-20T02:31:09ZengAmerican Geophysical Union (AGU)Journal of Advances in Modeling Earth Systems1942-24662024-02-01162n/an/a10.1029/2023MS003915On the Choice of Training Data for Machine Learning of Geostrophic Mesoscale TurbulenceF. E. Yan0J. Mak1Y. Wang2Department of Ocean Science Hong Kong University of Science and Technology Hong Kong Hong KongDepartment of Ocean Science Hong Kong University of Science and Technology Hong Kong Hong KongDepartment of Ocean Science Hong Kong University of Science and Technology Hong Kong Hong KongAbstract Data plays a central role in data‐driven methods, but is not often the subject of focus in investigations of machine learning algorithms as applied to Earth System Modeling related problems. Here we consider the problem of eddy‐mean interaction in rotating stratified turbulence in the presence of lateral boundaries, where it is known that rotational components of the eddy flux plays no direct role in the sub‐grid forcing onto the mean state variables, and its presence is expected to affect the performance of the trained machine learning models. While an often utilized choice in the literature is to train a model from the divergence of the eddy fluxes, here we provide theoretical arguments and numerical evidence that learning from the eddy fluxes with the rotational component appropriately filtered out, achieved in this work by means of an object called the eddy force function, results in models with comparable or better skill, but substantially reduced sensitivity to the presence of small‐scale features. We argue that while the choice of data choice and/or quality may not be critical if we simply want a model to have predictive skill, it is highly desirable and perhaps even necessary if we want to leverage data‐driven methods to aid in discovering unknown or hidden physical processes within the data itself.https://doi.org/10.1029/2023MS003915machine learninggeostrophic turbulenceeddy‐mean interactioneddy parameterizationdata‐driven methods
spellingShingle F. E. Yan
J. Mak
Y. Wang
On the Choice of Training Data for Machine Learning of Geostrophic Mesoscale Turbulence
Journal of Advances in Modeling Earth Systems
machine learning
geostrophic turbulence
eddy‐mean interaction
eddy parameterization
data‐driven methods
title On the Choice of Training Data for Machine Learning of Geostrophic Mesoscale Turbulence
title_full On the Choice of Training Data for Machine Learning of Geostrophic Mesoscale Turbulence
title_fullStr On the Choice of Training Data for Machine Learning of Geostrophic Mesoscale Turbulence
title_full_unstemmed On the Choice of Training Data for Machine Learning of Geostrophic Mesoscale Turbulence
title_short On the Choice of Training Data for Machine Learning of Geostrophic Mesoscale Turbulence
title_sort on the choice of training data for machine learning of geostrophic mesoscale turbulence
topic machine learning
geostrophic turbulence
eddy‐mean interaction
eddy parameterization
data‐driven methods
url https://doi.org/10.1029/2023MS003915
work_keys_str_mv AT feyan onthechoiceoftrainingdataformachinelearningofgeostrophicmesoscaleturbulence
AT jmak onthechoiceoftrainingdataformachinelearningofgeostrophicmesoscaleturbulence
AT ywang onthechoiceoftrainingdataformachinelearningofgeostrophicmesoscaleturbulence