Towards automated recipe genre classification using semi-supervised learning.

Sharing cooking recipes is a great way to exchange culinary ideas and provide instructions for food preparation. However, categorizing raw recipes found online into appropriate food genres can be challenging due to a lack of adequate labeled data. In this study, we present a dataset named the "...

Full description

Saved in:

Bibliographic Details
Main Authors:	Nazmus Sakib, G M Shahariar, Md Mohsinul Kabir, Md Kamrul Hasan, Hasan Mahmud
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2025-01-01
Series:	PLoS ONE
Online Access:	https://doi.org/10.1371/journal.pone.0317697
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832540181239955456
author	Nazmus Sakib G M Shahariar Md Mohsinul Kabir Md Kamrul Hasan Hasan Mahmud
author_facet	Nazmus Sakib G M Shahariar Md Mohsinul Kabir Md Kamrul Hasan Hasan Mahmud
author_sort	Nazmus Sakib
collection	DOAJ
description	Sharing cooking recipes is a great way to exchange culinary ideas and provide instructions for food preparation. However, categorizing raw recipes found online into appropriate food genres can be challenging due to a lack of adequate labeled data. In this study, we present a dataset named the "Assorted, Archetypal, and Annotated Two Million Extended (3A2M+) Cooking Recipe Dataset" that contains two million culinary recipes labeled in respective categories with extended named entities extracted from recipe descriptions. This collection of data includes various features such as title, NER, directions, and extended NER, as well as nine different labels representing genres including bakery, drinks, non-veg, vegetables, fast food, cereals, meals, sides, and fusions. The proposed pipeline named 3A2M+ extends the size of the Named Entity Recognition (NER) list to address missing named entities like heat, time or process from the recipe directions using two NER extraction tools. 3A2M+ dataset provides a comprehensive solution to the various challenging recipe-related tasks, including classification, named entity recognition, and recipe generation. Furthermore, we have demonstrated traditional machine learning, deep learning and pre-trained language models to classify the recipes into their corresponding genre and achieved an overall accuracy of 98.6%. Our investigation indicates that the title feature played a more significant role in classifying the genre.
format	Article
id	doaj-art-594085d1f829429cba1239d98b21f2b7
institution	Kabale University
issn	1932-6203
language	English
publishDate	2025-01-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS ONE
spelling	doaj-art-594085d1f829429cba1239d98b21f2b72025-02-05T05:31:59ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01201e031769710.1371/journal.pone.0317697Towards automated recipe genre classification using semi-supervised learning.Nazmus SakibG M ShahariarMd Mohsinul KabirMd Kamrul HasanHasan MahmudSharing cooking recipes is a great way to exchange culinary ideas and provide instructions for food preparation. However, categorizing raw recipes found online into appropriate food genres can be challenging due to a lack of adequate labeled data. In this study, we present a dataset named the "Assorted, Archetypal, and Annotated Two Million Extended (3A2M+) Cooking Recipe Dataset" that contains two million culinary recipes labeled in respective categories with extended named entities extracted from recipe descriptions. This collection of data includes various features such as title, NER, directions, and extended NER, as well as nine different labels representing genres including bakery, drinks, non-veg, vegetables, fast food, cereals, meals, sides, and fusions. The proposed pipeline named 3A2M+ extends the size of the Named Entity Recognition (NER) list to address missing named entities like heat, time or process from the recipe directions using two NER extraction tools. 3A2M+ dataset provides a comprehensive solution to the various challenging recipe-related tasks, including classification, named entity recognition, and recipe generation. Furthermore, we have demonstrated traditional machine learning, deep learning and pre-trained language models to classify the recipes into their corresponding genre and achieved an overall accuracy of 98.6%. Our investigation indicates that the title feature played a more significant role in classifying the genre.https://doi.org/10.1371/journal.pone.0317697
spellingShingle	Nazmus Sakib G M Shahariar Md Mohsinul Kabir Md Kamrul Hasan Hasan Mahmud Towards automated recipe genre classification using semi-supervised learning. PLoS ONE
title	Towards automated recipe genre classification using semi-supervised learning.
title_full	Towards automated recipe genre classification using semi-supervised learning.
title_fullStr	Towards automated recipe genre classification using semi-supervised learning.
title_full_unstemmed	Towards automated recipe genre classification using semi-supervised learning.
title_short	Towards automated recipe genre classification using semi-supervised learning.
title_sort	towards automated recipe genre classification using semi supervised learning
url	https://doi.org/10.1371/journal.pone.0317697
work_keys_str_mv	AT nazmussakib towardsautomatedrecipegenreclassificationusingsemisupervisedlearning AT gmshahariar towardsautomatedrecipegenreclassificationusingsemisupervisedlearning AT mdmohsinulkabir towardsautomatedrecipegenreclassificationusingsemisupervisedlearning AT mdkamrulhasan towardsautomatedrecipegenreclassificationusingsemisupervisedlearning AT hasanmahmud towardsautomatedrecipegenreclassificationusingsemisupervisedlearning

Towards automated recipe genre classification using semi-supervised learning.

Similar Items