Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations

Choosing nutritious foods is essential for daily health, but finding recipes that match available ingredients and dietary preferences can be challenging. Traditional recommendation methods often lack personalization and accurate ingredient recognition. Personalized systems address this by integratin...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yosua Setyawan Soekamto, Andreas Lim, Leonard Christopher Limanjaya, Yoshua Kaleb Purwanto, Suk-Ho Lee, Dae-Ki Kang
Format:	Article
Language:	English
Published:	MDPI AG 2025-01-01
Series:	Sensors
Subjects:	retrieval-augmented generation personalized recipe recommendation large language models vision-language models ingredient-based recipe retrieval
Online Access:	https://www.mdpi.com/1424-8220/25/2/449
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832587478840639488
author	Yosua Setyawan Soekamto Andreas Lim Leonard Christopher Limanjaya Yoshua Kaleb Purwanto Suk-Ho Lee Dae-Ki Kang
author_facet	Yosua Setyawan Soekamto Andreas Lim Leonard Christopher Limanjaya Yoshua Kaleb Purwanto Suk-Ho Lee Dae-Ki Kang
author_sort	Yosua Setyawan Soekamto
collection	DOAJ
description	Choosing nutritious foods is essential for daily health, but finding recipes that match available ingredients and dietary preferences can be challenging. Traditional recommendation methods often lack personalization and accurate ingredient recognition. Personalized systems address this by integrating user preferences, dietary needs, and ingredient availability. This study presents Pic2Plate, a framework combining Vision-Language Models (VLMs) and Retrieval-Augmented Generation (RAG) to overcome these challenges. Pic2Plate uses advanced image recognition to extract ingredient lists from user images and RAG to retrieve and personalize recipe recommendations. Leveraging smartphone camera sensors ensures accessibility and portability. Pic2Plate’s performance was evaluated in two areas: ingredient detection accuracy and recipe relevance. The ingredient detection module, powered by GPT-4o, achieved strong results with precision (0.83), recall (0.91), accuracy (0.77), and F1-score (0.86), demonstrating effectiveness in recognizing diverse food items. A survey of 120 participants assessed recipe relevance, with model rankings calculated using the Bradley–Terry method. Pic2Plate’s VLM and RAG integration consistently outperformed other models. These results highlight Pic2Plate’s ability to deliver context-aware, reliable, and diverse recipe suggestions. The study underscores its potential to transform recipe recommendation systems with a scalable, user-centric approach to personalized cooking.
format	Article
id	doaj-art-1743b4b128fc4798bb9d40d7327ad6db
institution	Kabale University
issn	1424-8220
language	English
publishDate	2025-01-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj-art-1743b4b128fc4798bb9d40d7327ad6db2025-01-24T13:48:58ZengMDPI AGSensors1424-82202025-01-0125244910.3390/s25020449Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe RecommendationsYosua Setyawan Soekamto0Andreas Lim1Leonard Christopher Limanjaya2Yoshua Kaleb Purwanto3Suk-Ho Lee4Dae-Ki Kang5Department of Computer Engineering, Dongseo University, Busan 47011, Republic of KoreaDepartment of Computer Engineering, Dongseo University, Busan 47011, Republic of KoreaDepartment of Computer Engineering, Dongseo University, Busan 47011, Republic of KoreaDepartment of Computer Engineering, Dongseo University, Busan 47011, Republic of KoreaDepartment of Computer Engineering, Dongseo University, Busan 47011, Republic of KoreaDepartment of Computer Engineering, Dongseo University, Busan 47011, Republic of KoreaChoosing nutritious foods is essential for daily health, but finding recipes that match available ingredients and dietary preferences can be challenging. Traditional recommendation methods often lack personalization and accurate ingredient recognition. Personalized systems address this by integrating user preferences, dietary needs, and ingredient availability. This study presents Pic2Plate, a framework combining Vision-Language Models (VLMs) and Retrieval-Augmented Generation (RAG) to overcome these challenges. Pic2Plate uses advanced image recognition to extract ingredient lists from user images and RAG to retrieve and personalize recipe recommendations. Leveraging smartphone camera sensors ensures accessibility and portability. Pic2Plate’s performance was evaluated in two areas: ingredient detection accuracy and recipe relevance. The ingredient detection module, powered by GPT-4o, achieved strong results with precision (0.83), recall (0.91), accuracy (0.77), and F1-score (0.86), demonstrating effectiveness in recognizing diverse food items. A survey of 120 participants assessed recipe relevance, with model rankings calculated using the Bradley–Terry method. Pic2Plate’s VLM and RAG integration consistently outperformed other models. These results highlight Pic2Plate’s ability to deliver context-aware, reliable, and diverse recipe suggestions. The study underscores its potential to transform recipe recommendation systems with a scalable, user-centric approach to personalized cooking.https://www.mdpi.com/1424-8220/25/2/449retrieval-augmented generationpersonalized recipe recommendationlarge language modelsvision-language modelsingredient-based recipe retrieval
spellingShingle	Yosua Setyawan Soekamto Andreas Lim Leonard Christopher Limanjaya Yoshua Kaleb Purwanto Suk-Ho Lee Dae-Ki Kang Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations Sensors retrieval-augmented generation personalized recipe recommendation large language models vision-language models ingredient-based recipe retrieval
title	Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations
title_full	Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations
title_fullStr	Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations
title_full_unstemmed	Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations
title_short	Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations
title_sort	pic2plate a vision language and retrieval augmented framework for personalized recipe recommendations
topic	retrieval-augmented generation personalized recipe recommendation large language models vision-language models ingredient-based recipe retrieval
url	https://www.mdpi.com/1424-8220/25/2/449
work_keys_str_mv	AT yosuasetyawansoekamto pic2plateavisionlanguageandretrievalaugmentedframeworkforpersonalizedreciperecommendations AT andreaslim pic2plateavisionlanguageandretrievalaugmentedframeworkforpersonalizedreciperecommendations AT leonardchristopherlimanjaya pic2plateavisionlanguageandretrievalaugmentedframeworkforpersonalizedreciperecommendations AT yoshuakalebpurwanto pic2plateavisionlanguageandretrievalaugmentedframeworkforpersonalizedreciperecommendations AT sukholee pic2plateavisionlanguageandretrievalaugmentedframeworkforpersonalizedreciperecommendations AT daekikang pic2plateavisionlanguageandretrievalaugmentedframeworkforpersonalizedreciperecommendations

Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations

Similar Items