Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations

Choosing nutritious foods is essential for daily health, but finding recipes that match available ingredients and dietary preferences can be challenging. Traditional recommendation methods often lack personalization and accurate ingredient recognition. Personalized systems address this by integratin...

Full description

Saved in:
Bibliographic Details
Main Authors: Yosua Setyawan Soekamto, Andreas Lim, Leonard Christopher Limanjaya, Yoshua Kaleb Purwanto, Suk-Ho Lee, Dae-Ki Kang
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/2/449
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832587478840639488
author Yosua Setyawan Soekamto
Andreas Lim
Leonard Christopher Limanjaya
Yoshua Kaleb Purwanto
Suk-Ho Lee
Dae-Ki Kang
author_facet Yosua Setyawan Soekamto
Andreas Lim
Leonard Christopher Limanjaya
Yoshua Kaleb Purwanto
Suk-Ho Lee
Dae-Ki Kang
author_sort Yosua Setyawan Soekamto
collection DOAJ
description Choosing nutritious foods is essential for daily health, but finding recipes that match available ingredients and dietary preferences can be challenging. Traditional recommendation methods often lack personalization and accurate ingredient recognition. Personalized systems address this by integrating user preferences, dietary needs, and ingredient availability. This study presents Pic2Plate, a framework combining Vision-Language Models (VLMs) and Retrieval-Augmented Generation (RAG) to overcome these challenges. Pic2Plate uses advanced image recognition to extract ingredient lists from user images and RAG to retrieve and personalize recipe recommendations. Leveraging smartphone camera sensors ensures accessibility and portability. Pic2Plate’s performance was evaluated in two areas: ingredient detection accuracy and recipe relevance. The ingredient detection module, powered by GPT-4o, achieved strong results with precision (0.83), recall (0.91), accuracy (0.77), and F1-score (0.86), demonstrating effectiveness in recognizing diverse food items. A survey of 120 participants assessed recipe relevance, with model rankings calculated using the Bradley–Terry method. Pic2Plate’s VLM and RAG integration consistently outperformed other models. These results highlight Pic2Plate’s ability to deliver context-aware, reliable, and diverse recipe suggestions. The study underscores its potential to transform recipe recommendation systems with a scalable, user-centric approach to personalized cooking.
format Article
id doaj-art-1743b4b128fc4798bb9d40d7327ad6db
institution Kabale University
issn 1424-8220
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj-art-1743b4b128fc4798bb9d40d7327ad6db2025-01-24T13:48:58ZengMDPI AGSensors1424-82202025-01-0125244910.3390/s25020449Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe RecommendationsYosua Setyawan Soekamto0Andreas Lim1Leonard Christopher Limanjaya2Yoshua Kaleb Purwanto3Suk-Ho Lee4Dae-Ki Kang5Department of Computer Engineering, Dongseo University, Busan 47011, Republic of KoreaDepartment of Computer Engineering, Dongseo University, Busan 47011, Republic of KoreaDepartment of Computer Engineering, Dongseo University, Busan 47011, Republic of KoreaDepartment of Computer Engineering, Dongseo University, Busan 47011, Republic of KoreaDepartment of Computer Engineering, Dongseo University, Busan 47011, Republic of KoreaDepartment of Computer Engineering, Dongseo University, Busan 47011, Republic of KoreaChoosing nutritious foods is essential for daily health, but finding recipes that match available ingredients and dietary preferences can be challenging. Traditional recommendation methods often lack personalization and accurate ingredient recognition. Personalized systems address this by integrating user preferences, dietary needs, and ingredient availability. This study presents Pic2Plate, a framework combining Vision-Language Models (VLMs) and Retrieval-Augmented Generation (RAG) to overcome these challenges. Pic2Plate uses advanced image recognition to extract ingredient lists from user images and RAG to retrieve and personalize recipe recommendations. Leveraging smartphone camera sensors ensures accessibility and portability. Pic2Plate’s performance was evaluated in two areas: ingredient detection accuracy and recipe relevance. The ingredient detection module, powered by GPT-4o, achieved strong results with precision (0.83), recall (0.91), accuracy (0.77), and F1-score (0.86), demonstrating effectiveness in recognizing diverse food items. A survey of 120 participants assessed recipe relevance, with model rankings calculated using the Bradley–Terry method. Pic2Plate’s VLM and RAG integration consistently outperformed other models. These results highlight Pic2Plate’s ability to deliver context-aware, reliable, and diverse recipe suggestions. The study underscores its potential to transform recipe recommendation systems with a scalable, user-centric approach to personalized cooking.https://www.mdpi.com/1424-8220/25/2/449retrieval-augmented generationpersonalized recipe recommendationlarge language modelsvision-language modelsingredient-based recipe retrieval
spellingShingle Yosua Setyawan Soekamto
Andreas Lim
Leonard Christopher Limanjaya
Yoshua Kaleb Purwanto
Suk-Ho Lee
Dae-Ki Kang
Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations
Sensors
retrieval-augmented generation
personalized recipe recommendation
large language models
vision-language models
ingredient-based recipe retrieval
title Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations
title_full Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations
title_fullStr Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations
title_full_unstemmed Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations
title_short Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations
title_sort pic2plate a vision language and retrieval augmented framework for personalized recipe recommendations
topic retrieval-augmented generation
personalized recipe recommendation
large language models
vision-language models
ingredient-based recipe retrieval
url https://www.mdpi.com/1424-8220/25/2/449
work_keys_str_mv AT yosuasetyawansoekamto pic2plateavisionlanguageandretrievalaugmentedframeworkforpersonalizedreciperecommendations
AT andreaslim pic2plateavisionlanguageandretrievalaugmentedframeworkforpersonalizedreciperecommendations
AT leonardchristopherlimanjaya pic2plateavisionlanguageandretrievalaugmentedframeworkforpersonalizedreciperecommendations
AT yoshuakalebpurwanto pic2plateavisionlanguageandretrievalaugmentedframeworkforpersonalizedreciperecommendations
AT sukholee pic2plateavisionlanguageandretrievalaugmentedframeworkforpersonalizedreciperecommendations
AT daekikang pic2plateavisionlanguageandretrievalaugmentedframeworkforpersonalizedreciperecommendations