Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations
Choosing nutritious foods is essential for daily health, but finding recipes that match available ingredients and dietary preferences can be challenging. Traditional recommendation methods often lack personalization and accurate ingredient recognition. Personalized systems address this by integratin...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-01-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/25/2/449 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832587478840639488 |
---|---|
author | Yosua Setyawan Soekamto Andreas Lim Leonard Christopher Limanjaya Yoshua Kaleb Purwanto Suk-Ho Lee Dae-Ki Kang |
author_facet | Yosua Setyawan Soekamto Andreas Lim Leonard Christopher Limanjaya Yoshua Kaleb Purwanto Suk-Ho Lee Dae-Ki Kang |
author_sort | Yosua Setyawan Soekamto |
collection | DOAJ |
description | Choosing nutritious foods is essential for daily health, but finding recipes that match available ingredients and dietary preferences can be challenging. Traditional recommendation methods often lack personalization and accurate ingredient recognition. Personalized systems address this by integrating user preferences, dietary needs, and ingredient availability. This study presents Pic2Plate, a framework combining Vision-Language Models (VLMs) and Retrieval-Augmented Generation (RAG) to overcome these challenges. Pic2Plate uses advanced image recognition to extract ingredient lists from user images and RAG to retrieve and personalize recipe recommendations. Leveraging smartphone camera sensors ensures accessibility and portability. Pic2Plate’s performance was evaluated in two areas: ingredient detection accuracy and recipe relevance. The ingredient detection module, powered by GPT-4o, achieved strong results with precision (0.83), recall (0.91), accuracy (0.77), and F1-score (0.86), demonstrating effectiveness in recognizing diverse food items. A survey of 120 participants assessed recipe relevance, with model rankings calculated using the Bradley–Terry method. Pic2Plate’s VLM and RAG integration consistently outperformed other models. These results highlight Pic2Plate’s ability to deliver context-aware, reliable, and diverse recipe suggestions. The study underscores its potential to transform recipe recommendation systems with a scalable, user-centric approach to personalized cooking. |
format | Article |
id | doaj-art-1743b4b128fc4798bb9d40d7327ad6db |
institution | Kabale University |
issn | 1424-8220 |
language | English |
publishDate | 2025-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Sensors |
spelling | doaj-art-1743b4b128fc4798bb9d40d7327ad6db2025-01-24T13:48:58ZengMDPI AGSensors1424-82202025-01-0125244910.3390/s25020449Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe RecommendationsYosua Setyawan Soekamto0Andreas Lim1Leonard Christopher Limanjaya2Yoshua Kaleb Purwanto3Suk-Ho Lee4Dae-Ki Kang5Department of Computer Engineering, Dongseo University, Busan 47011, Republic of KoreaDepartment of Computer Engineering, Dongseo University, Busan 47011, Republic of KoreaDepartment of Computer Engineering, Dongseo University, Busan 47011, Republic of KoreaDepartment of Computer Engineering, Dongseo University, Busan 47011, Republic of KoreaDepartment of Computer Engineering, Dongseo University, Busan 47011, Republic of KoreaDepartment of Computer Engineering, Dongseo University, Busan 47011, Republic of KoreaChoosing nutritious foods is essential for daily health, but finding recipes that match available ingredients and dietary preferences can be challenging. Traditional recommendation methods often lack personalization and accurate ingredient recognition. Personalized systems address this by integrating user preferences, dietary needs, and ingredient availability. This study presents Pic2Plate, a framework combining Vision-Language Models (VLMs) and Retrieval-Augmented Generation (RAG) to overcome these challenges. Pic2Plate uses advanced image recognition to extract ingredient lists from user images and RAG to retrieve and personalize recipe recommendations. Leveraging smartphone camera sensors ensures accessibility and portability. Pic2Plate’s performance was evaluated in two areas: ingredient detection accuracy and recipe relevance. The ingredient detection module, powered by GPT-4o, achieved strong results with precision (0.83), recall (0.91), accuracy (0.77), and F1-score (0.86), demonstrating effectiveness in recognizing diverse food items. A survey of 120 participants assessed recipe relevance, with model rankings calculated using the Bradley–Terry method. Pic2Plate’s VLM and RAG integration consistently outperformed other models. These results highlight Pic2Plate’s ability to deliver context-aware, reliable, and diverse recipe suggestions. The study underscores its potential to transform recipe recommendation systems with a scalable, user-centric approach to personalized cooking.https://www.mdpi.com/1424-8220/25/2/449retrieval-augmented generationpersonalized recipe recommendationlarge language modelsvision-language modelsingredient-based recipe retrieval |
spellingShingle | Yosua Setyawan Soekamto Andreas Lim Leonard Christopher Limanjaya Yoshua Kaleb Purwanto Suk-Ho Lee Dae-Ki Kang Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations Sensors retrieval-augmented generation personalized recipe recommendation large language models vision-language models ingredient-based recipe retrieval |
title | Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations |
title_full | Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations |
title_fullStr | Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations |
title_full_unstemmed | Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations |
title_short | Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations |
title_sort | pic2plate a vision language and retrieval augmented framework for personalized recipe recommendations |
topic | retrieval-augmented generation personalized recipe recommendation large language models vision-language models ingredient-based recipe retrieval |
url | https://www.mdpi.com/1424-8220/25/2/449 |
work_keys_str_mv | AT yosuasetyawansoekamto pic2plateavisionlanguageandretrievalaugmentedframeworkforpersonalizedreciperecommendations AT andreaslim pic2plateavisionlanguageandretrievalaugmentedframeworkforpersonalizedreciperecommendations AT leonardchristopherlimanjaya pic2plateavisionlanguageandretrievalaugmentedframeworkforpersonalizedreciperecommendations AT yoshuakalebpurwanto pic2plateavisionlanguageandretrievalaugmentedframeworkforpersonalizedreciperecommendations AT sukholee pic2plateavisionlanguageandretrievalaugmentedframeworkforpersonalizedreciperecommendations AT daekikang pic2plateavisionlanguageandretrievalaugmentedframeworkforpersonalizedreciperecommendations |