Digital chefs and intelligent cooking systems based on multimodal large language model

A digital chef and an intelligent cooking method were proposed to achieve high-quality, precise cooking results. In the offline phase, visual, auditory and thermal sensors record professional chefs' continuous cooking operations. The collected frame-by-frame images and multi-round Q&A t...

Full description

Saved in:
Bibliographic Details
Main Authors: LI Xinyuan, LI Bai, SUN Yueshuo, ZHANG Tantan, TIAN Yonglin, YIN Zhuyan, WANG Fei-Yue
Format: Article
Language:zho
Published: POSTS&TELECOM PRESS Co., LTD 2024-12-01
Series:智能科学与技术学报
Subjects:
Online Access:http://www.cjist.com.cn/zh/article/doi/10.11959/j.issn.2096-6652.202448/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A digital chef and an intelligent cooking method were proposed to achieve high-quality, precise cooking results. In the offline phase, visual, auditory and thermal sensors record professional chefs' continuous cooking operations. The collected frame-by-frame images and multi-round Q&A texts form a culinary expert knowledge base. A low-rank adaptation method was applied to fine-tune a pretrained multimodal large language model, enabling it to understand cooking intentions. In the online phase, real-time sensory data were converted into image-text inputs for the fine-tuned model, which then generated cooking instructions to guide users through the cooking steps. A hardware-software cooking system was implemented and tested with a pan-frying steak task. Experimental results show that the fine-tuned system effectively controls the steak's doneness and quality, and significantly improves the accuracy and rationality of cooking instructions compared to the model before fine-tuning.
ISSN:2096-6652