Building a Framework for Visual Question Answering Systems

VQA (Visual Question Answering) systems are among the latest advancements in the fields of artificial intelligence and deep learning. They integrate image processing with natural language understanding to enable intelligent systems to answer questions related to image content. The significance of th...

Full description

Saved in:

Bibliographic Details
Main Authors:	Maya Abu Hamoud, Wasim Safi
Format:	Article
Language:	Arabic
Published:	Higher Commission for Scientific Research 2025-01-01
Series:	Syrian Journal for Science and Innovation
Subjects:	vqa vgg19 glove mscoco dataset
Online Access:	https://journal.hcsr.gov.sy/archives/1504
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832586181631541248
author	Maya Abu Hamoud Wasim Safi
author_facet	Maya Abu Hamoud Wasim Safi
author_sort	Maya Abu Hamoud
collection	DOAJ
description	VQA (Visual Question Answering) systems are among the latest advancements in the fields of artificial intelligence and deep learning. They integrate image processing with natural language understanding to enable intelligent systems to answer questions related to image content. The significance of these systems lies in their ability to interpret and analyze images in a manner similar to human comprehension, making them applicable to a wide range of critical fields. VQA systems represent a crucial step towards the development of advanced AI systems that bridge the gap between computer vision and human language understanding, fostering a deeper and more integrated interaction with the real world. This study aimed to thoroughly explore and analyze the methods and techniques used in visual question answering. The focus was on developing an advanced model capable of analyzing and understanding images while responding to related queries. In this paper, we developed a VQA system utilizing artificial intelligence and deep learning techniques. We employed the VGG19 model to extract image features, while questions and answers were encoded using GloVe and Label Encoding techniques. The model was trained using the MSCOCO dataset, which contains a variety of images and related questions. The model's performance was enhanced through multiple experiments to fine-tune the training parameters. The model achieved significant accuracy compared to previous research, with an F1 Score of 44.23% for training accuracy and 42.97% for validation accuracy. The results demonstrated a slight improvement over other models that also utilized VGG19 on the same dataset. Additionally, a web platform was developed to test the system, enabling users to evaluate answer accuracy and use the model on new images or those from the dataset.
format	Article
id	doaj-art-24ae0a522a474f2c80cec7f160ba3a66
institution	Kabale University
issn	2959-8591
language	Arabic
publishDate	2025-01-01
publisher	Higher Commission for Scientific Research
record_format	Article
series	Syrian Journal for Science and Innovation
spelling	doaj-art-24ae0a522a474f2c80cec7f160ba3a662025-01-26T08:27:37ZaraHigher Commission for Scientific ResearchSyrian Journal for Science and Innovation2959-85912025-01-013110.5281/zenodo.14723207Building a Framework for Visual Question Answering SystemsMaya Abu Hamoud0Wasim Safi1Syrian Virtual University_ Damascus.Higher Institute for Applied Sciences and Technology -Damascus-Syria.VQA (Visual Question Answering) systems are among the latest advancements in the fields of artificial intelligence and deep learning. They integrate image processing with natural language understanding to enable intelligent systems to answer questions related to image content. The significance of these systems lies in their ability to interpret and analyze images in a manner similar to human comprehension, making them applicable to a wide range of critical fields. VQA systems represent a crucial step towards the development of advanced AI systems that bridge the gap between computer vision and human language understanding, fostering a deeper and more integrated interaction with the real world. This study aimed to thoroughly explore and analyze the methods and techniques used in visual question answering. The focus was on developing an advanced model capable of analyzing and understanding images while responding to related queries. In this paper, we developed a VQA system utilizing artificial intelligence and deep learning techniques. We employed the VGG19 model to extract image features, while questions and answers were encoded using GloVe and Label Encoding techniques. The model was trained using the MSCOCO dataset, which contains a variety of images and related questions. The model's performance was enhanced through multiple experiments to fine-tune the training parameters. The model achieved significant accuracy compared to previous research, with an F1 Score of 44.23% for training accuracy and 42.97% for validation accuracy. The results demonstrated a slight improvement over other models that also utilized VGG19 on the same dataset. Additionally, a web platform was developed to test the system, enabling users to evaluate answer accuracy and use the model on new images or those from the dataset.https://journal.hcsr.gov.sy/archives/1504vqavgg19glovemscoco dataset
spellingShingle	Maya Abu Hamoud Wasim Safi Building a Framework for Visual Question Answering Systems Syrian Journal for Science and Innovation vqa vgg19 glove mscoco dataset
title	Building a Framework for Visual Question Answering Systems
title_full	Building a Framework for Visual Question Answering Systems
title_fullStr	Building a Framework for Visual Question Answering Systems
title_full_unstemmed	Building a Framework for Visual Question Answering Systems
title_short	Building a Framework for Visual Question Answering Systems
title_sort	building a framework for visual question answering systems
topic	vqa vgg19 glove mscoco dataset
url	https://journal.hcsr.gov.sy/archives/1504
work_keys_str_mv	AT mayaabuhamoud buildingaframeworkforvisualquestionansweringsystems AT wasimsafi buildingaframeworkforvisualquestionansweringsystems

Building a Framework for Visual Question Answering Systems

Similar Items