Comparison of Deep-Learning-Based Segmentation Models: Using Top View Person Images

Image segmentation is considered as a key research topic in the area of computer vision. It is pivotal in a broad range of real-life applications. Recently, the emergence of deep learning drives significant advancement in image segmentation; the developed systems are now capable of recognizing, segm...

Full description

Saved in:

Bibliographic Details
Main Authors:	Imran Ahmed, Misbah Ahmad, Fakhri Alam Khan, Muhammad Asif
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Deep learning semantic segmentation top view person FCN U-Net DeepLab
Online Access:	https://ieeexplore.ieee.org/document/9146648/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832582407725776896
author	Imran Ahmed Misbah Ahmad Fakhri Alam Khan Muhammad Asif
author_facet	Imran Ahmed Misbah Ahmad Fakhri Alam Khan Muhammad Asif
author_sort	Imran Ahmed
collection	DOAJ
description	Image segmentation is considered as a key research topic in the area of computer vision. It is pivotal in a broad range of real-life applications. Recently, the emergence of deep learning drives significant advancement in image segmentation; the developed systems are now capable of recognizing, segmenting, and classifying objects of specific interest in images. Generally, most of these techniques primarily focused on the asymmetric field of view or frontal view objects. This work explores widely used deep learning-based models for person segmentation using top view data set. The first model employed in this work is Fully Convolutional Neural Network (FCN) with Resnet-101 architecture. The network consists of a set of max-pooling and convolution layers to identify pixel-wise class labels and prediction of the mask. The second model is based on FCN called U-Net with Encoder-Decoder architecture. The encoder is mainly comprised of a contracting path, also called an encoder, which captures the context in the image and symmetric expanding path called decoder to enable accurate location. The third model used for top view person segmentation is a DeepLabV3 model also with encoder-decoder architecture. The encoder consists of trained Convolutional Neural Network (CNN) to encode feature maps of the input image. The decoder is used for up-sampling and reconstruction of output using important information extracted by the encoder. All segmentation models are firstly tested using pre-trained models (trained on frontal view data set). To improve the performance, these models are further trained using person data set captured from a top view. The output of all models consists of a segmented person in the top view images. The experimental results reveal the effectiveness and performance of segmentation models by achieving <inline-formula> <tex-math notation="LaTeX">$IoU$ </tex-math></inline-formula> of 83%, 84%, and 86% and <inline-formula> <tex-math notation="LaTeX">$mIoU$ </tex-math></inline-formula> of 80% 82% and 84% for FCN, U-Net, and DeepLabv3 respectively. Furthermore, the discussion is provided for output results with possible future guidelines.
format	Article
id	doaj-art-d35636f9415545129d1dfbfd15dc8a3f
institution	Kabale University
issn	2169-3536
language	English
publishDate	2020-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-d35636f9415545129d1dfbfd15dc8a3f2025-01-30T00:00:52ZengIEEEIEEE Access2169-35362020-01-01813636113637310.1109/ACCESS.2020.30114069146648Comparison of Deep-Learning-Based Segmentation Models: Using Top View Person ImagesImran Ahmed0https://orcid.org/0000-0002-7751-286XMisbah Ahmad1https://orcid.org/0000-0001-7013-0159Fakhri Alam Khan2https://orcid.org/0000-0002-9130-1874Muhammad Asif3https://orcid.org/0000-0003-1839-2527Center of Excellence in IT, Institute of Management Sciences, Peshawar, PakistanCenter of Excellence in IT, Institute of Management Sciences, Peshawar, PakistanCenter of Excellence in IT, Institute of Management Sciences, Peshawar, PakistanDepartment of Computer Science, National Textile University, Faisalabad, PakistanImage segmentation is considered as a key research topic in the area of computer vision. It is pivotal in a broad range of real-life applications. Recently, the emergence of deep learning drives significant advancement in image segmentation; the developed systems are now capable of recognizing, segmenting, and classifying objects of specific interest in images. Generally, most of these techniques primarily focused on the asymmetric field of view or frontal view objects. This work explores widely used deep learning-based models for person segmentation using top view data set. The first model employed in this work is Fully Convolutional Neural Network (FCN) with Resnet-101 architecture. The network consists of a set of max-pooling and convolution layers to identify pixel-wise class labels and prediction of the mask. The second model is based on FCN called U-Net with Encoder-Decoder architecture. The encoder is mainly comprised of a contracting path, also called an encoder, which captures the context in the image and symmetric expanding path called decoder to enable accurate location. The third model used for top view person segmentation is a DeepLabV3 model also with encoder-decoder architecture. The encoder consists of trained Convolutional Neural Network (CNN) to encode feature maps of the input image. The decoder is used for up-sampling and reconstruction of output using important information extracted by the encoder. All segmentation models are firstly tested using pre-trained models (trained on frontal view data set). To improve the performance, these models are further trained using person data set captured from a top view. The output of all models consists of a segmented person in the top view images. The experimental results reveal the effectiveness and performance of segmentation models by achieving <inline-formula> <tex-math notation="LaTeX">$IoU$ </tex-math></inline-formula> of 83%, 84%, and 86% and <inline-formula> <tex-math notation="LaTeX">$mIoU$ </tex-math></inline-formula> of 80% 82% and 84% for FCN, U-Net, and DeepLabv3 respectively. Furthermore, the discussion is provided for output results with possible future guidelines.https://ieeexplore.ieee.org/document/9146648/Deep learningsemantic segmentationtop view personFCNU-NetDeepLab
spellingShingle	Imran Ahmed Misbah Ahmad Fakhri Alam Khan Muhammad Asif Comparison of Deep-Learning-Based Segmentation Models: Using Top View Person Images IEEE Access Deep learning semantic segmentation top view person FCN U-Net DeepLab
title	Comparison of Deep-Learning-Based Segmentation Models: Using Top View Person Images
title_full	Comparison of Deep-Learning-Based Segmentation Models: Using Top View Person Images
title_fullStr	Comparison of Deep-Learning-Based Segmentation Models: Using Top View Person Images
title_full_unstemmed	Comparison of Deep-Learning-Based Segmentation Models: Using Top View Person Images
title_short	Comparison of Deep-Learning-Based Segmentation Models: Using Top View Person Images
title_sort	comparison of deep learning based segmentation models using top view person images
topic	Deep learning semantic segmentation top view person FCN U-Net DeepLab
url	https://ieeexplore.ieee.org/document/9146648/
work_keys_str_mv	AT imranahmed comparisonofdeeplearningbasedsegmentationmodelsusingtopviewpersonimages AT misbahahmad comparisonofdeeplearningbasedsegmentationmodelsusingtopviewpersonimages AT fakhrialamkhan comparisonofdeeplearningbasedsegmentationmodelsusingtopviewpersonimages AT muhammadasif comparisonofdeeplearningbasedsegmentationmodelsusingtopviewpersonimages

Comparison of Deep-Learning-Based Segmentation Models: Using Top View Person Images

Similar Items