Inference-aware convolutional neural network pruning

Choudhary, Tejalal

Welcome to the Digital Repository Service@BU

The Digital Institutional Repository of BU

DRS@BU captures all intellectual assets of Bennett University. This platform enables the BU community to deposit (self-archive) their publications using a web interface, preserve and organize these publications for easy retrieval. It is expected that the repository will evolve as a major source of reference for all Bennett University publications accessible on the net.

nanoll extt

Please use this identifier to cite or link to this item: http://lrcdrs.bennett.edu.in:80/handle/123456789/882

Title:	Inference-aware convolutional neural network pruning
Authors:	Choudhary, Tejalal
Keywords:	Bayesian optimization Convolutional neural network Efficient inference Filter pruning Model compression and acceleration Resource-constrained devices
Issue Date:	2022
Publisher:	Elsevier B.V.
Series/Report no.:	Vol. 135;
Abstract:	Deep neural networks (DNNs) have become an important tool in solving various problems in numerous disciplines. However, DNNs are also known for their high resource requirement, weight redundancy, and large-scale parameters. As a result, the use of DNNs is restricted for devices that lack the necessary resources required to execute, especially resource-constrained devices such as mobile phones, wearable devices, and other edge devices. In recent years, pruning has emerged as an essential technique to reduce insignificant parameters and accelerate the model performance. However, finding the optimal number of parameters that can be pruned without significantly affecting the model performance is a time-consuming, tedious task and require a lot of manual tuning. This paper represent pruning as an optimization problem with the goal of improving DNN run-time inference performance by pruning low impacting parameters (filters) and their corresponding feature maps. To do this, we present a Bayesian optimization-based method for automatically determining the appropriate number of filters for each convolutional layer. Also, we proposed an objective function incorporating distinct model performance and resource-specific constraints. The proposed method is applied to two different kinds of convolutional network architectures (i.e., VGG16 and deeper network ResNet34) on CIFAR10, CIFAR100, and ImageNet datasets. The large-scale ImageNet experimental findings showed that the floating-point operations of the ResNet34 and VGG16 could be reduced by 35.46 percent and 84.97 percent, respectively, with negligible loss of accuracy. © 2022 Elsevier B.V.
URI:	https://doi.org/10.1016/j.future.2022.04.031 http://lrcdrs.bennett.edu.in:80/handle/123456789/882
ISSN:	0167-739X
Appears in Collections:	Journal Articles_SCSET

Files in This Item:

File	Description	Size	Format
1225 Inference-aware convolutional neural network pruning.pdf Restricted Access		7.28 MB	Adobe PDF	View/Open Request a copy

Show full item record

Contact admin for Full-Text