Abstract image feature by feature. It learns smaller features

Abstract ­­- Activation
functions are important components of Convolutional Neural Networks (CNN) that
introduces non-linearity in the model to compute complex functions. There are
different types of activation functions used with CNNs in different
applications, however it turns out that an effective activation function yields
better results and improves performance of the model. In this study four of the
widely used activation functions are chosen to analyze and evaluate to figure
out their efficiency in terms of the model’s accuracy. Sigmoid, hyperbolic
tangent, rectified linear unit (ReLU) and exponential linear unit (ELU) activation
functions have been used with most of the successful models. A CNN model has
been implemented on the MNIST dataset to perform the analysis task. The
experiments have been performed on Nvidia GPU 940MX to accelerate the training
and testing of the CNN model. It has been observed that ReLU the most popular
activation function performs better than sigmoid and tanh and a recent
activation function ELU performs better than ReLU.

Keywords
– CNN; Activation Function; Sigmoid; Tanh; ReLU;  ELU;  GPU.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

                                                                                                                                                              
I.           
INTRODUCTION

Convolutional
neural network is a type of deep neural network (DNN) commonly referred to as
CNN or ConvNet. CNNs are inspired by biological visual cortex. CNN takes an
image as input pass it through a series of convolutional, nonlinear, pooling
and dense layers and get class score as ouput that best describes the
image.  CNN processes an image feature by
feature. It learns smaller features in the first layer followed by combination
of small features into larger features and so on 9. CNN has revolutionized
the processing of computer vision applications. In 2012, CNN was applied for
Image Classification task in ImageNet Large Scale Visual Recognition Challenge
(ILSVR) and has achieved an error rate of 15.4% which is a significant
improvement over the next best that achieved an error rate of 26.2% 8.
Although CNNs were implemented first by Yann Lecun in 1998 6, it was in 2012
when CNNs got momentum from the computer vision community. Activation function
is an important element of neural network. It essentially introduces non-linearity
into the model. Neural networks are considered as universal function
approximators and can compute and learn any complicated and complex functions.
Hence to make the network more powerful to represent any arbitrary function,
activation function is applied. The choice of activation function determines
the performance of the network. The present work has considered three
extensively implemented activation functions that are sigmoid, tanh, and ReLU
and a relatively recent function ELU to study their performance on Graphical
Processing Unit (GPU). A similar evaluation is performed in 5 where the
authors have investigated the performance of ReLU activations and its variants
leaky ReLU, parametric ReLU and randomized leaky ReLU. Their results suggest
that the variants of ReLU perform better than the original ReLU. The other
related analysis work has been done in 7 wherein various activation functions
like sigmoid, tanh, radial basis function has been compared and analyzed on
generalized MLP architectures of neural networks. The results in 7 suggest
that tanh function yields better performance than other activation functions.

                                                                                                                                             
II.           
ACTIVATION FUNCTIONS

A neuron essentially consists of two parts, first
one performs the dot product while the second being the activation function
that adds non-linearity to the unit. The first neural model McCulloch-Pitts
(1943) used step function as its activation 12. It was based on the simple
threshold technique. Several other activations were developed later including
sigmoid, tanh, relu and its variants like leaky relu, parametric relu,
randomized leaky relu and elu. Four of them are studied in this paper.