Q82 : Analysis and comparison of different activation functions in LSTM networks
Thesis > Central Library of Shahrood University > Computer Engineering > MSc > 2016
Authors:
Amir Farzad [Author], Hoda Mashayekhi[Supervisor], Prof. Hamid Hassanpour[Advisor]
Abstarct: Long Short-Term Memory (LSTM) is a kind of Recurrent neural network (RNN) which designed for prevent of vanishing gradient. Each unit of LSTM have been shown with a block. Blocks in this network is kind of memory cell which connected recurrently. Each block contains several gates with activation functions. In Recurrent neural networks such as the LSTM, the sigmoid and hyperbolic tangent functions are commonly used as activation functions in the network units. Other activation functions developed for the neural networks are not thoroughly analyzed in LSTMs. While many researchers have adopted LSTM networks for classification tasks, no comprehensive study is available on the choice of activation functions for the gates in these networks. In this thesis, we gather and compare 23 different kinds of activation functions in a basic LSTM network with one hidden laxyer. Performance of different activation functions and different number of LSTM blocks in the hidden laxyer are analyzed for classification of records in the IMDB and the Movie Review data sets. Additionally we compare number of blocks in hidden laxyer. The quantitative results on both data sets demonstrate that the least average error is achieved with the modified Elliott and cloglogm activation functions. Specifically, this function exhibits better results than the sigmoid activation function which is prevalent in LSTM networks. Results have been shown bigger range of activation function have better results and selecting the optimum number of blocks in hidden laxyer have intercommunicated with complexity and length of data sets.
Keywords:
#neural network #LSTM #activation function #sigmoidal gate Link
Keeping place: Central Library of Shahrood University
Visitor: