Introduction to Recurrent Neural Networks
In today’s era, data has become the next big currency. The increase in data overtime has made the urge to process the data and be able to create patterns as well as provide further information. In order to handle audio or speech data we need to know more about how to handle sequential or time series data. This blog gives an introduction about recurrent neural networks which aids in processing the data. We explain about the need of Recurrent Neural Networks in deep learning and issues related to RNN.
What is ANN ?
Artificial Neural Networks have helped us in achieving various tasks like pattern generation, recognition or classification. They have a fixed input and output.
Limitation of Artificial Neural Network :-
Feed Forward Networks do not have any cycles and information moves only in the forward direction and hence is called Feed Forward Neural Network.
Unable to deal with sequential data such as time series data and tasks including natural language processing and sentiment classification.
In order to process time series data or sequential data ANN fails as it is unable to input data of the previous states as there are no feedback networks.
Hence RNN comes to the picture.
What is RNN ?
RNN or Recurrent Neural Network is a deep learning network in which the network also takes the previous outputs as input for the next time step thus forming a cycle in the network.
Fig 1. Recurrent Neural Network
In this network the flow of information goes forwards as well as backwards hence it is known as recurrent network. Fig 1 represents a basic recurrent network.
Fig 2. Unfolded Form of Recurrent Neural Network
The above figure (Fig 2) represents the unfolded form of the recurrent neural network with three time steps. In RNN, the input (x) of the network is provided simultaneously and it provides an output at each time step called (y). H0 is the initial hidden state which is a vector with values 0. Further after the first computation the input x1 and initial hidden state h0 is passed through the activation function and the output (y1) is found out. The output (h1) is then provided as the input in the next time step along with the next input (x2).
This repetitive process helps in storing the previous data in the memory which was earlier not possible in ANNs. This processing of information in timesteps lead to problems like Vanishing Gradient/Exploding Gradient and Overfitting.
Equations required for training of the recurrent neural network :-
The current hidden state is a function of the previous hidden state and the current input vector. This gives us the value of current hidden state.
ht represents the value of the current hidden layer in the network. It is calculated by computing the previous hidden state with the weights of previous hidden state and the weights of current input state with the input word vector. This gives ht which further aids in finding the output yt.
In order to get the output yt , we calculate the predicted word vector at a given time step t using the above softmax function which is a function of Weight of output vector (Ws) and the current hidden layer (ht) in the network.
Thus the output of each time step is used to train the model using gradient descent.
This leads to error at each time step. If the predicted output is yt bar and actual output is yt, the error function for the time step is as follows.
Advantages over ANN :-
In RNN, it is possible to process input of any length.
RNN Networks are able to store information using feedback connections in the form of activations. Computation takes also the historical information.
They are useful in handling time-series data or is able to learn sequential data. In feed forward networks the network has one to one input to output is possible whereas in case of RNN, one to many, many to one and many to many input to outputs. The example for one to many is image captioning and many to one is examining a group of images and producing a sentence.
Slow in Computation. It has difficulty in accessing information from a very long time.
In RNNs, future input cannot be considered as input for current stage.
Activation Functions used in RNN ?
Sigmoid – It is used when the probability is predicted as the output. Though it might get stuck during training the SoftMax function is used in multiclass classification.
Tan h – It is a sigmoidal function ranging from -1 to 1. It is used mostly to classify between two classes.
RELU – Rectified Linear Unit is an activation function which gives the output directly if it is positive or else it is 0.
While computation of the output, when we modify the weight in order to minimise the error, the error at each time step gets multiplied and finally results in a very small negative value or a negligible value. This problem leads to the vanishing gradient problem.
Problems related to RNN ?
Vanishing Gradient, Gradient Clipping
Vanishing Gradient –
Capturing long term dependencies because of multiplicative gradient that can exponentially increase/decrease with the number of time steps.
Gradient Clipping –
During back-propagation when exploding gradient is encountered the maximum value for the gradient is capped.
What we learnt ?
This blog contains the introduction about Recurrent Neural Networks and their inner workings. The calculations for training of data and the concept behind RNNs. This is an introduction to Recurrent Neural Networks in Deep Learning. Any suggestions are highly appreciated.
Stanford Deep Learning - https://stanford.edu/~shervine/teaching/cs-230/
Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network Alex Sherstinsky - arXiv:1808.03314
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (November 15, 1997), 1735–1780. DOI:https://doi.org/10.1162/neco.19126.96.36.1995