Multi layer Perceptron is kind of Hello world of Deep Learning. In this article we will build a 2 hidden layer Perceptron model with 3 hidden units in first layer and 2 units in second layer with three inputs in input layer and one output unit. This will be a binary classifier . Demo Python implementation is provided using sklearn.
Before getting forward in this article make sure you understand perceptron neuron model well if you don't then click here.
So let's build a neural net ,Before going into mathematical details first let's visualize our neural network
So, our first weight matrix will be of size 3x3 and each row will be weights for one neuron , same for 2nd hidden layer the weight matrix will be of size 3x2.
The weight matrix at first hidden layer will look something like this,
The activation function used at both hidden layer for each neuron is Relu and for the final output layer it is sigmoid. The main reason for not using sigmoid at each layer is Vanishing gradient problem and it occurs because of the small magnitude output of sigmoid in case of positive output . Vanishing gradient problem is one of the biggest drawback in case of using sigmoid at each layer which makes it very difficult to train the neural network.
Training a neural network consist of three steps:
- Forward Pass
- Calculating loss
- Backpropagate Loss to update weights
If You are unaware of Backpropagation algorithm then click here
Forward pass :
- Z= r ,where W1 is weight matrix for 1st hidden layer and W0 is bias vector for 1st hidden layer
- compute Relu(Z)
- pass the output of 1st hidden layer i.e as input to 2nd hidden layer
- Z2 =
- X'' =Relu(Z2)
- pass the output of 2nd hidden layer i.e X'' as input to our output layer
- then compute sigmoid ()
Calculating Loss: We will be using Logistic Loss Passing Output to loss and calculating it is a straight forward task
Backpropagating loss is very complex task as differentiating loss function with respesct to every Weight is highly complex so for this step we Will provide direct derivatives which you can use,Derivative of loss function is , where is sigmoid () and as we know the values of W's and X's depends upon the output of previous Layers , so we have to backpropagate the loss using Backpropagation.
After this we will update the weights using Gradient Desent,If you are unfamiliar how gradient descent work then just click here.
We Will follow this procedure till the difference between previous Wi and next Wi is negligible.
Mlp is not a new architecture but a powerful one and best beginner's friendly implementation of neural network.We will be covering many such articles on DeepLearning so stay tuned.