Week 5 content of scribble AI

By AI Club on 10/14/2024

0

This week's content is probably the **most** important of all the weeks. Because we will actually talk about "back-propagation" which consolidates how the Neural Network actually **learns** on a given dataset and is then able to predict what an unknown data might be.

For this week only, we will be giving away the code for the back-propagation portion of the neural network. It will be upto you to make sure all the Neural Network content up until this week is done and is working, only then will you be able to integrate the code given here into your existing project.

If you all remember about weights and biases, they are the defining features of a neural network. Recall the fundamental formula for how neurons pass data to another neuron? The dot product:

*np.dot(self.weights, inputs) + self.bias*

The inputs we cannot do anything about; however, if we can **tweak** the weights and biases in such a way so that the neural network is able to **predict** what a given image is, won't that be awesome?

That is exactly what back-propagation is all about.

- Let's say we are trying to train our NN on the MNIST dataset. This is a dataset of handwritten digits (0 to 9). It contains thousands of images of **each** digit. Essentially in a numeric form.

- We get the dataset and "train" our nn. This is how backpropagation works:

- X is our MNIST dataset input data and Y is the labels (what digit the X corresponds to)

- We do one "forward-pass" with the data as inputs. This is basically just passing the MNIST dataset as *inputs* and then calculating the dot product once.

- Now, when training while we pass the MNIST dataset, we also know the Y values. We also pass the Y values to model while training.

- Now, after Pass 1 and generating the outputs, the model predicts what digit the inputs might be. Since this is Pass 1, the predicted output is completely random.

- Then, the model compares. It compares the digit it predicted with the digit that **should have been** the correct digit. And it calculates a numeric loss.

- Now, the model goes inside each neuron and tries to updates its weights and biases to **minimize** the loss. You see the vision on this? By trying to minimize the loss, the model will try to update the weights and biases in such a way that it is more accurate when predicting a digit.

- How do we update the weights and biases? We **reverse** the layers. We are currently at the output layer, we go backwards from the output layer all the way to the first layer (the input layer), updating the weights and biases along the way.

- For pass 2, the output will be **slightly** closer to the real output. As we keep on doing it, you can imagine that the model will get better in time. Each pass is called an "epoch".

Watch the following video:

https://www.youtube.com/watch?v=w8yWXqWQYmU

Do you remember how we applied the activation function during forward feeding? The ReLU and Softmax activation functions. When doing back-propagation we need to **undo** those functions. How? Calculus. We take the derivative of those functions and apply them. This is the code for the Neuron level class. You can directly copy them.

*class Neuron:*

* def **init**(self, num_inputs, activation_function):*

* self.num_inputs = num_inputs*

* self.activation_function = activation_function*

* self.weights = np.random.randn(num_inputs) * np.sqrt(2. / num_inputs) # He initialization*

* self.bias = np.random.randn()*

* self.output = 0*

* self.inputs = None*

* self.d_weights = None*

* self.d_bias = None*

* self.delta = 0*

* # Have your code from previous weeks*

* *

* def backward(self, delta, learning_rate):*

* if self.activation_function == "relu":*

* delta *= relu_derivative(self.output)*

* *

* self.d_weights = delta * self.inputs*

* self.d_bias = delta*

* *

* self.delta = delta*

* return np.dot(delta, self.weights)*

* *

* def update(self, learning_rate):*

* self.weights -= learning_rate * self.d_weights*

* self.bias -= learning_rate * self.d_bias*

Analyzing:

We set the *self.d_weights* and *self.d_bias* which are the derivatives with respect to weights and biases. The *self.delta* variable keeps track of a small change in the derivative of the ReLU function of the Neurons. For *self.weights* We use He initialization to randomly initialize the weights. Make note of this change in your version of the code.

In the *backward* function, we update the derivatives with respect to the weights and the biases by multiplying them with the small delta. Why exactly? It is hard to explain but the video I linked should cover the math behind it.

Lastly, the *update* function actually updates the *self.weights* and *self.bias* by multiplying the derivatives with a **learning rate**. The learning rate essentially controls how fast or slow the NN learns.

Just like how we cascade the "forward" function, we also do the same for backprop. We also update the Layer class with a backward and update method.

*class Layer:*

* def **init**(self, num_inputs, num_neurons, activation_function):*

* self.num_inputs = num_inputs*

* self.num_neurons = num_neurons*

* self.activation_function = activation_function*

* self.neurons = [Neuron(num_inputs, activation_function) for in range(numneurons)]*

* # Your forward function goes here before backward*

* *

* def backward(self, delta, learning_rate):*

* next_delta = np.zeros(self.num_inputs)*

* for i, neuron in enumerate(self.neurons):*

* next_delta += neuron.backward(delta[i], learning_rate)*

* return next_delta*

* def update(self, learning_rate):*

* for neuron in self.neurons:*

* neuron.update(learning_rate)*

The backward function here is much simpler

Based on the num_inputs for the layer, we create a matrix of deltas and update the matrix by calling the backward function of all the neurons in that layer. In the update method, we call the update method of each neuron in that layer.

Now, time to update the NN class

*class NeuralNetwork:*

* # No changes to your **init** or forward functions*

* def calc_loss_delta(self, predicted_outputs, actual_outputs):*

* return predicted_outputs - actual_outputs*

* def train(self, X, y, epochs, learning_rate):*

* for epoch in range(epochs):*

* total_loss = 0*

* for i in range(X.shape[0]):*

* inputs = X[i]*

* expected_output = y[i]*

* *

* # Forward pass*

* predicted_output = self.forward(inputs)*

* *

* # Calculate loss*

* loss = categorical_cross_entropy(predicted_output, expected_output)*

* total_loss += loss*

* *

* # Calculate loss gradient (delta for the output layer)*

* loss_delta = self.calc_loss_delta(predicted_output, expected_output)*

* # Backward pass*

* delta = loss_delta*

* for layer in reversed(self.layers):*

* delta = layer.backward(delta, learning_rate)*

* layer.update(learning_rate)*

* *

* average_loss = total_loss / X.shape[0]*

* print(f'Epoch {epoch+1}/{epochs}, Loss: {average_loss:.4f}')*

The *calc_loss_delta* function calculates the loss between the NN predicted outputs and the actual outputs. In the *train* method, we run the training for the number of **epochs** or passes. Going forward, the comments on the code are self-explanatory. We predict the outputs, calculate the loss, calculating the loss delta, backpropagate through the layers in a reversed manner. At the end, we output the loss for each epoch.

Now, what is the *categorical_cross_entropy* function? It is the function that calculates the loss.

You can learn about categorical cross entropy here. There are other loss functions too, but we'll use this one:

https://www.youtube.com/watch?v=dEXPMQXoiLc

In the top of your file, you can define this function as well as the ReLU derivative:

*def relu_derivative(x):*

* return np.where(x > 0, 1, 0)*

*def categorical_cross_entropy(predicted, actual):*

* epsilon = 1e-12 # To avoid log(0)*

* predicted = np.clip(predicted, epsilon, 1. - epsilon)*

* return -np.sum(actual * np.log(predicted))*

I am attaching a file *week5_code.py* that contains the code for the content covered above.

Make sure to **not** write over this file. Instead, use the code in the file to copy the necessary and new functions and code over to your existing project that you have been working on for the past month.

While the content for this week might seem very intimidating, I promise if you read the entire article with a calm mind and watch the two videos linked, backpropagation is very understandable. Moreover, I have provided every piece of code needed for back-propagation to work.

Next week, we will cover the MNIST dataset in detail and actually load the dataset with python and train our network on the dataset. At the end, we will test our NN on a random digit from the dataset and you'll see how your NN is able to detect what that digit is. Before those, we will also talk briefly about:

- One-hot encoding and why we need it

- Normalizing the data before training

Once we have our model trained and running, we will have accomplished the most important part of this project. I hope to see you all there!