Week 4 content of Scribble AI

By AI Club on 10/7/2024

0

In this week, we will be learning about how neurons "fire" - whether to fire or not - and much more. This week will be much more informational compared to the previous ones. While the coding tasks will not be a lot, it is crucial to utilize this document to gain an understanding of what is being taught. To start, we'll discuss activation functions.

Files you need for this week:

**Activation Functions in Neural Networks**

Activation functions are crucial in neural networks. They determine whether a neuron should be activated or not by calculating the weighted sum and adding bias to it. This transformation is what adds non-linearity to the model, enabling it to learn and perform complex tasks. Without activation functions, the network would simply perform a linear transformation of the input, which severely limits the model's capacity to learn from data.

Watch the video:

https://www.youtube.com/watch?v=Y9qdKsOHRjA

(You can watch the whole video, but for this project we will only care about ReLU and Softmax)

ReLU (Rectified Linear Unit) is one of the most popular activation functions applied in the hidden layers of a neural network. The ReLU (Rectified Linear Unit) activation function works by taking any negative number in the data and converting it to zero, while keeping all positive numbers the same. So if you have an input, ReLU basically flips any negatives to zero but lets the positives stay as they are. It's super useful because it helps the neural network learn more effectively without getting stuck, especially when dealing with complex data.

Watch the video:

https://www.youtube.com/watch?v=68BZ5f7P94E&pp=ygUYcmVsdSBhY3RpdmF0aW9uIGZ1bmN0aW9u

Simplicity: Computationally efficient—just a simple threshold at zero.

Non-linear: Though it appears linear, it introduces non-linearity, which helps the network to learn from errors and adjust weights.

Avoids vanishing gradient problem: Unlike sigmoid or tanh activation functions, ReLU helps in reducing the problem of vanishing gradients, making the model training process faster.

It looks like ReLU is not that hard to implement in code.

*def relu(x):*

* # x will be an input. For a ReLU function to work*

* # Based on the definition of ReLU, can you complete the function? Super easy*

Softmax is typically used in the output layer of a neural network, especially when dealing with classification problems. It transforms the raw scores from the network into probabilities that sum to one. The Softmax function is a way to turn a list of numbers (like scores from a neural network) into probabilities. It does this by emphasizing the largest numbers and reducing the impact of the smaller ones. Each number gets exponentiated (raised to the power of e), and then we divide by the sum of all those exponentiated values to get a probability. This is super handy for figuring out which category a certain input most likely belongs to, since the output probabilities add up to 100%. So, it gives you a clean probability distribution.

Probability distribution: Converts logits into probabilities, which makes it suitable for multi-class classification.

Clear interpretation: The output probabilities can be interpreted straightforwardly, making it clear how confident the model is about its predictions.

To understand softmax even better, I recommend watching this video upto 20mins:

https://www.youtube.com/watch?v=omz_NdFgWyU

Let's try to code up the softmax function

*import numpy as np*

*def softmax(x):*

* # Based on the definitions, videos, and mathematical notions, can you write this function*

* # that returns the value after applying softmax to x?*

Before we proceed, make sure you run the tests after you are done finishing the functions.

The *activation_functions.py* contains the skeleton for both the functions. Edit the file to complete the functions. Afterwards,

Do this:

*python test_4.py*

Now, we need to implement the relu and softmax functions into the Neural Network code. First, copy your working activation functions from *activation_functions.py* and paste them into *week4_code.py* at the top of the file (after numpy import).

In the Neuron class, right before you output your self.output, apply the ReLU activation function to the neuron.

For softmax, **READ ON**. After reading and watching about softmax, you might have guessed that we apply softmax to the **output layer** since we want the output of the neurons to follow a perfect probability distribution. So, instead of altering the neuron code, we will alter the *layer* forward function and right before we output the *self.outputs*, we will apply softmax to the entire *self.outputs* (just call your softmax function with the self.outputs).

I will suggest some edits to get you started.

1) Make both the Layer class and the Neuron take an additional parameter called "activation_function".

2) While creating the hidden layers in the Neural Network class, pass "relu" as the activation function

3) When you create the final output layer, pass "softmax" as the activation function

4) Proceed to edit the *forward* functions of both layer and neurons to incorporate the feature.

The *week4_code.py* file should be updated with code from previous week so it has the necessary code for the Neuron, Layer, and Neural Network.

Hope this week's content was helpful. I urge you all to please take extra care to understand the content to the best of your abilities. I will be expecting A LOT of questions in the discord channel. Since we will not be having any beginner project sessions this week, please utilize discord for questions. See y'all next week!