Welcome to my blog! I'm thrilled to have you here for my second post. Today, we're diving into the notations and syntax of the neural networks world. This post, along with some upcoming ones, are essential before we jump into creating our first neural network. You'll soon see why. So, grab a cup of coffee, get comfortable, and let's embark on this exciting journey together!
Understanding Neural Networks Notation
In my previous post https://www.awakeaiminds.com/post/an-intuitive-guide-to-understanding-neural-networks, I demonstrated that for a neural network with the next structure,
The equations that input into the gray neuron are:
And then we applied an activation function over h, resulting in the following:
This gives us our output Y. It's important to note that we need to determine three parameters for the model: w1, w2, and b. I will explain how to get these values in upcoming posts.
Now, imagine a slightly more robust neural network compared to the one shown in the previous image. Let's consider the following neural network:
As you can see, using the same variable names can be confusing. For example, distinguishing between w1 from input 1 to neuron 1 and w1 from input 1 to neuron 2 in the hidden layer can be difficult, as both are named the same but represent different values. Additionally, it becomes challenging to differentiate between w1 in the first layer and w1 in the second layer when dealing with multiple layers.
Let's use a standardized notation for neural networks, as illustrated in the following image:
This approach allows us to represent the equations more easily and clearly. Now let's examine the connections arriving at the first neuron in layer 1 (the hidden layer) to better understand the notation.
The first neuron in the hidden layer receives inputs x1 and weight w11, as well as x2 and w12, along with the bias term b1. Therefore, the input to neuron 1 can be expressed as:
The previous equation demonstrates that all calculations take place in layer 1, which we denote with the symbol (1). In this context, h1 represents the input to the first neuron. A generalized notation then will be:
Now, you can derive the equations for the inputs to neurons 2 and 3 in the first layer(It's a valuable exercise, so I encourage you to give it a try!):
As discussed in my previous post, we need to apply activation functions to the inputs received by the neurons. For example, if we choose the Sigmoid Activation Function for the last three equations, the result will be:
Finally, let's get the output result based on the following image:
As you can see, we could say that the output y acts like layer number 2. Neuron y receives inputs from a1 and w11, a2 and w12, a3 and w13, and b1, the weights and bias are for the second layer as you can appreciate in the previous image. Therefore, the input for neuron y will be:
If we replace a1, a2, and a3, the input for y will be:
We have two alternatives depending on the problem we need to solve. Since this is the output layer, we can either apply an activation function to h1 in the output layer, or use the previous equation without an activation function. If we decide to apply for example a sigmoid activation function to h1, the final output result will be:
Managing all these equations is already becoming cumbersome, even with such a small neural network! In real-world applications, the number of neurons can range from a few hundred to several thousand, depending on the complexity and type of the network. Can you imagine how challenging it would be to manage the number of equations? This is where matrix operations come to the rescue. In my next post, I’ll explore matrices and vectors and how they simplify working with neural networks. See you there!
コメント