Welcome to my blog! If you've ever been curious about neural networks, wondered what all the buzz is about, or wanted to understand why everyone is suddenly obsessed with them, you're in the right place. In this post, I'll break down the complexities of neural networks in simple, intuitive terms. Whether you're a tech enthusiast who loves diving deep into the details or someone who's just starting to explore the fascinating world of AI, this guide is for you. So grab a cup of coffee, get comfortable, and let's embark on this exciting journey of decoding neural networks together!
A brief explanation of neural networks
The term "neural networks" comes from an over simplistic model of the brain. I referred to an over simplistic model because the human brain, being the most complex system known, still holds countless mysteries we have yet to unravel.
However, advances in technology and science have allowed us to discover many fascinating facts about the brain. Let's explore some of the key components of this neural world. Biological neurons are the building blocks of the brain and nervous system, responsible for processing and transmitting information through electrical and chemical signals. A typical neuron consists of three primary parts:
Dendrites: These are branch-like structures that receive signals from other neurons.
Soma (Cell Body): This contains the nucleus and organelles, acting as the neuron's metabolic center.
Axon: A long, thin fiber that transmits signals away from the neuron to other neurons or muscles.
Below is a visual representation that illustrates what I'm explaining.
Now, Artificial Neural Networks (ANNs) are computational models inspired by the biological neurons I just explained. ANNs consist of artificial neurons or nodes organized into layers: the input layer, hidden layers, and the output layer.
The Input Layer
It draws inspiration from the way dendrites function in biological neural models, acting as receivers of input features or data.
The Hidden Layers
They perform computations and feature transformations, essentially simulating the functions of a cell body or soma. Inside the nucleus of a biological neuron, complex processes occur; in artificial neural networks, these are represented by activation functions.
Put simply, a hidden neural network receives inputs and applies an activation function—a mathematical transformation—to produce an output. These transformations occur within the neurons in the hidden layers. There are four primary activation functions used in ANNs:
Sigmoid Function (Logistic): This function squashes the input values between 0 and 1, making it useful for binary classification problems.
Hyperbolic Tangent Function (tanh): Similar to the sigmoid, tanh squashes input values between -1 and 1, providing a zero-centered output that can help with training stability.
Rectified Linear Unit (ReLU): This function outputs the input directly if it is positive; otherwise, it outputs zero. ReLU is popular for its simplicity and efficiency in training deep neural networks.
Linear Function: This function is suitable for tasks where the relationship between input and output is expected to be linear.
In my upcoming posts, I'll delve into various activation functions available and guide you on selecting the most suitable one based on your specific problem.
The Output Layer
It produces the final output or prediction, drawing inspiration from the axon's role in transmitting signals. Now that you have a bit more context about biological and artificial neural networks, let's refer to the following image to solidify these concepts further.
In the image above, you can see two inputs: x1 and x2 which form the input layer as described earlier. There are also two specific variables we haven't discussed yet: w(w1 and w2) and b. These variables are known as weights and bias respectively. The gray neuron belongs to the hidden layer; in this example, we have one hidden layer (though a hidden layer can contain multiple neurons!). Finally, we have the output layer y.
Now, let's delve into the process that occurs within a neural network workflow. Each input is multiplied by its corresponding weight and then combined with the bias(b).
For instance, consider the initial neuron in the hidden layer, which receives inputs x1 and x2. These inputs are multiplied by their respective weights, w1 and w2 and then added to the bias term. The resulting calculation can be expressed as follows:
Imagine that this is what the first neuron(the gray one) from the hidden layer is receiving. Now, inside of the neuron a series of transformations are taking place(activation function). So the neuron in the hidden layer is going to apply an activation function over h. Let's say our activation function for this example is the sigmoid function. The sigmoid function is defined by the following formula:
If we replace the term h then the result will be:
The result from the last equation in this example will be our output result y. You might think, "All of this looks like a piece of cake!" But things can get messy real quick—neural networks can get incredibly complex. For instance:
To put things into perspective, consider your input variables—they can be anything. For example, if you want to predict house prices based on factors like location, house area, and number of rooms, your inputs would include location, house area, and number of rooms, while your output(y) would represent the house price.
By now, I'm sure you may have questions. Firstly, why do we need to apply activation functions to the inputs? That's an excellent question! We could stick with inputs, weights, and bias alone, but this approach would only yield linear solutions, adhering to this structure:
And no matter how many neurons you have, if you only sum them up without applying activation functions, you will still end up with a linear solution. In linear solutions, changes in the inputs result in proportional changes in the output. However, most real-world problems do not follow a linear relationship—they are non-linear problems, meaning they cannot be accurately represented with a linear function.
By applying activation functions, we introduce non-linearity to our outputs. This allows the neural network to capture and model the complexities of real-life problems more effectively. The non-linear transformation enables the network to learn and represent intricate patterns, making it capable of solving a wide range of complex tasks.
Secondly, why do we need weights and biases? To answer this question, let’s take a trip back to when you solved simple equations in school. Back then, you needed to find the values of x (inputs) and y (output) that satisfied the equation. In neural networks, you already have the input and output values.
Let’s say you want to predict the probability that someone has diabetes. Diabetes will be your output, while age, gender, body mass, smoking status, history of gestational diabetes, and so on will be your inputs. If you use an ANN to predict diabetes, you need data. This data should include records of patients, their diabetes status, and their respective values for all the inputs: age, gender, body mass, smoking status, and more. The more data you have, the better.
We feed this data into our neural network, and the model’s task is to determine the precise weights and biases that will predict whether a patient has diabetes as accurately as possible. This is achieved by comparing the neural network's output with the actual diabetes status in a process called training.
If the training process goes well, we will have weights and biases that closely approximate the outputs to the actual diabetes status. In other words, the weights, biases, and activation functions are what give the neural network its ability to learn and make accurate predictions!
In the next posts, I will explain how the training process occurs in a neural network. See you then!