Neural Networks – How AI Starts Thinking!


In today’s blog, we’re diving into one of the most exciting and core ideas behind deep learning—Neural Networks.

We’ve all heard of "neurons" in biology, right? Those brain cell things. But now imagine something similar being used inside a computer! That’s right—neural networks in deep learning are inspired by the human brain, and that’s what helps AI "think" like us. Pretty interesting, right?

🔍 What is a Neural Network?

A neural network is a type of machine learning model inspired by how the human brain works. It’s not using real biological neurons, obviously—but it’s built on the idea of how we humans process information. These networks are made up of artificial neurons (also called nodes), and no, they don’t look like biological ones—but they serve a similar purpose in learning.

And what’s their main job?
To learn patterns in data.
Whether it’s an image, text, or raw sensor data—everything has its own pattern. Neural networks are designed to pick up on those patterns and learn from them.

We need to talk about the perceptron—the most basic unit in a neural network. What does it do? A perceptron takes in input, applies weights and bias, passes it through an activation function, and gives an output. That’s it. Simple but powerful. When we stack multiple perceptrons into layers, we get a multi-layer perceptron (MLP)—which is the basic structure in deep learning: Input Layer, Hidden Layers (can be many!), Output Layer. The more hidden layers, the “deeper” the learning. That’s why we call it deep learning.

Firing the Neurons

Activation functions decide whether a neuron should "fire" (i.e., send information to the next layer). They're basically math functions that help the network understand and process data in complex ways.

Here are some common ones:

  • Sigmoid – Gives output between 0 and 1. Mostly used in binary classification.

  • Tanh (Hyperbolic Tangent) – Output between -1 and 1. It’s like sigmoid but more balanced, but still suffers from something called vanishing gradient

  • ReLU (Rectified Linear Unit) – Used often in image data. It gives better results and speeds up learning.

  • Leaky ReLU – A better version of ReLU that solves the “dying neuron” problem when ReLU outputs zero too often.

Ever heard of vanishing gradient? It sounds complex, but here’s the deal:

  • Gradient = the signal that tells a neuron how to adjust weights.

  • Vanishing Gradient = the signal becomes too small (almost zero), so the network stops learning.

  • Exploding Gradient = the signal becomes too big, and learning becomes unstable.

We need the gradient to be just right—not too small, not too big. This is why activation functions and good design matter.

Now, let’s get to how learning happens.

There are three main steps:

  1. Forward Propagation – The input data flows through the network, layer by layer, until we get a prediction/output.

  2. Loss Calculation – We check how far the prediction is from the actual result. That difference is called the loss or error.

  3. Backward Propagation – Here’s where the magic happens. We adjust the weights to reduce the error. This step goes backward through the network to improve it.

This process keeps repeating until the network gets better and better at predicting.


Yeah, until now we’ve got some good knowledge about Neural Networks. Now it’s time to dive a little deeper and understand how we actually train these networks. That’s where the real magic begins! To do that, we need to know a few important methods that help during training.

One of the most important things here is something called a Cost Function (also known as Loss Function). It’s nothing fancy—just a simple concept. It tells us how wrong the model's prediction is. Based on this error, we can figure out how to adjust and improve the model. That’s the whole reason we use a cost function—it helps us learn from our mistakes.

Then we have something called Weight Initialization. See, we already know that the model keeps updating weights after every step, right? But how we start—what initial weights we give—matters a lot. If we give completely random weights without thinking, the training can go really bad. It may lead to something called exploding gradients, where values go out of control, or sometimes it becomes too slow to train. So we have to be smart here, and that’s why we use proper weight initialization methods.

Now, sometimes training takes forever or becomes unstable. To fix that, we use a technique called Batch Normalization. It helps the model train faster and better. And yeah, remember we also use Dropout and Regularization—we already saw that in our ML series. Dropout is like removing unwanted or unnecessary data during training. And regularization is just a way to control the model so it doesn’t overfit and mess up.

Okay, now comes one more concept—sounds big, but it’s simple: Internal Covariate Shift. It’s just a term that means the output from one layer becomes the input for the next layer, and that output keeps changing during training. That shift makes it hard for the model to learn. To reduce this shifting problem, again, we use Batch Normalization. It keeps things steady.

And then, there’s something called Early Stopping. Think of this like telling the model, “Okay, that’s enough learning!” Because if we let the model keep learning too much, it’ll start messing up by memorizing everything, which we don’t want. So we stop it at the right time to avoid overfitting.

Lastly, we’ve got something called Checkpointing. Training these models can take hours, sometimes even days. So imagine if something goes wrong in between—boom, everything’s gone! That’s why we save the model's progress step by step using checkpoints. So even if it crashes, we don’t lose everything. We can start again from the last saved point. And yeah, that’s it!


Now we’re going to talk about one of the most important parts of deep learning—CNN. Why is it important? Because when we start working with images or videos, CNN becomes a major tool. Most people love visual stuff. I mean, let’s be real—raw data is kind of boring. But images and videos? That’s where things get exciting! So yeah, CNN is something you’ll definitely be interested in.

CNN stands for Convolutional Neural Network. It’s just another type of neural network, but this one is specially designed for working with image or spatial-related data. The cool part? CNNs automatically detect things like color, edges, shapes—all those patterns in the image. Then, they combine that information and try to understand what the image is saying. Pretty cool, right?

To understand CNNs better, we need to know a few operations that happen inside them. First, there’s something called a kernel. You can think of it like those filters we use on Instagram. Just like we apply a filter to highlight features in a photo, CNNs use these kernels to detect important features in the image, like edges or patterns. These kernels slide over the image and capture those details.

Then we’ve got padding. This is simple—it just means adding extra space around the image. Sometimes the image doesn’t fit perfectly during processing, especially when features are being extracted. So, padding helps by adding borders to the image so that all the operations go smoothly.

Next is stride. Stride decides how much the filter moves across the image. If the stride is 1, the filter moves slowly and captures a lot of small details. If the stride is 2 or more, it moves faster, so the process becomes quicker, but you might miss some fine details. So, it’s always a balance.

Now comes pooling. Pooling is used to simplify the data after feature extraction. We have two main types: max pooling and average pooling. Max pooling just picks the highest value from a region in the image, while average pooling takes the average. It’s like zooming out a bit to see the big picture while keeping the most important parts.

Another cool thing is parameter sharing. This is a bit different from how regular neural networks work. In normal neural networks, we update weights for every neuron separately. But in CNNs, we use the same filter across the whole image. That’s called parameter sharing. It helps speed up training and also improves how well the model works on different types of images.

Finally, we’ve got something called Transfer Learning. Now, if we try to build a CNN model from scratch, it’ll take a lot of time and data. Instead, we use models that are already trained on huge datasets. We just tweak them a little for our own project. This is called transfer learning. Some popular pre-trained models we use are ResNet, VGG, and a few others. They save time and still give great results.

And yeah, these are the main things you need to know about CNN. Once you get a feel for it, working with image data becomes super powerful and interesting!


The next important concept we’re gonna talk about is RNN, which stands for Recurrent Neural Network. This one's mainly used for sequential data, like text. You’ve probably noticed it when you’re typing something in Google—like you start with “I am looking...” and before you even finish, it starts giving suggestions like “I am looking for food” or “I am looking for travel destinations.” That’s RNN in action!

That’s all for now about RNN! But don’t worry—we’ve got some really exciting stuff coming up in the next blog. We’ll dive into how images are created using deep learning, how NLP (Natural Language Processing) works, and how GANs (Generative Adversarial Networks) can generate duplicate or even completely new images. It's gonna be a super fun ride into some genius-level tech stuff!


Until then, happy learning! 😊


And hey—if you ever have any questions about anything I’ve shared here, or if there’s something different you want to learn, just DM me directly. No formalities needed! Just hit me up through my contact, and I’ll genuinely try to help with all my heart. I'm always open-minded and here to support you in your learning journey. 💬❤️

Comments