Deep Learning Decoded: The Way I See It

Yeah, in this blog, we’re going to look into what deep learning is.

But before that—if you’ve been following along, we’ve already seen a lot about machine learning. We’ve talked about what machine learning is, how it works, the math involved, and many more things in the last few blogs.

Now, we’re going a bit deeper—to understand deep learning, and why we moved to deep learning from machine learning.

So yeah, deep learning is a subset of machine learning, and machine learning is a subset of AI (Artificial Intelligence). It all falls under AI.

Deep learning is actually inspired by how our brain works—specifically the neural networks. The same concept is applied here in a technical way.

We mostly use deep learning to handle unstructured data like images, audio, text, and video. It processes these through multiple layers of neurons and gives us the output we need. That’s why deep learning is so useful.

Nowadays, we see deep learning everywhere—like in face unlock systems, self-driving cars, and many more real-life applications.

A Quick Intro to Neural Networks

Before diving deeper, we need to understand what a neural network actually is.

We all know about neural networks from biology—how the brain has neurons connected to each other. In deep learning, we follow a similar idea, but in a mathematical form.

A neural network has layers made up of nodes (which are like neurons). These nodes are all connected. Every connection has something called a weight and a bias. There’s also a function called the activation function that decides whether a node should "fire" or not.

There are mainly three layers in a neural network:

Input layer – where we give the raw data
Hidden layer(s) – where all the internal calculations happen using weights, bias, and math
Output layer – the final layer where we get the result

Can Neural Networks Understand Emotions?

Now here’s something people often ask: If neural networks don’t have feelings, how do they understand emotions?

Well, it’s simple. Neural networks don’t actually feel emotions. What they do is learn patterns from emotional data.

For example, imagine—if we train it with a bunch of photos of people smiling, the model learns what a smile looks like: mouth shape, eye position, facial muscles, etc. It’s not feeling the smile, it’s just recognizing the pattern from data.

What About Voice Data?

Some of you might also wonder how deep learning works with voice input.

So here’s how it goes: when we talk, our voice is raw audio. This goes into the microphone, and then something called sampling is used.

Sampling is the process where the microphone captures tiny snapshots of your voice—thousands of times per second. These snapshots are turned into numbers, because computers only understand binary numbers (0s and 1s).

After that, the model checks certain features of your voice like Pitch (high or low), Tempo (fast or slow), Energy (how strongly you speak), Frequency (the sound waves).

Using these, the model can predict the emotion or mood behind your speech—even though it doesn’t feel anything. It just matches the voice features with patterns it has already learned.

And yeah, if you're still wondering what is sampling exactly?, it's nothing but the process of converting sound signals into digital format. It captures thousands of points from your sound wave every second and converts that into numerical data the machine can understand.

Now yeah, let’s take another note here.

We just talked about how audio uses sampling, right? So, the next question is—what about images, videos, and other kinds of data?
Yeah, we’ve got sampling methods for those too. But each one has a slightly different process.

Sampling in Images

For images, we use a process known as aka pixelization.
Basically, images are converted into tiny square-like units called pixels. Each pixel contains information—like its color value, brightness, and position.

For example, if a pixel has RGB values of (0, 0, 0), it means it’s black. If the values are all high, it means it’s a white or bright pixel. This way, each pixel represents a part of the image.

Even a simple picture may have more than 10,000 pixels. And yeah, using all those pixels, the model learns how a face looks—like where the eyes are, how the nose is shaped, what a smile looks like, and so on.

Sampling in Videos

Now, when it comes to videos, it's actually pretty similar to images.

A video is made up of frames. Depending on the quality, it might have 24 to 60 frames per second. Each frame is basically a still image. So, the video is like a bunch of images played quickly one after another.

Each frame is processed just like we do with images—using the same pixel sampling methods we talked about above. And yeah, this is how video data is handled.

For this, we mainly use models like CNN (Convolutional Neural Networks). These are especially good for understanding image and video data.

Sampling in Text

Now let’s talk about text.

In text, we also use sampling—but in a different form. You might have seen this in machine learning or NLP (Natural Language Processing).

Some of the common methods used are One-hot encoding, Word embeddings, Tokenization.

We also use sentiment analysis to understand the emotions behind text. And I think if you’ve already read my ML blog posts, you might be familiar with these terms.

Sampling in Sensor or IoT Data

Now here’s something many people aren’t aware of—sensor data, like from IoT devices. This type of data also uses sampling, but again, in a slightly different way.

Why? Because this data is time series data.

It collects values every few seconds or even milliseconds. For example, a heart rate monitor might track beats per second. It uses sensors like accelerometers to measure movements or changes in direction. This is very common in fitness trackers and smartwatches.

We’ll talk about this in more detail later. I don’t want to bore you with too much at once. 😅

Before wrapping up this part, let me tell you—most people only started hearing about AI, ML, and DL after 2019, but the concept of neural networks actually began way back in 1943. Yeah, seriously!

If you're curious, just Google a bit about it. Check what happened in 1956, when the term “neural networks” was introduced, and how it all began. You’ll find a lot of interesting stuff out there.

Now yeah, next we’ll move on to something interesting—why deep learning is used so much these days. The answer is pretty simple: it has so many real-world applications. Deep learning is used in areas like computer vision, NLP, speech recognition, and many more. Let’s just go through them one by one.

In computer vision, deep learning is what allows machines to see and understand images or videos. Just think of it like giving a machine eyes and the brain to process what it sees. In NLP, it helps machines read, understand, and generate human language—both written and spoken. And in speech and audio processing, it helps machines understand spoken words, follow patterns in audio, and even generate speech like how assistants talk to us.

Apart from these, deep learning is also used in many other areas and cross-domains like robotics, healthcare, finance, architecture, education, and a lot more. So yeah, deep learning is kind of everywhere now.

Before we move forward, let’s look at a small concept—not going too deep into it—but just to give you an idea of how math is used in deep learning. We already saw this a little bit earlier, but here’s a quick summary of how it’s integrated.

First, we have linear algebra. We use this in many parts of deep learning, especially in handling input data and doing calculations for weights and biases. It’s also used in dot product, matrix multiplication, and in models like CNN. Linear algebra helps reduce dimensions of large data and is a major part of the backpropagation process too. So yeah, wherever you see grids, matrices, or big calculations—linear algebra is behind it.

Then comes probability. Since AI is basically about decision-making and predicting outcomes, probability is obviously a core part of it. It helps the model make the best possible guess based on what it has learned from data.

And yeah, if probability is there, then obviously we also need calculus. Calculus deals with how things change, and in deep learning, we use it to understand how the model's output changes when we adjust weights. It plays a big role in optimization and also shows up during backpropagation. It’s basically helping the model learn and get better with each step.

So these are the main math topics used in deep learning. We don’t need to go any deeper right now—this much is enough for now to understand the connection between DL and math.

Now, another thing people often ask is: “We hear all these terms—DL, ML, Gen AI, LLM, LLaMA, GPT, Gemini, Lambda—what are all these and what’s the difference?” Let’s break this down in the simplest way possible.

AI is the main thing. If something behaves like a human—makes decisions, responds intelligently, or solves problems—we call it AI. Inside AI, we have a subset called ML, which means machine learning. ML is all about learning from data. It doesn’t just follow fixed rules—it learns from past examples and improves.

Now inside ML, we have another subset called DL—deep learning. Deep learning uses more complex layers and models to find patterns in big data. So yeah, DL is not directly under AI, but under ML, which itself is under AI.

Now let’s talk about LLM, which means Large Language Model. LLM is a type of deep learning model, and it's used for tasks like chatting, writing, translating, answering questions—anything that has to do with language.

Then we have Gen AI, or Generative AI. This is not a model, but more like a concept. It means using AI to generate new content, like text, images, music, etc. And yes, Gen AI uses LLMs to do that job.

Now all these names—GPT, LLaMA, Lambda, Gemini—they are just different LLMs. Just like how Asia is a continent, and inside it you have countries like India, China, Japan—LLM is like the continent, and these models (GPT, LLaMA, etc.) are the countries inside it. They are just different versions of LLM, developed by different companies.

So yeah, I think now you’ve got a full basic idea of what DL is, where it's used, how math supports it, and how it connects with all these other fancy AI terms. Now you can easily say, "Yeah, I know what deep learning is. I know how it works and where it fits in."

This is all for now about deep learning. From the next blog, we’ll go a little deeper—we’ll look at how these models like CNN, RNN, and Transformers actually work in practice. That’ll be even more fun.

Until then, happy learning!

For Study Material or anyother, contact me directly (Click Here)

Life Of AI Student

Search This Blog

Deep Learning Decoded: The Way I See It

Comments

Post a Comment