#11: Data Drift vs Concept Drift - The Difference? | How to Talk About Failed ML Projects in Interviews
Reading time - 7 mins
1. ML Picks of the Week
A weekly dose of ML tools, concepts & interview prep. All in under 1 minute.
2. Technical ML Section
Learn the difference between Data and Concept Drift in ML Production
3. Career ML Section
Learn How to Talk About Failed ML Projects in Interviews
1. ML Picks of the Week
🥇ML Pipeline Framework Kedro
Kedro is a Python framework for building modular, production-ready ML pipelines. It helps you organize your code, manage data dependencies, and make your projects easier to maintain, scale, and debug.
What makes Kedro worth learning:
✅ Clean project structure out of the box
✅ Clean resulting ML Pipeline code and structure
✅ Great pipeline visualization tool - KedroViz
Kedro brings engineering discipline to ML workflows without overcomplicating things.
If you want to start with ML Pipelines, I recommend starting with Kedro.
📈 ML Concept
Autoencoders
Autoencoders are neural networks trained to reconstruct their input, and in doing so, they learn a compressed representation of the data.
They work by encoding input into a low-dimensional space (the bottleneck), then decoding it back to the original form.
The goal is to minimize the reconstruction error between input and output.
Autoencoders are especially useful for:
✅ Detecting anomalies by spotting high reconstruction errors
✅ Reducing dimensionality in high-dimensional datasets
✅ Learning compact embeddings for downstream tasks
They're widely used in ML pipelines where you need unsupervised learning, anomaly detection, or feature extraction without labels.
If your dataset has noise, complexity, or hidden structure, Autoencoders can help you understand what matters most.
Read a deeper breakdown of how Autoencoders work HERE.
🤔 ML Interview Question
What is the Vanishing Gradient Problem?
The vanishing gradient problem happens when gradients become too small as they’re backpropagated through deep neural networks.
As a result, the weights in early layers update very slowly (or not at all), making it hard for the model to learn meaningful patterns.
Why is this important?
📌 It limits how deep your neural network can go without performance issues.
📌 It can cause training to stagnate, especially in RNNs.
📌 Understanding it helps you make better architecture choices (e.g., ReLU, skip connections, batch norm).
The vanishing gradient is a major reason behind innovations like ResNets, GRUs, and LSTMs.
Read more about the Vanishing Gradient Problem HERE.
2. Technical ML Section
Difference between Data Drift and Concept Drift
Drift events in ML systems are the problems that have broken many ML solutions working in real-time.
There are 2 main drifts that occur in real-world production systems:
- Data Drift
- Concept Drift
These 2 concepts sound similar, but they’re very different in practice.
In this Maistermind issue, you’ll learn the key differences using a simple real-world example.
No fluff, straight to the point discussion.
1️⃣ System Intro
Imagine we have the following large-scale coffee roasting process.
To control the outlet beans' humidity, we create a machine learning model with a simple setup as shown below.
2️⃣ What is Data Drift?
Given the system setup, we can now think of what can be Data Drift in this case.
Definition: Data drift is the change of statistical properties in the input (features) data
In our case, the example of data drift is a shift in the bean size.
3️⃣ Why is Data Drift important to monitor?
Scenario 1:
Data drift monitoring helps to estimate the model performance when true target values are rarely available.
Consider the case below. In this case, we can see that continuous data drift monitoring helps to identify the potential hazards of the drifted data to the system.
But most importantly, we can make in-time decisions on what we can do to prevent production losses.
Scenario 2:
Understanding the root cause of a bad model performance
This is a critical use case of continuous drift monitoring. Often, models in production start degrading, and we don't really know what is the root cause of this.
And if we don't know what the root cause is:
- We can't improve the system performance, at least in a sustainable way.
- We can't make the correct decision on what to do not to lose the product / final value of the model.
Scenario 3:
What if the model is correct but the true target data (the humidity land samples data) is wrong?
I have seen such cases in the ML systems that I deployed. The ground truth is not always that ground as it seems.
In this case, without understanding that the true target contains errors, you will have hard times when re-training your model and not getting a better performance.
4️⃣ What is Concept Drift?
Definition: Concept drift is a phenomenon when the relationship between the feature and the target changes.
So, the main difference between the concept and data drift is that:
- In Data Drift, the input features' distribution changes over time
- In Compcet Drift, the relationship between the input features and the
Let's see what the concept drift is in our coffee roasting system.
5️⃣ Why is Concept Drift important to monitor?
Main Scenario:
Concept drift monitoring helps to debug poor model performance and define a re-training strategy
Let's consider the case below.
In this case, we see that the model starts deviating from the true target values while there is no data drift detected. Then, we get questions like:
- Can we still use the model?
- Should we stop the operation right away?
- Do we need to re-train and build a completely new model?
Concept drift is a very confusing phenomenon in real-world ML Production systems, and it's real.
I have seen a severe local concept drift in one of the ML Systems my team and I deployed.
In that case, we observed that the correlation between the input feature and the target was changing from 0.5 to -0.5. And you can imagine how severe the ML Model Metric degradation was.
6️⃣ Summary
- Data drift is the change in statistical properties of the input data (features).
- Concept drift is the change in the relationship between input features and the target.
- Data drift can be used as a proxy for monitoring the model performance when the target samples are rarely available to compute the model error.
- Continuous data and concept drift monitoring can be useful to speed the root cause analysis of poor model performance.
That's it for the Technical Part!
Follow me on LinkedIn for more daily ML breakdowns.
2. ML Career Section
How to Talk About Failed ML Projects in Interviews
Every ML professional has projects that didn’t go as planned. I've failed several ML Projects, most of them at the beginning of my career.
In interviews, what separates strong candidates is how they explain these failures.
The failure itself isn’t the issue. What matters is your ability to reflect, take ownership, and turn that experience into a learning moment.
Here’s a simple structure that works:
- Briefly explain what happened
- Be honest about what went wrong
- Share what you learned — and how it changed your future work
For example:
"We deployed a churn prediction model that started performing poorly in production.
It took me a long time to investigate the issues, and in the end, the client didn't proceed with our solution.
After investigating, I realized that a severe data drift had happened, so that even model re-training did not help.
In the next project, I stepped up and built a simple data drift monitoring pipeline.
It helped us catch distribution shifts early and avoid repeating the same mistakes."
This kind of story shows more than just technical knowledge. It shows critical thinking, maturity, and a bias toward improvement — all things hiring teams look for.
That’s what makes the story powerful.
Hope this helps in your next ML interview!
Related articles
- Why is Data Drift in ML important to monitor - Newsletter Article
- Drift in ML is important to monitor - Blog Article
That is it for this week!
If you haven’t yet, follow me on LinkedIn where I share Technical and Career ML content every day!
Whenever you're ready, there are 3 ways I can help you:
1. ML Career 1:1 Session
I’ll address your personal request & create a strategic plan with the next steps to grow your ML career.
2. Full CV & LinkedIn Upgrade (all done for you)
I review your experience, clarify all the details, and create:
- Upgraded ready-to-use CV (Doc format)
- Optimized LinkedIn Profile (About, Headline, Banner, and Experience Sections)
3. CV Review Session
I review and show major drawbacks of your CV & provide concrete examples on how to fix them. I also give you a ready CV template to make you stand out.
Join Maistermind for 1 weekly piece with 2 ML guides:
1. Technical ML tutorial or skill learning guide
2. Tips list to grow ML career, LinkedIn, income