#12: 5 Cases When Gradient Boosting Fails | Why NOT to be an ML/DS Generalist
Reading time - 7 mins
1. ML Picks of the Week
A weekly dose of ML tools, concepts & interview prep. All in under 1 minute.
2. Technical ML Section
Learn When Gradient Boosting Fails for Tabular Data
3. Career ML Section
Learn why NOT to be an ML/DS generalist to make a successful ML career
1. ML Picks of the Week
🥇ML Library tsfresh
Tsfresh is a powerful Python library for automated feature extraction from time series data.
It’s designed to help you transform raw sequences into meaningful features, without manually crafting time lags or domain-specific indicators.
What makes tsfresh worth learning:
✅ Automatically extracts hundreds of time series features
✅ Filters out irrelevant ones using built-in statistical tests
✅ Works well with both univariate and multivariate time series
So, if you’re working on time series-related problems, tsfresh helps you save time and avoid manual errors in early pipeline stages.
📈 ML Concept
p-value
The p-value is a key metric in statistical hypothesis testing.
It measures the probability of getting results at least as extreme as your observation, assuming the null hypothesis is true.
In simpler terms: a small p-value means your result is unlikely to be random, and provides evidence against the null.
Why it matters in ML:
✅ Used for feature selection in statistical models
✅ Supports A/B testing and experiment analysis
✅ Provides evidence-based support for decision-making
Just remember:
A low p-value means “statistically significant,” not necessarily important in practice.
Read a deeper breakdown of on p-value HERE.
🤔 ML Interview Question
What are the 4 main limitations of K-Means clustering?
K-Means is fast and easy to implement — but it has several core weaknesses you should know before using it in production.
📌 Requires specifying K:
You must define the number of clusters up front, and choosing the right K isn’t always clear.
📌 Assumes equal-sized, spherical clusters:
K-Means struggles with elongated or unevenly sized clusters. It works best when groups are compact and similar.
📌 Sensitive to outliers:
A single outlier can pull a centroid far off, distorting results.
📌 Initialization matters:
Different random starts can lead to different outcomes. K-Means++ helps, but doesn’t guarantee optimal clusters.
Read more about the K-Means clustering HERE.
2. Technical ML Section
When Gradient Boosting Fails for Tabular Data
Gradient Boosting (GB) is considered a go-to algorithm when you deal with tabular datasets. This is not without a reason. GB has a strong predictive power for tabular data.
However, as is often in ML, GB is not a silver bullet.
In this Maistermind issue, you will learn 5 cases when not to use Gradient Boosting even if you have a tabular dataset.
1️⃣ Don't use GB when features <-> target relationships are mostly linear
In this case, Gradient Boosting will barely beat Linear or Logistic Regression.
But linear models will
- Train much faster
- Be more interpretable
- Be easier to tune hyperparameters
Below is an example of a relationship when GB is an overkill.
2️⃣ Don't use GB when you have noisy and sparse data with low variability
In this case, Gradient Boosting will again barely beat Linear Models.
Gradient Boosting likes:
- Data with a low noise level (will not overfit)
- High data variability with many distinct feature values
Below is an example of the target and features distribution over time when it does not make sense to use Gradient Boosting.
You can often see such data in industrial processes when there is a stable behavior of an industrial system over a long period of time.
3️⃣ Don't use GB when model extrapolation is important
In most cases, we use Gradient Boosting with Decision Trees as weak learners. They don’t extrapolate.
Yes, you can use Gradient Boosting with linear models as weak learners. They can extrapolate outside the training range.
However, people don't do that in 99.9% of cases.
If you need extrapolation for non-linear data, use smooth non-linear models, e.g., Neural Networks or Gaussian Processes.
Yes, performance is not guaranteed, but with smooth functions, it can still be relatively good in case you are not very far away from the training set. Then, at some point, you will re-train the model and be OK.
4️⃣ Don't use GB when you want to have a baseline non-linear model
Gradient Boosting is hard to tune because it has many hyperparameters and it's prone to overfitting.
If you need a quick non-linear model baseline, use Random Forest. Easy to tune, hard to overfit from the box.
5️⃣ Don't use GB when you want to use the ML model for optimization
Gradient Boosting with Decision Trees is a piece-wise constant non-smooth algorithm.
This will make Optimization Gradients noisy and unstable.
In this case, use smooth non-linear models, e.g., Neural Networks, Gaussian Processes, Splines.
Below, see the example of a smooth function approximation.
6️⃣ Summary
- Don't use GB when features <-> target relationships are mostly linear
- Don't use GB when you have noisy and sparse data with low variability
- Don't use GB when model extrapolation is important
- Don't use GB when you want to have a baseline non-linear model
- Don't use GB when you want to use the ML model for optimization
That's it for the Technical Part!
Follow me on LinkedIn for more daily ML breakdowns.
2. ML Career Section
Why NOT to be an ML/DS generalist to make a successful ML career
Too many data scientists try to learn everything.
NLP, computer vision, reinforcement learning, fraud detection, demand forecasting, marketing analytics, MLOps, causal inference... all at once.
That’s a mistake.
Having a broad awareness is helpful.
But trying to master every area at once often leads to shallow understanding and slow progress.
💰 The highest-paid data scientists usually follow a different path:
→ Go deep in 1–2 ML domains (e.g., tabular ML and time series forecasting)
→ Focus on 1–2 business verticals (e.g., retail or insurance)
Why does it work well?
When your experience consistently maps to specific problems in a domain — e.g., demand forecasts for retail inventory, you’re no longer just “another data scientist.”
⭐ You become the top 1% candidate for those roles:
-> The one who gets shortlisted first.
-> The one who gets paid more
This is because you’ve already solved their exact problems.
👉 So, my recommendation:
-> Pick one ML domain and one business domain.
-> Go deep. Build intuition. Work on real use cases.
That’s how you build a successful ML career!
Related articles
That is it for this week!
If you haven’t yet, follow me on LinkedIn where I share Technical and Career ML content every day!
Whenever you're ready, there are 3 ways I can help you:
1. ML Job Landing Kit
Get everything I learned about landing ML jobs after reviewing 1000+ ML CVs, conducting 100+ interviews & hiring 25 Data Scientists. The exact system I used to help 70+ clients get more interviews and land their ML jobs.
2. ML Career 1:1 Session
I’ll address your personal request & create a strategic plan with the next steps to grow your ML career.
3. Full CV & LinkedIn Upgrade (all done for you)
I review your experience, clarify all the details, and create:
- Upgraded ready-to-use CV (Doc format)
- Optimized LinkedIn Profile (About, Headline, Banner, and Experience Sections)
Join Maistermind for 1 weekly piece with 2 ML guides:
1. Technical ML tutorial or skill learning guide
2. Tips list to grow ML career, LinkedIn, income