← Back to all posts

#12: 5 Cases When Gradient Boosting Fails | Why NOT to be an ML/DS Generalist

by Timur Bikmukhametov
May 02, 2025
Reading time - 7 mins

 

1. ML Picks of the Week

A weekly dose of ML tools, concepts & interview prep. All in under 1 minute.

 

2. Technical ML Section

Learn When Gradient Boosting Fails for Tabular Data

 

3. Career ML Section

Learn why NOT to be an ML/DS generalist to make a successful ML career


1. ML Picks of the Week

🥇ML Library tsfresh

Tsfresh is a powerful Python library for automated feature extraction from time series data.

It’s designed to help you transform raw sequences into meaningful features, without manually crafting time lags or domain-specific indicators.

What makes tsfresh worth learning:

✅ Automatically extracts hundreds of time series features
✅ Filters out irrelevant ones using built-in statistical tests
✅ Works well with both univariate and multivariate time series

So, if you’re working on time series-related problems, tsfresh helps you save time and avoid manual errors in early pipeline stages.


📈 ML Concept
p-value

The p-value is a key metric in statistical hypothesis testing.

It measures the probability of getting results at least as extreme as your observation, assuming the null hypothesis is true.

In simpler terms: a small p-value means your result is unlikely to be random, and provides evidence against the null.

Why it matters in ML:

✅ Used for feature selection in statistical models
✅ Supports A/B testing and experiment analysis
✅ Provides evidence-based support for decision-making

Just remember:
A low p-value means “statistically significant,” not necessarily important in practice.

Read a deeper breakdown of on p-value HERE.


🤔 ML Interview Question
What are the 4 main limitations of K-Means clustering?

K-Means is fast and easy to implement — but it has several core weaknesses you should know before using it in production.

📌 Requires specifying K:
You must define the number of clusters up front, and choosing the right K isn’t always clear.

📌 Assumes equal-sized, spherical clusters:
K-Means struggles with elongated or unevenly sized clusters. It works best when groups are compact and similar.

📌 Sensitive to outliers:
A single outlier can pull a centroid far off, distorting results.

📌 Initialization matters:
Different random starts can lead to different outcomes. K-Means++ helps, but doesn’t guarantee optimal clusters.

Read more about the K-Means clustering HERE.


2. Technical ML Section

When Gradient Boosting Fails for Tabular Data

Gradient Boosting (GB) is considered a go-to algorithm when you deal with tabular datasets. This is not without a reason. GB has a strong predictive power for tabular data.

However, as is often in ML, GB is not a silver bullet.

In this Maistermind issue, you will learn 5 cases when not to use Gradient Boosting even if you have a tabular dataset.


1️⃣ Don't use GB when features <-> target relationships are mostly linear

In this case, Gradient Boosting will barely beat Linear or Logistic Regression.

But linear models will

  • Train much faster
  • Be more interpretable
  • Be easier to tune hyperparameters

Below is an example of a relationship when GB is an overkill.


2️⃣ Don't use GB when you have noisy and sparse data with low variability 

In this case, Gradient Boosting will again barely beat Linear Models.

Gradient Boosting likes:

  • Data with a low noise level (will not overfit)
  • High data variability with many distinct feature values

 

Below is an example of the target and features distribution over time when it does not make sense to use Gradient Boosting.

You can often see such data in industrial processes when there is a stable behavior of an industrial system over a long period of time.


3️⃣ Don't use GB when model extrapolation is important

In most cases, we use Gradient Boosting with Decision Trees as weak learners. They don’t extrapolate.

 

Yes, you can use Gradient Boosting with linear models as weak learners. They can extrapolate outside the training range.

However, people don't do that in 99.9% of cases.

If you need extrapolation for non-linear data, use smooth non-linear models, e.g., Neural Networks or Gaussian Processes.

Yes, performance is not guaranteed, but with smooth functions, it can still be relatively good in case you are not very far away from the training set. Then, at some point, you will re-train the model and be OK.


4️⃣ Don't use GB when you want to have a baseline non-linear model

Gradient Boosting is hard to tune because it has many hyperparameters and it's prone to overfitting.

If you need a quick non-linear model baseline, use Random Forest. Easy to tune, hard to overfit from the box.

 


5️⃣ Don't use GB when you want to use the ML model for optimization

Gradient Boosting with Decision Trees is a piece-wise constant non-smooth algorithm.

This will make Optimization Gradients noisy and unstable.

In this case, use smooth non-linear models, e.g., Neural Networks, Gaussian Processes, Splines.

Below, see the example of a smooth function approximation.


6️⃣ Summary

  1. Don't use GB when features <-> target relationships are mostly linear
  2. Don't use GB when you have noisy and sparse data with low variability 
  3. Don't use GB when model extrapolation is important
  4. Don't use GB when you want to have a baseline non-linear model
  5. Don't use GB when you want to use the ML model for optimization

 


That's it for the Technical Part! 

Follow me on LinkedIn​ for more daily ML breakdowns.


 

2. ML Career Section

Why NOT to be an ML/DS generalist to make a successful ML career

 

Too many data scientists try to learn everything.

NLP, computer vision, reinforcement learning, fraud detection, demand forecasting, marketing analytics, MLOps, causal inference... all at once.

That’s a mistake.

Having a broad awareness is helpful.

But trying to master every area at once often leads to shallow understanding and slow progress.


💰 The highest-paid data scientists usually follow a different path:

→ Go deep in 1–2 ML domains (e.g., tabular ML and time series forecasting)
→ Focus on 1–2 business verticals (e.g., retail or insurance)

Why does it work well?

When your experience consistently maps to specific problems in a domain — e.g., demand forecasts for retail inventory, you’re no longer just “another data scientist.”

⭐ You become the top 1% candidate for those roles:

-> The one who gets shortlisted first.
-> The one who gets paid more

This is because you’ve already solved their exact problems.


👉 So, my recommendation:
-> Pick one ML domain and one business domain.
-> Go deep. Build intuition. Work on real use cases.

That’s how you build a successful ML career!


Related articles

  1. Gradient Boosting Hyperparameters & Tuning Tips
  2. Gradient Boosting - Learning Guide

 


That is it for this week!

If you haven’t yet, follow me on LinkedIn where I share Technical and Career ML content every day!


Whenever you're ready, there are 3 ways I can help you:

​

1. ML Job Landing Kit

Get everything I learned about landing ML jobs after reviewing 1000+ ML CVs, conducting 100+ interviews & hiring 25 Data Scientists. The exact system I used to help 70+ clients get more interviews and land their ML jobs.

 

2. ML Career 1:1 Session 

I’ll address your personal request & create a strategic plan with the next steps to grow your ML career.

 

​3. Full CV & LinkedIn Upgrade (all done for you)​

I review your experience, clarify all the details, and create:
- Upgraded ready-to-use CV (Doc format)
- Optimized LinkedIn Profile (About, Headline, Banner, and Experience Sections)


Join Maistermind for 1 weekly piece with 2 ML guides:


1. Technical ML tutorial or skill learning guide
2. Tips list to grow ML career, LinkedIn, income

 

Join here! 

 

 
 
#18: How to choose the right ML Regression metric?
  Reading time - 7 mins   🥇 Picks of the Week One-liner tool for ML models, Data Drift detection method, Curated Learning Resources and more. 🧠 ML Section Learn ML Regression Metrics Pros and Cons and how to choose the best one for your case. Land your next ML job. Fast. ​I built the ML Job Landing Kit to help Data Professionals land jobs faster! ✅ Here’s what’s inside: - 100+ ML Interview ...
#17: What is Model Registry in ML?
Reading time - 7 mins   🥇 Picks of the Week One line data overview tool, differences in boosting algos, weekly quiz, and more. 🧠 ML Section Structure your knowledge about Model Registry and why you need to use one. Land your next ML job. Fast. ​I built the ML Job Landing Kit to help Data Professionals land jobs faster! ✅ Here’s what’s inside: - 100+ ML Interview Q & A - 25-page CV Crafting...
#16: How to tune LSTM models
  Reading time - 7 mins   🥇 Picks of the Week Best Python dashboard tool, clustering concepts, weekly quiz, and more. 🧠 ML Section Learn practical tips on how to tune LSTM Neural Networks Land your next ML job. Fast. ​I built the ML Job Landing Kit to help Data Professionals land jobs faster! ✅ Here’s what’s inside: - 100+ ML Interview Q & A - 25-page CV Crafting Guide for ML - 10-page Li...
Powered by Kajabi