Newsletter Blog Career Sessions
← Back to all posts

#12: 5 Cases When Gradient Boosting Fails | Why NOT to be an ML/DS Generalist

by Timur Bikmukhametov
May 02, 2025
Reading time - 7 mins

 

1. ML Picks of the Week

A weekly dose of ML tools, concepts & interview prep. All in under 1 minute.

 

2. Technical ML Section

Learn When Gradient Boosting Fails for Tabular Data

 

3. Career ML Section

Learn why NOT to be an ML/DS generalist to make a successful ML career


1. ML Picks of the Week

🥇ML Library tsfresh

Tsfresh is a powerful Python library for automated feature extraction from time series data.

It’s designed to help you transform raw sequences into meaningful features, without manually crafting time lags or domain-specific indicators.

What makes tsfresh worth learning:

✅ Automatically extracts hundreds of time series features
✅ Filters out irrelevant ones using built-in statistical tests
✅ Works well with both univariate and multivariate time series

So, if you’re working on time series-related problems, tsfresh helps you save time and avoid manual errors in early pipeline stages.


📈 ML Concept
p-value

The p-value is a key metric in statistical hypothesis testing.

It measures the probability of getting results at least as extreme as your observation, assuming the null hypothesis is true.

In simpler terms: a small p-value means your result is unlikely to be random, and provides evidence against the null.

Why it matters in ML:

✅ Used for feature selection in statistical models
✅ Supports A/B testing and experiment analysis
✅ Provides evidence-based support for decision-making

Just remember:
A low p-value means “statistically significant,” not necessarily important in practice.

Read a deeper breakdown of on p-value HERE.


🤔 ML Interview Question
What are the 4 main limitations of K-Means clustering?

K-Means is fast and easy to implement — but it has several core weaknesses you should know before using it in production.

📌 Requires specifying K:
You must define the number of clusters up front, and choosing the right K isn’t always clear.

📌 Assumes equal-sized, spherical clusters:
K-Means struggles with elongated or unevenly sized clusters. It works best when groups are compact and similar.

📌 Sensitive to outliers:
A single outlier can pull a centroid far off, distorting results.

📌 Initialization matters:
Different random starts can lead to different outcomes. K-Means++ helps, but doesn’t guarantee optimal clusters.

Read more about the K-Means clustering HERE.


2. Technical ML Section

When Gradient Boosting Fails for Tabular Data

Gradient Boosting (GB) is considered a go-to algorithm when you deal with tabular datasets. This is not without a reason. GB has a strong predictive power for tabular data.

However, as is often in ML, GB is not a silver bullet.

In this Maistermind issue, you will learn 5 cases when not to use Gradient Boosting even if you have a tabular dataset.


1️⃣ Don't use GB when features <-> target relationships are mostly linear

In this case, Gradient Boosting will barely beat Linear or Logistic Regression.

But linear models will

  • Train much faster
  • Be more interpretable
  • Be easier to tune hyperparameters

Below is an example of a relationship when GB is an overkill.


2️⃣ Don't use GB when you have noisy and sparse data with low variability 

In this case, Gradient Boosting will again barely beat Linear Models.

Gradient Boosting likes:

  • Data with a low noise level (will not overfit)
  • High data variability with many distinct feature values

 

Below is an example of the target and features distribution over time when it does not make sense to use Gradient Boosting.

You can often see such data in industrial processes when there is a stable behavior of an industrial system over a long period of time.


3️⃣ Don't use GB when model extrapolation is important

In most cases, we use Gradient Boosting with Decision Trees as weak learners. They don’t extrapolate.

 

Yes, you can use Gradient Boosting with linear models as weak learners. They can extrapolate outside the training range.

However, people don't do that in 99.9% of cases.

If you need extrapolation for non-linear data, use smooth non-linear models, e.g., Neural Networks or Gaussian Processes.

Yes, performance is not guaranteed, but with smooth functions, it can still be relatively good in case you are not very far away from the training set. Then, at some point, you will re-train the model and be OK.


4️⃣ Don't use GB when you want to have a baseline non-linear model

Gradient Boosting is hard to tune because it has many hyperparameters and it's prone to overfitting.

If you need a quick non-linear model baseline, use Random Forest. Easy to tune, hard to overfit from the box.

 


5️⃣ Don't use GB when you want to use the ML model for optimization

Gradient Boosting with Decision Trees is a piece-wise constant non-smooth algorithm.

This will make Optimization Gradients noisy and unstable.

In this case, use smooth non-linear models, e.g., Neural Networks, Gaussian Processes, Splines.

Below, see the example of a smooth function approximation.


6️⃣ Summary

  1. Don't use GB when features <-> target relationships are mostly linear
  2. Don't use GB when you have noisy and sparse data with low variability 
  3. Don't use GB when model extrapolation is important
  4. Don't use GB when you want to have a baseline non-linear model
  5. Don't use GB when you want to use the ML model for optimization

 


That's it for the Technical Part! 

Follow me on LinkedIn​ for more daily ML breakdowns.


 

2. ML Career Section

Why NOT to be an ML/DS generalist to make a successful ML career

 

Too many data scientists try to learn everything.

NLP, computer vision, reinforcement learning, fraud detection, demand forecasting, marketing analytics, MLOps, causal inference... all at once.

That’s a mistake.

Having a broad awareness is helpful.

But trying to master every area at once often leads to shallow understanding and slow progress.


💰 The highest-paid data scientists usually follow a different path:

→ Go deep in 1–2 ML domains (e.g., tabular ML and time series forecasting)
→ Focus on 1–2 business verticals (e.g., retail or insurance)

Why does it work well?

When your experience consistently maps to specific problems in a domain — e.g., demand forecasts for retail inventory, you’re no longer just “another data scientist.”

⭐ You become the top 1% candidate for those roles:

-> The one who gets shortlisted first.
-> The one who gets paid more

This is because you’ve already solved their exact problems.


👉 So, my recommendation:
-> Pick one ML domain and one business domain.
-> Go deep. Build intuition. Work on real use cases.

That’s how you build a successful ML career!


Related articles

  1. Gradient Boosting Hyperparameters & Tuning Tips
  2. Gradient Boosting - Learning Guide

 


That is it for this week!

If you haven’t yet, follow me on LinkedIn where I share Technical and Career ML content every day!


Whenever you're ready, there are 3 ways I can help you:

​

1. ML Job Landing Kit

Get everything I learned about landing ML jobs after reviewing 1000+ ML CVs, conducting 100+ interviews & hiring 25 Data Scientists. The exact system I used to help 70+ clients get more interviews and land their ML jobs.

 

2. ML Career 1:1 Session 

I’ll address your personal request & create a strategic plan with the next steps to grow your ML career.

 

​3. Full CV & LinkedIn Upgrade (all done for you)​

I review your experience, clarify all the details, and create:
- Upgraded ready-to-use CV (Doc format)
- Optimized LinkedIn Profile (About, Headline, Banner, and Experience Sections)


Join Maistermind for 1 weekly piece with 2 ML guides:


1. Technical ML tutorial or skill learning guide
2. Tips list to grow ML career, LinkedIn, income

 

Join here! 

 

 
 
#13: 3 Simple Steps to Filter Outlies | TOP 3 ML Portfolio Mistakes
Reading time - 7 mins   1. ML Picks of the Week ML Tool, Concept, Resource Link, Post, Interview Question, and Quiz of the week. All in under 1 minute.   2. Technical ML Section Learn 3 Simple Steps to Filter Outliers   3. Career ML Section Learn TOP 3 ML Portfolio Mistakes 1. ML Picks of the Week ⏱ All under 60 seconds. Let’s go 👇 🥇 ML Tool → Pyro - Python probabilistic library built on PyT...
#11: Data Drift vs Concept Drift - The Difference? | How to Talk About Failed ML Projects in Interviews
Reading time - 7 mins   1. ML Picks of the Week A weekly dose of ML tools, concepts & interview prep. All in under 1 minute.   2. Technical ML Section Learn the difference between Data and Concept Drift in ML Production   3. Career ML Section Learn How to Talk About Failed ML Projects in Interviews  1. ML Picks of the Week 🥇ML Pipeline Framework Kedro Kedro is a Python framework for building...
#10: 5 Practical Tips for Time Series Analysis | 3 Tips for your ML Income Growth
Reading time - 7 mins   1. ML Picks of the Week A weekly dose of ML tools, concepts & interview prep. All in under 1 minute.   2. Technical ML Section Learn 5 Practical Tips for Time Series Analysis from my experience   3. Career ML Section Learn 3 tips that can help you grow your ML career income 1. ML Picks of the Week 🥇ML Tool Python Library StatsForecast StatsForecast is a computationall...
Powered by Kajabi