Newsletter Blog Career Sessions
← Back to all posts

#13: 3 Simple Steps to Filter Outlies | TOP 3 ML Portfolio Mistakes

by Timur Bikmukhametov
May 09, 2025
Reading time - 7 mins

 

1. ML Picks of the Week

ML Tool, Concept, Resource Link, Post, Interview Question, and Quiz of the week. All in under 1 minute.

 

2. Technical ML Section

Learn 3 Simple Steps to Filter Outliers

 

3. Career ML Section

Learn TOP 3 ML Portfolio Mistakes


1. ML Picks of the Week

⏱ All under 60 seconds. Let’s go 👇

🥇 ML Tool

→ Pyro - Python probabilistic library built on PyTorch.

→Use it to build any model to estimate model uncertainty.

 

📈 ML Concept

→Type I error - error when the null hypothesis is rejected while it is true.

→ Read more about it HERE, it's often asked on interviews.

 

🔗 One Link Worth Saving

→ Awesome ML - A collection of ML tools and frameworks.

→ Covers everything from libraries & papers to toolkits, organized by language and category.

 

🔵 LinkedIn Post of the Week

Gradient Boosting vs Random Forest - how many trees do you need in each?

 

🤔 ML Interview Question

→ What is hierarchical clustering?

→ Watch this video to prepare and nail it!

 

🧠 Quiz. Can you answer without GPT?

What does a high variance in a model indicate?

A) The model is underfitting

B) The model performs consistently across datasets

C) The model is overfitting

D) The model generalizes well

✅ See the correct answer


2. Technical ML Section

Intro to Feature Selection with SHAP

There is a misconception that the more features you generate, the better your model is.

This is 100% wrong. Especially from the practical perspectice.

 

Dealing with outliers in time series can be a headache in real-world applications.

This is especially true when you implement filters in real-time production pipelines.

Many filters change the original time series signal, which might not be desirable in many applications.

In this Maistermind issue, you will learn a simple yet robust filter.

It does not change the original signal, and you can start using it right away.


Imagine we have a signal with several change points and outliers as shown below.

The change point is when you have an abrupt change in the signal, but this is not an outlier.

Let's see how we can clean the outliers.

Also, it's interesting to see how the filter deals with change points because such points are often a challenge for many filters.


1️⃣ Compute absolute differences between neighboring points

Why do we do that?

Well, what we can expect is that the difference between an outlier and the point right before and right after it should be bigger than for other points.

Let's see if this is true.

We can also plot the distribution of the differences.

 

We see that some of the differences are indeed much larger than most of the others.

Now, let's take Step 2.


2️⃣ Specify a suitable threshold above which we treat points as outliers

The threshold selection depends on your domain knowledge and how "aggressively" you want to be with your filtering.

For this example, I selected the threshold to be 2.5.

Let's draw the threshold on the distribution also.

Now, let's go to Step 3.


3️⃣ Mark outliers as Nan values and fill Nans with the "forward fill" method

If we do this, this is how the resulting signal looks.

 

By the way, why do we use forward filling and not, for instance, interpolation?

Imagine you deploy the solution in a real-time production setting.

There can be situations when the Nan point appears to be at the time series edge.

From here, you can see that it will then be impossible to fill this Nan with a linear interpolation.

This is a simple but important lesson: always keep the production / deployment setup in mind.


🟢 Advantages of the method

  • Easy to understand and implement
  • Easy to deploy and configure in production
  • Non-outlying points are the same as the original signal

 

 


🔴 Disadvantages of the method

  • Change points can be detected as false-positive outliers
  • Threshold is a hyperparameter that can change over time, so timely calibration can be required

 

 


That's it for the Technical Part! 

Follow me on LinkedIn​ for more daily ML breakdowns.


 

2. ML Career Section

TOP 3 ML Portfolio Mistakes

 

1️⃣ Project is not end-to-end

The vast majority of projects that I have seen include:

  1. EDA Analysis of data in a Jupyter Notebook.
  2. ML Model Fitting
  3. Github Repo with this Notebook.

 

This project will NEVER EVER stand out and land you a job.

A good ML Portfolio project must have a deployed model with any sort of web page or a dashboard where the end-user can interact with the solution.


2️⃣ No Business or Problem Context

The worst project that you can present is the project without the "why" behind it.

You need to answer and discuss at least these 3 questions:

 

1. What is an approximate $ value that my model will bring

Example: “Reducing churn by 5% could save ~$200K/year.”

 

2. What decision does your model help with?
Example: “Sales reps use these lead scores to prioritize calls.”

 

3. Why is ML even needed here?
Example: “Manual review can’t scale across 50K customer messages weekly.”


3️⃣ GitHub and the code is a mess

Your project might be solid, but if your GitHub looks like:

  • Untitled3.ipynb
  • No README
  • No structure or explanation

  • Messy, unrefactored code

…it screams “unfinished.”

✅ Instead:

  • Add a clear README (problem, solution, results, demo)

  • Organize your repo: /notebooks, /src, requirements.txt

  • Follow PEP8 code style, remove unused cells, and turn logic into functions

 


Related articles

  1. 5 Practical Tips for Time Series Analysis

 


That is it for this week!

If you haven’t yet, follow me on LinkedIn where I share Technical and Career ML content every day!


Whenever you're ready, there are 3 ways I can help you:

​

1. ML Job Landing Kit

Get everything I learned about landing ML jobs after reviewing 1000+ ML CVs, conducting 100+ interviews & hiring 25 Data Scientists. The exact system I used to help 70+ clients get more interviews and land their ML jobs.

 

2. ML Career 1:1 Session 

I’ll address your personal request & create a strategic plan with the next steps to grow your ML career.

 

​3. Full CV & LinkedIn Upgrade (all done for you)​

I review your experience, clarify all the details, and create:
- Upgraded ready-to-use CV (Doc format)
- Optimized LinkedIn Profile (About, Headline, Banner, and Experience Sections)


Join Maistermind for 1 weekly piece with 2 ML guides:


1. Technical ML tutorial or skill learning guide
2. Tips list to grow ML career, LinkedIn, income

 

Join here! 

 

 
 
#12: 5 Cases When Gradient Boosting Fails | Why NOT to be an ML/DS Generalist
Reading time - 7 mins   1. ML Picks of the Week A weekly dose of ML tools, concepts & interview prep. All in under 1 minute.   2. Technical ML Section Learn When Gradient Boosting Fails for Tabular Data   3. Career ML Section Learn why NOT to be an ML/DS generalist to make a successful ML career 1. ML Picks of the Week 🥇ML Library tsfresh Tsfresh is a powerful Python library for automated fe...
#11: Data Drift vs Concept Drift - The Difference? | How to Talk About Failed ML Projects in Interviews
Reading time - 7 mins   1. ML Picks of the Week A weekly dose of ML tools, concepts & interview prep. All in under 1 minute.   2. Technical ML Section Learn the difference between Data and Concept Drift in ML Production   3. Career ML Section Learn How to Talk About Failed ML Projects in Interviews  1. ML Picks of the Week 🥇ML Pipeline Framework Kedro Kedro is a Python framework for building...
#10: 5 Practical Tips for Time Series Analysis | 3 Tips for your ML Income Growth
Reading time - 7 mins   1. ML Picks of the Week A weekly dose of ML tools, concepts & interview prep. All in under 1 minute.   2. Technical ML Section Learn 5 Practical Tips for Time Series Analysis from my experience   3. Career ML Section Learn 3 tips that can help you grow your ML career income 1. ML Picks of the Week 🥇ML Tool Python Library StatsForecast StatsForecast is a computationall...
Powered by Kajabi