← Back to all posts

#13: 3 Simple Steps to Filter Outlies | TOP 3 ML Portfolio Mistakes

by Timur Bikmukhametov
May 09, 2025
Reading time - 7 mins

 

1. ML Picks of the Week

ML Tool, Concept, Resource Link, Post, Interview Question, and Quiz of the week. All in under 1 minute.

 

2. Technical ML Section

Learn 3 Simple Steps to Filter Outliers

 

3. Career ML Section

Learn TOP 3 ML Portfolio Mistakes


1. ML Picks of the Week

⏱ All under 60 seconds. Let’s go 👇

🥇 ML Tool

→ Pyro - Python probabilistic library built on PyTorch.

→Use it to build any model to estimate model uncertainty.

 

📈 ML Concept

→Type I error - error when the null hypothesis is rejected while it is true.

→ Read more about it HERE, it's often asked on interviews.

 

🔗 One Link Worth Saving

→ Awesome ML - A collection of ML tools and frameworks.

→ Covers everything from libraries & papers to toolkits, organized by language and category.

 

🔵 LinkedIn Post of the Week

Gradient Boosting vs Random Forest - how many trees do you need in each?

 

🤔 ML Interview Question

→ What is hierarchical clustering?

→ Watch this video to prepare and nail it!

 

🧠 Quiz. Can you answer without GPT?

What does a high variance in a model indicate?

A) The model is underfitting

B) The model performs consistently across datasets

C) The model is overfitting

D) The model generalizes well

✅ See the correct answer


2. Technical ML Section

Intro to Feature Selection with SHAP

There is a misconception that the more features you generate, the better your model is.

This is 100% wrong. Especially from the practical perspectice.

 

Dealing with outliers in time series can be a headache in real-world applications.

This is especially true when you implement filters in real-time production pipelines.

Many filters change the original time series signal, which might not be desirable in many applications.

In this Maistermind issue, you will learn a simple yet robust filter.

It does not change the original signal, and you can start using it right away.


Imagine we have a signal with several change points and outliers as shown below.

The change point is when you have an abrupt change in the signal, but this is not an outlier.

Let's see how we can clean the outliers.

Also, it's interesting to see how the filter deals with change points because such points are often a challenge for many filters.


1️⃣ Compute absolute differences between neighboring points

Why do we do that?

Well, what we can expect is that the difference between an outlier and the point right before and right after it should be bigger than for other points.

Let's see if this is true.

We can also plot the distribution of the differences.

 

We see that some of the differences are indeed much larger than most of the others.

Now, let's take Step 2.


2️⃣ Specify a suitable threshold above which we treat points as outliers

The threshold selection depends on your domain knowledge and how "aggressively" you want to be with your filtering.

For this example, I selected the threshold to be 2.5.

Let's draw the threshold on the distribution also.

Now, let's go to Step 3.


3️⃣ Mark outliers as Nan values and fill Nans with the "forward fill" method

If we do this, this is how the resulting signal looks.

 

By the way, why do we use forward filling and not, for instance, interpolation?

Imagine you deploy the solution in a real-time production setting.

There can be situations when the Nan point appears to be at the time series edge.

From here, you can see that it will then be impossible to fill this Nan with a linear interpolation.

This is a simple but important lesson: always keep the production / deployment setup in mind.


🟢 Advantages of the method

  • Easy to understand and implement
  • Easy to deploy and configure in production
  • Non-outlying points are the same as the original signal

 

 


🔴 Disadvantages of the method

  • Change points can be detected as false-positive outliers
  • Threshold is a hyperparameter that can change over time, so timely calibration can be required

 

 


That's it for the Technical Part! 

Follow me on LinkedIn​ for more daily ML breakdowns.


 

2. ML Career Section

TOP 3 ML Portfolio Mistakes

 

1️⃣ Project is not end-to-end

The vast majority of projects that I have seen include:

  1. EDA Analysis of data in a Jupyter Notebook.
  2. ML Model Fitting
  3. Github Repo with this Notebook.

 

This project will NEVER EVER stand out and land you a job.

A good ML Portfolio project must have a deployed model with any sort of web page or a dashboard where the end-user can interact with the solution.


2️⃣ No Business or Problem Context

The worst project that you can present is the project without the "why" behind it.

You need to answer and discuss at least these 3 questions:

 

1. What is an approximate $ value that my model will bring

Example: “Reducing churn by 5% could save ~$200K/year.”

 

2. What decision does your model help with?
Example: “Sales reps use these lead scores to prioritize calls.”

 

3. Why is ML even needed here?
Example: “Manual review can’t scale across 50K customer messages weekly.”


3️⃣ GitHub and the code is a mess

Your project might be solid, but if your GitHub looks like:

  • Untitled3.ipynb
  • No README
  • No structure or explanation

  • Messy, unrefactored code

…it screams “unfinished.”

✅ Instead:

  • Add a clear README (problem, solution, results, demo)

  • Organize your repo: /notebooks, /src, requirements.txt

  • Follow PEP8 code style, remove unused cells, and turn logic into functions

 


Related articles

  1. 5 Practical Tips for Time Series Analysis

 


That is it for this week!

If you haven’t yet, follow me on LinkedIn where I share Technical and Career ML content every day!


Whenever you're ready, there are 3 ways I can help you:

​

1. ML Job Landing Kit

Get everything I learned about landing ML jobs after reviewing 1000+ ML CVs, conducting 100+ interviews & hiring 25 Data Scientists. The exact system I used to help 70+ clients get more interviews and land their ML jobs.

 

2. ML Career 1:1 Session 

I’ll address your personal request & create a strategic plan with the next steps to grow your ML career.

 

​3. Full CV & LinkedIn Upgrade (all done for you)​

I review your experience, clarify all the details, and create:
- Upgraded ready-to-use CV (Doc format)
- Optimized LinkedIn Profile (About, Headline, Banner, and Experience Sections)


Join Maistermind for 1 weekly piece with 2 ML guides:


1. Technical ML tutorial or skill learning guide
2. Tips list to grow ML career, LinkedIn, income

 

Join here! 

 

 
 
#18: How to choose the right ML Regression metric?
  Reading time - 7 mins   🥇 Picks of the Week One-liner tool for ML models, Data Drift detection method, Curated Learning Resources and more. 🧠 ML Section Learn ML Regression Metrics Pros and Cons and how to choose the best one for your case. Land your next ML job. Fast. ​I built the ML Job Landing Kit to help Data Professionals land jobs faster! ✅ Here’s what’s inside: - 100+ ML Interview ...
#17: What is Model Registry in ML?
Reading time - 7 mins   🥇 Picks of the Week One line data overview tool, differences in boosting algos, weekly quiz, and more. 🧠 ML Section Structure your knowledge about Model Registry and why you need to use one. Land your next ML job. Fast. ​I built the ML Job Landing Kit to help Data Professionals land jobs faster! ✅ Here’s what’s inside: - 100+ ML Interview Q & A - 25-page CV Crafting...
#16: How to tune LSTM models
  Reading time - 7 mins   🥇 Picks of the Week Best Python dashboard tool, clustering concepts, weekly quiz, and more. 🧠 ML Section Learn practical tips on how to tune LSTM Neural Networks Land your next ML job. Fast. ​I built the ML Job Landing Kit to help Data Professionals land jobs faster! ✅ Here’s what’s inside: - 100+ ML Interview Q & A - 25-page CV Crafting Guide for ML - 10-page Li...
Powered by Kajabi