#13: 3 Simple Steps to Filter Outlies | TOP 3 ML Portfolio Mistakes
Reading time - 7 mins
1. ML Picks of the Week
ML Tool, Concept, Resource Link, Post, Interview Question, and Quiz of the week. All in under 1 minute.
2. Technical ML Section
Learn 3 Simple Steps to Filter Outliers
3. Career ML Section
Learn TOP 3 ML Portfolio Mistakes
1. ML Picks of the Week
⏱ All under 60 seconds. Let’s go 👇
🥇 ML Tool
→ Pyro - Python probabilistic library built on PyTorch.
→Use it to build any model to estimate model uncertainty.
📈 ML Concept
→Type I error - error when the null hypothesis is rejected while it is true.
→ Read more about it HERE, it's often asked on interviews.
🔗 One Link Worth Saving
→ Awesome ML - A collection of ML tools and frameworks.
→ Covers everything from libraries & papers to toolkits, organized by language and category.
🔵 LinkedIn Post of the Week
Gradient Boosting vs Random Forest - how many trees do you need in each?
🤔 ML Interview Question
→ What is hierarchical clustering?
→ Watch this video to prepare and nail it!
🧠 Quiz. Can you answer without GPT?
What does a high variance in a model indicate?
A) The model is underfitting
B) The model performs consistently across datasets
C) The model is overfitting
D) The model generalizes well
2. Technical ML Section
Intro to Feature Selection with SHAP
There is a misconception that the more features you generate, the better your model is.
This is 100% wrong. Especially from the practical perspectice.
Dealing with outliers in time series can be a headache in real-world applications.
This is especially true when you implement filters in real-time production pipelines.
Many filters change the original time series signal, which might not be desirable in many applications.
In this Maistermind issue, you will learn a simple yet robust filter.
It does not change the original signal, and you can start using it right away.
Imagine we have a signal with several change points and outliers as shown below.
The change point is when you have an abrupt change in the signal, but this is not an outlier.
Let's see how we can clean the outliers.
Also, it's interesting to see how the filter deals with change points because such points are often a challenge for many filters.
1️⃣ Compute absolute differences between neighboring points
Why do we do that?
Well, what we can expect is that the difference between an outlier and the point right before and right after it should be bigger than for other points.
Let's see if this is true.
We can also plot the distribution of the differences.
We see that some of the differences are indeed much larger than most of the others.
Now, let's take Step 2.
2️⃣ Specify a suitable threshold above which we treat points as outliers
The threshold selection depends on your domain knowledge and how "aggressively" you want to be with your filtering.
For this example, I selected the threshold to be 2.5.
Let's draw the threshold on the distribution also.
Now, let's go to Step 3.
3️⃣ Mark outliers as Nan values and fill Nans with the "forward fill" method
If we do this, this is how the resulting signal looks.
By the way, why do we use forward filling and not, for instance, interpolation?
Imagine you deploy the solution in a real-time production setting.
There can be situations when the Nan point appears to be at the time series edge.
From here, you can see that it will then be impossible to fill this Nan with a linear interpolation.
This is a simple but important lesson: always keep the production / deployment setup in mind.
🟢 Advantages of the method
- Easy to understand and implement
- Easy to deploy and configure in production
- Non-outlying points are the same as the original signal
🔴 Disadvantages of the method
- Change points can be detected as false-positive outliers
- Threshold is a hyperparameter that can change over time, so timely calibration can be required
That's it for the Technical Part!
Follow me on LinkedIn for more daily ML breakdowns.
2. ML Career Section
TOP 3 ML Portfolio Mistakes
1️⃣ Project is not end-to-end
The vast majority of projects that I have seen include:
- EDA Analysis of data in a Jupyter Notebook.
- ML Model Fitting
- Github Repo with this Notebook.
This project will NEVER EVER stand out and land you a job.
A good ML Portfolio project must have a deployed model with any sort of web page or a dashboard where the end-user can interact with the solution.
2️⃣ No Business or Problem Context
The worst project that you can present is the project without the "why" behind it.
You need to answer and discuss at least these 3 questions:
1. What is an approximate $ value that my model will bring
Example: “Reducing churn by 5% could save ~$200K/year.”
2. What decision does your model help with?
Example: “Sales reps use these lead scores to prioritize calls.”
3. Why is ML even needed here?
Example: “Manual review can’t scale across 50K customer messages weekly.”
3️⃣ GitHub and the code is a mess
Your project might be solid, but if your GitHub looks like:
- Untitled3.ipynb
- No README
-
No structure or explanation
-
Messy, unrefactored code
…it screams “unfinished.”
✅ Instead:
-
Add a clear README (problem, solution, results, demo)
-
Organize your repo: /notebooks, /src, requirements.txt
-
Follow PEP8 code style, remove unused cells, and turn logic into functions
Related articles
That is it for this week!
If you haven’t yet, follow me on LinkedIn where I share Technical and Career ML content every day!
Whenever you're ready, there are 3 ways I can help you:
1. ML Job Landing Kit
Get everything I learned about landing ML jobs after reviewing 1000+ ML CVs, conducting 100+ interviews & hiring 25 Data Scientists. The exact system I used to help 70+ clients get more interviews and land their ML jobs.
2. ML Career 1:1 Session
I’ll address your personal request & create a strategic plan with the next steps to grow your ML career.
3. Full CV & LinkedIn Upgrade (all done for you)
I review your experience, clarify all the details, and create:
- Upgraded ready-to-use CV (Doc format)
- Optimized LinkedIn Profile (About, Headline, Banner, and Experience Sections)
Join Maistermind for 1 weekly piece with 2 ML guides:
1. Technical ML tutorial or skill learning guide
2. Tips list to grow ML career, LinkedIn, income