[ENH] feat: add flag_outliers function for outlier detection#1602
[ENH] feat: add flag_outliers function for outlier detection#1602Psycoder0611 wants to merge 2 commits into
Conversation
- Implements flag_outliers() as a pandas DataFrame method - Supports IQR and Z-score detection methods - Adds boolean flag column to indicate outlier rows - Includes comprehensive unit tests (7 tests, all passing) - Follows existing pyjanitor code style and conventions
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## dev #1602 +/- ##
==========================================
- Coverage 87.56% 85.65% -1.92%
==========================================
Files 95 126 +31
Lines 6819 9932 +3113
==========================================
+ Hits 5971 8507 +2536
- Misses 848 1425 +577 🚀 New features to boost your workflow:
|
|
Hey @Psycoder0611, thanks for this contribution! Adding outlier detection to pyjanitor is a great fit for the library. The overall structure follows our conventions well — chainable API, I have a few suggestions organized by priority. Must fix
Should fix
Using AI to address theseIf you use an AI coding agent, here's a prompt you can copy-paste to work through these review comments:
Looking forward to the next revision! |
Summary
Adds a new
flag_outliers()function to pyjanitor that detects and flagsoutlier values in a numeric DataFrame column.
Motivation
Data pipelines frequently need to identify anomalous values before
aggregation or modeling. This function provides a clean, chainable
pandas method for outlier flagging during ETL workflows.
Changes
janitor/functions/flag_outliers.pywith full implementationflag_outliersinjanitor/functions/__init__.pytests/functions/test_flag_outliers.pywith 7 passing testsFeatures
Tests
All 7 tests pass covering: