[ENH] feat: add flag_outliers function for outlier detection by Psycoder0611 · Pull Request #1602 · pyjanitor-devs/pyjanitor

Psycoder0611 · 2026-04-16T04:59:01Z

Summary

Adds a new flag_outliers() function to pyjanitor that detects and flags
outlier values in a numeric DataFrame column.

Motivation

Data pipelines frequently need to identify anomalous values before
aggregation or modeling. This function provides a clean, chainable
pandas method for outlier flagging during ETL workflows.

Changes

Added janitor/functions/flag_outliers.py with full implementation
Registered flag_outliers in janitor/functions/__init__.py
Added tests/functions/test_flag_outliers.py with 7 passing tests

Features

Supports IQR method (default, threshold=1.5)
Supports Z-score method (threshold configurable)
Chainable as a DataFrame method via pandas-flavor
Custom output column naming
Does not mutate the original DataFrame
Full input validation with meaningful error messages

Tests

All 7 tests pass covering:

IQR outlier detection
Z-score outlier detection
No outliers case
Custom flag column name
Immutability of original DataFrame
Invalid method error handling
Non-numeric column error handling

- Implements flag_outliers() as a pandas DataFrame method - Supports IQR and Z-score detection methods - Adds boolean flag column to indicate outlier rows - Includes comprehensive unit tests (7 tests, all passing) - Follows existing pyjanitor code style and conventions

codecov · 2026-04-16T19:30:00Z

Codecov Report

❌ Patch coverage is 97.50000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 85.65%. Comparing base (901f4b3) to head (a6bac54).
⚠️ Report is 183 commits behind head on dev.

Additional details and impacted files

@@            Coverage Diff             @@
##              dev    #1602      +/-   ##
==========================================
- Coverage   87.56%   85.65%   -1.92%     
==========================================
  Files          95      126      +31     
  Lines        6819     9932    +3113     
==========================================
+ Hits         5971     8507    +2536     
- Misses        848     1425     +577

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ericmjl · 2026-05-30T12:59:20Z

Hey @Psycoder0611, thanks for this contribution! Adding outlier detection to pyjanitor is a great fit for the library. The overall structure follows our conventions well — chainable API, pandas_flavor registration, immutability, good test coverage. Nice work.

I have a few suggestions organized by priority.

Must fix

Use the existing check_column utility instead of rolling your own. The _check_column helper in flag_outliers.py is a duplicate of janitor.utils.check_column — same logic, same error messages. flag_nulls.py already imports it with from janitor.utils import check_column. Do the same here and delete the private helper.
The shared threshold default is a footgun for Z-score users. The default threshold=1.5 makes sense for IQR but is misleading for Z-score, where 3.0 is standard. A user calling df.flag_outliers("col", method="zscore") gets a much more aggressive filter than they'd expect. Consider either: (a) making the default method-specific, or (b) raising a warning when method="zscore" and threshold wasn't explicitly provided.
Missing from __future__ import annotations. Other function modules in pyjanitor include this at the top. Add it for consistency.

Should fix

Tests should use match= in pytest.raises. Bare pytest.raises(ValueError) only checks that some ValueError was raised — it could be the wrong one. Use pytest.raises(ValueError, match="Invalid method") etc. to verify the correct error path.
No test for NaN handling. IQR quantiles and Z-score mean/std have specific behavior with NaN values. This should be tested and the behavior documented in the docstring.
test_non_method_functional is a misleading name. It tests calling flag_outliers(df, ...) as a standalone function. Something like test_standalone_function_call would be clearer.
Missing Z-score example in the docstring. Only the IQR method is demonstrated. Adding a Z-score example would help users.

Using AI to address these

If you use an AI coding agent, here's a prompt you can copy-paste to work through these review comments:

I need to address code review feedback on my flag_outliers PR in pyjanitor. Please make the following changes to janitor/functions/flag_outliers.py and tests/functions/test_flag_outliers.py:

Remove the _check_column helper function and replace its usage with from janitor.utils import check_column. Follow the pattern in janitor/functions/flag_nulls.py.

Add from __future__ import annotations at the top of flag_outliers.py.

When method="zscore" and the user did not explicitly pass a threshold, raise a UserWarning suggesting that the default 1.5 is designed for IQR and that 3.0 is typical for Z-score.

In all tests that use pytest.raises, add the match= parameter to verify the correct error message is raised.

Add a test for NaN handling — when the column contains NaN values, the function should still produce correct boolean flags and not crash.

Rename test_non_method_functional to test_standalone_function_call.

Add a Z-score usage example to the docstring of flag_outliers.

Read janitor/functions/flag_nulls.py and janitor/utils.py first to match existing codebase patterns.

Looking forward to the next revision!

Psycoder0611 added 2 commits April 15, 2026 21:56

Fix boolean assertions in flag_outliers tests

a6bac54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] feat: add flag_outliers function for outlier detection#1602

[ENH] feat: add flag_outliers function for outlier detection#1602
Psycoder0611 wants to merge 2 commits into
pyjanitor-devs:devfrom
Psycoder0611:feature/flag-outliers

Psycoder0611 commented Apr 16, 2026

Uh oh!

codecov Bot commented Apr 16, 2026

Uh oh!

ericmjl commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Psycoder0611 commented Apr 16, 2026

Summary

Motivation

Changes

Features

Tests

Uh oh!

codecov Bot commented Apr 16, 2026

Codecov Report

Uh oh!

ericmjl commented May 30, 2026

Must fix

Should fix

Using AI to address these

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants