Skip to content

feat: add percentile-based truncated histogram option#1844

Open
matsumotominato wants to merge 2 commits intoData-Centric-AI-Community:developfrom
matsumotominato:feature/truncated-histogram
Open

feat: add percentile-based truncated histogram option#1844
matsumotominato wants to merge 2 commits intoData-Centric-AI-Community:developfrom
matsumotominato:feature/truncated-histogram

Conversation

@matsumotominato
Copy link
Copy Markdown

Summary

Add a percentile_cutoff option to the histogram configuration that allows
generating a truncated histogram alongside the standard one.

Problem

When a column contains extreme outliers (e.g., company revenues where most are
around $90K but some are in the billions), the histogram becomes a single bar
and is not useful for understanding the distribution.

Closes #1817

Changes

  • config.py: Added percentile_cutoff parameter to Histogram class (default: 0.0)
  • summary_algorithms.py: Compute truncated histogram when percentile_cutoff > 0
  • render_real.py: Display truncated histogram as a new tab in the report

Usage

profile = ProfileReport(
    df,
    plot={"histogram": {"percentile_cutoff": 0.05}},
)

Add a percentile_cutoff option to the histogram configuration that
allows generating a truncated histogram alongside the standard one.
This helps visualize distributions with extreme outliers by clipping
data to a specified percentile range (e.g., 5th-95th percentile).

Closes Data-Centric-AI-Community#1817
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

histogram calculated on truncated data (e.g. 5 to 95 percentile)

1 participant