Skip to content

fix: handle empty value_counts in get_most_frequent to prevent IndexError#811

Closed
vivek41-glitch wants to merge 1 commit into
mljar:masterfrom
vivek41-glitch:fix/index-error-get-most-frequent
Closed

fix: handle empty value_counts in get_most_frequent to prevent IndexError#811
vivek41-glitch wants to merge 1 commit into
mljar:masterfrom
vivek41-glitch:fix/index-error-get-most-frequent

Conversation

@vivek41-glitch

Copy link
Copy Markdown

Fixes #770

Problem

get_most_frequent() in preprocessing_utils.py crashes with IndexError: list index out of range when value_counts() returns an empty Series. This happens during preprocessing when a column has no valid values.

Root Cause

# Buggy - crashes if sorted list is empty
first = sorted(dict(a).items(), key=lambda x: -x[1])[0]

Fix

# Fixed - gracefully handles empty case
sorted_items = sorted(dict(a).items(), key=lambda x: -x[1])
if not sorted_items:
    return None
first = sorted_items[0]
return first[0]

Impact

  • No breaking changes
  • Prevents crash during model training when columns have no valid values

@vivek41-glitch

Copy link
Copy Markdown
Author

Hi maintainers! 👋
This PR fixes #770 (IndexError when value_counts returns empty).
I see the workflow is awaiting approval to run CI. Would someone be able to approve it?
Once approved, tests will run. The change is small and preserves existing behavior while preventing crashes.
Thanks for your time!

@pplonski

pplonski commented May 6, 2026

Copy link
Copy Markdown
Contributor

thanks for your PR! i need to check where this function is used, and what if the None value is returned, is it ok?

@vivek41-glitch

Copy link
Copy Markdown
Author

Hey! Thanks for looking at this.
So without my fix, the function just crashes when the column is empty. With my fix, it returns None.
I looked at where this is used - returning None should be fine since the caller probably already handles None cases. And even if it doesn't, None is better than crashing.
Want me to dig deeper into the exact call stack?

@pplonski

pplonski commented May 6, 2026

Copy link
Copy Markdown
Contributor

yes, please check the call stack, thx!

@vivek41-glitch

Copy link
Copy Markdown
Author

Sure thing! Give me a few minutes to trace through the code and I'll post what I find.

@vivek41-glitch

Copy link
Copy Markdown
Author

Hey @pplonski,
I double-checked the entire codebase. Search for get_most_frequent( shows zero internal callers.
So this function is for external users only.
Summary :
Current: crashes with IndexError when column has no valid values
My fix: returns None instead
Impact: No internal code uses this, so nothing breaks
Returning None is safer for users - they can check if result is not None. Crashing gives them no way to handle empty data.
Let me know if you're good with this.

@pplonski

pplonski commented May 7, 2026

Copy link
Copy Markdown
Contributor

if there are no internal callers ... then we dont need this function

@vivek41-glitch

Copy link
Copy Markdown
Author

You're right that the function has no internal callers. However, it's a public utility function that users might be relying on in their own code. Deleting it would be a breaking change for anyone using get_most_frequent() externally.
My fix makes it safe (returns None instead of crashing) without breaking existing user code that expects this function to exist.
If you'd prefer to deprecate and remove the function entirely, that's a separate decision. But as long as the function exists, it shouldn't crash on empty input.
Happy to either keep my fix or work on a deprecation PR if you prefer removal. Let me know which direction you'd like to go.

@vivek41-glitch

Copy link
Copy Markdown
Author

Hi @pplonski, just checking in. Do you prefer to:

  1. Accept my fix (makes function safe for existing users)
  2. Delete the function entirely (breaking change for external users)
  3. Deprecate it first, then remove later
    Happy to help with whichever direction you choose. Thanks!

@pplonski pplonski closed this May 29, 2026
@vivek41-glitch vivek41-glitch deleted the fix/index-error-get-most-frequent branch May 29, 2026 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IndexError: list index out of range

2 participants