Refactor Apriori Algorithm: Fix Pruning Logic, Add Type Hints, and Pass Lint Checks#12698
Closed
FardinMoghaddamPour wants to merge 5 commits into
Closed
Refactor Apriori Algorithm: Fix Pruning Logic, Add Type Hints, and Pass Lint Checks#12698FardinMoghaddamPour wants to merge 5 commits into
FardinMoghaddamPour wants to merge 5 commits into
Conversation
…generation
Brief: Improved pruning logic and fixed core support count in Apriori function.
Description: Rewrote prune logic and fixed key issues in candidate generation
to ensure accurate itemset frequency counting, pruning, and ordering for output.
Explanation:
1. Rewrote the `prune` function to validate (k-1)-subsets correctly.
2. Previous version misused list and count logic in pruning process.
3. Candidate generation now uses proper set union to join k-itemsets.
4. Added conversion from set of frozensets to deduplicate candidates safely.
5. Fixed incorrect initial support counting by replacing flawed loop logic.
6. Output of `apriori` is now consistently sorted for testing and readability.
7. Updated doctests to match new and correct support count outputs.
Conclusion:
This change corrects both logic and structure of the Apriori algorithm, ensuring
reliable pruning, accurate support calculation, and stable output format. It also
resolves structural design issues in candidate creation, making the code more
maintainable and testable. The refactor is essential for correctness and scaling.
for more information, see https://pre-commit.ci
…liant)
Brief: Applied Ruff and MyPy fixes: line length, type hints, import sorting.
Description: This commit resolves all Ruff and MyPy linter errors related to style,
formatting, and type safety to ensure full pre-commit compatibility and correctness.
Explanation:
1. Reformatted import statements to match standard alphabetical order (I001).
2. Wrapped overly long lines in docstrings to comply with line length limits (E501).
3. Replaced generator expression inside `set()` with set comprehension (C401).
4. Removed redundant `list()` call inside `sorted()` during candidate generation (C414).
5. Added missing type annotations for `item_counts` and `candidate_counts` to satisfy MyPy.
Conclusion:
These changes ensure the Apriori implementation conforms to all enforced code quality
standards (Ruff and MyPy). This improves readability, maintainability, and compatibility
with the repository’s CI system and contributor guidelines.
for more information, see https://pre-commit.ci
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe your change:
Checklist:
Commit Summary
Refactored Apriori implementation for correctness, pruning logic, and support counting.
Explanation:
prune()to validate (k-1)-subsets using frozensets and combinations.Conclusion:
This PR addresses critical flaws in the original Apriori implementation,
ensuring correct frequent itemset mining, consistent output formatting,
and future extensibility. It improves performance, testability, and clarity.