find common tags#891
Conversation
Signed-off-by: Saurabh Misra <misra.saurabh1@gmail.com>
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
| common_tags = articles[0].get("tags", []) | ||
| for article in articles[1:]: | ||
| common_tags = [tag for tag in common_tags if tag in article.get("tags", [])] | ||
| return set(common_tags) |
There was a problem hiding this comment.
⚡️Codeflash found 8,026% (80.26x) speedup for find_common_tags in codeflash/result/common_tags.py
⏱️ Runtime : 583 milliseconds → 7.18 milliseconds (best of 96 runs)
📝 Explanation and details
The optimization achieves a remarkable 8025% speedup by replacing inefficient list operations with optimized set operations for finding common tags across articles.
Key Changes:
- Initial conversion to set:
common_tags = set(articles[0].get("tags", []))instead of keeping tags as a list - Set intersection instead of list comprehension:
common_tags.intersection_update(article.get("tags", []))replaces[tag for tag in common_tags if tag in article.get("tags", [])]
Why This Is Much Faster:
- O(1) vs O(n) lookups: The original code uses
tag in article.get("tags", [])which is O(n) for lists. Set membership testing is O(1) on average. - Eliminates quadratic complexity: The original list comprehension creates O(n×m) operations where n is tags in common_tags and m is tags per article. With many articles, this compounds exponentially.
- In-place operations:
intersection_updatemodifies the existing set rather than creating new data structures each iteration, reducing memory allocations.
Performance Impact by Test Case:
- Massive gains on large datasets: Tests with 1000+ tags show 5257-11201% speedups, demonstrating how the optimization scales
- Consistent improvements on small datasets: Even simple cases show 10-50% improvements
- Most effective when: Articles have many tags or there are many articles to process
The line profiler confirms this: the bottleneck line went from 99.6% of execution time (635ms) to just 79.3% (12.5ms), representing a ~50x improvement on the critical path. This optimization is particularly valuable for content management systems or tag analysis workflows processing large article datasets.
✅ Correctness verification report:
| Test | Status |
|---|---|
| ⚙️ Existing Unit Tests | ✅ 2 Passed |
| 🌀 Generated Regression Tests | ✅ 29 Passed |
| ⏪ Replay Tests | 🔘 None Found |
| 🔎 Concolic Coverage Tests | ✅ 2 Passed |
| 📊 Tests Coverage | 100.0% |
⚙️ Existing Unit Tests and Runtime
| Test File::Test Function | Original ⏱️ | Optimized ⏱️ | Speedup |
|---|---|---|---|
test_common_tags.py::test_common_tags_1 |
5.05μs | 3.93μs | 28.3%✅ |
🌀 Generated Regression Tests and Runtime
# imports
# function to test
from __future__ import annotations
import pytest # used for our unit tests
from codeflash.result.common_tags import find_common_tags
# unit tests
def test_single_article():
# Single article should return its tags
articles = [{"tags": ["python", "coding", "tutorial"]}]
codeflash_output = find_common_tags(articles) # 1.55μs -> 1.24μs (25.5% faster)
# Outputs were verified to be equal to the original implementation
def test_multiple_articles_with_common_tags():
# Multiple articles with common tags should return the common tags
articles = [
{"tags": ["python", "coding"]},
{"tags": ["python", "data"]},
{"tags": ["python", "machine learning"]}
]
codeflash_output = find_common_tags(articles) # 2.90μs -> 2.32μs (25.0% faster)
# Outputs were verified to be equal to the original implementation
def test_empty_list_of_articles():
# Empty list of articles should return an empty set
articles = []
codeflash_output = find_common_tags(articles) # 732ns -> 483ns (51.6% faster)
# Outputs were verified to be equal to the original implementation
def test_articles_with_no_common_tags():
# Articles with no common tags should return an empty set
articles = [
{"tags": ["python"]},
{"tags": ["java"]},
{"tags": ["c++"]}
]
codeflash_output = find_common_tags(articles) # 2.38μs -> 2.15μs (10.5% faster)
# Outputs were verified to be equal to the original implementation
def test_articles_with_empty_tag_lists():
# Articles with some empty tag lists should return an empty set
articles = [
{"tags": []},
{"tags": ["python"]},
{"tags": ["python", "java"]}
]
codeflash_output = find_common_tags(articles) # 1.94μs -> 1.95μs (0.718% slower)
# Outputs were verified to be equal to the original implementation
def test_all_articles_with_empty_tag_lists():
# All articles with empty tag lists should return an empty set
articles = [
{"tags": []},
{"tags": []},
{"tags": []}
]
codeflash_output = find_common_tags(articles) # 1.90μs -> 1.78μs (6.69% faster)
# Outputs were verified to be equal to the original implementation
def test_tags_with_special_characters():
# Tags with special characters should be handled correctly
articles = [
{"tags": ["python!", "coding"]},
{"tags": ["python!", "data"]}
]
codeflash_output = find_common_tags(articles) # 2.16μs -> 1.78μs (21.3% faster)
# Outputs were verified to be equal to the original implementation
def test_case_sensitivity():
# Tags with different cases should not be considered the same
articles = [
{"tags": ["Python", "coding"]},
{"tags": ["python", "data"]}
]
codeflash_output = find_common_tags(articles) # 1.91μs -> 1.74μs (9.55% faster)
# Outputs were verified to be equal to the original implementation
def test_large_number_of_articles():
# Large number of articles with a common tag should return that tag
articles = [{"tags": ["common_tag", f"tag{i}"]} for i in range(1000)]
codeflash_output = find_common_tags(articles) # 206μs -> 138μs (49.8% faster)
# Outputs were verified to be equal to the original implementation
def test_large_number_of_tags():
# Large number of tags with some common tags should return the common tags
articles = [
{"tags": [f"tag{i}" for i in range(1000)]},
{"tags": [f"tag{i}" for i in range(500, 1500)]}
]
expected = {f"tag{i}" for i in range(500, 1000)}
codeflash_output = find_common_tags(articles) # 4.43ms -> 82.7μs (5257% faster)
# Outputs were verified to be equal to the original implementation
def test_mixed_length_of_tag_lists():
# Articles with mixed length of tag lists should return the common tags
articles = [
{"tags": ["python", "coding"]},
{"tags": ["python"]},
{"tags": ["python", "coding", "tutorial"]}
]
codeflash_output = find_common_tags(articles) # 2.42μs -> 2.12μs (13.9% faster)
# Outputs were verified to be equal to the original implementation
def test_tags_with_different_data_types():
# Tags with different data types should only consider strings
articles = [
{"tags": ["python", 123]},
{"tags": ["python", "123"]}
]
codeflash_output = find_common_tags(articles) # 1.98μs -> 1.79μs (10.5% faster)
# Outputs were verified to be equal to the original implementation
def test_performance_with_large_data():
# Performance with large data should return the common tag
articles = [{"tags": ["common_tag", f"tag{i}"]} for i in range(10000)]
codeflash_output = find_common_tags(articles) # 2.14ms -> 1.42ms (50.2% faster)
# Outputs were verified to be equal to the original implementation
def test_scalability_with_increasing_tags():
# Scalability with increasing tags should return the common tag
articles = [{"tags": ["common_tag"] + [f"tag{i}" for i in range(j)]} for j in range(1, 1001)]
codeflash_output = find_common_tags(articles) # 458μs -> 341μs (34.2% faster)
# Outputs were verified to be equal to the original implementation# imports
# function to test
from __future__ import annotations
import pytest # used for our unit tests
from codeflash.result.common_tags import find_common_tags
# unit tests
def test_empty_input_list():
# Test with an empty list
codeflash_output = find_common_tags([]) # 575ns -> 484ns (18.8% faster)
# Outputs were verified to be equal to the original implementation
def test_single_article():
# Test with a single article with tags
codeflash_output = find_common_tags([{"tags": ["python", "coding", "development"]}]) # 1.33μs -> 1.17μs (13.8% faster)
# Test with a single article with no tags
codeflash_output = find_common_tags([{"tags": []}]) # 478ns -> 404ns (18.3% faster)
# Outputs were verified to be equal to the original implementation
def test_multiple_articles_some_common_tags():
# Test with multiple articles having some common tags
articles = [
{"tags": ["python", "coding", "development"]},
{"tags": ["python", "development", "tutorial"]},
{"tags": ["python", "development", "guide"]}
]
codeflash_output = find_common_tags(articles) # 2.62μs -> 2.25μs (16.5% faster)
articles = [
{"tags": ["tech", "news"]},
{"tags": ["tech", "gadgets"]},
{"tags": ["tech", "reviews"]}
]
codeflash_output = find_common_tags(articles) # 1.42μs -> 1.03μs (37.3% faster)
# Outputs were verified to be equal to the original implementation
def test_multiple_articles_no_common_tags():
# Test with multiple articles having no common tags
articles = [
{"tags": ["python", "coding"]},
{"tags": ["development", "tutorial"]},
{"tags": ["guide", "learning"]}
]
codeflash_output = find_common_tags(articles) # 2.08μs -> 2.02μs (2.62% faster)
articles = [
{"tags": ["apple", "banana"]},
{"tags": ["orange", "grape"]},
{"tags": ["melon", "kiwi"]}
]
codeflash_output = find_common_tags(articles) # 1.23μs -> 1.01μs (21.8% faster)
# Outputs were verified to be equal to the original implementation
def test_articles_with_duplicate_tags():
# Test with articles having duplicate tags
articles = [
{"tags": ["python", "python", "coding"]},
{"tags": ["python", "development", "python"]},
{"tags": ["python", "guide", "python"]}
]
codeflash_output = find_common_tags(articles) # 2.46μs -> 2.03μs (21.4% faster)
articles = [
{"tags": ["tech", "tech", "news"]},
{"tags": ["tech", "tech", "gadgets"]},
{"tags": ["tech", "tech", "reviews"]}
]
codeflash_output = find_common_tags(articles) # 1.39μs -> 985ns (40.6% faster)
# Outputs were verified to be equal to the original implementation
def test_articles_with_mixed_case_tags():
# Test with articles having mixed case tags
articles = [
{"tags": ["Python", "Coding"]},
{"tags": ["python", "Development"]},
{"tags": ["PYTHON", "Guide"]}
]
codeflash_output = find_common_tags(articles) # 2.16μs -> 1.97μs (9.49% faster)
articles = [
{"tags": ["Tech", "News"]},
{"tags": ["tech", "Gadgets"]},
{"tags": ["TECH", "Reviews"]}
]
codeflash_output = find_common_tags(articles) # 1.10μs -> 1.00μs (9.60% faster)
# Outputs were verified to be equal to the original implementation
def test_articles_with_non_string_tags():
# Test with articles having non-string tags
articles = [
{"tags": ["python", 123, "coding"]},
{"tags": ["python", "development", 123]},
{"tags": ["python", "guide", 123]}
]
codeflash_output = find_common_tags(articles) # 2.68μs -> 2.05μs (30.5% faster)
articles = [
{"tags": [None, "news"]},
{"tags": ["tech", None]},
{"tags": [None, "reviews"]}
]
codeflash_output = find_common_tags(articles) # 1.48μs -> 1.07μs (38.4% faster)
# Outputs were verified to be equal to the original implementation
def test_large_scale_test_cases():
# Test with large scale input where all tags should be common
articles = [
{"tags": ["tag" + str(i) for i in range(1000)]} for _ in range(100)
]
expected_output = {"tag" + str(i) for i in range(1000)}
codeflash_output = find_common_tags(articles) # 385ms -> 3.41ms (11201% faster)
# Test with large scale input where no tags should be common
articles = [
{"tags": ["tag" + str(i) for i in range(1000)]} for _ in range(50)
] + [{"tags": ["unique_tag"]}]
codeflash_output = find_common_tags(articles) # 190ms -> 1.74ms (10851% faster)
# Outputs were verified to be equal to the original implementationfrom codeflash.result.common_tags import find_common_tags
def test_find_common_tags():
find_common_tags([{}, {}])
def test_find_common_tags_2():
find_common_tags([])🔎 Concolic Coverage Tests and Runtime
| Test File::Test Function | Original ⏱️ | Optimized ⏱️ | Speedup |
|---|---|---|---|
codeflash_concolic_2d1ideoq/tmpgwc7y34w/test_concolic_coverage.py::test_find_common_tags |
2.08μs | 1.80μs | 15.4%✅ |
codeflash_concolic_2d1ideoq/tmpgwc7y34w/test_concolic_coverage.py::test_find_common_tags_2 |
659ns | 507ns | 30.0%✅ |
To test or edit this optimization locally git merge codeflash/optimize-pr891-2025-11-10T02.53.34
| common_tags = articles[0].get("tags", []) | |
| for article in articles[1:]: | |
| common_tags = [tag for tag in common_tags if tag in article.get("tags", [])] | |
| return set(common_tags) | |
| common_tags = set(articles[0].get("tags", [])) | |
| for article in articles[1:]: | |
| common_tags.intersection_update(article.get("tags", [])) | |
| return common_tags |
PR Type
Enhancement, Tests
Description
Add common tags utility function
Implement tests for tag intersection
Diagram Walkthrough
File Walkthrough
common_tags.py
Add tag intersection utility functioncodeflash/result/common_tags.py
find_common_tagsfunction.test_common_tags.py
Add unit tests for common tagstests/test_common_tags.py
find_common_tags.