Reddit multi-subreddit support + configurable targets.#297
Reddit multi-subreddit support + configurable targets.#297AuraMindNest wants to merge 4 commits into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
✅ Files skipped from review due to trivial changes (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughThe Reddit activity tracker is extended from single-subreddit to multi-subreddit scraping. New ChangesReddit Multi-Subreddit Support and Per-Subreddit Scoping
Sequence Diagram(s)sequenceDiagram
participant CLI as Management Command
participant Collector as RedditActivityTrackerCollector
participant Session as RedditSession
participant Svc as services
participant State as RedditIncrementalState
CLI->>Collector: collect() with resolved subreddits list
loop for each subreddit
Collector->>Svc: get_latest_submission_created_utc(subreddit=name)
Collector->>Svc: get_latest_comment_created_utc(subreddit=name)
Collector->>Session: fetch_submissions_in_range(subreddit=name)
Session-->>Collector: submissions[]
Collector->>Session: fetch_comments_in_range(subreddit=name)
Session-->>Collector: comments[]
Collector->>Collector: _filter_submissions_by_keywords(keywords)
Collector->>Collector: _filter_comments_by_keywords(keywords)
Collector->>Collector: upsert records, write JSON, record cursor
end
Collector->>State: from_subreddit_cursors(submissions=..., comments=...)
State-->>Collector: _incremental_state_out (checkpoint_token, extras)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
reddit_activity_tracker/tests/test_collector_integration.py (1)
280-284: ⚡ Quick winAssert override propagation for comment fetches too.
This test validates
--subredditsonly on submission calls; add the same check forsession.fetch_comments_in_rangeso both required fetcher methods are contract-tested.Suggested test assertion
subreddit_args = [ call.kwargs["subreddit"] for call in session.fetch_submissions_in_range.call_args_list ] assert subreddit_args == ["cpp_questions", "learnprogramming"] + comment_subreddit_args = [ + call.kwargs["subreddit"] + for call in session.fetch_comments_in_range.call_args_list + ] + assert comment_subreddit_args == ["cpp_questions", "learnprogramming"]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@reddit_activity_tracker/tests/test_collector_integration.py` around lines 280 - 284, The test currently validates that the subreddit override propagates to submission fetches via session.fetch_submissions_in_range, but does not verify the same behavior for comment fetches. Add a parallel assertion block that extracts the subreddit arguments from session.fetch_comments_in_range.call_args_list using the same pattern as the existing submission assertion (accessing call.kwargs["subreddit"] for each call), and assert that these subreddit arguments also match the expected list of ["cpp_questions", "learnprogramming"] to ensure both fetcher methods properly receive the overridden subreddits.reddit_activity_tracker/tests/test_services.py (1)
126-154: ⚡ Quick winStrengthen global comment-max coverage with cross-subreddit data.
The current global-max comment assertion uses a single comment record, so it doesn’t prove cross-subreddit max selection for comments. Add a second comment on another subreddit with a higher timestamp and assert that value.
Suggested test hardening
- baker.make( + submission_programming = baker.make( RedditSubmission, reddit_submission_id="t3_b", subreddit="programming", title="B", url="https://example.com/b", permalink="/r/programming/comments/b/", created_utc=500, ) baker.make( RedditComment, reddit_comment_id="t1_b", submission=submission, parent_id="t3_a", url="https://example.com/c", created_utc=200, ) + baker.make( + RedditComment, + reddit_comment_id="t1_prog", + submission=submission_programming, + parent_id="t3_b", + url="https://example.com/prog-c", + created_utc=900, + ) assert services.get_latest_submission_created_utc() == 500 - assert services.get_latest_comment_created_utc() == 200 + assert services.get_latest_comment_created_utc() == 900🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@reddit_activity_tracker/tests/test_services.py` around lines 126 - 154, The test test_get_latest_submission_and_comment_created_utc_global_max currently only creates a single RedditComment record with created_utc=200, which doesn't validate cross-subreddit maximum selection for comments. Add a second baker.make() call to create another RedditComment on the second submission (the one in the "programming" subreddit) with a higher created_utc value than 200, then update the assertion for services.get_latest_comment_created_utc() to expect this new higher timestamp value to ensure the function correctly selects the maximum comment timestamp across all subreddits.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@config/settings.py`:
- Around line 568-575: The code assumes that _parsed_keyword_filters is a
dictionary after JSON parsing, but valid JSON like empty lists or strings will
pass json.loads() without error and then fail when calling .items() with an
AttributeError. Add a guard condition after parsing _parsed_keyword_filters to
check that it is actually a dictionary (using isinstance(dict)) before
attempting to call .items() on it in the dictionary comprehension for
REDDIT_SUBREDDIT_KEYWORD_FILTERS. If the parsed JSON is not a dictionary, either
skip setting the variable or set it to an empty dictionary.
In `@reddit_activity_tracker/management/commands/run_reddit_activity_tracker.py`:
- Around line 43-53: The _resolve_subreddit_targets function currently returns a
list of subreddit targets that may contain duplicates, leading to redundant API
calls and inflated metrics. After collecting targets from either the
command-line override (via _parse_subreddit_list) or the settings configuration
(via getattr), deduplicate the targets list before the validation check and
return statement. Convert the targets list to a set to remove duplicates, then
convert it back to a list to maintain the expected return type before returning
from the function.
---
Nitpick comments:
In `@reddit_activity_tracker/tests/test_collector_integration.py`:
- Around line 280-284: The test currently validates that the subreddit override
propagates to submission fetches via session.fetch_submissions_in_range, but
does not verify the same behavior for comment fetches. Add a parallel assertion
block that extracts the subreddit arguments from
session.fetch_comments_in_range.call_args_list using the same pattern as the
existing submission assertion (accessing call.kwargs["subreddit"] for each
call), and assert that these subreddit arguments also match the expected list of
["cpp_questions", "learnprogramming"] to ensure both fetcher methods properly
receive the overridden subreddits.
In `@reddit_activity_tracker/tests/test_services.py`:
- Around line 126-154: The test
test_get_latest_submission_and_comment_created_utc_global_max currently only
creates a single RedditComment record with created_utc=200, which doesn't
validate cross-subreddit maximum selection for comments. Add a second
baker.make() call to create another RedditComment on the second submission (the
one in the "programming" subreddit) with a higher created_utc value than 200,
then update the assertion for services.get_latest_comment_created_utc() to
expect this new higher timestamp value to ensure the function correctly selects
the maximum comment timestamp across all subreddits.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 36add6d1-9e91-40a1-84d4-6bd70318b223
📒 Files selected for processing (12)
.env.exampleREADME.mdconfig/boost_collector_schedule.yamlconfig/settings.pydocs/service_api/reddit_activity_tracker.mdreddit_activity_tracker/fetcher.pyreddit_activity_tracker/management/commands/run_reddit_activity_tracker.pyreddit_activity_tracker/protocol_impl.pyreddit_activity_tracker/services.pyreddit_activity_tracker/tests/test_collector_integration.pyreddit_activity_tracker/tests/test_run_reddit_activity_tracker_command.pyreddit_activity_tracker/tests/test_services.py
Close #292.
Summary
Extend the Reddit activity tracker from a single hardcoded target (
r/cpp) to a configurable list of subreddits. Targets are set viaREDDIT_SUBREDDITS(comma-separated env var;r/prefix optional) or overridden per run with--subreddits. The collector iterates all configured subreddits in one scheduled run, using a single sharedRedditSessionso Reddit API rate limits apply across all targets.Per-subreddit incremental cursors replace the previous global
Max(created_utc)watermarks:get_latest_submission_created_utcandget_latest_comment_created_utcaccept an optionalsubredditargument, andRedditIncrementalState(newprotocol_impl.py) records per-subreddit submission/comment timestamps viaload_incremental_state()and_incremental_state_out. Broad subreddits can be narrowed withREDDIT_SUBREDDIT_KEYWORD_FILTERS(JSON env var; default filtersr/programmingto boost/c++/cpp keywords). The fetcher no longer hardcodesSUBREDDIT;fetch_submissions_in_rangeandfetch_comments_in_rangerequire an explicitsubredditparameter. Default targets:cpp,cpp_questions,programming. No model migration required —RedditSubmission.subredditis already indexed.Apps touched
reddit_activity_tracker(collector loop, fetcher, services,protocol_impl.py, tests)config(settings, schedule YAML)docs/service_api(reddit_activity_tracker.md).env.example,README.mdTest plan
python -m pytest(or scoped:python -m pytest <app>/tests)uv run pyright(if typed code changed)lint-imports(if imports or cross-app coupling changed)Docs / coupling
Summary by CodeRabbit
Summary by CodeRabbit
reddit_activity_tracker/README with setup, settings, and command usage details.