fix(hf-daily-papers): default to yesterday + retry-with-fallback + earlier cron#150
Merged
Conversation
…rlier cron
Root cause (diagnosed from 2026-05-15T00:15Z scheduled-run failure with
HTTP 400):
- Cron fires at 23:59 UTC but Actions queues routinely delay 5-30
minutes; the actual run starts past midnight UTC.
- _today_utc() returns the *new* UTC day after midnight crossover.
- HF's daily-papers bucket for the new day isn't populated until HF's
late-afternoon editorial pipeline runs → endpoint returns HTTP 400.
Three layered fixes:
1. **Default to yesterday-UTC** in _today_utc() (src/llmxive/hf_daily_papers.py):
HF buckets are filled by HF's editorial pipeline late in the day; "today"
is unreliably-published. Yesterday is guaranteed-complete.
2. **Retry-with-fallback** in _fetch_daily_json(): on HTTP 400/404 for the
requested date, walk back one day and retry (configurable fallback_days,
default 1). 5xx and other errors propagate unchanged (transient HF
problems should not be silently swallowed).
3. **Move cron earlier** in .github/workflows/hf-daily-papers.yml:
23:59 UTC → 08:00 UTC. 08:00 UTC fetches yesterday's UTC bucket, which
is guaranteed-published by then regardless of schedule drift.
API change: fetch_top_papers() now returns (effective_date, list[Paper])
instead of just list[Paper] — the effective_date may differ from the
requested date when the fallback chain triggered. submit_top_papers()
unpacks the tuple and logs a fallback message to stderr.
Tests:
- 5 new TestDateFallback tests cover:
- _today_utc returns yesterday
- 400 triggers one-day fallback
- 404 also triggers fallback
- 5xx propagates unchanged
- all-400 chain raises the last HTTPError
- All 17 HF tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Root cause
The 2026-05-15T00:15Z scheduled run failed with HTTP 400. Analysis:
_today_utc()returns the new UTC day (e.g.2026-05-15) after midnight crossover.https://huggingface.co/api/daily_papers?date=2026-05-15(queried at 00:16 UTC) returned HTTP 400.Three layered fixes
1. Default to yesterday-UTC
src/llmxive/hf_daily_papers.py_today_utc()now returns(now_utc - 1 day). HF buckets are filled late in the day; yesterday is the safest default.2. Retry-with-fallback on 400/404
_fetch_daily_json(date, fallback_days=1)now:datefirst.(effective_date, payload)so callers can log when the fallback fired.fetch_top_papers()updated to return(effective_date, list[Paper]).submit_top_papers()unpacks and logs a fallback message to stderr when triggered.3. Move cron from 23:59 UTC → 08:00 UTC
.github/workflows/hf-daily-papers.ymlschedule:"59 23 * * *"→"0 8 * * *". At 08:00 UTC, yesterday's bucket is guaranteed-published regardless of any scheduler drift.Tests
5 new
TestDateFallbacktests intests/unit/test_hf_daily_papers.py:test_today_utc_defaults_to_yesterday— _today_utc returns (now - 1 day)test_fetch_falls_back_on_400— 400 triggers exactly one fallback retrytest_fetch_falls_back_on_404— 404 also triggers fallbacktest_fetch_does_not_swallow_5xx— 503 propagates unchangedtest_fetch_raises_when_all_fallback_attempts_400— all-400 chain raises the last HTTPError17/17 HF tests pass (existing 12 + 5 new).
Backward compat
The only API-shape change is
fetch_top_papersreturning a tuple. All 5 existing test callsites updated to unpack. No external callers exist outside this module + tests.🤖 Generated with Claude Code