Skip to content

feat(dataflows): CLS / Cninfo / Eastmoney news for HK and CN A-share tickers#792

Open
chitboon wants to merge 5 commits into
TauricResearch:mainfrom
chitboon:feat/news-source-vendors
Open

feat(dataflows): CLS / Cninfo / Eastmoney news for HK and CN A-share tickers#792
chitboon wants to merge 5 commits into
TauricResearch:mainfrom
chitboon:feat/news-source-vendors

Conversation

@chitboon
Copy link
Copy Markdown

Summary

Adds first-class news coverage for HK and CN A-share (SH / SZ) tickers
via three new vendors plus a ticker-suffix auto-router. Purely additive at the
data layer — US tickers are unchanged.

Suffix Vendors
.HK Eastmoney + CLS
.SS / .SH / .SZ Eastmoney + CLS + Cninfo
else (incl. US) yfinance (existing behaviour)

The router engages when data_vendors.news_data is set to "auto". This PR
flips the default from "yfinance" to "auto" so coverage is on out of the
box; setting it back to "yfinance" (or "alpha_vantage") restores legacy
single-vendor behaviour.

What's in this PR (3 commits)

  1. Per-vendor cache scaffolding — daily / weekly bucket cache module used
    by the new vendors. Reduces repeat-fetch cost when analysing multiple
    tickers in the same day or re-running the same ticker the same week.
  2. yfinance ticker normaliser (.SH.SS) — Yahoo only accepts .SS
    for Shanghai; users pasting .SH from other sources now route correctly.
    Also tightens get_global_news_yfinance from 4 overlapping queries to 2,
    roughly halving macro-news fetch latency.
  3. CLS / Cninfo / Eastmoney vendors + auto-router — vendor modules,
    route_to_vendor multi-source merge, default_config flip, akshare>=1.18
    dep, tests.

What's NOT in this PR (held back deliberately)

  • SGX (Singapore Business Times RSS) vendor — needs more end-to-end
    testing before shipping; will follow as a separate PR.
  • Analyst-prefetch refactor — the consuming news_analyst.py already
    routes through get_newsroute_to_vendor, so the new vendors engage
    without analyst-side changes. The single-shot rewrite is a separate
    architectural concern and not part of this PR.

Failure modes

  • A per-vendor failure or [Vendor skip — ...] marker is dropped from the
    multi-source merge; the analyst sees whichever sources succeeded.
  • If all sources fail / skip, the router falls through to yfinance as a final
    safety net — never raises.

Tests

  • 4 new test files; all use unittest.mock at the akshare import boundary
    — no live network calls in CI.
  • Covers: routing per suffix, multi-vendor merge, skip-marker filtering,
    cache hit/miss across daily and weekly buckets, Cninfo's two known schema
    shapes (公告链接 direct vs announcementId + orgId reconstruction).
  • Full suite remains fast (~5s).

New dep

  • akshare>=1.18 — wraps Eastmoney + Cninfo endpoints. Imported lazily
    inside the vendor modules; users analysing only US tickers pay no import
    cost.

Test plan

  • python -m pytest tests/ — all green locally
  • _resolve_auto_vendors smoke covers .HK, .SS, .SH, .SZ, .SI,
    AAPL, and get_global_news
  • Reviewer to confirm akshare install lands cleanly in CI
  • Reviewer to confirm the news_data: "auto" default flip is desired —
    happy to revert to "yfinance" and document "auto" opt-in if preferred

🤖 Generated with Claude Code

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive support for Chinese-market data (HK, Shanghai, and Shenzhen) by integrating the akshare library and adding new news vendors: Eastmoney, CLS, and Cninfo. It implements an "auto" routing mechanism that selects the appropriate vendor based on ticker suffixes and merges results from multiple sources for Asian markets. Additionally, a new on-disk caching layer is introduced to optimize performance for slow-changing data like financial statements and global news. Feedback was provided regarding the caching mechanism to ensure atomic file writes, preventing potential data corruption during concurrent access.

Comment on lines +64 to +65
with open(_path_for(key), "w", encoding="utf-8") as f:
f.write(value)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Writing directly to the cache file can lead to corrupted data if the process is interrupted or if multiple processes attempt to write to the same file simultaneously. It is safer to write to a temporary file first and then use os.replace() to perform an atomic update.

Suggested change
with open(_path_for(key), "w", encoding="utf-8") as f:
f.write(value)
temp_path = f"{_path_for(key)}.tmp"
with open(temp_path, "w", encoding="utf-8") as f:
f.write(value)
os.replace(temp_path, _path_for(key))

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in f939f5f_write now uses tmp + os.replace for atomic publish.

@chitboon chitboon force-pushed the feat/news-source-vendors branch from 729dce7 to 8267393 Compare May 10, 2026 12:19
chitboon added a commit to chitboon/TradingAgents that referenced this pull request May 10, 2026
Addresses inline review feedback on PR TauricResearch#792 from gemini-code-assist:
streaming directly to the cache file leaves the file half-written if
the process is interrupted, and races between concurrent writers can
yield a corrupted blob. Writing to ``<path>.tmp`` first and then
``os.replace`` makes the publish atomic on POSIX and Windows — a
crash mid-write leaves the prior cache file untouched, and concurrent
writers race only on the rename.
@chitboon
Copy link
Copy Markdown
Author

Thanks for the catch — addressed in f939f5f. _write now streams to <path>.tmp and uses os.replace to publish atomically, so a crash mid-write leaves the prior cache file intact and concurrent writers race only on the rename.

chitboon and others added 5 commits May 11, 2026 14:41
Adds a small daily / weekly bucket cache used by the upcoming non-US
news vendors (CLS, Cninfo, Eastmoney). Daily buckets for breaking
news; weekly (ISO-week-Monday-keyed) buckets for Cninfo since
disclosures spike at quarter-end.

Standalone module — unused at this commit. The vendor wiring lands
in a follow-up commit on this branch.
Adds normalize_ticker_for_yfinance() and applies it in the yfinance
news fetcher. Different platforms expose Shanghai tickers with either
.SH or .SS — yfinance only accepts .SS, so user input from any source
now resolves to a fetchable symbol.

Also tightens get_global_news_yfinance from four broad queries to two
high-signal queries. Roughly halves the macro-news fetch latency
without losing distinct signal (the four were heavily overlapping).
Adds three news vendors covering HK and CN A-share (SH / SZ) tickers:

  * Eastmoney  — editorial Chinese-language news (HK / SH / SZ)
  * CLS        — Cailianshe flash-news stream (HK / SH / SZ)
  * Cninfo     — official CSRC corporate disclosures (SH / SZ only)

The vendors are wired into route_to_vendor via a ticker-suffix
auto-router. When data_vendors.news_data = "auto" (now the default),
the router dispatches:

  * .HK            -> eastmoney + cls
  * .SS / .SH / .SZ -> eastmoney + cls + cninfo
  * everything else -> yfinance (existing behaviour)

Multi-vendor calls run sequentially and concatenate non-empty results
as Markdown blocks separated by horizontal rules. Per-vendor failures
and "[Vendor skip — ...]" markers are dropped from the merge so the
analyst sees whichever sources succeeded; if all sources fail or skip,
we fall through to yfinance as a final safety net.

Each vendor call goes through the per-vendor cache added in the
preceding commit — daily buckets for editorial / flash news, weekly
buckets (ISO-week-Monday-keyed) for Cninfo since disclosures spike at
quarter-end.

Adds akshare>=1.18 as a runtime dep (Eastmoney / Cninfo backends).
Imported lazily inside the vendors so users analysing only US tickers
pay no import cost.

Tests cover routing, the multi-vendor merge, skip-marker filtering,
cache hit/miss, and Cninfo's two known schema shapes (公告链接 vs
announcementId+orgId). All vendor calls are mocked at the akshare
import boundary — no live network.
Addresses inline review feedback on PR TauricResearch#792 from gemini-code-assist:
streaming directly to the cache file leaves the file half-written if
the process is interrupted, and races between concurrent writers can
yield a corrupted blob. Writing to ``<path>.tmp`` first and then
``os.replace`` makes the publish atomic on POSIX and Windows — a
crash mid-write leaves the prior cache file untouched, and concurrent
writers race only on the rename.
These tests mutate the global config (data_vendors, tool_vendors) but
had no tearDown, so the mutations leaked into subsequent test files
that read tool-level routing — causing intermittent failures in the
new news-vendor tests after rebase onto main.

The straightforward set_config(deepcopy(DEFAULT_CONFIG)) doesn't
suffice because set_config merges nested dicts one level deep, so
tool_vendors entries set in earlier tests survive the update. The
tearDown directly replaces the global _config to defeat the merge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@chitboon chitboon force-pushed the feat/news-source-vendors branch from f939f5f to 499d611 Compare May 11, 2026 07:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant