feat(dataflows): CLS / Cninfo / Eastmoney news for HK and CN A-share tickers#792
feat(dataflows): CLS / Cninfo / Eastmoney news for HK and CN A-share tickers#792chitboon wants to merge 5 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces comprehensive support for Chinese-market data (HK, Shanghai, and Shenzhen) by integrating the akshare library and adding new news vendors: Eastmoney, CLS, and Cninfo. It implements an "auto" routing mechanism that selects the appropriate vendor based on ticker suffixes and merges results from multiple sources for Asian markets. Additionally, a new on-disk caching layer is introduced to optimize performance for slow-changing data like financial statements and global news. Feedback was provided regarding the caching mechanism to ensure atomic file writes, preventing potential data corruption during concurrent access.
| with open(_path_for(key), "w", encoding="utf-8") as f: | ||
| f.write(value) |
There was a problem hiding this comment.
Writing directly to the cache file can lead to corrupted data if the process is interrupted or if multiple processes attempt to write to the same file simultaneously. It is safer to write to a temporary file first and then use os.replace() to perform an atomic update.
| with open(_path_for(key), "w", encoding="utf-8") as f: | |
| f.write(value) | |
| temp_path = f"{_path_for(key)}.tmp" | |
| with open(temp_path, "w", encoding="utf-8") as f: | |
| f.write(value) | |
| os.replace(temp_path, _path_for(key)) |
There was a problem hiding this comment.
Addressed in f939f5f — _write now uses tmp + os.replace for atomic publish.
729dce7 to
8267393
Compare
Addresses inline review feedback on PR TauricResearch#792 from gemini-code-assist: streaming directly to the cache file leaves the file half-written if the process is interrupted, and races between concurrent writers can yield a corrupted blob. Writing to ``<path>.tmp`` first and then ``os.replace`` makes the publish atomic on POSIX and Windows — a crash mid-write leaves the prior cache file untouched, and concurrent writers race only on the rename.
|
Thanks for the catch — addressed in f939f5f. |
Adds a small daily / weekly bucket cache used by the upcoming non-US news vendors (CLS, Cninfo, Eastmoney). Daily buckets for breaking news; weekly (ISO-week-Monday-keyed) buckets for Cninfo since disclosures spike at quarter-end. Standalone module — unused at this commit. The vendor wiring lands in a follow-up commit on this branch.
Adds normalize_ticker_for_yfinance() and applies it in the yfinance news fetcher. Different platforms expose Shanghai tickers with either .SH or .SS — yfinance only accepts .SS, so user input from any source now resolves to a fetchable symbol. Also tightens get_global_news_yfinance from four broad queries to two high-signal queries. Roughly halves the macro-news fetch latency without losing distinct signal (the four were heavily overlapping).
Adds three news vendors covering HK and CN A-share (SH / SZ) tickers: * Eastmoney — editorial Chinese-language news (HK / SH / SZ) * CLS — Cailianshe flash-news stream (HK / SH / SZ) * Cninfo — official CSRC corporate disclosures (SH / SZ only) The vendors are wired into route_to_vendor via a ticker-suffix auto-router. When data_vendors.news_data = "auto" (now the default), the router dispatches: * .HK -> eastmoney + cls * .SS / .SH / .SZ -> eastmoney + cls + cninfo * everything else -> yfinance (existing behaviour) Multi-vendor calls run sequentially and concatenate non-empty results as Markdown blocks separated by horizontal rules. Per-vendor failures and "[Vendor skip — ...]" markers are dropped from the merge so the analyst sees whichever sources succeeded; if all sources fail or skip, we fall through to yfinance as a final safety net. Each vendor call goes through the per-vendor cache added in the preceding commit — daily buckets for editorial / flash news, weekly buckets (ISO-week-Monday-keyed) for Cninfo since disclosures spike at quarter-end. Adds akshare>=1.18 as a runtime dep (Eastmoney / Cninfo backends). Imported lazily inside the vendors so users analysing only US tickers pay no import cost. Tests cover routing, the multi-vendor merge, skip-marker filtering, cache hit/miss, and Cninfo's two known schema shapes (公告链接 vs announcementId+orgId). All vendor calls are mocked at the akshare import boundary — no live network.
Addresses inline review feedback on PR TauricResearch#792 from gemini-code-assist: streaming directly to the cache file leaves the file half-written if the process is interrupted, and races between concurrent writers can yield a corrupted blob. Writing to ``<path>.tmp`` first and then ``os.replace`` makes the publish atomic on POSIX and Windows — a crash mid-write leaves the prior cache file untouched, and concurrent writers race only on the rename.
These tests mutate the global config (data_vendors, tool_vendors) but had no tearDown, so the mutations leaked into subsequent test files that read tool-level routing — causing intermittent failures in the new news-vendor tests after rebase onto main. The straightforward set_config(deepcopy(DEFAULT_CONFIG)) doesn't suffice because set_config merges nested dicts one level deep, so tool_vendors entries set in earlier tests survive the update. The tearDown directly replaces the global _config to defeat the merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
f939f5f to
499d611
Compare
Summary
Adds first-class news coverage for HK and CN A-share (SH / SZ) tickers
via three new vendors plus a ticker-suffix auto-router. Purely additive at the
data layer — US tickers are unchanged.
.HK.SS/.SH/.SZThe router engages when
data_vendors.news_datais set to"auto". This PRflips the default from
"yfinance"to"auto"so coverage is on out of thebox; setting it back to
"yfinance"(or"alpha_vantage") restores legacysingle-vendor behaviour.
What's in this PR (3 commits)
by the new vendors. Reduces repeat-fetch cost when analysing multiple
tickers in the same day or re-running the same ticker the same week.
.SH→.SS) — Yahoo only accepts.SSfor Shanghai; users pasting
.SHfrom other sources now route correctly.Also tightens
get_global_news_yfinancefrom 4 overlapping queries to 2,roughly halving macro-news fetch latency.
route_to_vendormulti-source merge,default_configflip,akshare>=1.18dep, tests.
What's NOT in this PR (held back deliberately)
testing before shipping; will follow as a separate PR.
news_analyst.pyalreadyroutes through
get_news→route_to_vendor, so the new vendors engagewithout analyst-side changes. The single-shot rewrite is a separate
architectural concern and not part of this PR.
Failure modes
[Vendor skip — ...]marker is dropped from themulti-source merge; the analyst sees whichever sources succeeded.
safety net — never raises.
Tests
unittest.mockat the akshare import boundary— no live network calls in CI.
cache hit/miss across daily and weekly buckets, Cninfo's two known schema
shapes (
公告链接direct vsannouncementId + orgIdreconstruction).New dep
akshare>=1.18— wraps Eastmoney + Cninfo endpoints. Imported lazilyinside the vendor modules; users analysing only US tickers pay no import
cost.
Test plan
python -m pytest tests/— all green locally_resolve_auto_vendorssmoke covers.HK,.SS,.SH,.SZ,.SI,AAPL, andget_global_newsnews_data: "auto"default flip is desired —happy to revert to
"yfinance"and document"auto"opt-in if preferred🤖 Generated with Claude Code