Skip to content

feat(etf): improve ETF analysis with holdings, drill-down, and risk guidance#819

Open
hang666 wants to merge 6 commits into
TauricResearch:mainfrom
hang666:feat/equity-etf
Open

feat(etf): improve ETF analysis with holdings, drill-down, and risk guidance#819
hang666 wants to merge 6 commits into
TauricResearch:mainfrom
hang666:feat/equity-etf

Conversation

@hang666
Copy link
Copy Markdown

@hang666 hang666 commented May 12, 2026

Summary

This PR improves ETF analysis by routing ETF tickers through fund-specific profile, holdings, and constituent drill-down tools instead of treating ETFs like operating companies.

ETFs do not have company-style income statements, balance sheets, cash flows, or insider transactions. This change adds ETF-aware tooling and prompt guidance so the fundamentals and risk agents analyze ETFs through fund-specific dimensions such as holdings concentration, tracking strategy, expense ratio, AUM, liquidity, structure risk, and leveraged/inverse ETF decay.

Changes

  • Added ETF detection using yfinance quoteType, with cached lookups to avoid repeated metadata calls.
  • Added ETF-specific placeholders for company-style financial tools, including:
    • get_fundamentals
    • get_balance_sheet
    • get_cashflow
    • get_income_statement
    • get_insider_transactions
  • Added yfinance-backed ETF profile and holdings support.
  • Added Alpha Vantage ETF profile and holdings support through ETF_PROFILE.
  • Added holdings concentration metrics, including:
    • largest single holding
    • top-N aggregate weight
    • top-N Herfindahl index
  • Added ETF top-holdings drill-down to fetch constituent-level fundamentals and news for major ETF holdings.
  • Wired ETF tools into the fundamentals analyst and fundamentals ToolNode.
  • Updated the fundamentals analyst prompt to use ETF-specific tools for ETF tickers.
  • Added ETF-specific risk guidance for aggressive, neutral, and conservative risk debaters.
  • Added structural warnings for leveraged and inverse ETFs, including daily reset and path-dependent decay risks.
  • Added unit coverage for ETF detection, routing, placeholders, vendor rendering, drill-down, prompt context, ToolNode wiring, concentration metrics, and leveraged/inverse ETF warnings.

Design notes

  • ETF placeholders are applied at the dispatch layer instead of inside individual vendor modules. This keeps vendor modules provider-pure while ensuring routed tool calls consistently redirect ETF tickers to ETF-specific tools.
  • News tools are intentionally not blocked for ETF tickers. ETF-related news remains useful and continues through the normal news path.
  • The top-holdings drill-down tool is intended for small top_n values because each constituent triggers both a fundamentals call and a news call.
  • Direct vendor calls remain thin wrappers over upstream provider APIs; ETF-aware behavior is applied through normal routed tool calls.

Testing

python -m pytest tests/test_etf_support.py -q

hang666 added 5 commits May 12, 2026 16:14
…lder

ETFs are not companies — they have no income statement, balance sheet,
or cash flow. Before this change, get_fundamentals(SPY) returned mostly
empty fields and the fundamentals analyst happily wrote a "company
margins" narrative about an index basket.

This change adds the full data layer for cross-vendor ETF support:

- dataflows/etf_utils.py: vendor-agnostic ETF detection via cached
  yfinance quoteType lookup, an @etf_placeholder decorator that
  short-circuits company-financial functions on ETF input, and an
  ETF_PROTECTED_METHODS registry listing the methods that need the
  decorator and the human label that goes into the placeholder.

- dataflows/yfinance_etf.py + dataflows/alpha_vantage_etf.py: two
  parallel vendor implementations of get_etf_profile and
  get_etf_holdings. yfinance covers US + HK ETFs; Alpha Vantage's
  ETF_PROFILE endpoint covers anything Alpha Vantage tracks. Both
  vendor modules degrade gracefully when their respective API
  returns no data (HK ETFs frequently expose .info without
  funds_data; Alpha Vantage may return an empty payload).

- agents/utils/etf_tools.py: @tool wrappers routed through
  interface.route_to_vendor.

- dataflows/interface.py: register the new vendor implementations in
  VENDOR_METHODS, declare the new etf_data category in
  TOOLS_CATEGORIES, and — the key architectural move — apply the ETF
  placeholder uniformly across every registered vendor of every
  company-financial method via _apply_etf_placeholders(). The
  placeholder is a tool-level semantic (ETF tickers redirect to ETF
  tools), not a vendor concern. Vendor modules stay vendor-pure;
  future vendors (akshare for A-share) inherit protection
  automatically. Direct calls into vendor modules intentionally
  bypass the placeholder — those are thin API wrappers, while the
  ETF redirect is the dispatch layer's job.

- default_config.py: etf_data default vendor.

- tests/conftest.py: autouse fixture clears the ETF quoteType LRU
  between tests so mocks don't leak across cases.

- tests/test_etf_support.py: 33 tests covering detection, caching,
  both vendor renderings (incl. funds_data / sectors / holdings
  degradation paths), routing through alpha_vantage and yfinance,
  routing fallback when the etf_data key is missing from legacy
  configs, the dispatch-layer placeholder firing uniformly across
  both vendors, and a smoke check confirming news tools are unaffected.
build_instrument_context now detects ETF tickers and appends a block
that (a) tells the LLM the instrument is a fund, not a company,
(b) lists the analysis dimensions that matter for an ETF (top
holdings, tracking strategy, expense ratio, NAV / premium-discount),
and (c) redirects it from the company-financial tools to
get_etf_profile / get_etf_holdings.

is_etf_ticker is re-exported through agent_utils so analyst modules
stay agent-namespace; they don't reach into dataflows directly.

The fundamentals analyst always binds both the company-financial
tools and the new ETF tools; its system_message branches on ETF
detection so the prompt aligns with the toolset the LLM should
actually use. Keeping the bound set fixed avoids per-call ToolNode
reconfiguration.

The fundamentals ToolNode registers all six tools because LangGraph
validates the entire bound toolset, not just the methods the analyst
chooses to call.

Tests cover the ETF branch / stock branch of build_instrument_context,
non-string input safety, agent_utils re-export, and the trading_graph
ToolNode registration.
Aggregate ETF metrics tell the analyst what's in the basket but not
why those names are moving. The drill-down tool fetches the top-N
constituents and routes each through the normal fundamentals + news
vendors so the analyst can reason about underlying-name catalysts.

- yfinance_etf.get_top_holding_tickers / alpha_vantage_etf.get_top_holding_tickers:
  structured extractors returning [(ticker, name, weight_pct), ...].
  yfinance normalizes bare HK codes (00939 → 00939.HK) and handles
  the decimal-vs-percent ambiguity yfinance ships in Holding Percent.
- dataflows/etf_drilldown.py: get_etf_top_holdings_drilldown
  orchestrates one fundamentals call + one news call per constituent
  through route_to_vendor. Per-constituent failures surface inline
  rather than killing the whole report. Fundamentals are capped at
  1500 chars, news at 1200 chars, to keep the LLM's context budget
  manageable across N constituents.
- interface.VENDOR_METHODS: register get_top_holding_tickers for
  both vendors under the etf_data category. The orchestration uses
  route_to_vendor so vendor fallback and configuration work the same
  way as the other ETF methods.
- agents/utils/etf_tools: @tool wrapper exposing the drill-down to
  the LLM with a usage-cost note in the docstring.
- agents/utils/agent_utils + fundamentals_analyst + trading_graph:
  bind the new tool, register it in the fundamentals ToolNode, and
  update the ETF-branch system_message to mention it (with explicit
  top_n ≤ 5 guidance).

Tests: 13 new cases covering the structured extractors (decimal +
HK normalization edge cases, empty payloads), drill-down
orchestration (non-ETF redirect, multi-constituent rendering,
per-constituent error containment, truncation), and routing
through both yfinance and alpha_vantage.
Two small ETF improvements that compound the existing pipeline:

1. Concentration metrics in get_etf_holdings.
   etf_utils.concentration_summary(weights_pct) renders a three-line
   block — largest single holding, top-N aggregate weight, and
   Herfindahl index across the shown top-N — so a thematic ETF (top-10
   = 95%, HHI = 0.10) and a broad index (top-10 = 30%, HHI = 0.01)
   read distinctly at a glance. Both vendor renderers (yfinance and
   alpha_vantage) call the shared helper after their CSV body, so the
   metrics appear regardless of which ETF data source is configured.

2. ETF dimensions in the risk debate.
   The three risk debators (aggressive / conservative / neutral)
   previously consumed the analyst reports without any ETF awareness,
   so their debate framed the instrument as a company. New
   agent_utils.build_etf_risk_block(ticker) emits a markdown block
   listing the axes that matter — liquidity, concentration, tracking
   risk, structure risk (leveraged/inverse decay), premium/discount,
   underlying-name catalysts. Each debator pulls the block from
   state["company_of_interest"] and appends it to its prompt right
   after the fundamentals report. The block is empty for non-ETF
   tickers, so debators append unconditionally without branching.

Tests: 13 new cases covering concentration math (renders all three
metrics, drops invalid weights, contrasts thematic vs broad), the
end-to-end holdings renderer for both vendors, the risk-block helper
across ETF / stock / empty input, and the three debators each
injecting the block into the LLM prompt for ETF tickers and omitting
it for stocks. Full unit suite: 140 passed (was 127).
SPY's report didn't surface this gap because SPY isn't leveraged, but
TQQQ / SQQQ / SDS / UPRO and friends are LLM-decision landmines: the
"holding period" narrative defaults to multi-week for any equity ETF,
yet daily-reset products decay path-dependently and a multi-day flat
underlying can still produce a negative ETF return. The risk-block's
six-axis treatment ("Structure risk: leveraged or inverse ETFs decay
daily…") buried the point in the middle of a list — easy to skim past.

This change escalates leveraged / inverse ETFs to a top-of-prompt
hard warning that fires for every analyst and every risk debator.

- etf_utils._yfinance_etf_category(ticker): new LRU-cached helper
  (independent of _yfinance_quote_type so we don't have to refetch).
- etf_utils.leverage_descriptor(ticker): classifies into "Leveraged",
  "Inverse", "Leveraged Inverse", or "" (not flagged). Detection uses
  the yfinance "Trading--<flavor>" category convention, requiring the
  "Trading--" prefix so categories like "Long-Term Bond" don't false-
  positive.
- etf_utils.clear_etf_cache(): now resets both caches so test
  isolation holds for the leverage detection path.
- agent_utils._leverage_warning(descriptor): single source of truth
  for the warning text — both injection sites render identical
  wording, no risk of drift between the analyst's prompt and the
  risk debators' prompt.
- agent_utils.build_instrument_context: appends the warning when a
  leveraged ETF is detected. Every agent that reads instrument
  context now sees it.
- agent_utils.build_etf_risk_block: prepends the warning to the
  existing six-axis block so the three risk debators encounter it
  first.

Tests: 17 new cases — descriptor classification across all four
return values, false-positive guards (Long-Term Bond, ordinary
Software-category equity), non-string input safety, the warning
appearing in build_instrument_context for leveraged ETFs and absent
for ordinary ETFs / stocks, the warning leading the risk block (not
buried), the three risk debators each propagating it to their LLM
prompt for TQQQ, and clear_etf_cache resetting the new cache.
Full unit suite: 177 passed (was 160).
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive ETF support by adding specialized data flows for yfinance and Alpha Vantage, including profile retrieval, holdings analysis, and a constituent drill-down tool. The system now detects ETF tickers to provide tailored analysis guidance and risk dimensions, such as structural warnings for leveraged products, while using placeholders to redirect agents away from irrelevant company-financial tools. Review feedback highlights opportunities to optimize network efficiency via consolidated caching, improve CSV generation robustness, and ensure consistent handling of varying vendor data formats for holding weights.

Comment thread tradingagents/dataflows/etf_utils.py Outdated
Comment thread tradingagents/dataflows/alpha_vantage_etf.py
Comment thread tradingagents/dataflows/alpha_vantage_etf.py Outdated
Comment thread tradingagents/dataflows/yfinance_etf.py Outdated
Comment thread tradingagents/dataflows/yfinance_etf.py Outdated
Five fixes flagged by code review on PR TauricResearch#819:

1. Consolidate yfinance .info caching. _yfinance_quote_type and
   _yfinance_etf_category now both delegate to a single cached
   _yfinance_info(ticker) helper. Before, each field-level cache hit
   .info separately, so leverage_descriptor (which needs both) burned
   two network round-trips per ticker. clear_etf_cache() now clears
   the consolidated cache.

2. Cache alpha_vantage _fetch_etf_profile_raw. get_etf_profile,
   get_etf_holdings, and get_top_holding_tickers all derive from the
   same ETF_PROFILE payload — without caching, analyzing one ETF burns
   three separate Alpha Vantage API calls and risks the free-tier
   daily quota. Added clear_etf_profile_cache() for test isolation
   and wired it into the conftest autouse fixture.

3. Use the csv module for Alpha Vantage holdings rendering. The
   previous manual f-string join produced malformed CSV when a
   description field contained a comma ("Berkshire Hathaway Inc,
   Class B"). csv.writer properly quotes such fields.

4. Apply the weight-percent heuristic in get_etf_holdings too. yfinance
   ships Holding Percent inconsistently (sometimes 0.07, sometimes
   7.0). get_top_holding_tickers already had the < 1.5 ? *100 : value
   heuristic; get_etf_holdings did a plain *100, so a value of 7.0
   rendered as 700%. Extracted _normalize_weight_pct as a shared
   helper so both call sites apply identical scaling.

5. Look up the weight column by keyword in get_top_holding_tickers.
   Previously it hardcoded "Holding Percent" and silently returned
   weight=0 when yfinance used a variant label ("Weight"). Extracted
   _find_weight_column as a shared helper.

Tests: +6 cases (88 ETF total, 164 unit total) covering the
consolidated-cache invariant ("two field reads → one network call"),
CSV round-trip with commas in descriptions, the weight heuristic
applied symmetrically in both renderers, and the column-lookup fix.
@hang666
Copy link
Copy Markdown
Author

hang666 commented May 12, 2026

Code Review

This pull request introduces comprehensive ETF support by adding specialized data flows for yfinance and Alpha Vantage, including profile retrieval, holdings analysis, and a constituent drill-down tool. The system now detects ETF tickers to provide tailored analysis guidance and risk dimensions, such as structural warnings for leveraged products, while using placeholders to redirect agents away from irrelevant company-financial tools. Review feedback highlights opportunities to optimize network efficiency via consolidated caching, improve CSV generation robustness, and ensure consistent handling of varying vendor data formats for holding weights.

Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant