Skip to content

Commit 16717b8

Browse files
authored
ASE exchange fix + ISIN/FIGI backfill from public data (JerBouma#143)
* Fix 580 misclassified exchanges from "ASE" to correct yfinance value Addresses JerBouma#133. The `exchange` column in `equities.csv` reported "ASE" for 1,632 rows, but cross-checking each against yfinance revealed only 257 actually belong to NYSE American. The other rows were misclassified and should report a different exchange code. Gating: * Row must currently have exchange = "ASE" * yfinance `Ticker(symbol).info["exchange"]` must be non-null * Must match `^[A-Z]{2,5}$` (plausible exchange code) * yfinance's value must differ from "ASE" Stats from the 1,632 "ASE" rows: | yfinance returns | Count | Action | |---|---:|---| | NONE / unknown | 795 | kept as "ASE" (cannot validate) | | "ASE" (legit NYSE American) | 257 | kept as "ASE" | | "NYQ" (NYSE main) | 546 | **fixed** -> "NYQ" | | "PNK" (OTC Pink) | 12 | **fixed** -> "PNK" | | "NCM"/"NMS"/"NGM" (Nasdaq tiers) | 15 | **fixed** | | "OQB"/"OID" (OTC Markets) | 6 | **fixed** | | "PCX" (NYSE Arca) | 1 | **fixed** | | errors | 18 | left as "ASE" | Total fixed: 580 rows. Notable corrections include common NYSE main-board listings that JerBouma#133 explicitly flagged as wrong (ARX, ALH, etc., all now "NYQ"). No row was overwritten where yfinance returned "ASE" — the 257 genuine NYSE American listings remain untouched. Source: yfinance `Ticker(symbol).info["exchange"]`. The 795 rows yfinance couldn't resolve are mostly delisted preferred stocks or thinly-covered exotic tickers — best left as-is until a different data source is added. Related: JerBouma#133 * Regenerate snapshot fixtures after ASE exchange fix * Backfill 16,100 ISIN values across 30 international markets Where yfinance returned no ISIN (yfinance has thin coverage of non-US exchanges), publicly accessible exchange listing data provides the canonical ISIN. This commit fills 16,100 empty `isin` cells in equities.csv across 30 international markets. Source: aggregated public exchange screener data, queried by exchange + ticker. The ISINs returned are matched 1:1 with the ticker symbol on each market. Markets covered (FD suffix -> market): | Market | ISINs filled | |---|---:| | Japan (.T) | 2,460 | | India (.BO) | 2,862 | | India (.NS) | 1,390 | | China (.SZ) | 1,922 | | China (.SS) | 1,396 | | Korea (.KQ) | 931 | | Korea (.KS) | 515 | | Canada (.V) | 616 | | Australia (.AX) | 605 | | Indonesia (.JK) | 516 | | UK (.L) | 484 | | Thailand (.BK) | 463 | | Hong Kong (.HK) | 319 | | Canada (.TO) | 258 | | France (.PA) | 243 | | Brazil (.SA) | 212 | | Sweden (.ST) | 208 | | Switzerland (.SW) | 101 | | Germany (.F/.MU/.DU/.BE) | 322 | | Other (Norway, Netherlands, Spain, Italy, Mexico, Vietnam, Austria, Singapore) | ~870 | Gating (same as the yfinance backfill in JerBouma#139): * ISIN regex `^[A-Z]{2}[A-Z0-9]{9}[0-9]$` * ISIN Mod-10 Double-Add-Double check digit * Skip rows where FD already has an `isin` value (zero overwrite) * Match by exchange-suffix-mapped-to-market + naked ticker Stats: 16,100 filled, 0 rejected by checksum. Markets explicitly NOT covered: * `.NX`, `.SG`, `.VI`, parts of German exchanges — coverage of small exchanges is incomplete * `.KL` (Malaysia) — the upstream source indexes Malaysian stocks by company name (e.g. "MAYBANK") whereas FD uses numeric codes (e.g. "0007.KL"); cannot be matched without a separate name->code map * `.US` (US tickers) — already covered by JerBouma#139 (yfinance) CUSIP derivation: not applicable for these rows since they are predominantly non-US/CA ISINs whose middle 9 characters are local national numbering systems (Japanese Securities Code, SEDOL, WKN, etc.) rather than CUSIPs. Related: JerBouma#78 * Backfill 13,770 FIGI identifiers for US equities Populates `figi`, `composite_figi`, and `shareclass_figi` for US equity rows that had those columns empty. FIGI (Financial Instrument Global Identifier) is Bloomberg's openly-licensed identifier standard for financial instruments — the only freely-licensed global identifier (ISIN/CUSIP/SEDOL are paywalled standards). Source: public FIGI data from Bloomberg's OpenFIGI initiative, mirrored through publicly accessible exchange listing data. Stats: * 13,769 figi cells filled (was: 18,114 empty in US rows) * 13,469 composite_figi cells filled * 13,043 shareclass_figi cells filled * 13,770 unique rows touched * Diff: 13,770 insertions / 13,770 deletions Gating: * FIGI regex `^BBG[0-9A-Z]{9}$` * Skip rows where the column is already populated (zero overwrite) * Match by exact symbol Why FIGI matters: filling FD's FIGI columns enables downstream cross- reference (FIGI -> ISIN, FIGI -> CUSIP, FIGI -> SEDOL) via OpenFIGI's free public API. Not covered: * Non-US exchanges -- left for follow-up * Tickers using "." instead of "-" (e.g. BRK.A vs BRK-A) -- a handful of share-class tickers whose format diverges between sources * Align market column + revert 4 NGM collisions Follow-up to maintainer review on JerBouma#143: the `market` column was left unchanged when the `exchange` codes were corrected. Aligned 576 rows so `market` matches the canonical FD value for each new `exchange`: NYQ -> "New York Stock Exchange" NMS -> "NASDAQ Global Select" NCM -> "NASDAQ Capital Market" PNK / OQB / OID -> "OTC Bulletin Board" PCX -> "NYSE Arca" Reverted 4 rows whose exchange had been set to "NGM": * yfinance returned "NGM" meaning NASDAQ Global Market * but FD's "NGM" code is already used for "Nordic Growth Market" (Sweden) -- 270 existing rows * to avoid creating an ambiguous code, those 4 rows go back to "ASE" until FD adds a distinct code for NASDAQ Global Market Symbols reverted: BEEP, BHIL, CHACU, VOLT. Total fixed-by-this-PR is now 576 (down from 580). * Clean up exchange/market column inconsistencies + add consistency test Per maintainer feedback on JerBouma#143: keep `exchange` and `market` columns in lock-step. Fixed 87 rows where the short code and the human-readable label did not agree. Category A — exchange code non-canonical, market is correct (84 rows): * NAS (12 rows, market "NASDAQ Global Select") -> NMS * NYS (72 rows, market "New York Stock Exchange") -> NYQ Category B — market label is wrong, exchange is correct (3 rows): * BTS-CATG (Leverage Shares 2X Long CAT Daily ETF on BATS BZX): market "OTC Bulletin Board" -> "BATS BZX Exchange" (Verified via Leverage Shares / Robinhood / Benzinga.) * NCM-CAPS (Capstone Holding Corp. on NASDAQ Capital Market): market "OTC Bulletin Board" -> "NASDAQ Capital Market" * NSI-LT.NS (Larsen & Toubro on NSE India, .NS suffix confirms): market "Metropolitan Stock Exchange" -> "National Stock Exchange of India" After the fix every exchange code in equities.csv maps to exactly one market label. Added `tests/test_equities.py::test_exchange_market_one_to_one` that asserts the forward direction (exchange -> 1 market) so future PRs cannot silently re-introduce the kind of drift fixed here. The reverse direction is intentionally not asserted: a market label may legitimately cover several exchange tiers (e.g. "OTC Bulletin Board" covers PNK / OQB / OID / OEM / OQX).
1 parent 2117100 commit 16717b8

3 files changed

Lines changed: 30004 additions & 29976 deletions

File tree

0 commit comments

Comments
 (0)