Fill 17,252 empty exchange + market cells in equities.csv#145
Merged
Conversation
Two passes:
**Pass 1 (deterministic, 17,240 rows)**: for every row with `exchange`
empty AND a recognisable ticker suffix (`.NX`, `.F`, `.DU`, `.MI`, ...),
derive the canonical exchange code from rows that already have that
same suffix populated. No external API. The mapping is unambiguous at
>=99% per suffix (verified before applying):
| Suffix | Exchange | Rows |
|---|---|---:|
| .NX | ENX (Euronext) | 11,364 |
| .F | FRA (Frankfurt) | 3,068 |
| .DU | DUS (Dusseldorf) | 624 |
| .BE | BER (Berlin) | 370 |
| .MI | MIL (Borsa Italiana) | 349 |
| .MU | MUN (Munich) | 247 |
| .SG | STU (Stuttgart) | 218 |
| .OL | OSL (Oslo) | 115 |
| .TI | TLO (EuroTLX) | 115 |
| .ST | STO (NASDAQ OMX Stockholm) | 89 |
| .BO | BSE (BSE India) | 75 |
| .PA | PAR (Euronext Paris) | 70 |
| .VI | VIE (Vienna) | 68 |
| .MX | MEX (Mexico) | 63 |
| Other suffixes | various | ~405 |
**Pass 2 (external resolution, 12 rows)**: for the 98 remaining rows
(US-style symbols with no suffix and every other column NaN — failed
scrapes), query yfinance and Finnhub. yfinance resolved 9; Finnhub's
US universe endpoint resolved 3 more (MIC mapping: XNYS->NYQ, XNAS->NMS,
OOTC->PNK). TradingView returned no additional matches on this set.
Stats:
* 17,240 deterministic suffix-based fills
* 12 external resolution fills (9 yfinance + 3 Finnhub)
* 17,252 `exchange` cells filled (was: 17,338 empty before)
* 17,252 `market` cells also filled (inherited from canonical
`exchange -> market` mapping)
* 86 rows still empty: US-style symbols not found in any free
data source (delisted warrants/units like `BFT-WT`, `CAS'U`,
`CCX'U`, etc.). Out of scope.
Validation: passes `test_exchange_market_one_to_one` (introduced in
JerBouma#143) — every exchange code in `equities.csv` still maps to exactly
one market label.
No row with a pre-existing populated `exchange` was modified. Only
empty cells filled.
6f3f337 to
de3257d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fills 17,252 of the 17,338 empty
exchange+marketcells inequities.csv— and the consequence-of-#143 invariant test (test_exchange_market_one_to_one) still passes on the result.exchangefilledmarketfilled (1:1 with exchange)No row with a pre-existing populated
exchangewas modified. Only empty cells filled.1. Deterministic pass (17,240 rows, no external API)
For every row with
exchangeempty AND a recognisable ticker suffix (.NX,.F,.DU,.MI, …), the canonical exchange code is derived from rows that already have that same suffix populated in FD. Every suffix maps to exactly one exchange code at ≥99% in the existing data..NXENX(Euronext).FFRA(Frankfurt Stock Exchange).DUDUS(Dusseldorf Stock Exchange).BEBER(Berlin Stock Exchange).MIMIL(Borsa Italiana).MUMUN(Munich Stock Exchange).SGSTU(Stuttgart Stock Exchange).OLOSL(Oslo Bors).TITLO(EuroTLX).STSTO(NASDAQ OMX Stockholm).BOBSE(BSE India).PAPAR(Euronext Paris).VIVIE(Vienna Stock Exchange).MXMEX(Mexico Stock Exchange)The
marketcolumn is filled from FD's own canonicalexchange → marketmapping (also derived from existing populated rows).2. External resolution pass (12 rows, US-style symbols)
After Pass 1, 98 rows still had empty
exchange. These are US-style symbols (no ticker suffix) with every other column alsoNaN— failed scrapes from an older import. yfinance resolved 12:3. The 86 remaining rows — out of scope here
After both passes, 86 rows still have empty
exchange. All are US-style symbols representing delisted tickers (old SPAC units likeCCX'U, warrants likeBFT-WT, post-bankruptcyQ-suffixed tickers likeSVTLQ, etc.).Cross-checked all 86 against Morningstar's published Re-Used Ticker Symbols list:
CATS,CEL,CTL,ENT,ERI,HDS,MGEN,RMG,WMGI. These are ambiguous — the FD row represents the historical company but the live symbol now belongs to a different one. Safer left empty than written with a current exchange that would mislead consumers about which entity the row describes.delisted.csvfile proposed in [IMPROVE] Split equities/etfs/funds CSVs by exchange for maintainability (no data change) #144 (per-exchange shards + a single delisted file as a clean partition of "currently listed").Validation
Passes the
test_exchange_market_one_to_oneinvariant introduced in #143:Every newly-filled exchange/market pair was sourced from FD's own canonical mapping, so the invariant cannot be broken by this PR.
Diff shape
1 file changed, 17,253 insertions / 17,253 deletionsondatabase/equities.csv. Two columns touched (exchange,market); no rows added or removed.Test plan
pytest tests/— all tests pass; in particulartest_exchange_market_one_to_onepasses.NXticker (e.g.XYWG.NX) should now haveexchange = ENX,market = "Euronext".Fticker should now haveexchange = FRA,market = "Frankfurt Stock Exchange"exchangepopulated has been modifiedRelated: #143 (introduced the invariant test), #144 (proposes the
delisted.csvhome for the 77 remaining rows).