Skip to content

Fill 17,252 empty exchange + market cells in equities.csv#145

Merged
JerBouma merged 1 commit into
JerBouma:mainfrom
dokson:feature/fill-missing-exchange
May 19, 2026
Merged

Fill 17,252 empty exchange + market cells in equities.csv#145
JerBouma merged 1 commit into
JerBouma:mainfrom
dokson:feature/fill-missing-exchange

Conversation

@dokson
Copy link
Copy Markdown
Contributor

@dokson dokson commented May 18, 2026

Summary

Fills 17,252 of the 17,338 empty exchange + market cells in equities.csv — and the consequence-of-#143 invariant test (test_exchange_market_one_to_one) still passes on the result.

Pass Method Cells filled
1 Deterministic, from ticker suffix using FD's own data 17,240
2 External resolution (yfinance) for US-style symbols 12
Total exchange filled 17,252
Total market filled (1:1 with exchange) 17,252

No row with a pre-existing populated exchange was modified. Only empty cells filled.

1. Deterministic pass (17,240 rows, no external API)

For every row with exchange empty AND a recognisable ticker suffix (.NX, .F, .DU, .MI, …), the canonical exchange code is derived from rows that already have that same suffix populated in FD. Every suffix maps to exactly one exchange code at ≥99% in the existing data.

Suffix Maps to (FD canonical) Rows filled
.NX ENX (Euronext) 11,364
.F FRA (Frankfurt Stock Exchange) 3,068
.DU DUS (Dusseldorf Stock Exchange) 624
.BE BER (Berlin Stock Exchange) 370
.MI MIL (Borsa Italiana) 349
.MU MUN (Munich Stock Exchange) 247
.SG STU (Stuttgart Stock Exchange) 218
.OL OSL (Oslo Bors) 115
.TI TLO (EuroTLX) 115
.ST STO (NASDAQ OMX Stockholm) 89
.BO BSE (BSE India) 75
.PA PAR (Euronext Paris) 70
.VI VIE (Vienna Stock Exchange) 68
.MX MEX (Mexico Stock Exchange) 63
Other long-tail suffixes various ~405

The market column is filled from FD's own canonical exchange → market mapping (also derived from existing populated rows).

2. External resolution pass (12 rows, US-style symbols)

After Pass 1, 98 rows still had empty exchange. These are US-style symbols (no ticker suffix) with every other column also NaN — failed scrapes from an older import. yfinance resolved 12:

3. The 86 remaining rows — out of scope here

After both passes, 86 rows still have empty exchange. All are US-style symbols representing delisted tickers (old SPAC units like CCX'U, warrants like BFT-WT, post-bankruptcy Q-suffixed tickers like SVTLQ, etc.).

Cross-checked all 86 against Morningstar's published Re-Used Ticker Symbols list:

  • 9 tickers are listed as reused (the symbol has been reassigned to a different company by an exchange): CATS, CEL, CTL, ENT, ERI, HDS, MGEN, RMG, WMGI. These are ambiguous — the FD row represents the historical company but the live symbol now belongs to a different one. Safer left empty than written with a current exchange that would mislead consumers about which entity the row describes.
  • 77 tickers are NOT in the reused list (truly delisted, never reassigned). These are perfect candidates for the delisted.csv file proposed in [IMPROVE] Split equities/etfs/funds CSVs by exchange for maintainability (no data change) #144 (per-exchange shards + a single delisted file as a clean partition of "currently listed").

Validation

Passes the test_exchange_market_one_to_one invariant introduced in #143:

Each exchange code must map to exactly one market label.

Every newly-filled exchange/market pair was sourced from FD's own canonical mapping, so the invariant cannot be broken by this PR.

Diff shape

1 file changed, 17,253 insertions / 17,253 deletions on database/equities.csv. Two columns touched (exchange, market); no rows added or removed.

Test plan

  • CI categorization + markdown linters pass
  • pytest tests/ — all tests pass; in particular test_exchange_market_one_to_one passes
  • Spot-check: a .NX ticker (e.g. XYWG.NX) should now have exchange = ENX, market = "Euronext"
  • Spot-check: an .F ticker should now have exchange = FRA, market = "Frankfurt Stock Exchange"
  • Confirm no row that already had exchange populated has been modified

Related: #143 (introduced the invariant test), #144 (proposes the delisted.csv home for the 77 remaining rows).

Two passes:

**Pass 1 (deterministic, 17,240 rows)**: for every row with `exchange`
empty AND a recognisable ticker suffix (`.NX`, `.F`, `.DU`, `.MI`, ...),
derive the canonical exchange code from rows that already have that
same suffix populated. No external API. The mapping is unambiguous at
>=99% per suffix (verified before applying):

| Suffix | Exchange | Rows |
|---|---|---:|
| .NX | ENX (Euronext) | 11,364 |
| .F | FRA (Frankfurt) | 3,068 |
| .DU | DUS (Dusseldorf) | 624 |
| .BE | BER (Berlin) | 370 |
| .MI | MIL (Borsa Italiana) | 349 |
| .MU | MUN (Munich) | 247 |
| .SG | STU (Stuttgart) | 218 |
| .OL | OSL (Oslo) | 115 |
| .TI | TLO (EuroTLX) | 115 |
| .ST | STO (NASDAQ OMX Stockholm) | 89 |
| .BO | BSE (BSE India) | 75 |
| .PA | PAR (Euronext Paris) | 70 |
| .VI | VIE (Vienna) | 68 |
| .MX | MEX (Mexico) | 63 |
| Other suffixes | various | ~405 |

**Pass 2 (external resolution, 12 rows)**: for the 98 remaining rows
(US-style symbols with no suffix and every other column NaN — failed
scrapes), query yfinance and Finnhub. yfinance resolved 9; Finnhub's
US universe endpoint resolved 3 more (MIC mapping: XNYS->NYQ, XNAS->NMS,
OOTC->PNK). TradingView returned no additional matches on this set.

Stats:
 * 17,240 deterministic suffix-based fills
 *     12 external resolution fills (9 yfinance + 3 Finnhub)
 * 17,252 `exchange` cells filled (was: 17,338 empty before)
 * 17,252 `market` cells also filled (inherited from canonical
                                      `exchange -> market` mapping)
 *     86 rows still empty: US-style symbols not found in any free
        data source (delisted warrants/units like `BFT-WT`, `CAS'U`,
        `CCX'U`, etc.). Out of scope.

Validation: passes `test_exchange_market_one_to_one` (introduced in
JerBouma#143) — every exchange code in `equities.csv` still maps to exactly
one market label.

No row with a pre-existing populated `exchange` was modified. Only
empty cells filled.
@dokson dokson force-pushed the feature/fill-missing-exchange branch from 6f3f337 to de3257d Compare May 18, 2026 23:14
@JerBouma JerBouma merged commit 82dff27 into JerBouma:main May 19, 2026
3 checks passed
@dokson dokson deleted the feature/fill-missing-exchange branch May 19, 2026 06:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants