Skip to content

Fix DuckDB bulk_insert failing when quoted CSV cells fall outside sniffer sample#64

Merged
martinv13 merged 3 commits into
cre-dev:mainfrom
martinv13:fix/duckdb-bulk-insert-quote-char
Jun 17, 2026
Merged

Fix DuckDB bulk_insert failing when quoted CSV cells fall outside sniffer sample#64
martinv13 merged 3 commits into
cre-dev:mainfrom
martinv13:fix/duckdb-bulk-insert-quote-char

Conversation

@martinv13

Copy link
Copy Markdown
Collaborator

Summary

Fixes #62.

  • DuckDB's read_csv sniffer examines only the first ~20 480 rows as its sample. If no quoted cells appear in that sample, it sets quote=(empty) — meaning it treats " as an ordinary character.
  • When join-transformed columns produce a value containing a comma (e.g. "val,ue",other), Python's csv.writer wraps the whole cell in outer quotes. Any such cell beyond row ~20 480 would cause a column-count mismatch and an Invalid Input Error.
  • Fix: pass quote='"' explicitly to read_csv in DuckDBDialect.bulk_insert, bypassing auto-detection entirely.

Test plan

  • New regression test test_duckdb_bulk_insert_quoted_csv_field_after_large_unquoted_sample inserts 25 000 plain rows followed by one row whose label contains a comma; verifies all rows are read back correctly.
  • All existing test_bulk_insert.py tests still pass.
  • Run full suite: TZ="Europe/Paris" DB_STRING="duckdb:///:memory:" python -m pytest

🤖 Generated with Claude Code

martinv13 and others added 3 commits June 17, 2026 16:13
…ffer sample (cre-dev#62)

DuckDB's read_csv sniffer only examines the first ~20 480 rows; if none are
quoted, it sets quote=(empty) and then errors on any later quoted cell with a
column-count mismatch.  Passing quote='"' explicitly bypasses auto-detection.
Adds a regression test that inserts 25 000 plain rows followed by one row
whose value contains a comma (triggering csv.writer quoting).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Makes the RFC 4180 doubling behaviour explicit rather than relying on
DuckDB defaulting escape to the same char as quote.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@martinv13 martinv13 merged commit 58866e1 into cre-dev:main Jun 17, 2026
9 checks passed
@martinv13 martinv13 deleted the fix/duckdb-bulk-insert-quote-char branch June 17, 2026 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DuckDBDialect.insert_into_temp_tables is missing quote='\"'

1 participant