fix: Normalize Cassandra column name casing for SortedFeatureView sch… by Manisha4 · Pull Request #360 · ExpediaGroup/feast

Manisha4 · 2026-04-29T23:12:26Z

What this PR does / why we need it:

Fix Cassandra _alter_table failing with "Column already exists" on re-materialization of SortedFeatureViews with mixed-case feature names
Fix Go OnlineReadRange producing "Undefined column name" errors when reading those same features
Add registration-time validation to reject SortedFeatureView features that differ only in case, preventing silent data loss on Cassandra/Scylla

Which issue(s) this PR fixes:

Cassandra and Scylla case-fold unquoted column identifiers to lowercase at storage time. A feature named pct_filter_bookings_adult_countXdestination_geo_id_2w is stored as pct_filter_bookings_adult_countxdestination_geo_id_2w on disk.

Python materialization path: _alter_table compared desired_cols (original case from FV definition) against existing_cols (lowercase from Cassandra metadata) using a case-sensitive set difference. On every materialization after the initial table creation, it concluded the column was "new" and attempted
ALTER TABLE ADD, which Cassandra rejected because the lowercase column already existed.

Go read path: buildRangeQueryCQL and rangeFilterToCQL wrapped column identifiers in double quotes ("featureX"), making Cassandra perform a case-sensitive lookup against the always-lowercase stored column — resulting in "Undefined column name" errors.

Checks

I've made sure the tests are passing.
My commits are signed off (git commit -s)
My PR title follows conventional commits format

Testing Strategy

Unit tests
Integration tests
Manual tests
Testing is not required for this change

Misc

…ema operations - Cassandra/Scylla case-fold unquoted identifiers to lowercase at storage time. The _alter_table method and Go OnlineReadRange path compared feature names case-sensitively against Cassandra metadata, causing "Column already exists" errors on materialization and "Undefined column"errors on reads for any SortedFeatureView with mixed-case feature names. - Add _canonical_column_name (Python) and canonicalColumnName (Go) helpers - Patch _alter_table to use case-insensitive column diff - Patch buildRangeQueryCQL, rangeFilterToCQL, and MapScan loop to use unquoted lowercase identifiers in CQL while preserving original case in API response payloads - Add case-collision validator to SortedFeatureView.ensure_valid() - No DDL changes to existing tables; no modifications to the tall-schema (regular FeatureView) read/write paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ureView Replace the O(n²) generator scan with an auxiliary canonical_to_original dict for constant-time collision detection. Also improve the error message to be more directive. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…rning in FV layer Backend-specific constraints belong in the backend. SortedFeatureView now emits a logger.warning for case-colliding feature names (they work fine on Valkey/DynamoDB). The Cassandra plugin rejects them with a hard CassandraInvalidConfig error in _create_table and _alter_table, where the constraint actually applies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

zabarn · 2026-05-01T20:58:53Z

+	// Use unquoted, lowercased identifiers to match the on-disk form written
+	// by the Python materializer. Quoting + mixed case would make Cassandra
+	// do a case-sensitive lookup that misses the (always-lowercase) stored
+	// column.
+	columnRefs := make([]string, len(featureNames))
 	for i, name := range featureNames {
-		quotedFeatures[i] = fmt.Sprintf(`"%s"`, name)
+		columnRefs[i] = canonicalColumnName(name)
 	}


This makes sense for new tables created with unquoted identifiers, but I’m a little worried about backward compatibility. If any existing deployments have quoted mixed-case column names, forcing lowercase unquoted refs here could miss those columns when doing reads right?

The previous write function has always written unquoted DDL (see _build_sorted_table_cql and the old _alter_table), so the write path in Cassandra case-folds everything to lowercase on storage. Every existing table should already have lowercase columns, the only path where this is not true is if someone directly calls the Cassandra client and writes to it.

zabarn · 2026-05-01T20:59:58Z

+		`SELECT entity_key, event_ts, %s FROM %s WHERE %s%s%s%s`,
+		strings.Join(columnRefs, ", "),


Same compatibility question here on the SELECT projection: we now always emit lowercase unquoted column names. Can we confirm this is safe for all previously created Cassandra/Scylla tables, including any that may have been created with quoted identifiers?

Same as above comment, the write path always stored it as lower case column names

zabarn · 2026-05-01T21:04:38Z

 	assert.NotContains(t, cql, "ORDER BY", "ORDER BY should be omitted when all SortKeyFilters have Order == nil")
 }
+
+func TestBuildRangeQueryCQL_UsesLowercaseUnquotedIdentifiers(t *testing.T) {


Could we also add coverage for the OnlineReadRange read path (mixed-case feature refs vs lowercase row keys) at cassandraonlinestore.go:954, since that is the actual regression fix?

vanitabhagwat

LGTM

Manisha4 and others added 4 commits April 29, 2026 16:09

style: Apply ruff formatting to cassandra_online_store.py

34c2daf

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

zabarn reviewed May 1, 2026

View reviewed changes

update PR with test and related changes

e6909f1

vanitabhagwat approved these changes May 4, 2026

View reviewed changes

zabarn added the ok-to-test label May 5, 2026

zabarn approved these changes May 5, 2026

View reviewed changes

Manisha4 merged commit 0b4447e into master May 5, 2026
34 of 36 checks passed

Manisha4 deleted the cassandra-fv-fix branch May 5, 2026 20:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Normalize Cassandra column name casing for SortedFeatureView sch…#360

fix: Normalize Cassandra column name casing for SortedFeatureView sch…#360
Manisha4 merged 5 commits into
masterfrom
cassandra-fv-fix

Manisha4 commented Apr 29, 2026 •

edited

Loading

Uh oh!

zabarn May 1, 2026

Uh oh!

Manisha4 May 1, 2026

Uh oh!

zabarn May 1, 2026

Uh oh!

Manisha4 May 1, 2026

Uh oh!

zabarn May 1, 2026

Uh oh!

vanitabhagwat left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		`SELECT entity_key, event_ts, %s FROM %s WHERE %s%s%s%s`,
		strings.Join(columnRefs, ", "),

Conversation

Manisha4 commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it:

Which issue(s) this PR fixes:

Checks

Testing Strategy

Misc

Uh oh!

zabarn May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Manisha4 May 1, 2026

Choose a reason for hiding this comment

Uh oh!

zabarn May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Manisha4 May 1, 2026

Choose a reason for hiding this comment

Uh oh!

zabarn May 1, 2026

Choose a reason for hiding this comment

Uh oh!

vanitabhagwat left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Manisha4 commented Apr 29, 2026 •

edited

Loading