Skip to content

fix: Normalize Cassandra column name casing for SortedFeatureView sch…#360

Merged
Manisha4 merged 5 commits into
masterfrom
cassandra-fv-fix
May 5, 2026
Merged

fix: Normalize Cassandra column name casing for SortedFeatureView sch…#360
Manisha4 merged 5 commits into
masterfrom
cassandra-fv-fix

Conversation

@Manisha4
Copy link
Copy Markdown
Collaborator

@Manisha4 Manisha4 commented Apr 29, 2026

What this PR does / why we need it:

  • Fix Cassandra _alter_table failing with "Column already exists" on re-materialization of SortedFeatureViews with mixed-case feature names
  • Fix Go OnlineReadRange producing "Undefined column name" errors when reading those same features
  • Add registration-time validation to reject SortedFeatureView features that differ only in case, preventing silent data loss on Cassandra/Scylla

Which issue(s) this PR fixes:

Cassandra and Scylla case-fold unquoted column identifiers to lowercase at storage time. A feature named pct_filter_bookings_adult_countXdestination_geo_id_2w is stored as pct_filter_bookings_adult_countxdestination_geo_id_2w on disk.

Python materialization path: _alter_table compared desired_cols (original case from FV definition) against existing_cols (lowercase from Cassandra metadata) using a case-sensitive set difference. On every materialization after the initial table creation, it concluded the column was "new" and attempted
ALTER TABLE ADD, which Cassandra rejected because the lowercase column already existed.

Go read path: buildRangeQueryCQL and rangeFilterToCQL wrapped column identifiers in double quotes ("featureX"), making Cassandra perform a case-sensitive lookup against the always-lowercase stored column — resulting in "Undefined column name" errors.

Checks

  • I've made sure the tests are passing.
  • My commits are signed off (git commit -s)
  • My PR title follows conventional commits format

Testing Strategy

  • Unit tests
  • Integration tests
  • Manual tests
  • Testing is not required for this change

Misc

Manisha4 and others added 4 commits April 29, 2026 16:09
…ema operations

- Cassandra/Scylla case-fold unquoted identifiers to lowercase at storage time. The _alter_table method and Go OnlineReadRange path compared feature names case-sensitively against Cassandra metadata, causing "Column already exists" errors on materialization and "Undefined column"errors on reads for any SortedFeatureView with mixed-case feature names.
- Add _canonical_column_name (Python) and canonicalColumnName (Go) helpers
- Patch _alter_table to use case-insensitive column diff
- Patch buildRangeQueryCQL, rangeFilterToCQL, and MapScan loop to use unquoted lowercase identifiers in CQL while preserving original case in API response payloads
- Add case-collision validator to SortedFeatureView.ensure_valid()
- No DDL changes to existing tables; no modifications to the tall-schema (regular FeatureView) read/write paths
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ureView

Replace the O(n²) generator scan with an auxiliary canonical_to_original
dict for constant-time collision detection. Also improve the error
message to be more directive.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rning in FV layer

Backend-specific constraints belong in the backend. SortedFeatureView
now emits a logger.warning for case-colliding feature names (they work
fine on Valkey/DynamoDB). The Cassandra plugin rejects them with a hard
CassandraInvalidConfig error in _create_table and _alter_table, where
the constraint actually applies.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment on lines +782 to 789
// Use unquoted, lowercased identifiers to match the on-disk form written
// by the Python materializer. Quoting + mixed case would make Cassandra
// do a case-sensitive lookup that misses the (always-lowercase) stored
// column.
columnRefs := make([]string, len(featureNames))
for i, name := range featureNames {
quotedFeatures[i] = fmt.Sprintf(`"%s"`, name)
columnRefs[i] = canonicalColumnName(name)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense for new tables created with unquoted identifiers, but I’m a little worried about backward compatibility. If any existing deployments have quoted mixed-case column names, forcing lowercase unquoted refs here could miss those columns when doing reads right?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous write function has always written unquoted DDL (see _build_sorted_table_cql and the old _alter_table), so the write path in Cassandra case-folds everything to lowercase on storage. Every existing table should already have lowercase columns, the only path where this is not true is if someone directly calls the Cassandra client and writes to it.

Comment on lines +840 to +841
`SELECT entity_key, event_ts, %s FROM %s WHERE %s%s%s%s`,
strings.Join(columnRefs, ", "),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same compatibility question here on the SELECT projection: we now always emit lowercase unquoted column names. Can we confirm this is safe for all previously created Cassandra/Scylla tables, including any that may have been created with quoted identifiers?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above comment, the write path always stored it as lower case column names

assert.NotContains(t, cql, "ORDER BY", "ORDER BY should be omitted when all SortKeyFilters have Order == nil")
}

func TestBuildRangeQueryCQL_UsesLowercaseUnquotedIdentifiers(t *testing.T) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also add coverage for the OnlineReadRange read path (mixed-case feature refs vs lowercase row keys) at cassandraonlinestore.go:954, since that is the actual regression fix?

Copy link
Copy Markdown
Collaborator

@vanitabhagwat vanitabhagwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Manisha4 Manisha4 merged commit 0b4447e into master May 5, 2026
34 of 36 checks passed
@Manisha4 Manisha4 deleted the cassandra-fv-fix branch May 5, 2026 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants