Commit acf6fbc
miranov25
feat(schema): Separate Definition Schema from Record Schema
BREAKING CHANGE: None - fully backward compatible
## Summary
Implement clear separation between Definition Schema (blueprint) and
Record Schema (runtime snapshot) for compression configuration. This
resolves cycle detection issues when loading definition schemas onto
fresh data.
## Core Changes
### Schema Export (Definition vs Record)
- Definition schema (include_state=False):
- Compression targets exported as physical columns (no expr)
- Compressed storage columns (dy_c) NOT exported
- No state/original_removed fields
- Record schema (include_state=True):
- Compression targets exported as aliases with decompress expr
- Storage columns included
- Full state information preserved
### New API Methods
- export_definition_schema() - convenience for blueprint export
- export_record_schema() - convenience for snapshot export
- validate_schema() - comprehensive validation with modes:
- check_data=True/False (data vs structure-only validation)
- strict=True (no pending aliases/columns allowed)
- allow_missing_columns, allow_pending_aliases (fine-grained control)
### Bug Fixes
- Early physical check now validates schema changes on compressed columns
- get_compression_state() infers SCHEMA_ONLY for definition schemas
- Subframe schema export respects include_state parameter
### Backward Compatibility
- Old-format schemas (compression target with expr) emit DeprecationWarning
- All existing schemas load without error
- Existing tests pass unchanged
## Test Coverage
- New: test_schema_definition_vs_record.py (31 tests)
- Extended: test_alias_dataframe.py (8 new tests for schema export)
- Total: 482+ tests passing
## Reviewers
Approved by: Gemini, GPT, Claude 21 parent e3efbf5 commit acf6fbc
4 files changed
Lines changed: 2118 additions & 21 deletions
File tree
- UTILS/dfextensions/AliasDataFrame
- tests
0 commit comments