Enhance BetterStructureSql to handle massive database schemas (tens of thousands of tables, triggers, functions) by splitting schema output into multiple files organized in a directory structure, while maintaining single-blob storage in the database for versioning.
The integration app includes a large schema generator demonstrating the problem. Run this to see why multi-file output is essential:
cd integration
rails db:seed # Generates 50 tables, 150 indexes, 25 FKs, 20 views, 10 functions, 15 triggers
rails db:schema:dump
wc -l db/structure.sql
# Output: 4287 db/structure.sqlThe Problem:
- Opening a 4,000+ line file is slow in most editors
- Finding a specific table requires scrolling or searching
- Git diffs show the entire file for a one-line change
- Merge conflicts are extremely difficult to resolve
- Code reviews become painful
The Solution (after multi-file implementation):
$ find db/schema -type f -name "*.sql" | wc -l
72 files
$ ls db/schema/
1_extensions/ 4_tables/ 7_views/ _header.sql
2_types/ 5_indexes/ 8_functions/ _manifest.json
3_sequences/ 6_foreign_keys/ 9_triggers/
$ git diff db/schema/ # After adding 1 table
# Shows only: db/schema/4_tables/000011.sql (new file, 48 lines)Each file is ~50-500 lines, easy to navigate, and git diffs are clean and focused.
Companies with very large database schemas face challenges with monolithic structure.sql files:
- Massive schemas: Tens of thousands of tables, triggers, functions, views
- Single file limitations:
- Difficult to navigate and search
- Slow to load in editors
- Hard to review in code reviews
- Poor git diff performance
- High memory consumption during load
- Developer experience: Need to quickly locate specific tables/triggers without scrolling through massive files
When config.output_path has no file extension (e.g., 'db/schema'), it's treated as a directory and the gem splits output into multiple files organized by object type:
Detection Rule:
- No extension → Directory → Multi-file mode (e.g.,
'db/schema') - Has extension → File → Single-file mode (e.g.,
'db/structure.sql'or'db/schema.rb')
db/schema/
├── 1_extensions/
│ └── 000001.sql
├── 2_types/
│ └── 000001.sql
├── 3_sequences/
│ ├── 000001.sql
│ └── 000002.sql
├── 4_tables/
│ ├── 000001.sql (500 LOC max per file)
│ ├── 000002.sql
│ └── 000003.sql
├── 5_indexes/
│ ├── 000001.sql
│ └── 000002.sql
├── 6_foreign_keys/
│ └── 000001.sql
├── 7_views/
│ └── 000001.sql
├── 8_functions/
│ ├── 000001.sql
│ └── 000002.sql
├── 9_triggers/
│ └── 000001.sql
├── migrations/ (schema_migrations INSERTs, 500 per file)
│ ├── 000001.sql
│ ├── 000002.sql
│ └── 000003.sql
├── _header.sql (SET statements, search path)
└── _manifest.json (metadata about split files)
500 LOC Soft Limit:
- When current file + next object would exceed 550 LOC → create new file
- If current file is 450 LOC and next object is 100 LOC (total 550) → overflow, new file
Single Large Object Handling:
- If a single object > 500 LOC (e.g., 600-column table, complex trigger) → store in dedicated file
- No artificial splitting of individual objects
- Files can exceed 500 LOC if containing a single large object
Numbering:
- Zero-padded sequential:
000001.sql,000002.sql,000003.sql - Up to 999,999 files per directory (6 digits)
- Files numbered in dependency-safe order
SchemaVersions table continues to store as single blob:
- Combine all split files into one content string
- Store as single
TEXTfield inbetter_structure_sql_schema_versions - Web UI download provides ZIP file containing reconstructed directory structure
- No changes to database schema or versioning logic
Scenario: Company has 15,000 tables, 30,000 triggers, 5,000 functions
Current Pain: Single 2MB structure.sql file crashes text editors, impossible to navigate
Solution:
# config/initializers/better_structure_sql.rb
BetterStructureSql.configure do |config|
config.output_path = 'db/schema' # Directory, not file
config.max_lines_per_file = 500 # Configurable limit
endResult:
- Schema split across ~300 files in organized directories
- Each file 200-500 LOC, easily navigable
- Git diffs show only affected files
- Developers can grep within
db/schema/tables/to find specific tables
Scenario: Pull request adds 5 new tables and 10 triggers
Current Pain: Reviewer must scroll through 10,000-line file to find changes
Solution: Git diff shows only affected files:
db/schema/tables/000042.sql (new file)
db/schema/triggers/000018.sql (modified)
Benefit: Reviewer sees exactly what changed, no noise
Scenario: Developer needs to restore schema from version 2 weeks ago
Current Flow: Click "Download" on version → receives structure.sql
New Flow:
- Click "Download" on version → receives
schema_version_123.zip - Extract ZIP → gets full directory structure
- Can
unzipand explore or load directly
Web UI:
- Download button generates ZIP on-the-fly using
rubyzip - ZIP contains reconstructed directory structure from single blob
- No need to store ZIPs on disk, generated per-request
BetterStructureSql.configure do |config|
# Enable multi-file output by using directory path
config.output_path = 'db/schema' # Directory → multi-file
# config.output_path = 'db/structure.sql' # File → single file (default)
# Chunking settings
config.max_lines_per_file = 500 # Soft limit (default: 500)
config.overflow_threshold = 1.1 # 10% overflow allowed (default: 1.1)
# If current + next > 500 * 1.1 (550) → new file
# Manifest generation
config.generate_manifest = true # Create _manifest.json (default: true)
# Existing options work the same
config.enable_schema_versions = true
config.include_functions = true
endDefault behavior unchanged:
config.output_path = 'db/structure.sql'(file) → single file outputconfig.output_path = 'db/schema'(directory) → multi-file output- Detection: If path has no extension or ends with
/→ directory
Each object type gets a subdirectory:
extensions/- PostgreSQL extensionstypes/- Custom types (enums, composites, domains)sequences/- Sequence objectstables/- Table definitions with columns and constraintsindexes/- Index definitionsforeign_keys/- Foreign key constraintsviews/- Regular viewsmaterialized_views/- Materialized viewsfunctions/- User-defined functionstriggers/- Trigger definitions_header.sql- SET statements and search path_manifest.json- Metadata
Numbered files: 000001.sql, 000002.sql, etc.
- Sequential numbering within each directory
- Zero-padded to 6 digits (supports up to 999,999 files)
- Ordering matches dependency-safe order from Dumper
Special files:
_header.sql- Executed first (SET statements)_manifest.json- Optional metadata file
_manifest.json provides metadata about the split:
{
"version": "1.0",
"generated_at": "2025-11-19T10:30:00Z",
"pg_version": "14.5",
"format": "sql",
"total_files": 247,
"total_lines": 98543,
"max_lines_per_file": 500,
"directories": {
"1_extensions": {"files": 1, "lines": 12},
"2_types": {"files": 3, "lines": 145},
"3_sequences": {"files": 8, "lines": 892},
"4_tables": {"files": 156, "lines": 78432},
"5_indexes": {"files": 45, "lines": 9821},
"6_foreign_keys": {"files": 12, "lines": 3421},
"7_views": {"files": 7, "lines": 2134},
"8_functions": {"files": 11, "lines": 3012},
"9_triggers": {"files": 4, "lines": 674},
"migrations": {"files": 3, "lines": 1520}
}
}Note: Load order is implicit from numbered directory names (1_extensions, 2_types, etc.) and numeric file names (000001.sql, 000002.sql). No need to store explicit load_order array that could become huge with thousands of files.
rails db:schema:load_betterBehavior:
- Auto-detects single file vs directory
- If directory: loads files in dependency order using manifest
- If file: loads as single SQL file (current behavior)
Load Strategy:
- Load one directory at a time
- Concatenate all files within directory
- Execute directory contents as single SQL statement
- Example: 4_tables/ (150 files) → concatenate all → execute once
Load Order:
_header.sql(SET statements) - execute once1_extensions/- concatenate all files, execute once2_types/- concatenate all files, execute once3_sequences/- concatenate all files, execute once4_tables/- concatenate all files, execute once5_indexes/- concatenate all files, execute once6_foreign_keys/- concatenate all files, execute once7_views/- concatenate all files, execute once8_functions/- concatenate all files, execute once9_triggers/- concatenate all files, execute oncemigrations/- concatenate all files, execute once
Total SQL Executions: Maximum 11 (header + 9 directories + migrations)
Memory Footprint: Load one directory at a time (~1-2MB per directory), not entire schema
Developers can load manually by leveraging numbered directories:
# Load all files in dependency order (numbered directories ensure correct order)
{
cat db/schema/_header.sql
for dir in db/schema/{1..9}_*/; do
cat "$dir"*.sql 2>/dev/null
done
cat db/schema/migrations/*.sql 2>/dev/null
} | psql -d myapp_development
# Or more simply (relies on sort order)
find db/schema -name "*.sql" -not -name "_header.sql" | sort | \
xargs -I {} sh -c 'cat db/schema/_header.sql; cat {}' | psql -d myapp_developmentNote: The numbered directory prefixes (1_, 2_, 3_...) ensure correct load order without needing to read manifest.
Download Action (GET /schema_versions/:id/download):
Single-file format (format_type = 'sql' or 'rb', non-split):
- Download as
.sqlor.rbfile - Current behavior unchanged
Multi-file format (detected via manifest in content):
- Generate ZIP file on-the-fly using
rubyzip - ZIP contains reconstructed directory structure
- Filename:
schema_version_<id>_<timestamp>.zip
Implementation:
def download
version = SchemaVersion.find(params[:id])
if multi_file_format?(version.content)
send_zip_archive(version)
else
send_single_file(version)
end
end
def send_zip_archive(version)
zip_data = ZipGenerator.generate(version.content)
send_data zip_data,
filename: "schema_version_#{version.id}_#{version.created_at.to_i}.zip",
type: 'application/zip'
endZipGenerator (new class):
- Parses content to identify file boundaries
- Creates in-memory ZIP using
rubyzip - Adds all files to ZIP in correct directory structure
- Streams ZIP to client
Display:
- Detect multi-file format from content
- Show directory tree visualization instead of raw content
- Provide "Download ZIP" button
- Show manifest metadata (file count, total lines, breakdown by type)
Example:
Schema Version #123
Format: SQL (Multi-File)
PostgreSQL Version: 14.5
Created: 2025-11-19 10:30:00
Total Files: 247
Total Lines: 98,543
Directory Structure:
├── _header.sql (12 lines)
├── extensions/ (1 file, 12 lines)
├── types/ (3 files, 145 lines)
├── tables/ (156 files, 78,432 lines)
└── ...
[Download ZIP]
Current approach (single file):
- Build entire content string in memory
- Can be 1-5 MB for large schemas
- Memory spike during dump
Multi-file approach:
- Write files incrementally as generated
- Lower peak memory usage
- Can process schemas of any size
Trade-offs:
- More file operations (open/close for each chunk)
- But smaller individual writes
- Modern filesystems handle many small files well
Optimization:
- Buffer writes within each file
- Only fsync after completing each file
- Parallel file writing possible (future optimization)
Benefits:
- Smaller diffs (only changed files)
- Faster
git status,git diff - Better merge conflict resolution
- Reduced repository bloat
- Rails 6.0+: Full support
- Rails 5.2: Full support (ActiveRecord connection API unchanged)
- Rails 5.0-5.1: Compatible (not officially tested)
- PostgreSQL 10+: Full support
- PostgreSQL 9.6: Compatible (no new features required)
New dependency: rubyzip (~> 2.3)
- For generating ZIP downloads in Web UI
- Well-maintained, popular gem (50M+ downloads)
- No system dependencies (pure Ruby)
Existing projects (single file):
- Update config:
config.output_path = 'db/schema' - Run:
rails db:schema:dump_better - Commit multi-file structure
- Delete old
db/structure.sql
Rollback:
- Update config:
config.output_path = 'db/structure.sql' - Run:
rails db:schema:dump_better - Single file restored
No database migration required - versioning storage unchanged
Validation:
- Sanitize all file paths
- Prevent
../in generated filenames - Restrict output to configured directory only
Limits:
- Maximum file count in ZIP: 10,000 files
- Maximum uncompressed size: 100 MB
- Reject ZIPs with suspicious compression ratios
Output files:
- Created with mode
0644(owner write, group/other read) - Respect system umask
- No executable bits
- File chunking logic (500 LOC limit)
- Large object handling (single 600 LOC object)
- Directory structure generation
- Manifest generation and parsing
- ZIP archive creation
- Full dump with 10,000 tables split across files
- Load multi-file schema and verify database state
- Round-trip: dump → load → dump, compare outputs
- Web UI download ZIP and extract
- Empty database → minimal files
- Single huge table (1000 columns) → single file > 500 LOC
- Mixed small/large objects → correct chunking
- Special characters in object names → safe filenames
- Circular dependencies → correct ordering maintained
- 50,000 tables: dump time, memory usage, file count
- 100,000 triggers: chunking performance
- ZIP generation: 500-file archive speed
- Load time: compare single vs multi-file
Add "Multi-File Output" section with:
- Configuration example
- Use cases
- Migration guide
Add keywords:
- multi-file schema, directory output, chunking strategy, 500 LOC limit, overflow threshold, manifest generation, rubyzip, ZIP download, large database support, tens of thousands of tables, file splitting, numbered files, directory structure, load order, dependency-safe chunking
docs/features/multi-file-schema-output/(this document)docs/features/multi-file-schema-output/architecture.md(technical design)docs/features/multi-file-schema-output/plan/(implementation phases)
Parallel dumping:
- Generate multiple object types concurrently
- Requires thread-safe file writing
Compression:
- Gzip individual files:
000001.sql.gz - Reduces disk usage for large schemas
Incremental dumps:
- Track changed objects since last dump
- Only regenerate affected files
- Requires change tracking
Custom chunking strategies:
- Group related tables (by prefix, schema, etc.)
- Logical grouping instead of LOC-based
Smart loading:
- Parallel file loading for faster schema restoration
- Topological sort with parallelism
Web UI enhancements:
- Browse directory structure online
- View individual files without downloading ZIP
- Diff between versions at file level