Skip to content

Commit a5d3157

Browse files
Implement parallel entity export and update ZIP assembly process
1 parent 4078c23 commit a5d3157

7 files changed

Lines changed: 411 additions & 469 deletions

File tree

extensions/saas/sources/migration/ARCHITECTURE.md

Lines changed: 27 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -47,8 +47,8 @@ The Account Migration Module enables full lifecycle management of tenant (Accoun
4747
▼ ▼
4848
┌──────────────┐ ┌────────────────┐
4949
│ ExportPipeline│ │ ImportPipeline │
50-
│ (streaming │ │ (streaming │
51-
│ ZIP write) │ │ ZIP / legacy)
50+
│ (parallel │ │ (streaming │
51+
│ ZIP write) │ │ ZIP v3)
5252
└──────┬───────┘ └───────┬────────┘
5353
│ │
5454
│ uses │ uses
@@ -114,31 +114,27 @@ Account101_20260616T100500.zip
114114

115115
ZIP entries use `DEFLATE` at `BEST_SPEED` level. Typical JSON compresses 5–10×.
116116

117-
### Streaming Strategy
117+
### Parallel Temp-Directory Strategy
118+
119+
Entity files are written in parallel to a temporary directory, then assembled into a ZIP in topological order:
118120

119121
```
120-
ZipOutputStream (wraps caller's OutputStream, 256 KB buffer)
121-
122-
├── Entry: manifest.json
123-
│ JsonGenerator → {version, exportedAt, sourceAccountId,
124-
│ identityStrategy, account: AccountDTO,
125-
│ entities: [{file, entityClass}, ...]}
126-
127-
└── For each entityClass in topological order:
128-
Entry: Account{id}_{SimpleName}.json
129-
JsonGenerator → {
130-
"entityClass": "com.example.Customer",
131-
"fields": ["id", "name", "category_ref_id", ...],
132-
"rows": [
133-
[1, "John", 3],
134-
[2, "Cindy", null],
135-
...
136-
]
137-
}
138-
```
139-
140-
Each `JsonGenerator` is wrapped with a `NoCloseOutputStream` so that `gen.close()` flushes
141-
without closing the underlying `ZipOutputStream`.
122+
1. Files.createTempDirectory("saas-export-{accountId}-")
123+
2. Write manifest.json to temp dir (synchronous — entity list is known upfront)
124+
3. For each entityClass (up to exportParallelism=4 concurrent virtual threads):
125+
EntityManager localEm = emf.createEntityManager() ← one per thread, thread-safe
126+
write Account{id}_{SimpleName}.json to temp dir
127+
localEm.close()
128+
4. ZipOutputStream → output stream:
129+
addEntry(manifest.json) ← always first
130+
For each entityClass in topological order:
131+
addEntry(Account{id}_{SimpleName}.json) ← Files.copy(), no heap allocation
132+
zipOut.finish()
133+
5. deleteDirectory(tempDir) ← always, in finally
134+
```
135+
136+
The column definition cache (`Map<Class<?>, List<ColumnDef>>`) is built once per entity type
137+
and reused across all parallel tasks via a `ConcurrentHashMap`.
142138

143139
### Keyset Pagination
144140

@@ -172,14 +168,7 @@ Fields annotated with `@ExportIgnore` are skipped. Missing or inaccessible field
172168

173169
## 5. Import Pipeline
174170

175-
### Format Detection (magic bytes)
176-
177-
```
178-
BufferedInputStream.mark(4) → peek 2 bytes → reset
179-
0x50 0x4B ("PK") → ZIP archive → importFromZip()
180-
0x1F 0x8B → GZIP stream → importLegacy() with GZIPInputStream wrapper
181-
anything else → plain JSON → importLegacy()
182-
```
171+
Input must be a ZIP archive (format v3). Legacy JSON and GZIP formats are no longer supported.
183172

184173
### ZIP Import Flow (v3)
185174

@@ -208,13 +197,6 @@ ZipInputStream (sequential — entries arrive in topological order)
208197
`ZipInputStream.read()` reports EOF at the end of each entry, so Jackson's parser stops
209198
naturally without reading into the next entry.
210199

211-
### Legacy Import (v1 / v2)
212-
213-
For files produced by format v1 (per-row objects) or v2 (single JSON/GZIP):
214-
- GZIP-wrapped streams are unwrapped automatically.
215-
- The single JSON document is parsed using the original streaming algorithm.
216-
- This path is maintained for backward compatibility and will not be removed.
217-
218200
### ID Resolution (Identity Mapper)
219201

220202
```
@@ -315,9 +297,7 @@ Account42_20260614T100500.zip
315297
- `{fieldName}_ref_id` encodes a `@ManyToOne` / `@OneToOne` reference by its primary key.
316298
- Entities appear in topological order (parents before children) both in the manifest and as ZIP entries.
317299
- The `account` section in the manifest makes the archive self-describing.
318-
- Format version `"3"` uses the ZIP multi-file layout.
319-
- Format version `"2"` (legacy) uses a single columnar JSON/GZIP document.
320-
- Format version `"1"` (legacy) uses a single JSON document with per-row objects.
300+
- Format version `"3"` is the only supported format. Versions `"1"` and `"2"` (legacy JSON / GZIP) are no longer accepted by the importer.
321301

322302
---
323303

@@ -404,8 +384,9 @@ CREATE TABLE saas_migration_jobs (
404384
| Concern | Mitigation |
405385
|---------|------------|
406386
| Large tables | Keyset pagination (`id > lastId`, configurable chunk size, default 500 rows/page) |
407-
| Memory | Jackson streaming API + ZipOutputStream: never load all rows; one ZIP entry at a time |
408-
| Disk I/O | 256 KB write buffer on ZipOutputStream; DEFLATE BEST_SPEED compression |
387+
| Memory | Jackson streaming API; entity files written to temp dir, then streamed into ZIP via `Files.copy()` |
388+
| Parallel export | Up to `exportParallelism` (default 4) entity types exported concurrently via virtual threads; each uses its own `EntityManager` |
389+
| Disk I/O | 256 KB ZIP buffer, 64 KB per-entity file buffer; DEFLATE BEST_SPEED compression |
409390
| Network / disk | ZIP always produced — typically 5–10× smaller than raw JSON |
410391
| Long-running jobs | Virtual threads, cooperative cancellation via `CancellationToken` |
411392
| DB load | Read-only keyset-paginated queries; imports batched per chunk in isolated transactions |
@@ -420,7 +401,7 @@ CREATE TABLE saas_migration_jobs (
420401
|-------|-------|
421402
| **v1 (legacy)** | EXPORT, IMPORT as single JSON. `KEEP_IDS` + `REGENERATE_IDS`. |
422403
| **v2 (legacy)** | Columnar format (fields + rows arrays). Optional GZIP. |
423-
| **v3 (current)** | ZIP multi-file: one JSON per entity. Always compressed. Keyset pagination. Backward-compatible import. Clone via temp file. |
404+
| **v3 (current)** | ZIP multi-file: one JSON per entity. Always compressed. Keyset pagination. Parallel entity export (virtual threads). Clone via temp file. |
424405
| **v4** | Cross-environment MIGRATE (HTTP push to remote endpoint). Resume after failure (checkpoint in DB). |
425406
| **v5** | `UUID7` identity strategy. Partial export (subset of entities). Schema validation on import. |
426407
| **v6** | Multi-region database migration. S3/GCS file storage backend. Event-driven progress via SSE/WebSocket. |

extensions/saas/sources/migration/README.md

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ All operations run as **async background jobs** via virtual threads, so long-run
1414
| Operation | Description |
1515
|-----------|-------------|
1616
| `EXPORT` | Serialize all tenant data to a versioned ZIP archive (one JSON file per entity) |
17-
| `IMPORT` | Restore tenant data from a previously exported ZIP (or legacy JSON/GZIP) |
17+
| `IMPORT` | Restore tenant data from a previously exported ZIP archive |
1818
| `CLONE` | Duplicate a tenant in the same system (different accountId) |
1919
| `BACKUP` | Alias for EXPORT tagged as backup |
2020
| `RESTORE` | Alias for IMPORT that replaces existing data |
@@ -140,10 +140,17 @@ Content-Type: application/json
140140
dynamia.saas.migration.chunk-size=500
141141
dynamia.saas.migration.output-directory=${java.io.tmpdir}/saas-migration
142142
dynamia.saas.migration.max-concurrent-jobs=5
143+
dynamia.saas.migration.export-parallelism=4
143144
dynamia.saas.migration.fail-on-entity-error=false
144145
```
145146

146-
> `compression-enabled` has been removed. All exports now produce a ZIP archive by default.
147+
| Property | Default | Description |
148+
|----------|---------|-------------|
149+
| `chunk-size` | `500` | Records per pagination page |
150+
| `output-directory` | `${tmpdir}/saas-migration` | Where export files are stored |
151+
| `max-concurrent-jobs` | `5` | Max simultaneous running jobs |
152+
| `export-parallelism` | `4` | Entity types exported concurrently per job |
153+
| `fail-on-entity-error` | `false` | Stop on first error vs. log and continue |
147154

148155
---
149156

@@ -200,7 +207,8 @@ Relationship fields (`@ManyToOne`, `@OneToOne`) are serialized as `{fieldName}_r
200207
Collections (`@OneToMany`, `@ManyToMany`) are reconstructed naturally when child entities
201208
reference their parents.
202209

203-
The importer is backward-compatible with legacy v2 (single JSON/GZIP) and v1 (single JSON) files.
210+
Entity files are written in parallel (up to `export-parallelism` concurrent virtual threads)
211+
to a temporary directory, then assembled into the ZIP in topological order.
204212

205213
---
206214

extensions/saas/sources/migration/src/main/java/tools/dynamia/modules/saas/migration/config/AccountMigrationProperties.java

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,8 @@
2121
* <pre>
2222
* dynamia.saas.migration.chunk-size=500
2323
* dynamia.saas.migration.output-directory=/var/data/saas-migration
24-
* dynamia.saas.migration.compression-enabled=false
2524
* dynamia.saas.migration.max-concurrent-jobs=5
25+
* dynamia.saas.migration.export-parallelism=4
2626
* dynamia.saas.migration.fail-on-entity-error=false
2727
* </pre>
2828
*
@@ -50,6 +50,12 @@ public class AccountMigrationProperties {
5050
*/
5151
private int maxConcurrentJobs = 5;
5252

53+
/**
54+
* Number of entity types exported concurrently during a single export job.
55+
* Each parallel slot opens its own {@code EntityManager}. Default: 4.
56+
*/
57+
private int exportParallelism = 4;
58+
5359
/**
5460
* If {@code true}, the import pipeline stops immediately when any entity
5561
* fails to persist. If {@code false}, errors are logged and the import
@@ -91,6 +97,14 @@ public void setMaxConcurrentJobs(int maxConcurrentJobs) {
9197
this.maxConcurrentJobs = maxConcurrentJobs;
9298
}
9399

100+
public int getExportParallelism() {
101+
return exportParallelism;
102+
}
103+
104+
public void setExportParallelism(int exportParallelism) {
105+
this.exportParallelism = exportParallelism;
106+
}
107+
94108
public boolean isFailOnEntityError() {
95109
return failOnEntityError;
96110
}

0 commit comments

Comments
 (0)