@@ -47,8 +47,8 @@ The Account Migration Module enables full lifecycle management of tenant (Accoun
4747 ▼ ▼
4848┌──────────────┐ ┌────────────────┐
4949│ ExportPipeline│ │ ImportPipeline │
50- │ (streaming │ │ (streaming │
51- │ ZIP write) │ │ ZIP / legacy) │
50+ │ (parallel │ │ (streaming │
51+ │ ZIP write) │ │ ZIP v3) │
5252└──────┬───────┘ └───────┬────────┘
5353 │ │
5454 │ uses │ uses
@@ -114,31 +114,27 @@ Account101_20260616T100500.zip
114114
115115ZIP entries use ` DEFLATE ` at ` BEST_SPEED ` level. Typical JSON compresses 5–10×.
116116
117- ### Streaming Strategy
117+ ### Parallel Temp-Directory Strategy
118+
119+ Entity files are written in parallel to a temporary directory, then assembled into a ZIP in topological order:
118120
119121```
120- ZipOutputStream (wraps caller's OutputStream, 256 KB buffer)
121- │
122- ├── Entry: manifest.json
123- │ JsonGenerator → {version, exportedAt, sourceAccountId,
124- │ identityStrategy, account: AccountDTO,
125- │ entities: [{file, entityClass}, ...]}
126- │
127- └── For each entityClass in topological order:
128- Entry: Account{id}_{SimpleName}.json
129- JsonGenerator → {
130- "entityClass": "com.example.Customer",
131- "fields": ["id", "name", "category_ref_id", ...],
132- "rows": [
133- [1, "John", 3],
134- [2, "Cindy", null],
135- ...
136- ]
137- }
138- ```
139-
140- Each ` JsonGenerator ` is wrapped with a ` NoCloseOutputStream ` so that ` gen.close() ` flushes
141- without closing the underlying ` ZipOutputStream ` .
122+ 1. Files.createTempDirectory("saas-export-{accountId}-")
123+ 2. Write manifest.json to temp dir (synchronous — entity list is known upfront)
124+ 3. For each entityClass (up to exportParallelism=4 concurrent virtual threads):
125+ EntityManager localEm = emf.createEntityManager() ← one per thread, thread-safe
126+ write Account{id}_{SimpleName}.json to temp dir
127+ localEm.close()
128+ 4. ZipOutputStream → output stream:
129+ addEntry(manifest.json) ← always first
130+ For each entityClass in topological order:
131+ addEntry(Account{id}_{SimpleName}.json) ← Files.copy(), no heap allocation
132+ zipOut.finish()
133+ 5. deleteDirectory(tempDir) ← always, in finally
134+ ```
135+
136+ The column definition cache (` Map<Class<?>, List<ColumnDef>> ` ) is built once per entity type
137+ and reused across all parallel tasks via a ` ConcurrentHashMap ` .
142138
143139### Keyset Pagination
144140
@@ -172,14 +168,7 @@ Fields annotated with `@ExportIgnore` are skipped. Missing or inaccessible field
172168
173169## 5. Import Pipeline
174170
175- ### Format Detection (magic bytes)
176-
177- ```
178- BufferedInputStream.mark(4) → peek 2 bytes → reset
179- 0x50 0x4B ("PK") → ZIP archive → importFromZip()
180- 0x1F 0x8B → GZIP stream → importLegacy() with GZIPInputStream wrapper
181- anything else → plain JSON → importLegacy()
182- ```
171+ Input must be a ZIP archive (format v3). Legacy JSON and GZIP formats are no longer supported.
183172
184173### ZIP Import Flow (v3)
185174
@@ -208,13 +197,6 @@ ZipInputStream (sequential — entries arrive in topological order)
208197` ZipInputStream.read() ` reports EOF at the end of each entry, so Jackson's parser stops
209198naturally without reading into the next entry.
210199
211- ### Legacy Import (v1 / v2)
212-
213- For files produced by format v1 (per-row objects) or v2 (single JSON/GZIP):
214- - GZIP-wrapped streams are unwrapped automatically.
215- - The single JSON document is parsed using the original streaming algorithm.
216- - This path is maintained for backward compatibility and will not be removed.
217-
218200### ID Resolution (Identity Mapper)
219201
220202```
@@ -315,9 +297,7 @@ Account42_20260614T100500.zip
315297- ` {fieldName}_ref_id ` encodes a ` @ManyToOne ` / ` @OneToOne ` reference by its primary key.
316298- Entities appear in topological order (parents before children) both in the manifest and as ZIP entries.
317299- The ` account ` section in the manifest makes the archive self-describing.
318- - Format version ` "3" ` uses the ZIP multi-file layout.
319- - Format version ` "2" ` (legacy) uses a single columnar JSON/GZIP document.
320- - Format version ` "1" ` (legacy) uses a single JSON document with per-row objects.
300+ - Format version ` "3" ` is the only supported format. Versions ` "1" ` and ` "2" ` (legacy JSON / GZIP) are no longer accepted by the importer.
321301
322302---
323303
@@ -404,8 +384,9 @@ CREATE TABLE saas_migration_jobs (
404384| Concern | Mitigation |
405385| ---------| ------------|
406386| Large tables | Keyset pagination (` id > lastId ` , configurable chunk size, default 500 rows/page) |
407- | Memory | Jackson streaming API + ZipOutputStream: never load all rows; one ZIP entry at a time |
408- | Disk I/O | 256 KB write buffer on ZipOutputStream; DEFLATE BEST_SPEED compression |
387+ | Memory | Jackson streaming API; entity files written to temp dir, then streamed into ZIP via ` Files.copy() ` |
388+ | Parallel export | Up to ` exportParallelism ` (default 4) entity types exported concurrently via virtual threads; each uses its own ` EntityManager ` |
389+ | Disk I/O | 256 KB ZIP buffer, 64 KB per-entity file buffer; DEFLATE BEST_SPEED compression |
409390| Network / disk | ZIP always produced — typically 5–10× smaller than raw JSON |
410391| Long-running jobs | Virtual threads, cooperative cancellation via ` CancellationToken ` |
411392| DB load | Read-only keyset-paginated queries; imports batched per chunk in isolated transactions |
@@ -420,7 +401,7 @@ CREATE TABLE saas_migration_jobs (
420401| -------| -------|
421402| ** v1 (legacy)** | EXPORT, IMPORT as single JSON. ` KEEP_IDS ` + ` REGENERATE_IDS ` . |
422403| ** v2 (legacy)** | Columnar format (fields + rows arrays). Optional GZIP. |
423- | ** v3 (current)** | ZIP multi-file: one JSON per entity. Always compressed. Keyset pagination. Backward-compatible import . Clone via temp file. |
404+ | ** v3 (current)** | ZIP multi-file: one JSON per entity. Always compressed. Keyset pagination. Parallel entity export (virtual threads) . Clone via temp file. |
424405| ** v4** | Cross-environment MIGRATE (HTTP push to remote endpoint). Resume after failure (checkpoint in DB). |
425406| ** v5** | ` UUID7 ` identity strategy. Partial export (subset of entities). Schema validation on import. |
426407| ** v6** | Multi-region database migration. S3/GCS file storage backend. Event-driven progress via SSE/WebSocket. |
0 commit comments