@@ -48,7 +48,7 @@ The Account Migration Module enables full lifecycle management of tenant (Accoun
4848┌──────────────┐ ┌────────────────┐
4949│ ExportPipeline│ │ ImportPipeline │
5050│ (streaming │ │ (streaming │
51- │ JSON write)│ │ JSON read) │
51+ │ ZIP write) │ │ ZIP / legacy) │
5252└──────┬───────┘ └───────┬────────┘
5353 │ │
5454 │ uses │ uses
@@ -97,26 +97,61 @@ Account → Customer → Order → OrderItem
9797
9898---
9999
100- ## 4. Export Pipeline
100+ ## 4. Export Pipeline (format v3)
101+
102+ ### Output: ZIP Archive
103+
104+ Every export produces a single ZIP file containing one JSON entry per entity type, plus a manifest:
105+
106+ ```
107+ Account101_20260616T100500.zip
108+ ├── manifest.json ← always first entry
109+ ├── Account101_Account.json ← empty rows; Account data lives in manifest
110+ ├── Account101_Customer.json
111+ ├── Account101_Product.json
112+ └── Account101_Order.json
113+ ```
114+
115+ ZIP entries use ` DEFLATE ` at ` BEST_SPEED ` level. Typical JSON compresses 5–10×.
101116
102117### Streaming Strategy
103118
104119```
105- OutputStream (raw or GZIPOutputStream)
106- └── JsonGenerator (Jackson streaming)
107- ├── Write header: {version, exportedAt, sourceAccountId, identityStrategy}
108- ├── Write "account": AccountDTO object
109- └── Write "entities": {
110- For each entityClass in topological order:
111- Write entityClass.getName(): {
112- "fields": ["id", "name", "category_ref_id", ...],
113- "rows": [
114- [1, "John", 3],
115- [2, "Cindy", null],
116- ...
117- ]
118- }
119- }
120+ ZipOutputStream (wraps caller's OutputStream, 256 KB buffer)
121+ │
122+ ├── Entry: manifest.json
123+ │ JsonGenerator → {version, exportedAt, sourceAccountId,
124+ │ identityStrategy, account: AccountDTO,
125+ │ entities: [{file, entityClass}, ...]}
126+ │
127+ └── For each entityClass in topological order:
128+ Entry: Account{id}_{SimpleName}.json
129+ JsonGenerator → {
130+ "entityClass": "com.example.Customer",
131+ "fields": ["id", "name", "category_ref_id", ...],
132+ "rows": [
133+ [1, "John", 3],
134+ [2, "Cindy", null],
135+ ...
136+ ]
137+ }
138+ ```
139+
140+ Each ` JsonGenerator ` is wrapped with a ` NoCloseOutputStream ` so that ` gen.close() ` flushes
141+ without closing the underlying ` ZipOutputStream ` .
142+
143+ ### Keyset Pagination
144+
145+ Entity rows are fetched in pages using keyset pagination (` id > lastId ` ) to avoid
146+ ` OFFSET ` -based degradation on large tables:
147+
148+ ```
149+ lastId = 0
150+ loop:
151+ SELECT ... WHERE accountId = ? AND id > lastId ORDER BY id LIMIT chunkSize
152+ → stream rows into current ZIP entry
153+ lastId = last row's id
154+ until page.size() < chunkSize
120155```
121156
122157### Serialization of a Single Entity (Columnar Format)
@@ -131,32 +166,55 @@ For each SingularAttribute (excluding id):
131166```
132167
133168Fields annotated with ` @ExportIgnore ` are skipped. Missing or inaccessible fields write ` null ` .
169+ ` Account ` entity rows are intentionally empty — account data is in ` manifest.json ` .
134170
135171---
136172
137173## 5. Import Pipeline
138174
139- ### Streaming Strategy
175+ ### Format Detection (magic bytes)
140176
141177```
142- InputStream (auto-detected: raw or GZIPInputStream)
143- └── JsonParser (Jackson streaming)
144- ├── Read header → validate version, note sourceAccountId
145- ├── Read "account" → AccountDTO (optionally create new Account)
146- └── Read "entities" → {
147- For each entityClassName:
148- resolve class → Class.forName(entityClassName)
149- Read "fields": [...] → ordered column names
150- For each row array (chunked):
151- reconstruct JsonNode from fields + row values
152- deserialize → entity instance
153- resolve _ref_id references via idMappings
154- set accountId = targetAccountId
155- persist entity
156- record originalId → newId in idMappings
157- }
178+ BufferedInputStream.mark(4) → peek 2 bytes → reset
179+ 0x50 0x4B ("PK") → ZIP archive → importFromZip()
180+ 0x1F 0x8B → GZIP stream → importLegacy() with GZIPInputStream wrapper
181+ anything else → plain JSON → importLegacy()
158182```
159183
184+ ### ZIP Import Flow (v3)
185+
186+ ```
187+ ZipInputStream (sequential — entries arrive in topological order)
188+ │
189+ ├── Entry: manifest.json
190+ │ Read version + sourceAccountId for logging
191+ │ (entity list not required — each entity file is self-describing)
192+ │
193+ └── For each *.json entry:
194+ JsonParser (via NoCloseInputStream wrapper)
195+ → read "entityClass" → Class.forName()
196+ → read "fields" → ordered column names
197+ → read "rows" → chunked into List<JsonNode>
198+ persistChunk() [REQUIRES_NEW transaction per chunk]
199+ → deserialize entity
200+ → resolve _ref_id references via idMappings
201+ → set accountId = targetAccountId
202+ → persist + flush (REGENERATE_IDS) or set ID (KEEP_IDS)
203+ → record originalId → newId in idMappings
204+ zipIn.closeEntry() — advances stream to next entry
205+ ```
206+
207+ ` NoCloseInputStream ` prevents ` parser.close() ` from closing the ` ZipInputStream ` between entries.
208+ ` ZipInputStream.read() ` reports EOF at the end of each entry, so Jackson's parser stops
209+ naturally without reading into the next entry.
210+
211+ ### Legacy Import (v1 / v2)
212+
213+ For files produced by format v1 (per-row objects) or v2 (single JSON/GZIP):
214+ - GZIP-wrapped streams are unwrapped automatically.
215+ - The single JSON document is parsed using the original streaming algorithm.
216+ - This path is maintained for backward compatibility and will not be removed.
217+
160218### ID Resolution (Identity Mapper)
161219
162220```
@@ -172,7 +230,20 @@ REGENERATE_IDS (default for clone):
172230
173231---
174232
175- ## 6. Worker Lifecycle
233+ ## 6. Clone Operation
234+
235+ Clone uses a temporary file (not an in-memory buffer) to safely handle tenants with
236+ hundreds of megabytes of data:
237+
238+ ```
239+ Phase 1: ExportPipeline.export(source, tempFile) → Account{src}_timestamp.zip
240+ Phase 2: ImportPipeline.importTenant(tempFile, importOptions)
241+ Phase 3: Files.deleteIfExists(tempFile) [in finally block]
242+ ```
243+
244+ ---
245+
246+ ## 7. Worker Lifecycle
176247
177248```
178249POST /export/{accountId}
@@ -184,7 +255,7 @@ AccountMigrationJobService.createExportJob(accountId, options)
184255 ├── 2. SchedulerUtil.runWithResult(new ExportWorker(jobId, accountId, options))
185256 │ └── Virtual Thread starts
186257 │ ├── Update job status → RUNNING
187- │ ├── Call ExportPipeline.export(...)
258+ │ ├── Call ExportPipeline.export(...) → writes Account{id}_{ts}.zip
188259 │ │ └── MigrationProgressListener updates job.progress periodically
189260 │ ├── On success: update job status → COMPLETED, set resultPath
190261 │ └── On failure: update job status → FAILED, set errorMessage
@@ -200,58 +271,57 @@ Cancellation:
200271
201272---
202273
203- ## 7 . Export File Format
274+ ## 8 . Export File Format (v3)
204275
205276```
206- saas_export_42_20260614T100500.json[.gz]
277+ Account42_20260614T100500.zip
207278```
208279
280+ ** manifest.json** (first ZIP entry):
209281``` json
210282{
211- "version" : " 2 " ,
283+ "version" : " 3 " ,
212284 "exportedAt" : " 2026-06-14T10:05:00" ,
213285 "sourceAccountId" : 42 ,
214286 "identityStrategy" : " KEEP_IDS" ,
215287 "account" : {
216288 "id" : 42 ,
217289 "name" : " Acme Corp" ,
218290 "subdomain" : " acme" ,
219- "email" : " admin@acme.com" ,
220- ...
291+ "email" : " admin@acme.com"
221292 },
222- "entities" : {
223- "tools.dynamia.modules.saas.jpa.AccountParameter" : {
224- "fields" : [" id" , " accountId" , " name" , " value" ],
225- "rows" : [
226- [1 , 42 , " theme" , " dark" ]
227- ]
228- },
229- "com.example.Customer" : {
230- "fields" : [" id" , " accountId" , " name" , " category_ref_id" ],
231- "rows" : [
232- [10 , 42 , " John" , 3 ]
233- ]
234- },
235- "com.example.Order" : {
236- "fields" : [" id" , " accountId" , " total" , " customer_ref_id" ],
237- "rows" : [
238- [100 , 42 , 99.99 , 10 ]
239- ]
240- }
241- }
293+ "entities" : [
294+ { "file" : " Account42_AccountParameter.json" , "entityClass" : " tools.dynamia.modules.saas.jpa.AccountParameter" },
295+ { "file" : " Account42_Customer.json" , "entityClass" : " com.example.Customer" },
296+ { "file" : " Account42_Order.json" , "entityClass" : " com.example.Order" }
297+ ]
298+ }
299+ ```
300+
301+ ** Account42_Customer.json** (ZIP entry per entity):
302+ ``` json
303+ {
304+ "entityClass" : " com.example.Customer" ,
305+ "fields" : [" id" , " accountId" , " name" , " category_ref_id" ],
306+ "rows" : [
307+ [10 , 42 , " John" , 3 ],
308+ [11 , 42 , " Cindy" , null ]
309+ ]
242310}
243311```
244312
245313** Key conventions:**
246- - Each entity section is an object with ` fields ` (column names) and ` rows ` (value arrays) .
314+ - Each entity file is a standalone JSON document with ` entityClass ` , ` fields ` , and ` rows ` .
247315- ` {fieldName}_ref_id ` encodes a ` @ManyToOne ` / ` @OneToOne ` reference by its primary key.
248- - The ` account ` section makes the package self-describing.
249- - Entities appear in topological order (parents before children).
250- - Format version ` "2" ` uses the columnar layout; version ` "1" ` (legacy) used per-row objects.
316+ - Entities appear in topological order (parents before children) both in the manifest and as ZIP entries.
317+ - The ` account ` section in the manifest makes the archive self-describing.
318+ - Format version ` "3" ` uses the ZIP multi-file layout.
319+ - Format version ` "2" ` (legacy) uses a single columnar JSON/GZIP document.
320+ - Format version ` "1" ` (legacy) uses a single JSON document with per-row objects.
251321
252322---
253323
254- ## 8 . Identity Mapping SPI
324+ ## 9 . Identity Mapping SPI
255325
256326``` java
257327public interface IdentityMapper {
@@ -280,7 +350,7 @@ Register a custom implementation as a Spring bean (`@Component` / `@Service`) to
280350
281351---
282352
283- ## 9. Annotaions
353+ ## 10. Annotations
284354
285355### ` @AccountExportIgnore `
286356
@@ -304,7 +374,7 @@ Apply to entity fields that should be skipped during serialization (computed val
304374
305375---
306376
307- ## 10 . Database Schema
377+ ## 11 . Database Schema
308378
309379The module adds one table:
310380
@@ -329,25 +399,28 @@ CREATE TABLE saas_migration_jobs (
329399
330400---
331401
332- ## 11 . Scalability Notes
402+ ## 12 . Scalability Notes
333403
334404| Concern | Mitigation |
335405| ---------| ------------|
336- | Large tables | Chunked pagination (configurable, default 500 rows/page) |
337- | Memory | Jackson streaming API: never load all rows |
338- | Network / disk | Optional GZIP compression |
406+ | Large tables | Keyset pagination (` id > lastId ` , configurable chunk size, default 500 rows/page) |
407+ | Memory | Jackson streaming API + ZipOutputStream: never load all rows; one ZIP entry at a time |
408+ | Disk I/O | 256 KB write buffer on ZipOutputStream; DEFLATE BEST_SPEED compression |
409+ | Network / disk | ZIP always produced — typically 5–10× smaller than raw JSON |
339410| Long-running jobs | Virtual threads, cooperative cancellation via ` CancellationToken ` |
340- | DB load | Read-only paginated queries; imports batched per chunk |
411+ | DB load | Read-only keyset- paginated queries; imports batched per chunk in isolated transactions |
341412| Concurrent jobs | In-memory job registry + DB-backed state; configurable max concurrent |
413+ | Large clone | Export → temp file → import (avoids OOM from in-memory buffers) |
342414
343415---
344416
345- ## 12 . Implementation Roadmap
417+ ## 13 . Implementation Roadmap
346418
347419| Phase | Scope |
348420| -------| -------|
349- | ** v1 (current)** | EXPORT, IMPORT, CLONE, BACKUP, RESTORE. ` KEEP_IDS ` + ` REGENERATE_IDS ` . REST API. Progress tracking. Cancellation. |
350- | ** v2** | Cross-environment MIGRATE (HTTP push to remote endpoint). Resume after failure (checkpoint in DB). |
351- | ** v3** | ` UUID7 ` identity strategy. Partial export (subset of entities). Schema validation on import. Diff/merge strategy. |
352- | ** v4** | Multi-region database migration. S3/GCS file storage backend. Event-driven progress via SSE/WebSocket. |
353-
421+ | ** v1 (legacy)** | EXPORT, IMPORT as single JSON. ` KEEP_IDS ` + ` REGENERATE_IDS ` . |
422+ | ** v2 (legacy)** | Columnar format (fields + rows arrays). Optional GZIP. |
423+ | ** v3 (current)** | ZIP multi-file: one JSON per entity. Always compressed. Keyset pagination. Backward-compatible import. Clone via temp file. |
424+ | ** v4** | Cross-environment MIGRATE (HTTP push to remote endpoint). Resume after failure (checkpoint in DB). |
425+ | ** v5** | ` UUID7 ` identity strategy. Partial export (subset of entities). Schema validation on import. |
426+ | ** v6** | Multi-region database migration. S3/GCS file storage backend. Event-driven progress via SSE/WebSocket. |
0 commit comments