Dv/cpu image bringup report#20
Conversation
|
oh nice, how fast was it? |
There was a problem hiding this comment.
Pull request overview
Adds an experimental CPU image-generation script plus accompanying bring-up/report docs to document and reproduce coherent 128x128 CPU outputs (notably on the unpacked transformer path) for bonsai-image.
Changes:
- Added a standalone
scripts/generate_cpu_experimental.pyscript to run Flux2 CPU diffusion with logging and optional step image dumps. - Added two markdown reports capturing validated CPU bring-up results and suggested practical guidance.
- Documented an example command shape intended to reproduce the
128x128,4-stepCPU results.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| scripts/generate_cpu_experimental.py | New experimental CPU generation entrypoint (prompt encode → diffusion → VAE decode) with model loading helpers and detailed logging. |
| docs/upstream_issue_cpu_bringup.md | Focused bring-up/status note describing the strongest validated CPU results and reproduction shape. |
| docs/upstream_cpu_image_report_draft.md | Longer-form draft report consolidating the same CPU bring-up evidence and guidance. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| REPO_ROOT = Path(__file__).resolve().parents[1] | ||
| sys.path.insert(0, str((REPO_ROOT / "vendor" / "image-studio").resolve())) | ||
|
|
||
| from backend_gpu.diffusion_klein import _mflux_empirical_mu # noqa: E402 |
| def mem_gib() -> float: | ||
| with open("/proc/self/status") as fh: | ||
| for line in fh: | ||
| if line.startswith("VmRSS:"): | ||
| return int(line.split()[1]) / 1024 / 1024 | ||
| return 0.0 |
| parser.add_argument("--output", required=True) | ||
| parser.add_argument("--height", type=int, default=256) | ||
| parser.add_argument("--width", type=int, default=256) | ||
| parser.add_argument("--steps", type=int, default=1) |
| if args.height % 32 != 0 or args.width % 32 != 0: | ||
| raise SystemExit("height and width must be multiples of 32") | ||
|
|
| python scripts/generate_cpu_experimental.py \ | ||
| --prompt 'ostrich' \ | ||
| --height 128 \ | ||
| --width 128 \ | ||
| --steps 4 \ | ||
| --seed 7 \ | ||
| --transformer-dir models/bonsai-image-4B-ternary-unpacked/transformer |
| python scripts/generate_cpu_experimental.py \ | ||
| --prompt 'ostrich' \ | ||
| --height 128 \ | ||
| --width 128 \ | ||
| --steps 4 \ | ||
| --seed 7 \ | ||
| --transformer-dir models/bonsai-image-4B-ternary-unpacked/transformer |
Per codex: fastest passing config here was 96x96, 2-step, fp32, 4 threads: 11.4s warm render, about 49.6s total including 38.2s setup. 128x128 at 2 steps was 13.2s warm / 54.7s total, and 128x128 at 4 steps was 33.0s warm / 80.2s total. Old 128x128 4-step baseline here was about 20m42s, so the search materially improved it. This is with 4 neoverse arm vcpus and 24 gigs of ram on a free tier oracle cloud server. Will push latest changes to this branch. |
| @@ -0,0 +1 @@ | |||
| scripts: add warm CPU server benchmark helpers | |||
There was a problem hiding this comment.
can delete this file probably
| @@ -0,0 +1,38 @@ | |||
| ## Summary | |||
There was a problem hiding this comment.
do we need to commit this file?
should the in the PR itself
|
Nice that's faster than I imagined for CPU only, is this running in fp16 or using the 1-bit or ternary packing? Kinda saw mix of both when skimming through the code. |
|
Codex is on it, will post a reply shortly
Sincerely,
Denis Valeev
…On Sun, Jun 7, 2026 at 2:13 PM Pasha Khosravi ***@***.***> wrote:
*khosravipasha* left a comment (PrismML-Eng/Bonsai-Image-Demo#20)
<#20 (comment)>
Nice that's faster than I imagined for CPU only, is this running in fp16
or using the 1-bit or ternary packing? Kinda saw mix of both when skimming
through the code.
Happy to merge it after some clean up and making it blend better with rest
of demo, e.g. have a script that starts a server, etc.
—
Reply to this email directly, view it on GitHub
<#20?email_source=notifications&email_token=AAGX7TWKWSRAATI5PIDAFRD46WWEXA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTINRUGM2TIMZXHE32M4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2KYZTPN52GK4S7MNWGSY3L#issuecomment-4643543797>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGX7TUILCX6Q3HBNLSWAAL46WWEXAVCNFSM6AAAAACZYBFU56VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DMNBTGU2DGNZZG4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
|
Pushed a cleanup pass to this branch. What changed:
On the runtime question: the path here is not fp16 on this ARM CPU box.
Heads-up on the warm timings: the faster numbers from the later server work are for cached prompt requests on a resident server. A brand-new uncached prompt still pays a large cold prompt-encode cost first on this CPU path, so warm cached latency and first-hit latency are very different. The warm-server helper now makes that split explicit. If you want, I can do one more follow-up pass after this and fold the CPU server start path more tightly into the existing demo flow. |
|
For the above example run this: |
|
A lone samurai warrior in ornate black lacquer armor standing in mist, katana held low at his side, crimson silk cords and weathered metal plates, rain droplets glistening on the armor, stern shadowed face under a kabuto helmet, dramatic rim lighting, foggy bamboo forest background, shallow depth of field, cinematic composition, high detail, realistic historical texture, moody atmosphere, ultra-detailed photography style. size: 1024x1024 |




Summary
Adds a focused CPU image generation bring-up report for
bonsai-image.The report centers the strongest demonstrated result: the unpacked transformer CPU path can produce coherent
128x128outputs end to end, including:ostrichWhy this is useful
This makes the current CPU status easier to understand and reproduce.
It documents that:
128x128is a practical validation target for CPU image generationGuidance captured in the report
128x128+for structure/composition validationNotes