Dv/cpu image bringup report by DenisValeev · Pull Request #20 · PrismML-Eng/Bonsai-Image-Demo

DenisValeev · 2026-06-03T02:42:13Z

Summary

Adds a focused CPU image generation bring-up report for bonsai-image.

The report centers the strongest demonstrated result: the unpacked transformer CPU path can produce coherent 128x128 outputs end to end, including:

plain ostrich
a coherent ostrich silhouette
a large centered red circle
a clean 4-quadrant multi-color layout

Why this is useful

This makes the current CPU status easier to understand and reproduce.

It documents that:

the unpacked CPU path can converge on globally coherent images
both geometric and object-level prompts can work on CPU
128x128 is a practical validation target for CPU image generation

Guidance captured in the report

use 128x128+ for structure/composition validation
judge the active path primarily by final outputs
use the unpacked transformer CPU path as a reference-capable CPU configuration

Notes

this is a bring-up/status report, not a claim that every CPU failure mode is solved
no host-specific, secret, or private-environment details are included

khosravipasha · 2026-06-06T01:18:17Z

oh nice, how fast was it?

Copilot

Pull request overview

Adds an experimental CPU image-generation script plus accompanying bring-up/report docs to document and reproduce coherent 128x128 CPU outputs (notably on the unpacked transformer path) for bonsai-image.

Changes:

Added a standalone scripts/generate_cpu_experimental.py script to run Flux2 CPU diffusion with logging and optional step image dumps.
Added two markdown reports capturing validated CPU bring-up results and suggested practical guidance.
Documented an example command shape intended to reproduce the 128x128, 4-step CPU results.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File	Description
scripts/generate_cpu_experimental.py	New experimental CPU generation entrypoint (prompt encode → diffusion → VAE decode) with model loading helpers and detailed logging.
docs/upstream_issue_cpu_bringup.md	Focused bring-up/status note describing the strongest validated CPU results and reproduction shape.
docs/upstream_cpu_image_report_draft.md	Longer-form draft report consolidating the same CPU bring-up evidence and guidance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+REPO_ROOT = Path(__file__).resolve().parents[1]
+sys.path.insert(0, str((REPO_ROOT / "vendor" / "image-studio").resolve()))
+
+from backend_gpu.diffusion_klein import _mflux_empirical_mu  # noqa: E402


+def mem_gib() -> float:
+    with open("/proc/self/status") as fh:
+        for line in fh:
+            if line.startswith("VmRSS:"):
+                return int(line.split()[1]) / 1024 / 1024
+    return 0.0


+    parser.add_argument("--output", required=True)
+    parser.add_argument("--height", type=int, default=256)
+    parser.add_argument("--width", type=int, default=256)
+    parser.add_argument("--steps", type=int, default=1)


+    if args.height % 32 != 0 or args.width % 32 != 0:
+        raise SystemExit("height and width must be multiples of 32")
+


+python scripts/generate_cpu_experimental.py \
+  --prompt 'ostrich' \
+  --height 128 \
+  --width 128 \
+  --steps 4 \
+  --seed 7 \
+  --transformer-dir models/bonsai-image-4B-ternary-unpacked/transformer


+python scripts/generate_cpu_experimental.py \
+  --prompt 'ostrich' \
+  --height 128 \
+  --width 128 \
+  --steps 4 \
+  --seed 7 \
+  --transformer-dir models/bonsai-image-4B-ternary-unpacked/transformer


DenisValeev · 2026-06-06T01:32:06Z

oh nice, how fast was it?

Per codex: fastest passing config here was 96x96, 2-step, fp32, 4 threads: 11.4s warm render, about 49.6s total including 38.2s setup. 128x128 at 2 steps was 13.2s warm / 54.7s total, and 128x128 at 4 steps was 33.0s warm / 80.2s total. Old 128x128 4-step baseline here was about 20m42s, so the search materially improved it.

This is with 4 neoverse arm vcpus and 24 gigs of ram on a free tier oracle cloud server. Will push latest changes to this branch.

DenisValeev · 2026-06-06T01:39:21Z

DenisValeev · 2026-06-06T15:16:25Z

khosravipasha · 2026-06-07T18:06:23Z

@@ -0,0 +1 @@
+scripts: add warm CPU server benchmark helpers


can delete this file probably

khosravipasha · 2026-06-07T18:06:47Z

@@ -0,0 +1,38 @@
+## Summary


do we need to commit this file?
should the in the PR itself

khosravipasha · 2026-06-07T18:13:08Z

Nice that's faster than I imagined for CPU only, is this running in fp16 or using the 1-bit or ternary packing? Kinda saw mix of both when skimming through the code.
Happy to merge it after some clean up and making it blend better with rest of demo, e.g. have a script that starts a server, etc.

DenisValeev · 2026-06-07T18:16:32Z

Codex is on it, will post a reply shortly Sincerely, Denis Valeev

…

On Sun, Jun 7, 2026 at 2:13 PM Pasha Khosravi ***@***.***> wrote: *khosravipasha* left a comment (PrismML-Eng/Bonsai-Image-Demo#20) <#20 (comment)> Nice that's faster than I imagined for CPU only, is this running in fp16 or using the 1-bit or ternary packing? Kinda saw mix of both when skimming through the code. Happy to merge it after some clean up and making it blend better with rest of demo, e.g. have a script that starts a server, etc. — Reply to this email directly, view it on GitHub <#20?email_source=notifications&email_token=AAGX7TWKWSRAATI5PIDAFRD46WWEXA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTINRUGM2TIMZXHE32M4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2KYZTPN52GK4S7MNWGSY3L#issuecomment-4643543797>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGX7TUILCX6Q3HBNLSWAAL46WWEXAVCNFSM6AAAAACZYBFU56VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DMNBTGU2DGNZZG4> . You are receiving this because you authored the thread.Message ID: ***@***.***>

DenisValeev · 2026-06-07T18:21:19Z

Pushed a cleanup pass to this branch.

What changed:

deleted docs/pr_title.txt
deleted docs/pr_body.md
fixed the Copilot nits in scripts/generate_cpu_experimental.py
- only prepend vendor/image-studio if it exists
- make RSS logging fail-safe when /proc/self/status is unavailable
- default --steps to 4 to match the validated/report path
- reject non-positive --steps early
fixed the example commands in both docs to include required --output
added scripts/start_cpu_image_server.sh so the warm CPU image path has a demo-shaped server entrypoint instead of only a raw Python module

On the runtime question: the path here is not fp16 on this ARM CPU box.

model family: bonsai-image-4B-ternary
transformer path for the stronger CPU runs: unpacked transformer sibling, not GemLite dense reconstruction
main inference dtype on this host: float32
text encoder: quantized on disk, then dequantized for prompt encoding on CPU

Heads-up on the warm timings: the faster numbers from the later server work are for cached prompt requests on a resident server. A brand-new uncached prompt still pays a large cold prompt-encode cost first on this CPU path, so warm cached latency and first-hit latency are very different. The warm-server helper now makes that split explicit.

If you want, I can do one more follow-up pass after this and fold the CPU server start path more tightly into the existing demo flow.

DenisValeev · 2026-06-07T20:53:06Z

For anything substantial it's pretty slow.

But as a fire and forget with a follow-up to telegram it may be of interest.

image 1024x1024 4-step total 1928.8s

seed: 3995096562
prewarm: 669.0s
restart_wait: 195.0s
render: 1031.8s
max_seq: 128

So about 2000 seconds end to end.

DenisValeev · 2026-06-07T20:58:16Z

For the above example run this:

python scripts/generate_cpu_experimental.py --prompt "Macro close-up of an iridescent peacock feather eye, emerald green and teal strands radiating outward, deep blue-black oval center, shimmering metallic texture, shallow depth of field, soft dark blurred background, glossy natural fibers, cinematic macro photography, high detail." --output outputs/cpu-peacock-feather.png --height 1024 --width 1024 --steps 4 --max-seq 128 --transformer-dir models/bonsai-image-4B-ternary-unpacked/transformer

DenisValeev · 2026-06-07T21:34:42Z

A lone samurai warrior in ornate black lacquer armor standing in mist, katana held low at his side, crimson silk cords and weathered metal plates, rain droplets glistening on the armor, stern shadowed face under a kabuto helmet, dramatic rim lighting, foggy bamboo forest background, shallow depth of field, cinematic composition, high detail, realistic historical texture, moody atmosphere, ultra-detailed photography style.

size: 1024x1024
steps: 4
seed: 2265673322
prewarm: 255.0s
restart_wait: 93.0s
render: 1031.6s
total: 1411.6s

DenisValeev added 4 commits June 3, 2026 02:25

docs: add CPU image generation bring-up report

7c36d10

docs: add upstream CPU bring-up issue draft

d8bd9b3

scripts: add experimental CPU image generation tools

39c7141

scripts: drop block permutation debug helper

47734f4

khosravipasha requested a review from Copilot June 6, 2026 01:18

Copilot started reviewing on behalf of khosravipasha June 6, 2026 01:18 View session

Copilot AI reviewed Jun 6, 2026

View reviewed changes

scripts: add CPU prompt-cache and autoresearch tuning

058a03e

scripts: add warm CPU server benchmark helpers

950e2de

khosravipasha reviewed Jun 7, 2026

View reviewed changes

cpu image: clean up bringup PR

bee8015

		if args.height % 32 != 0 or args.width % 32 != 0:
		raise SystemExit("height and width must be multiples of 32")

Conversation

DenisValeev commented Jun 3, 2026

Summary

Why this is useful

Guidance captured in the report

Notes

Uh oh!

khosravipasha commented Jun 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

DenisValeev commented Jun 6, 2026

Uh oh!

DenisValeev commented Jun 6, 2026

Uh oh!

DenisValeev commented Jun 6, 2026

Uh oh!

khosravipasha Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

khosravipasha Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

khosravipasha commented Jun 7, 2026

Uh oh!

DenisValeev commented Jun 7, 2026 via email

Uh oh!

DenisValeev commented Jun 7, 2026

Uh oh!

DenisValeev commented Jun 7, 2026

Uh oh!

DenisValeev commented Jun 7, 2026

Uh oh!

DenisValeev commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants