Skip to content

Commit 1060ee0

Browse files
authored
VP8 diary: P-frame foundation series + cross-thread fix + cost note (#1594)
Updates the living diary entry at diary/2026-04-26-Claude-Opus-4.7/ result.md with everything that landed since the last update: * New "P-frame foundation series (Apr 27-28)" subsection under Followup work, listing #1586 through #1592 with one-line summaries of what each PR added, plus the two notable bugs surfaced and how they were caught (decoder cnt[CNT_INTRA] mismatch caught by per-MB pixel-dump diagnostics; cross-thread regression caught by Aaron running the example app and reported as a stack trace). * Rewrites Capabilities of the encoder as it stands -- moves keyframe-only out of Implemented into the new key+inter capability line, lists the new ZEROMV LAST_FRAME inter mode and the cross-thread-safe FrameEncoderBuffers, and updates the Not yet implemented list with NEWMV / NEAREST / NEAR / SPLITMV / GOLDEN / ALTREF as the next-step gaps. Also fixes a pre-existing duplicate "## Capabilities of the encoder as it stands" header. * Updates the Roadmap to push real motion estimation to position 1 (the biggest remaining compression lever now that the inter pipeline works) and demotes the now-implemented P-frame foundation item from the list. * Adds a new "## Cost" section near the top recording that the successful encoder write cost approximately EUR 150 in Anthropic credits, spanning the seven-PR keyframe series + SRTP investigation and fix + five-PR P-frame foundation + cross-thread fix. Useful forward reference for what this scope of port-and-debug work costs in 2026 dollars. * Updates the Headline to reflect that the stream is now keyframe + inter rather than keyframe-only.
1 parent 602dc92 commit 1060ee0

1 file changed

Lines changed: 100 additions & 35 deletions

File tree

diary/2026-04-26-Claude-Opus-4.7/result.md

Lines changed: 100 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -7,19 +7,28 @@ This is a **living record** — kept up to date as further progress lands.
77
## Headline
88

99
The fifth AI attempt at this task is the first to clear the wall the previous
10-
four ran into. `VP8Codec.EncodeVideo` produces a fully-decodable VP8 keyframe
11-
stream that Chrome renders correctly and continuously **the
10+
four ran into. `VP8Codec.EncodeVideo` produces a fully-decodable VP8 stream
11+
(keyframe + inter) that Chrome renders correctly and continuously -- **the
1212
WebRTCGetStartedVP8Net example streams audio + video to Chrome at the
13-
default Q=32 / 30 fps for 7+ minutes with no audio loss and no video
14-
artefacts** (running on the `srtp-per-ssrc-rollover-counter` branch — see
15-
"Followup work" below for why that branch matters).
13+
default Q=32 / 30 fps with no audio loss and no video artefacts** (running
14+
on master after the SRTP fix and the P-frame foundation series merged --
15+
see "Followup work" below).
1616

1717
Encoder primitives are individually bit-exact-verified against libvpx C
1818
reference output. The end-to-end stream against Chrome is the structural
1919
signal that the foundation port worked.
2020

2121
Compared to the prior four entries in this diary, the difference is not the
22-
model alone — it's the working method. See "What worked" below.
22+
model alone -- it's the working method. See "What worked" below.
23+
24+
## Cost
25+
26+
Aaron's commission of the work cost approximately **EUR 150 in
27+
Anthropic credits** for the successful encoder write -- spanning the
28+
seven-PR keyframe foundation series, the SRTP rollover-counter
29+
investigation and fix, the five-PR P-frame foundation series, and the
30+
follow-up cross-thread fix. Recorded here for future reference on what
31+
this scope of port-and-debug work costs in 2026 dollars.
2332

2433
## Reframing of the original task
2534

@@ -138,35 +147,87 @@ keyframe-only stream wraps the video sequence number every 30-50
138147
seconds; libvpx's mostly-P-frame stream wraps roughly an order of
139148
magnitude less often, so the bug is statistically much rarer to hit.
140149

141-
## Capabilities of the encoder as it stands
150+
### P-frame foundation series (Apr 27-28)
151+
152+
With the SRTP fix landed and audio + video stable for arbitrarily long
153+
sessions, the next thing on the roadmap was inter (P) frames. Same
154+
foundation-series shape as the original encoder port: a sequence of
155+
small, independently-testable PRs each porting one libvpx primitive,
156+
with the orchestration ticked over to "real" inter encoding only in the
157+
final PR. Five PRs, plus a follow-up cross-thread bug fix.
158+
159+
| PR | Title | What it did |
160+
| --- | --- | --- |
161+
| [#1586](https://github.com/sipsorcery-org/sipsorcery/pull/1586) | P-frame foundation: reference frame storage + key/inter cadence (PR 1 of 5) | `FrameEncoderBuffers.LastFrameY/U/V`, `VP8Codec.KeyframeIntervalFrames`, `_framesSinceLastKeyframe` counter. Inter branch in `EncodeVideo` is wired but still falls through to `EncodeKeyframe` -- decision logic in place, behaviour unchanged. |
162+
| [#1587](https://github.com/sipsorcery-org/sipsorcery/pull/1587) | P-frame foundation: inter (P-frame) header writer (PR 2 of 5) | `bitstream.StartInterFrameHeader` + `FinishInterFrameFirstPartition`. Frame tag with `key_frame_flag = 1`, no start code, no dimensions. Compressed first-partition prefix through `refresh_last_frame`. Bit-exact round-trip tests against the existing decoder's frame-tag parser. |
163+
| [#1588](https://github.com/sipsorcery-org/sipsorcery/pull/1588) | P-frame foundation: ZEROMV inter MB encoder (PR 3 of 5) | `mb_encoder.EncodeMacroblockZeroMvLast`. Same DCT/Walsh/quantize/tokenize pipeline as DC_PRED but the prediction is the same-position 16x16 + 8x8 + 8x8 samples from the previous frame's reconstruction. |
164+
| [#1589](https://github.com/sipsorcery-org/sipsorcery/pull/1589) | P-frame foundation: per-MB inter mode + ref bits writer (PR 4 of 5) | `bitstream.WriteInterMbRefAndMode`, `WriteInterMode`, `vp8_treed_write`, `WriteInterMbZeroMvLast`. The inter-mode tree path bits, walking `vp8_mv_ref_tree` for any of ZEROMV / NEAREST / NEAR / NEW / SPLITMV. Round-trip tested for every (ref_frame, mode) combination against the decoder's `vp8_treed_read`. |
165+
| [#1591](https://github.com/sipsorcery-org/sipsorcery/pull/1591) | P-frame foundation: `EncodeInterFrame` orchestration + `EncodeVideo` wire-up (PR 5 of 5) | `frame_encoder.EncodeInterFrame`. `VP8Codec.EncodeVideo` inter branch now actually emits a P-frame instead of falling through. Round-trip tests at Q=4/16/32 vs source PSNR. |
166+
| [#1592](https://github.com/sipsorcery-org/sipsorcery/pull/1592) | VP8: fix cross-thread inter-frame encoding (regression from #1591) | Lifts `FrameEncoderBuffers` from `[ThreadStatic]` on `frame_encoder` to a per-instance field on `VP8Codec`, so the LAST_FRAME reference survives the .NET thread pool moving the work between worker threads on each Timer tick. |
167+
168+
The single bug surfaced during the series was caught by per-MB pixel
169+
dumping under a moving-content round-trip test. The decoder picks a
170+
row of `vp8_mode_contexts` based on `cnt[CNT_INTRA]`, which it
171+
computes by walking the above/left/aboveleft neighbours' inter state.
172+
For an all-ZEROMV LAST_FRAME stream that's deterministic by MB
173+
position: `(0, 0)` -> 0, edges -> 2, interior -> 5. The encoder
174+
initially used row 0 for every MB; the decoder used different rows;
175+
the boolean coder desynced on the third MB of the first row, and the
176+
test caught it as a flat-DC-PRED block where ZEROMV inter should
177+
have been. Fixed in PR 5 itself.
178+
179+
The cross-thread bug found by Aaron's first test run after #1591
180+
merged is a useful illustration of the foundation-series discipline:
181+
the unit tests passed because they all ran on one thread, but the
182+
example app's `Timer`-driven dispatch path tripped a real defect
183+
the moment inter encoding required cross-call state. Fixed and
184+
regression-tested in #1592 -- a `Task.Factory.StartNew(LongRunning)`
185+
test that asserts the keyframe and inter calls land on different
186+
ManagedThreadIds *and* both encode successfully.
142187

143188
## Capabilities of the encoder as it stands
144189

145190
Implemented:
146191

147-
- Keyframe-only encoding (every emitted frame is `KEY_FRAME`).
192+
- Keyframe + inter-frame (P-frame) encoding. `VP8Codec.KeyframeIntervalFrames`
193+
controls cadence (default 30 -> 1 keyframe/sec at 30 fps); intermediate
194+
frames are inter.
195+
- Inter mode: ZEROMV referencing LAST_FRAME for every macroblock. Same-position
196+
16x16 Y + 8x8 U + 8x8 V samples from the previous frame's reconstruction.
148197
- Single-partition layout (`log2_nbr_of_dct_partitions = 0`).
149-
- DC_PRED for both Y (16×16) and UV (8×8); no other intra modes.
198+
- DC_PRED for both Y (16x16) and UV (8x8); no other intra modes.
150199
- Forward DCT + Walsh, regular quantizer, full coefficient tokenizer
151200
including the `skip_eob_node` and CAT1..CAT6 paths.
152201
- Per-MB above/left entropy contexts maintained internally, combined via
153202
libvpx's `VP8_COMBINEENTROPYCONTEXTS` rule (count of non-zero
154203
neighbours, in {0, 1, 2}).
204+
- Per-MB inter mode context: `cnt[CNT_INTRA]` computed correctly across
205+
MB position so the encoder's `vp8_mode_contexts` row matches the
206+
decoder's.
207+
- Per-MB skip optimisation: skippable MBs (all 25 transformed blocks
208+
EOB-only) suppress their tokens in partition 1.
155209
- Default base quantizer of 32 (no rate control yet).
156-
- Output is pure I420 in, byte stream out — same shape as the existing
210+
- Cross-thread-safe encoding: `FrameEncoderBuffers` is per-codec-instance,
211+
so `Timer`-dispatched stream sources work correctly.
212+
- Output is pure I420 in, byte stream out -- same shape as the existing
157213
decoder.
158214

159-
Not yet implemented (and called out explicitly so the next session has a
160-
clean starting list):
161-
162-
- Inter / P-frames. Every emitted frame is a key-frame.
163-
- Motion estimation, motion vectors, reference frame management.
164-
- Mode picking — DC_PRED is the only intra mode used.
215+
Not yet implemented:
216+
217+
- **Real motion estimation.** Every inter MB is ZEROMV. Source content
218+
with actual motion gets encoded as residuals against a stationary
219+
prediction, which works correctness-wise but loses most of the
220+
compression benefit a real motion-compensated encoder would give.
221+
- NEWMV (encoded motion vectors), NEAREST / NEAR (predicted MVs),
222+
SPLITMV (4x4 sub-MB partitions).
223+
- GOLDEN / ALTREF reference frames -- only LAST_FRAME is supported.
224+
- Mode picking -- DC_PRED is the only intra mode used. Other intra
225+
modes (V_PRED, H_PRED, TM_PRED, B_PRED) and an RD-style picker.
165226
- RD optimisation, segmentation, loop-filter level tuning.
166227
- Rate control / target bitrate.
167228
- Coefficient probability updates (1056 zero bits are written for "no
168-
update" leaves the decoder using the default tables).
169-
- `EncodeVideoFaster` / `DecodeVideoFaster` still throw.
229+
update" -- leaves the decoder using the default tables).
230+
- `EncodeVideoFaster` / `DecodeVideoFaster` -- still throw.
170231

171232
## Known issues
172233

@@ -215,25 +276,29 @@ shape that broke that streak this time:
215276

216277
## Roadmap
217278

218-
With the SRTP fix in place the encoder is functional for production
219-
streaming at default settings. Remaining items are quality / efficiency
220-
enhancements rather than correctness fixes:
221-
222-
1. **Other intra modes** (V_PRED, H_PRED, TM_PRED for Y; the 4 UV
223-
modes; B_PRED for 4×4 luma). Improves compression on detailed
279+
With keyframe encoding, P-frame foundation, and the SRTP fix all in
280+
place, the encoder is functional for production streaming at default
281+
settings. Remaining items are quality / efficiency enhancements rather
282+
than correctness fixes:
283+
284+
1. **Real motion estimation**: NEWMV with a motion-vector search, plus
285+
the NEAREST / NEAR predicted-MV modes that depend on neighbour MV
286+
accumulation. The biggest single compression-quality lever left --
287+
for typical webcam content with bounded motion this would drop the
288+
inter bitrate by another order of magnitude vs ZEROMV. Itself a
289+
foundation-series-shaped sequence of PRs (MV entropy coding, search
290+
primitive, mode picker integration).
291+
2. **Other intra modes** (V_PRED, H_PRED, TM_PRED for Y; the 4 UV
292+
modes; B_PRED for 4x4 luma). Improves compression on detailed
224293
content; the current DC_PRED-only encoder is the lowest-quality
225294
intra option.
226-
2. **A trivial mode picker** — pick the one that minimises
227-
sum-of-squared error on the residual. Pairs with (1).
228-
3. **Inter / P-frames**: motion vector entropy, `vp8_pack_mb_row` for
229-
non-keyframe partitions, `last_frame` reference frame management,
230-
ZEROMV / NEAREST / NEAR / NEWMV. (This is where libvpx is biggest;
231-
would itself span several PRs.) Would drop the steady-state
232-
bitrate by 10-100×, dramatically lowering bandwidth for typical
233-
webcam content. No longer urgent now that the encoder works at
234-
default settings.
235-
4. **Optional: rate control loop.** Useful only if production deployments
236-
want guaranteed bitrate caps.
295+
3. **A trivial mode picker** -- pick the one that minimises
296+
sum-of-squared error on the residual. Pairs with (1) and (2).
297+
4. **Loop filter on the encoder side** (currently `FilterLevel = 0`,
298+
so the bitstream signals "filter off"; turning it on would
299+
reduce blocking artefacts at lower quality settings).
300+
5. **Optional: rate control loop.** Useful only if production
301+
deployments want guaranteed bitrate caps.
237302

238303
Each of these is plausibly one PR of foundation-series scope.
239304

0 commit comments

Comments
 (0)