-
Notifications
You must be signed in to change notification settings - Fork 513
Expand file tree
/
Copy pathdirty-flag.html
More file actions
639 lines (618 loc) · 40.6 KB
/
dirty-flag.html
File metadata and controls
639 lines (618 loc) · 40.6 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-type" content="text/html;charset=UTF-8" />
<title>Dirty Flag · Optimization Patterns · Game Programming Patterns</title>
<!-- Tell mobile browsers we're optimized for them and they don't need to crop
the viewport. -->
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<link rel="stylesheet" type="text/css" href="style.css" />
<link href="https://fonts.googleapis.com/css?family=Merriweather:400,400italic,700,700italic|Source+Code+Pro|Source+Sans+Pro:200,300,400,600,400italic,600italic|Rock+Salt" rel="stylesheet" type="text/css">
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-42804721-1', 'gameprogrammingpatterns.com');
ga('send', 'pageview');
</script>
<script src="jquery-3.6.0.min.js"></script>
<script src="script.js"></script>
</head>
<body id="top">
<div class="page sidebar">
<span class="theme-toggler" title="dark theme" onclick="toggleTheme()"></span>
<div class="content">
<nav class="top">
<span class="prev">← <a href="data-locality.html">Previous Chapter</a></span>
<span class="next"><a href="object-pool.html">Next Chapter</a> →</span>
<span class="toc">≡ <a href="/">The Book</a></span>
</nav>
<h1>Dirty Flag</h1>
<h1 class="book"><a href="/">Game Programming Patterns</a><span class="section"><a href="optimization-patterns.html">Optimization Patterns</a></span></h1>
<h2><a href="#intent" name="intent">Intent</a></h2>
<p><em>Avoid unnecessary work by deferring it until the result is needed.</em></p>
<h2><a href="#motivation" name="motivation">Motivation</a></h2>
<p>Many games have something called a <em>scene graph</em>. This is a big data structure
that contains all of the objects in the world. The rendering engine uses this to
determine where to draw stuff on the screen.</p>
<p>At its simplest, a scene graph is just a flat list of objects. Each object has a
model, or some other graphic primitive, and a <span
name="transform"><em>transform</em></span>. The transform describes the object’s
position, rotation, and scale in the world. To move or turn an object, we simply
change its transform.</p>
<aside name="transform">
<p>The mechanics of <em>how</em> this transform is stored and manipulated are unfortunately
out of scope here. The comically abbreviated summary is that it’s a 4x4 matrix.
You can make a single transform that combines two transforms — for example,
translating and then rotating an object — by multiplying the two matrices.</p>
<p>How and why that works is left as an exercise for the reader.</p>
</aside>
<p>When the renderer draws an object, it takes the object’s model, applies the
transform to it, and then renders it there in the world. If we had a scene
<em>bag</em> and not a scene <em>graph</em>, that would be it, and life would be simple.</p>
<p>However, most scene graphs are <span name="hierarchical"><em>hierarchical</em></span>.
An object in the graph may have a parent object that it is anchored to. In that
case, its transform is relative to the <em>parent’s</em> position and isn’t an
absolute position in the world.</p>
<p>For example, imagine our game world has a pirate ship at sea. Atop the ship’s
mast is a crow’s nest. Hunched in that crow’s nest is a pirate. Clutching the
pirate’s shoulder is a parrot. The ship’s local transform positions the ship in
the sea. The crow’s nest’s transform positions the nest on the ship, and so on.</p>
<p><span name="pirate"></span>
<img src="images/dirty-flag-pirate.png" alt="A pirate ship containing a crow's nest with a pirate in it with a parrot on his shoulder." /></p>
<aside name="pirate">
<p>Programmer art!</p>
</aside>
<p>This way, when a parent object moves, its children move with it automatically.
If we change the local transform of the ship, the crow’s nest, pirate, and
parrot go along for the ride. It would be a total <span
name="slide">headache</span> if, when the ship moved, we had to manually adjust
the transforms of all the objects on it to keep them from sliding off.</p>
<aside name="slide">
<p>To be honest, when you are at sea you <em>do</em> have to keep manually adjusting your
position to keep from sliding off. Maybe I should have chosen a drier example.</p>
</aside>
<p>But to actually draw the parrot on screen, we need to know its absolute position
in the world. I’ll call the parent-relative transform the object’s <em>local
transform</em>. To render an object, we need to know its <em>world transform</em>.</p>
<h3><a href="#local-and-world-transforms" name="local-and-world-transforms">Local and world transforms</a></h3>
<p>Calculating an object’s world transform is pretty straightforward — you just walk
its parent chain starting at the root all the way down to the object, combining
transforms as you go. In other words, the parrot’s world transform is:</p>
<p><span name="degenerate"></span>
<img src="images/dirty-flag-multiply.png" alt="The parrot's world position comes from multiplying the local positions for the ship, nest, pirate, and parrot." /></p>
<aside name="degenerate">
<p>In the degenerate case where the object has no parent, its local and world
transforms are equivalent.</p>
</aside>
<p>We need the world transform for every object in the world every frame, so even
though there are only a handful of matrix multiplications per model, it’s on the hot
code path where performance is critical. Keeping them up to date is tricky
because when a parent object moves, that affects the world transform of itself
and all of its children, recursively.</p>
<p>The simplest approach is to calculate transforms on the fly while
rendering. Each frame, we recursively traverse the scene graph starting at the
top of the hierarchy. For each object, we calculate its world transform right
then and draw it.</p>
<p>But this is terribly wasteful of our precious CPU juice! Many objects in the
world are <em>not</em> moving every frame. Think of all of the static geometry that
makes up the level. Recalculating their world transforms each frame even though
they haven’t changed is a waste.</p>
<h3><a href="#cached-world-transforms" name="cached-world-transforms">Cached world transforms</a></h3>
<p>The obvious answer is to <em>cache</em> it. In each object, we store its local
transform and its derived world transform. When we render, we only use the
precalculated world transform. If the object never moves, the cached transform
is always up to date and everything’s happy.</p>
<p>When an object <em>does</em> move, the simple approach is to refresh its world
transform right then. But don’t forget the hierarchy! When a parent moves, we
have to recalculate its world transform <em>and all of its children’s,
recursively</em>.</p>
<p>Imagine some busy gameplay. In a single frame, the ship gets tossed on the
ocean, the crow’s nest rocks in the wind, the pirate leans to the edge, and the
parrot hops onto his head. We changed four local transforms. If we recalculate
world transforms eagerly whenever a local transform changes, what ends up
happening?</p>
<p><span name="stars"></span>
<img src="images/dirty-flag-update-bad.png" alt="Any time an object moves, the world coordinates are recalculated eagerly and redundantly." /></p>
<aside name="stars">
<p>You can see on the lines marked ★ that we’re recalculating the parrot’s
world transform <em>four</em> times when we only need the result of the final one.</p>
</aside>
<p>We only moved four objects, but we did <em>ten</em> world transform calculations.
That’s six pointless calculations that get thrown out before they are ever used
by the renderer. We calculated the parrot’s world transform <em>four</em> times, but it
is only rendered once.</p>
<p>The problem is that a world transform may depend on several local transforms.
Since we recalculate immediately each time <em>one</em> of the transforms changes, we end up
recalculating the same transform multiple times when more than one of the local
transforms it depends on changes in the same frame.</p>
<h3><a href="#deferred-recalculation" name="deferred-recalculation">Deferred recalculation</a></h3>
<p>We’ll solve this by <span name="decoupling">decoupling</span> changing local
transforms from updating the world transforms. This lets us change a bunch of
local transforms in a single batch and <em>then</em> recalculate the affected world
transform just once after all of those modifications are done, right before we
need it to render.</p>
<aside name="decoupling">
<p>It’s interesting how much of software architecture is intentionally
engineering a little slippage.</p>
</aside>
<p>To do this, we add a <em>flag</em> to each object in the graph. “Flag” and “bit” are
synonymous in programming — they both mean a single micron of data that can be
in one of two states. We call those “true” and “false”, or sometimes “set” and
“cleared”. I’ll use all of these interchangeably.</p>
<p>When the local transform
changes, we set it. When we need the object’s world transform, we check the
flag. If it’s set, we calculate the world transform and then clear the flag. The
flag represents, “Is the world transform out of date?” For reasons that aren’t
entirely clear, the traditional name for this “out-of-date-ness” is “dirty”.
Hence: <em>a dirty flag</em>. “Dirty bit” is an equally
<span name="specific">common</span> name for this pattern, but I figured I’d
stick with the name that didn’t seem as prurient.</p>
<aside name="specific">
<p>Wikipedia’s editors don’t have my level of self-control and went with <a href="http://en.wikipedia.org/wiki/Dirty_bit">dirty
bit</a>.</p>
</aside>
<p>If we apply this pattern and then move all of the objects in our previous
example, the game ends up doing:</p>
<p><img src="images/dirty-flag-update-good.png" alt="By deferring until all moves are done, we only recalculate once." /></p>
<p>That’s the best you could hope to do — the world transform for each affected
object is calculated exactly once. With only a single bit of data, this pattern
does a few things for us:</p>
<ul>
<li>
<p>It collapses modifications to multiple local transforms along an object’s
parent chain into a single recalculation on the object.</p>
</li>
<li>
<p>It avoids recalculation on objects that didn’t move.</p>
</li>
<li>
<p>And a minor bonus: if an object gets removed before it’s rendered, it
doesn’t calculate its world transform at all.</p>
</li>
</ul>
<h2><a href="#the-pattern" name="the-pattern">The Pattern</a></h2>
<p>A set of <strong>primary data</strong> changes over time. A set of <strong>derived data</strong> is
determined from this using some <strong>expensive process</strong>. A <strong>“dirty” flag</strong> tracks
when the derived data is out of sync with the primary data. It is <strong>set when the
primary data changes</strong>. If the flag is set when the derived data is needed, then
<strong>it is reprocessed and the flag is cleared.</strong> Otherwise, the previous <strong>cached
derived data</strong> is used.</p>
<h2><a href="#when-to-use-it" name="when-to-use-it">When to Use It</a></h2>
<p>Compared to some other patterns in this book, this one solves a pretty specific
problem. Also, like most optimizations, you should only reach for it when you
have a performance problem big enough to justify the added code complexity.</p>
<p>Dirty flags are applied to two kinds of work: <em>calculation</em> and
<em>synchronization</em>. In both cases, the process of going from the primary data to
the derived data is time-consuming or otherwise costly.</p>
<p>In our scene graph example, the process is slow because of the amount of math to
perform. When using this pattern for synchronization, on the other hand, it’s
more often that the derived data is <em>somewhere else</em> — either on disk or over
the network on another machine — and simply getting it from point A to point B
is what’s expensive.</p>
<p>There are a couple of other requirements too:</p>
<ul>
<li>
<p><strong>The primary data has to change more often than the derived data is used.</strong>
This pattern works by avoiding processing derived data when a subsequent
primary data change would invalidate it before it gets used. If you find
yourself always needing that derived data after every single modification
to the primary data, this pattern can’t help.</p>
</li>
<li>
<p><strong>It should be hard to update incrementally.</strong> Let’s say the
pirate ship in our game can only carry so much booty. We need to
know the total weight of everything in the hold. We
<em>could</em> use this pattern and have a dirty flag for the total weight. Every
time we add or remove some loot, we set the flag. When we need the
total, we add up all of the booty and clear the flag.</p>
<p>But a simpler solution is to <em>keep a running total</em>. When we add or
remove an item, just add or remove its weight from the current total. If
we can “pay as we go” like this and keep the derived data updated, then
that’s often a better choice than using this pattern and calculating the
derived data from scratch when needed.</p>
</li>
</ul>
<p>This makes it sound like dirty flags are rarely appropriate, but you’ll
find a place here or there where they help. <span name="hacks">Searching</span>
your average game codebase for the word “dirty” will often turn up uses of this
pattern.</p>
<aside name="hacks">
<p>From my research, it also turns up a lot of comments apologizing for “dirty”
hacks.</p>
</aside>
<h2><a href="#keep-in-mind" name="keep-in-mind">Keep in Mind</a></h2>
<p>Even after you’ve convinced yourself this pattern is a good fit, there are a few
wrinkles that can cause you some discomfort.</p>
<h3><a href="#there-is-a-cost-to-deferring-for-too-long" name="there-is-a-cost-to-deferring-for-too-long">There is a cost to deferring for too long</a></h3>
<p>This pattern defers some slow work until the result is actually needed, but when
it is, it’s often needed <em>right now</em>. But the reason we’re using this pattern to
begin with is because calculating that result is slow!</p>
<p>This isn’t a problem in our example because we can still calculate world
coordinates fast enough to fit within a frame, but you can imagine other cases
where the work you’re doing is a big chunk that takes noticeable time to chew
through. If the game doesn’t <em>start</em> chewing until right when the player expects
to see the result, that can cause an unpleasant visible <span
name="gc">pause</span>.</p>
<p>Another problem with deferring is that if something goes wrong, you may fail to
do the work at all. This can be particularly problematic when you’re using this
pattern to save some state to a more persistent form.</p>
<p>For example, text editors know if your document has “unsaved changes”. That
little bullet or star in your file’s title bar is literally the dirty flag
visualized. The primary data is the open document in memory, and the derived
data is the file on disk.</p>
<p><img src="images/dirty-flag-title-bar.png" alt="A window titlebar showing the little icon representing unsaved changes." /></p>
<p>Many programs don’t save to disk until either the document is closed or the
application is exited. That’s fine most of the time, but if you accidentally
kick the power cable out, there goes your masterpiece.</p>
<p>Editors that auto-save a backup in the background are compensating specifically
for this shortcoming. The auto-save frequency is a point on the continuum
between not losing too much work when a crash occurs and not thrashing the file
system too much by saving all the time.</p>
<aside name="gc">
<p>This mirrors the different garbage collection strategies in systems that
automatically manage memory. Reference counting frees memory the second it’s no
longer needed, but it burns CPU time updating ref counts eagerly every time
references are changed.</p>
<p>Simple garbage collectors defer reclaiming memory until it’s really needed, but
the cost is the dreaded “GC pause” that can freeze your entire game until the
collector is done scouring the heap.</p>
<p>In between the two are more complex systems like deferred ref-counting and
incremental GC that reclaim memory less eagerly than pure ref-counting but more
eagerly than stop-the-world collectors.</p>
</aside>
<h3><a href="#you-have-to-make-sure-to-set-the-flag-*every*-time-the-state-changes" name="you-have-to-make-sure-to-set-the-flag-*every*-time-the-state-changes">You have to make sure to set the flag <em>every</em> time the state changes</a></h3>
<p>Since the derived data is calculated from the primary data, it’s essentially a
cache. Whenever you have cached data, the trickiest aspect of it is <span
name="cache"><em>cache invalidation</em></span> — correctly noting when the cache is
out of sync with its source data. In this pattern, that means setting the dirty
flag when <em>any</em> primary data changes.</p>
<aside name="cache">
<p>Phil Karlton famously said, “There are only two hard things in Computer Science:
cache invalidation and naming things.”</p>
</aside>
<p>Miss it in one place, and your program will incorrectly use stale derived data.
This leads to confused players and bugs that are very hard to track down. When you use
this pattern, you’ll have to take care that any code that modifies the primary
state also sets the dirty flag.</p>
<p>One way to mitigate this is by encapsulating modifications to the primary data
behind some interface. If anything that can change the state goes through a
single narrow API, you can set the dirty flag there and rest assured that it
won’t be missed.</p>
<h3><a href="#you-have-to-keep-the-previous-derived-data-in-memory" name="you-have-to-keep-the-previous-derived-data-in-memory">You have to keep the previous derived data in memory</a></h3>
<p><span name="sync"></span></p>
<p>When the derived data is needed and the dirty flag <em>isn’t</em> set, it uses the
previously calculated data. This is obvious, but that does imply that you have
to keep that derived data around in memory in case you end up needing it later.</p>
<aside name="sync">
<p>This isn’t much of an issue when you’re using this pattern to synchronize the
primary state to some other place. In that case, the derived data isn’t usually
in memory at all.</p>
</aside>
<p>If you weren’t using this pattern, you could calculate the derived data on the
fly whenever you needed it, then discard it when you were done. That avoids the
expense of keeping it cached in memory at the cost of having to do that
calculation every time you need the result.</p>
<p>Like many optimizations, then, this pattern <span name="trade">trades</span>
memory for speed. In return for keeping the previously calculated data in
memory, you avoid having to recalculate it when it hasn’t changed. This
trade-off makes sense when the calculation is slow and memory is cheap. When
you’ve got more time than memory on your hands, it’s better to calculate it
as needed.</p>
<aside name="trade">
<p>Conversely, compression algorithms make the opposite trade-off: they optimize
<em>space</em> at the expense of the processing time needed to decompress.</p>
</aside>
<h2><a href="#sample-code" name="sample-code">Sample Code</a></h2>
<p>Let’s assume we’ve met the surprisingly long list of requirements and see how
the pattern looks in code. As I mentioned before, the actual math behind
transform matrices is beyond the humble aims of this book, so I’ll just
encapsulate that in a class whose implementation you can presume exists
somewhere out in the æther:</p>
<div class="codehilite"><pre><span></span><code><span class="k">class</span> <span class="nc">Transform</span>
<span class="p">{</span>
<span class="k">public</span><span class="o">:</span>
<span class="k">static</span> <span class="n">Transform</span> <span class="n">origin</span><span class="p">();</span>
<span class="n">Transform</span> <span class="nf">combine</span><span class="p">(</span><span class="n">Transform</span><span class="o">&</span> <span class="n">other</span><span class="p">);</span>
<span class="p">};</span>
</code></pre></div>
<p>The only operation we need here is <code>combine()</code> so that we can get an object’s
world transform by combining all of the local transforms along its parent chain.
It also has a method to get an “origin” transform — basically an identity
matrix that means no translation, rotation, or scaling at all.</p>
<p>Next, we’ll sketch out the class for an object in the scene graph. This is the
bare minimum we need <em>before</em> applying this pattern:</p>
<div class="codehilite"><pre><span></span><code><span class="k">class</span> <span class="nc">GraphNode</span>
<span class="p">{</span>
<span class="k">public</span><span class="o">:</span>
<span class="n">GraphNode</span><span class="p">(</span><span class="n">Mesh</span><span class="o">*</span> <span class="n">mesh</span><span class="p">)</span>
<span class="o">:</span> <span class="n">mesh_</span><span class="p">(</span><span class="n">mesh</span><span class="p">),</span>
<span class="n">local_</span><span class="p">(</span><span class="n">Transform</span><span class="o">::</span><span class="n">origin</span><span class="p">())</span>
<span class="p">{}</span>
<span class="k">private</span><span class="o">:</span>
<span class="n">Transform</span> <span class="n">local_</span><span class="p">;</span>
<span class="n">Mesh</span><span class="o">*</span> <span class="n">mesh_</span><span class="p">;</span>
<span class="n">GraphNode</span><span class="o">*</span> <span class="n">children_</span><span class="p">[</span><span class="n">MAX_CHILDREN</span><span class="p">];</span>
<span class="kt">int</span> <span class="n">numChildren_</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div>
<p>Each node has a local transform which describes where it is relative to its
parent. It has a mesh which is the actual graphic for the object. (We’ll allow
<code>mesh_</code> to be <code>NULL</code> too to handle non-visual nodes that are used just to group
their children.) Finally, each node has a possibly empty collection of child
nodes.</p>
<p>With this, a “scene graph” is really only a single root <code>GraphNode</code> whose
children (and grandchildren, etc.) are all of the objects in the world:</p>
<div class="codehilite"><pre><span></span><code><span class="n">GraphNode</span><span class="o">*</span> <span class="n">graph_</span> <span class="o">=</span> <span class="k">new</span> <span class="n">GraphNode</span><span class="p">(</span><span class="nb">NULL</span><span class="p">);</span>
<span class="c1">// Add children to root graph node...</span>
</code></pre></div>
<p>In order to render a scene graph, all we need to do is traverse that tree of
nodes, starting at the root, and call the following function for each node’s mesh
with the right world transform:</p>
<div class="codehilite"><pre><span></span><code><span class="kt">void</span> <span class="nf">renderMesh</span><span class="p">(</span><span class="n">Mesh</span><span class="o">*</span> <span class="n">mesh</span><span class="p">,</span> <span class="n">Transform</span> <span class="n">transform</span><span class="p">);</span>
</code></pre></div>
<p>We won’t implement this here, but if we did, it would do whatever magic the
renderer needs to draw that mesh at the given location in the world. If we can
call that correctly and efficiently on every node in the scene graph, we’re
happy.</p>
<h3><a href="#an-unoptimized-traversal" name="an-unoptimized-traversal">An unoptimized traversal</a></h3>
<p>To get our hands dirty, let’s throw together a basic traversal for rendering the
scene graph that calculates the world positions on the fly. It won’t be optimal,
but it will be simple. We’ll add a new method to <code>GraphNode</code>:</p>
<div class="codehilite"><pre><span></span><code><span class="kt">void</span> <span class="nf">GraphNode::render</span><span class="p">(</span><span class="n">Transform</span> <span class="n">parentWorld</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">Transform</span> <span class="n">world</span> <span class="o">=</span> <span class="n">local_</span><span class="p">.</span><span class="n">combine</span><span class="p">(</span><span class="n">parentWorld</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">mesh_</span><span class="p">)</span> <span class="n">renderMesh</span><span class="p">(</span><span class="n">mesh_</span><span class="p">,</span> <span class="n">world</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">numChildren_</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">children_</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">-></span><span class="n">render</span><span class="p">(</span><span class="n">world</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>We pass the world transform of the node’s parent into this using <code>parentWorld</code>.
With that, all that’s left to get the correct world transform of <em>this</em> node is
to combine that with its own local transform. We don’t have to walk <em>up</em> the
parent chain to calculate world transforms because we calculate as we go while
walking <em>down</em> the chain.</p>
<p>We calculate the node’s world transform and store it in <code>world</code>, then we render
the mesh, if we have one. Finally, we recurse into the child nodes, passing in
<em>this</em> node’s world transform. All in all, it’s a tight, simple recursive
method.</p>
<p>To draw an entire scene graph, we kick off the process at the root node:</p>
<div class="codehilite"><pre><span></span><code><span class="n">graph_</span><span class="o">-></span><span class="n">render</span><span class="p">(</span><span class="n">Transform</span><span class="o">::</span><span class="n">origin</span><span class="p">());</span>
</code></pre></div>
<h3><a href="#let's-get-dirty" name="let's-get-dirty">Let’s get dirty</a></h3>
<p>So this code does the right thing — it renders all the meshes in the right place
— but it doesn’t do it efficiently. It’s calling <code>local_.combine(parentWorld)</code>
on every node in the graph, every frame. Let’s see how this pattern fixes that.
First, we need to add two fields to <code>GraphNode</code>:</p>
<div class="codehilite"><pre><span></span><code><span class="k">class</span> <span class="nc">GraphNode</span>
<span class="p">{</span>
<span class="k">public</span><span class="o">:</span>
<span class="n">GraphNode</span><span class="p">(</span><span class="n">Mesh</span><span class="o">*</span> <span class="n">mesh</span><span class="p">)</span>
<span class="o">:</span> <span class="n">mesh_</span><span class="p">(</span><span class="n">mesh</span><span class="p">),</span>
<span class="n">local_</span><span class="p">(</span><span class="n">Transform</span><span class="o">::</span><span class="n">origin</span><span class="p">()),</span>
<span class="n">dirty_</span><span class="p">(</span><span class="nb">true</span><span class="p">)</span>
<span class="p">{}</span>
<span class="c1">// Other methods...</span>
<span class="k">private</span><span class="o">:</span>
<span class="n">Transform</span> <span class="n">world_</span><span class="p">;</span>
<span class="kt">bool</span> <span class="n">dirty_</span><span class="p">;</span>
<span class="c1">// Other fields...</span>
<span class="p">};</span>
</code></pre></div>
<p>The <code>world_</code> field caches the previously calculated world transform, and
<code>dirty_</code>, of course, is the dirty flag. Note that the flag starts out <code>true</code>.
When we create a new node, we haven’t calculated its world transform yet. At
birth, it’s already out of sync with the local transform.</p>
<p>The only reason we need this pattern is because objects can <em>move</em>, so let’s add
support for that:</p>
<div class="codehilite"><pre><span></span><code><span class="kt">void</span> <span class="nf">GraphNode::setTransform</span><span class="p">(</span><span class="n">Transform</span> <span class="n">local</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">local_</span> <span class="o">=</span> <span class="n">local</span><span class="p">;</span>
<span class="n">dirty_</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<p>The important part here is that it sets the dirty flag too. Are we forgetting
anything? Right — the child nodes!</p>
<p>When a parent node moves, all of its children’s world coordinates are
invalidated too. But here, we aren’t setting their dirty flags. We <em>could</em> do
that, but that’s recursive and slow. Instead, we’ll do something clever when we
go to render. Let’s see:</p>
<p><span name="branch"></span></p>
<div class="codehilite"><pre><span></span><code><span class="kt">void</span> <span class="nf">GraphNode::render</span><span class="p">(</span><span class="n">Transform</span> <span class="n">parentWorld</span><span class="p">,</span> <span class="kt">bool</span> <span class="n">dirty</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">dirty</span> <span class="o">|=</span> <span class="n">dirty_</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">dirty</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">world_</span> <span class="o">=</span> <span class="n">local_</span><span class="p">.</span><span class="n">combine</span><span class="p">(</span><span class="n">parentWorld</span><span class="p">);</span>
<span class="n">dirty_</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">mesh_</span><span class="p">)</span> <span class="n">renderMesh</span><span class="p">(</span><span class="n">mesh_</span><span class="p">,</span> <span class="n">world_</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">numChildren_</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">children_</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">-></span><span class="n">render</span><span class="p">(</span><span class="n">world_</span><span class="p">,</span> <span class="n">dirty</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<aside name="branch">
<p>There’s a subtle assumption here that the <code>if</code> check is faster than a matrix
multiply. Intuitively, you would think it is; surely testing a single bit is
faster than a bunch of floating point arithmetic.</p>
<p>However, modern CPUs are fantastically complex. They rely heavily on
<em>pipelining</em> — queueing up a series of sequential instructions. A branch like
our <code>if</code> here can cause a <em>branch misprediction</em> and force the CPU to lose cycles
refilling the pipeline.</p>
<p>The <a href="data-locality.html" class="pattern">Data Locality</a> chapter has
more about how modern CPUs try to go faster and how you can avoid tripping them
up like this.</p>
</aside>
<p>This is similar to the original naïve implementation. The key changes are that
we check to see if the node is dirty before calculating the world transform and
we store the result in a field instead of a local variable. When the node is
clean, we skip <code>combine()</code> completely and use the old-but-still-correct <code>world_</code>
value.</p>
<p>The <span name="clever">clever</span> bit is that <code>dirty</code> parameter. That will
be <code>true</code> if any node above this node in the parent chain was dirty. In much the
same way that <code>parentWorld</code> updates the world transform incrementally as we
traverse down the hierarchy, <code>dirty</code> tracks the dirtiness of the parent chain.</p>
<p>This lets us avoid having to recursively mark each child’s <code>dirty_</code> flag
in <code>setTransform()</code>. Instead, we pass the parent’s dirty flag down to its
children when we render and look at that too to see if we need to recalculate
the world transform.</p>
<p>The end result here is exactly what we want: changing a node’s local transform
is just a couple of assignments, and rendering the world calculates the exact
minimum number of world transforms that have changed since the last frame.</p>
<aside name="clever">
<p>Note that this clever trick only works because <code>render()</code> is the <em>only</em> thing in
<code>GraphNode</code> that needs an up-to-date world transform. If other things accessed
it, we’d have to do something different.</p>
</aside>
<h2><a href="#design-decisions" name="design-decisions">Design Decisions</a></h2>
<p>This pattern is fairly specific, so there are only a couple of knobs to twiddle:</p>
<h3><a href="#when-is-the-dirty-flag-cleaned" name="when-is-the-dirty-flag-cleaned">When is the dirty flag cleaned?</a></h3>
<ul>
<li>
<p><strong>When the result is needed:</strong></p>
<ul>
<li>
<p><em>It avoids doing calculation entirely if the result is never used.</em> For
primary data that changes much more frequently than the derived data is
accessed, this can be a big win.</p>
</li>
<li>
<p><em>If the calculation is time-consuming, it can cause a noticeable pause.</em>
Postponing the work until the player is expecting to see the result can
affect their gameplay experience. It’s often fast enough that this
isn’t a problem, but if it is, you’ll have to do the work earlier.</p>
</li>
</ul>
</li>
<li>
<p><strong>At well-defined checkpoints:</strong></p>
<p>Sometimes, there is a point in time or in the progression of the game where it’s
natural to do the deferred processing. For example,
we may want to save the game only when the pirate sails into port. Or the
sync point may not be part of the game mechanics. We may just want to hide the
work behind a loading screen or a cut scene.</p>
<ul>
<li>
<p><em>Doing the work doesn’t impact the user experience.</em> Unlike the previous
option, you can often give something to
distract the player while the game is busy processing.</p>
</li>
<li>
<p><em>You lose control over when the work happens.</em> This is sort of the
opposite of the earlier point. You have micro-scale control over when you
process, and can make sure the game handles it gracefully.</p>
<p>What you <em>can’t</em> do is ensure the player actually makes it to the
checkpoint or meets whatever criteria you’ve defined. If they get lost
or the game gets in a weird state, you can end up deferring
longer than you expect.</p>
</li>
</ul>
</li>
<li>
<p><strong>In the background:</strong></p>
<p>Usually, you start a fixed <span name="hysteresis">timer</span>
on the first modification and then process all of the changes that happen
between then and when the timer fires.</p>
<aside name="hysteresis">
<p>The term in human-computer interaction for an intentional delay between
when a program receives user input and when it responds is <a href="http://en.wikipedia.org/wiki/Hysteresis"><em>hysteresis</em></a>.</p>
</aside>
<ul>
<li>
<p><em>You can tune how often the work is performed.</em> By adjusting the timer
interval, you can ensure it happens as frequently (or infrequently) as
you want.</p>
</li>
<li>
<p><em>You can do more redundant work.</em> If the primary state only changes a
tiny amount during the timer’s run, you can end up processing a large
chunk of mostly unchanged data.</p>
</li>
<li>
<p><em>You need support for doing work asynchronously.</em>
Processing the data “in the background” implies that the player can
keep doing whatever it is that they’re doing at the same time. That
means you’ll likely need threading or some other kind of concurrency
support so that the game can work on the data while it’s still
being played.</p>
<p>Since the player is likely interacting with
the same primary state that you’re processing, you’ll need to think
about making that safe for concurrent modification too.</p>
</li>
</ul>
</li>
</ul>
<h3><a href="#how-fine-grained-is-your-dirty-tracking" name="how-fine-grained-is-your-dirty-tracking">How fine-grained is your dirty tracking?</a></h3>
<p>Imagine our pirate game lets players build and customize their pirate ship.
Ships are automatically saved online so the player can resume where they left
off. We’re using dirty flags to determine which decks of the ship have been
fitted and need to be sent to the server. Each chunk of data we send to the
server contains some modified ship data and a bit of metadata describing where
on the ship this modification occurred.</p>
<ul>
<li>
<p><strong>If it’s more fine-grained:</strong></p>
<p>Say you slap a dirty flag on each tiny plank of each deck.</p>
<ul>
<li><em>You only process data that actually changed.</em> You’ll send exactly the
facets of the ship that were modified to the server.</li>
</ul>
</li>
<li>
<p><strong>If it’s more coarse-grained:</strong></p>
<p>Alternatively, we could associate a dirty bit with each deck.
Changing anything on it marks the entire deck <span name="swab">dirty</span>.</p>
<aside name="swab">
<p>I could make some terrible joke about it needing to be swabbed here, but
I’ll refrain.</p>
</aside>
<ul>
<li>
<p><em>You end up processing unchanged data.</em> Add a single barrel to a deck
and you’ll have to send the whole thing to the server.</p>
</li>
<li>
<p><em>Less memory is used for storing dirty flags.</em> Add ten barrels to a deck
and you only need a single bit to track them all.</p>
</li>
<li>
<p><em>Less time is spent on fixed overhead.</em> When processing some changed data,
there’s often a bit of fixed work you have to do on top of handling the
data itself. In the example here, that’s the metadata required to
identify where on the ship the changed data is. The bigger your
processing chunks, the fewer of them there are, which means the less
overhead you have.</p>
</li>
</ul>
</li>
</ul>
<h2><a href="#see-also" name="see-also">See Also</a></h2>
<ul>
<li>
<p>This pattern is common outside of games in browser-side web frameworks like
<a href="http://angularjs.org/">Angular</a>. They use dirty flags to track which data
has been changed in the browser and needs to be pushed up to the server.</p>
</li>
<li>
<p>Physics engines track which objects are in motion and which are resting.
Since a resting body won’t move until an impulse is applied to it, they
don’t need processing until they get touched. This “is moving” bit is a
dirty flag to note which objects have had forces applied and need to have
their physics resolved.</p>
</li>
</ul>
<nav>
<span class="prev">← <a href="data-locality.html">Previous Chapter</a></span>
<span class="next"><a href="object-pool.html">Next Chapter</a> →</span>
<span class="toc">≡ <a href="/">The Book</a></span>
</nav>
</div>
</div>
<footer>© 2009-2021 Robert Nystrom</footer>
</body>
</html>