Distributed-Algorithms/Raft/6.824Lab2_Raft.html at master · aQuaYi/Distributed-Algorithms · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
<!DOCTYPE html>
<!-- saved from url=(0051)https://pdos.csail.mit.edu/6.824/labs/lab-raft.html -->
<html><head><meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<link rel="stylesheet" href="./6.824Lab2_Raft_files/style.css" type="text/css">
<title>6.824 Lab 2: Raft</title>
<style type="text/css">#htmlToothbrush, #bodyToothbrush {overflow:hidden !important;zoom:100% !important}
#htmlToothbrush #bodyToothbrush .parentToothbrush {overflow:visible !important;z-index:auto !important;transform:none !important;-webkit-transform-style:flat !important;transition:none !important;contain:none !important;}
#htmlToothbrush #bodyToothbrush .absoluteToothbrush {position:absolute !important;}
#htmlToothbrush #bodyToothbrush .playerToothbrush {position:fixed !important;top:0px !important;left:1px !important;width:calc(100vw - 2px) !important;height:100vh !important;max-width:none !important;max-height:none !important;min-width:0 !important;min-height:0 !important;margin:0 !important;padding:0 !important;z-index:2147483647 !important;border:none !important;background-color:#000 !important;transform:none !important;}
#htmlToothbrush #bodyToothbrush .playerToothbrush video {object-fit:contain !important;}
#playerControlBtn {visibility:hidden;opacity:0;display:none;transition: all 0.5s ease;cursor: pointer;font: 12px "&#24494;&#36719;&#38597;&#40657;";margin:0;width:64px;height:20px;line-height:20px;border:none;text-align: center;position: fixed;z-index:2147483647;background-color: #27A9D8;color: #FFF;} #playerControlBtn:hover {visibility:visible;opacity:1;background-color:#2774D8;}
#playerControlBtn.playerControlBtnCol {width:20px;height:64px;line-height:16px;}
#leftFullStackButton{display:none;position:fixed;width:1px;height:100vh;top:0;left:0;z-index:2147483647;background:#000;}
#rightFullStackButton{display:none;position:fixed;width:1px;height:100vh;top:0;right:0;z-index:2147483647;background:#000;}</style><style type="text/css">#htmlToothbrush #bodyToothbrush .node-video.node-full video,#htmlToothbrush #bodyToothbrush .WB_h5video video {width: inherit !important;height: inherit !important;}
#htmlToothbrush #bodyToothbrush #douyu_room_normal_flash_proxy_box {position:fixed !important;top:0px !important;left:1px !important;width:calc(100vw - 2px) !important;height:100vh !important;max-width:none !important;max-height:none !important;min-width:0 !important;min-height:0 !important;margin:0 !important;padding:0 !important;z-index:2147483645 !important;border:none !important;background-color:#000 !important;transform:none !important;}</style><style>@media print {#ghostery-purple-box {display:none !important}}</style></head>
<body style="background-color: rgb(246, 244, 236);">
<div align="center">
<h2><a href="https://pdos.csail.mit.edu/6.824/index.html">6.824</a> - Spring 2018</h2>
<h1>6.824 Lab 2: Raft</h1>
<h3>Part 2A Due: Feb 23 at 11:59pm</h3>
<h3>Part 2B Due: Mar 2 at 11:59pm</h3>
<h3>Part 2C Due: Mar 9 at 11:59pm</h3>
</div>
<hr>

<h3>Introduction</h3>

<p>
This is the first in a series of labs in which you'll build a
fault-tolerant key/value storage system. In this
lab you'll implement Raft, a replicated state machine protocol.
In the next lab you'll build a key/value service on top of
Raft. Then you will “shard” your service over
multiple replicated state machines for higher performance.

</p><p>
A replicated service achieves fault
tolerance by storing complete copies of its state (i.e., data)
on multiple replica servers.
Replication allows
the service to continue operating even if some of
its servers experience failures (crashes or a broken or flaky
network). The challenge is that failures may cause the
replicas to hold differing copies of the data.

</p><p>
Raft manages a service's state replicas, and in particular it
helps the service sort out what the correct state is after failures.
Raft implements a replicated state
machine. It organizes client requests into a sequence, called
the log, and ensures that all the replicas agree on the
contents of the log. Each replica executes the client requests
in the log in the order they appear in the log, applying those
requests to the replica's local copy of the service's state. Since all the live replicas
see the same log contents, they all execute the same requests
in the same order, and thus continue to have identical service
state. If a server fails but later recovers, Raft takes care of
bringing its log up to date. Raft will continue to operate as
long as at least a majority of the servers are alive and can
talk to each other. If there is no such majority, Raft will
make no progress, but will pick up where it left off as soon as
a majority can communicate again.

</p><p>
In this lab you'll implement Raft as a Go object type
with associated methods, meant to be used as a module in a
larger service. A set of Raft instances talk to each other with
RPC to maintain replicated logs. Your Raft interface will
support an indefinite sequence of numbered commands, also
called log entries. The entries are numbered with <em>index
numbers</em>. The log entry with a given index will eventually
be committed. At that point, your Raft should send the log
entry to the larger service for it to execute.

</p><p class="note">
Your Raft instances are only allowed to interact using RPC.
For example, different Raft instances
are not allowed to share Go variables.
Your code should not use files at all.


</p><p>
You should consult the
<a href="https://pdos.csail.mit.edu/6.824/papers/raft-extended.pdf">extended Raft paper</a>
and the Raft lecture notes.
You may find it useful to look at this
<a href="http://thesecretlivesofdata.com/raft/">illustration</a>
of the Raft protocol,
a
<a href="https://thesquareplanet.com/blog/students-guide-to-raft/">guide</a>
to Raft implementation
written for 6.824 students in 2016,
and advice about
<a href="https://pdos.csail.mit.edu/6.824/labs/raft-locking.txt">locking</a>
and
<a href="https://pdos.csail.mit.edu/6.824/labs/raft-structure.txt">structure</a>
for concurrency.
For a
wider perspective, have a look at Paxos, Chubby, Paxos Made
Live, Spanner, Zookeeper, Harp, Viewstamped Replication, and
<a href="http://static.usenix.org/event/nsdi11/tech/full_papers/Bolosky.pdf">Bolosky et al.</a>

</p><p>
In this lab you'll implement most of the Raft design
described in the extended paper, including saving
persistent state and reading it after a node fails and
then restarts. You will not implement cluster
membership changes (Section 6) or log compaction /
snapshotting (Section 7).

</p><ul class="hints">
<li>
Start early. Although the amount of code
isn't large, getting it to work correctly will be
challenging.

</li><li>
Read and understand the
<a href="https://pdos.csail.mit.edu/6.824/papers/raft-extended.pdf">extended Raft paper</a>
and the Raft lecture notes before you start. Your
implementation should follow the paper's description
closely, particularly Figure 2, since that's what the tests expect.
</li></ul>

<p>
This lab is due in three parts. You must submit each part on the
corresponding due date. This lab does not involve a lot of code,
but concurrency makes it challenging to debug;
start each part early.

</p><h3>Collaboration Policy</h3>

You must write all the code you hand in for 6.824, except for
code that we give you as part of the assignment. You are not
allowed to look at anyone else's solution, you are not allowed
to look at code from previous years, and you are not allowed to
look at other Raft implementations. You may discuss the
assignments with other students, but you may not look at or
copy anyone else's code, or allow anyone else to look at your code.

<p>
Please do not publish your code or make
it available to current or future 6.824 students.
<tt>github.com</tt> repositories are public by default, so please
don't put your code there unless you make the repository private. You
may find it convenient to use
<a href="https://github.mit.edu/">MIT's GitHub</a>,
but be sure to create a private repository.

</p><h3>Getting Started</h3>

<p>
If you have done Lab 1, you already have a copy of the lab
source code.
If not,
you can find directions for obtaining the source via git
in the <a href="https://pdos.csail.mit.edu/6.824/labs/lab-1.html">Lab 1 instructions</a>.

</p><div class="important">
<p>
Do a <tt>git pull</tt> to get the latest lab software.
</p></div>

<p>
We supply you with skeleton code and tests in <tt>src/raft</tt>, and a simple
RPC-like system in <tt>src/labrpc</tt>.

</p><p>
To get up and running, execute the following commands:
</p><pre>$ cd ~/6.824
$ git pull
...
$ cd src/raft
$ GOPATH=~/6.824
$ export GOPATH
$ go test
Test (2A): initial election ...
--- FAIL: TestInitialElection2A (5.04s)
        config.go:305: expected one leader, got none
Test (2A): election after network failure ...
--- FAIL: TestReElection2A (5.03s)
        config.go:305: expected one leader, got none
...
$
</pre>

<h3>The code</h3>

Implement Raft by adding code to
<tt>raft/raft.go</tt>. In that file you'll find a bit of
skeleton code, plus examples of how to send and receive
RPCs.

<p>
Your implementation must support the following interface, which
the tester and (eventually) your key/value server will use.
You'll find more details in comments in <tt>raft.go</tt>.

</p><pre>// create a new Raft server instance:
rf := Make(peers, me, persister, applyCh)

// start agreement on a new log entry:
rf.Start(command interface{}) (index, term, isleader)

// ask a Raft for its current term, and whether it thinks it is leader
rf.GetState() (term, isLeader)

// each time a new entry is committed to the log, each Raft peer
// should send an ApplyMsg to the service (or tester).
type ApplyMsg</pre>

<p>
A service calls <tt>Make(peers,me,…)</tt> to create a
Raft peer. The peers argument is an array of network identifiers
of the Raft peers (including this one), for use with labrpc RPC. The
<tt>me</tt> argument is the index of this peer in the peers
array. <tt>Start(command)</tt> asks Raft to start the processing
to append the command to the replicated log. <tt>Start()</tt>
should return immediately, without waiting for the log appends
to complete. The service expects your implementation to send an
<tt>ApplyMsg</tt> for each newly committed log entry to the
<tt>applyCh</tt> argument to <tt>Make()</tt>.

</p><p>
Your Raft peers should exchange RPCs using the labrpc Go
package that we provide to you. It is modeled after Go's
<a href="https://golang.org/pkg/net/rpc/">rpc library</a>, but internally uses
Go channels rather than sockets.
<tt>raft.go</tt> contains some example code that sends an RPC
(<tt>sendRequestVote()</tt>) and that handles an incoming RPC
(<tt>RequestVote()</tt>).  The reason you must use <tt>labrpc</tt> instead of
Go's RPC package is that the tester tells <tt>labrpc</tt> to delay RPCs,
re-order them, and delete them to simulate challenging network conditions under
which your code should work correctly.  Don't rely on modifications to
<tt>labrpc</tt> because we
will test your code with the <tt>labrpc</tt> as handed out.

</p><p>
Your first implementation may not be clean enough that you can easily reason
about its correctness.  Give yourself enough time to rewrite your implementation
so that you can easily reason about its correctness. Subsequent labs will build
on this lab, so it is important to do a good job on your implementation.

</p><h3>Part 2A</h3>

<p class="todo">
Implement leader election and heartbeats (<tt>AppendEntries</tt> RPCs with no
log entries). The goal for Part 2A is for a
single leader to be elected, for the leader to remain the leader
if there are no failures, and for a new leader to take over if the
old leader fails or if packets to/from the old leader are lost.
Run <tt>go test -run 2A</tt> to test your 2A code.

</p><ul class="hints">
<li>
Add any state you need to the <tt>Raft</tt>
struct in <tt>raft.go</tt>.
You'll also need to define a
struct to hold information about each log entry.
Your code should follow Figure 2 in the paper as
closely as possible.

</li><li>
Fill in the <tt>RequestVoteArgs</tt> and
<tt>RequestVoteReply</tt> structs. Modify
<tt>Make()</tt> to create a background goroutine that will kick off leader
 election periodically by sending out <tt>RequestVote</tt> RPCs when it hasn't
 heard from another peer for a while.  This way a peer will learn who is the
 leader, if there is already a leader, or become the leader itself.  Implement
 the <tt>RequestVote()</tt> RPC handler so that servers will vote for one
 another.


</li><li>
To implement heartbeats, define an
<tt>AppendEntries</tt> RPC struct (though you may not
need all the arguments yet), and have the leader send
them out periodically. Write an
<tt>AppendEntries</tt> RPC handler method that resets
the election timeout so that other servers don't step
forward as leaders when one has already been elected.

</li><li>
Make sure the election timeouts in different peers don't always fire
at the same time, or else all peers will vote only for themselves and no
one will become the leader.

</li><li>
The tester requires that the leader send heartbeat RPCs no more than
ten times per second.

</li><li>
The tester requires your Raft to elect a new leader within five seconds of the
failure of the old leader (if a majority of peers can still
communicate). Remember, however, that leader election may require multiple
rounds in case of a split vote (which can happen if packets are lost or if
candidates unluckily choose the same random backoff times). You must pick
election timeouts (and thus heartbeat intervals) that are short enough that it's
very likely that an election will complete in less than five seconds even if it
requires multiple rounds.

</li><li>
The paper's Section 5.2 mentions election timeouts in the range of 150
to 300 milliseconds. Such a range only makes sense if the leader
sends heartbeats considerably more often than once per 150
milliseconds. Because the tester limits you to 10 heartbeats per
second, you will have to use an election timeout larger
than the paper's 150 to 300 milliseconds, but not too large, because then you
may fail to elect a leader within five seconds.

</li><li>
You may find Go's
<a href="https://golang.org/pkg/math/rand/">rand</a>
useful.

</li><li>
You'll need to write code that takes actions periodically or
after delays in time. The easiest way to do this is to create
a goroutine with a loop that calls
<a href="https://golang.org/pkg/time/#Sleep">time.Sleep()</a>.
The hard way is to use Go's <tt>time.Timer</tt> or <tt>time.Ticker</tt>, which
are difficult to use correctly.

</li><li>
If you are puzzled about locking, you may find this
<a href="https://pdos.csail.mit.edu/6.824/labs/raft-locking.txt">advice</a> helpful.

</li><li>
If your code has trouble passing the tests,
read the paper's Figure 2 again; the full logic for leader
election is spread over multiple parts of the figure.

</li><li>
A good way to debug your code is to insert print statements when a peer sends or
receives a message, and collect the output in a file with
<tt>go test -run 2A &gt; out</tt>.  Then, by studying the trace of messages in
the <tt>out</tt> file, you can identify where your implementation deviates
from the desired protocol.  You might find <tt>DPrintf</tt> in <tt>util.go</tt>
useful to turn printing on and off as you debug different problems.

</li><li>Go RPC sends only struct fields whose names start with capital letters.
  Sub-structures must also have capitalized field names (e.g. fields of log records
  in an array). The <tt>labgob</tt> package will warn you about this;
  don't ignore the warnings.

</li><li>You should check your code with <tt>go test -race</tt>, and fix
any races it reports.

</li></ul>


<p>
Be sure you pass the 2A tests before submitting Part 2A, so that
you see something like this:

</p><pre>$ go test -run 2A
Test (2A): initial election ...
  ... Passed --   2.5  3   30    0
Test (2A): election after network failure ...
  ... Passed --   4.5  3   78    0
PASS
ok      raft    7.016s
$
</pre>

<p>
Each "Passed" line contains four numbers; these are the time that the
test took in seconds, the number of Raft peers (usually 3 or 5), the
number of RPCs sent during the test, and the number of log entries
that Raft reports were committed. Your numbers will differ from those
shown here. You can ignore the numbers if you like, but they may help
you sanity-check the number of RPCs that your implementation sends.
For all of labs 2, 3, and 4, the grading script will fail your
solution if it takes more than 600 seconds for all of the tests
(<tt>go test</tt>), or if any individual test takes more than 120
seconds.

</p><p>
Parts B and C will test leader election in more challenging settings
and may expose bugs in your leader election code which the 2A tests
miss.

</p><p>

</p><h3>Handin procedure for lab 2A</h3>

<div class="important">
<p>
Before submitting, please run the 2A tests
one final time.
Some bugs may not appear on
every run, so run the tests multiple times.
</p></div>

<p>
Submit your code via the class's submission website, located at
<a href="https://6824.scripts.mit.edu/2018/handin.py/">https://6824.scripts.mit.edu/2018/handin.py/</a>.

</p><p>
You may use your MIT Certificate or request an API key via
email to log in for the first time. Your API key (<tt>XXX</tt>)
is displayed once you are logged in, which can be used to upload
the lab from the console as follows.

</p><pre>$ cd "$GOPATH"
$ echo "XXX" &gt; api.key
$ make lab2a</pre>

<p class="important">
Check the submission website to make sure it sees your submission!

</p><p class="note">
You may submit multiple times. We will use the timestamp of
your <strong>last</strong> submission for the purpose of
calculating late days. Your grade is determined by the score
your solution <strong>reliably</strong> achieves when we run
the tester on our test machines.

</p><h3>Part 2B</h3>

<div class="important">
<p>
Do a <tt>git pull</tt> to get the latest lab software.
</p></div>

<p>
We want
Raft to keep a consistent, replicated log of operations.
A call to <tt>Start()</tt> at the leader starts the process
of adding a new operation to the log;
the leader sends the new operation to the other servers
in <tt>AppendEntries</tt> RPCs.

</p><p class="todo">
Implement the leader and follower code to append new log entries.
This will involve implementing <tt>Start()</tt>, completing the
<tt>AppendEntries</tt> RPC structs, sending them, fleshing out
the <tt>AppendEntry</tt> RPC handler, and advancing the <tt>commitIndex</tt> at
the leader.
Your first goal should
be to pass the <tt>TestBasicAgree2B()</tt> test (in
<tt>test_test.go</tt>). Once you have that working, you should
get all the 2B tests to pass (<tt>go test -run 2B</tt>).

</p><ul class="hints">

<li>
You will need to implement the election
restriction (section 5.4.1 in the paper).

</li><li>
One way to fail the early Lab 2B tests is to hold un-needed elections,
that is, an election even though the current leader is alive and can
talk to all peers. This can prevent agreement in situations where the
tester believes agreement is possible. Bugs in election timer
management, or not sending out heartbeats immediately after winning an
election, can cause un-needed elections.

</li><li>
You may need to write code that waits for certain events to occur.
Do not write loops that execute continuously without pausing, since that
will slow your implementation enough that it fails tests.
You can wait efficiently with Go's channels,
or Go's
<a href="https://golang.org/pkg/sync/#Cond">condition variables</a>,
or (if all else fails) by inserting a
<tt>time.Sleep(10 * time.Millisecond)</tt> in each loop iteration.

</li><li>Give yourself time to rewrite your implementation in light of
  lessons learned about structuring concurrent code. In later labs
  you'll thank yourself for having Raft code that's as clear and clean
  as possible. For ideas, you can re-visit our
<a href="https://pdos.csail.mit.edu/6.824/labs/raft-structure.txt">structure</a>,
<a href="https://pdos.csail.mit.edu/6.824/labs/raft-locking.txt">locking</a>
and
<a href="https://thesquareplanet.com/blog/students-guide-to-raft/">guide</a>,
pages.

</li></ul>

<p>
The tests for upcoming labs may fail your code if it runs too slowly.
You can check how much real time and CPU time your solution uses with
the time command. Here's some typical output for Lab 2B:

</p><pre>$ time go test -run 2B
Test (2B): basic agreement ...
  ... Passed --   0.5  5   28    3
Test (2B): agreement despite follower disconnection ...
  ... Passed --   3.9  3   69    7
Test (2B): no agreement if too many followers disconnect ...
  ... Passed --   3.5  5  144    4
Test (2B): concurrent Start()s ...
  ... Passed --   0.7  3   12    6
Test (2B): rejoin of partitioned leader ...
  ... Passed --   4.3  3  106    4
Test (2B): leader backs up quickly over incorrect follower logs ...
  ... Passed --  23.0  5 1302  102
Test (2B): RPC counts aren't too high ...
  ... Passed --   2.2  3   30   12
PASS
ok      raft    38.029s

real    0m38.511s
user    0m1.460s
sys     0m0.901s
$
</pre>

The "ok raft 38.029s" means that Go measured the time taken for the 2B
tests to be 38.029 seconds of real (wall-clock) time. The "user
0m1.460s" means that the code consumed 1.460 seconds of CPU time, or
time spent actually executing instructions (rather than waiting or
sleeping). If your solution uses much more than a minute of real time
for the 2B tests, or much more than 5 seconds of CPU time, you may run
into trouble later on. Look for time spent sleeping or waiting for RPC
timeouts, loops that run without sleeping or waiting for conditions or
channel messages, or large numbers of RPCs sent.

<p>
Be sure you pass the 2A and 2B tests before submitting Part 2B.

</p><p>

</p><h3>Handin procedure for lab 2B</h3>

<div class="important">
<p>
Before submitting, please run the 2A and 2B tests
one final time.
Some bugs may not appear on
every run, so run the tests multiple times.
</p></div>

<p>
Submit your code via the class's submission website, located at
<a href="https://6824.scripts.mit.edu/2018/handin.py/">https://6824.scripts.mit.edu/2018/handin.py/</a>.

</p><p>
You may use your MIT Certificate or request an API key via
email to log in for the first time. Your API key (<tt>XXX</tt>)
is displayed once you are logged in, which can be used to upload
the lab from the console as follows.

</p><pre>$ cd "$GOPATH"
$ echo "XXX" &gt; api.key
$ make lab2b</pre>

<p class="important">
Check the submission website to make sure it sees your submission!

</p><p class="note">
You may submit multiple times. We will use the timestamp of
your <strong>last</strong> submission for the purpose of
calculating late days. Your grade is determined by the score
your solution <strong>reliably</strong> achieves when we run
the tester on our test machines.

</p><h3>Part 2C</h3>

<div class="important">
<p>
Do a <tt>git pull</tt> to get the latest lab software.
</p></div>

<p>
If a Raft-based server reboots it should resume service
where it left off. This requires
that Raft keep persistent state that survives a reboot. The
paper's Figure 2 mentions which state should be persistent,
and <tt>raft.go</tt> contains examples of how to save and
restore persistent state.

</p><p>
A “real” implementation would do this by writing
Raft's persistent state to disk each time it changes, and reading the latest saved
state from
disk when restarting after a reboot. Your implementation won't use
the disk; instead, it will save and restore persistent state
from a <tt>Persister</tt> object (see <tt>persister.go</tt>).
Whoever calls <tt>Raft.Make()</tt> supplies a <tt>Persister</tt>
that initially holds Raft's most recently persisted state (if
any). Raft should initialize its state from that
<tt>Persister</tt>, and should use it to save its persistent
state each time the state changes. Use the <tt>Persister</tt>'s
<tt>ReadRaftState()</tt> and <tt>SaveRaftState()</tt> methods.

</p><p class="todo">
Complete the functions
<tt>persist()</tt>
and
<tt>readPersist()</tt> in <tt>raft.go</tt>
by adding code to save and restore persistent state. You will need to encode
(or "serialize") the state as an array of bytes in order to pass it to
the <tt>Persister</tt>. Use the <tt>labgob</tt> encoder we provide to do this;
see the comments in <tt>persist()</tt> and <tt>readPersist()</tt>.
<tt>labgob</tt> is derived from Go's <tt>gob</tt> encoder; the
only difference is that <tt>labgob</tt> prints error messages if
you try to encode structures with lower-case field names.

</p><p class="todo">
You now
need to determine at what points in the Raft protocol your
servers are required to persist their state, and insert calls
to <tt>persist()</tt> in those places.
There is already a call to <tt>readPersist()</tt>
in <tt>Raft.Make()</tt>.
Once you've done this,
you should pass the remaining tests.  You may want to
first try to pass the "basic persistence" test (<tt>go test
-run 'TestPersist12C'</tt>), and then tackle the remaining ones (<tt>go test -run
    2C</tt>).

</p><p class="note">
In order to avoid running out of memory, Raft must periodically
discard old log entries, but you <strong>do not</strong> have
to worry about this until the next lab.

</p><ul class="hints">
<li>
Many of the 2C tests involve
servers failing and the network losing RPC requests or replies.

</li><li>
In order to pass some of the challenging tests towards the end, such as
those marked "unreliable", you will need to implement the optimization to
allow a follower to back up the leader's nextIndex by more than one entry
at a time. See the description in the <a +href="../papers/raft-extended.pdf">extended Raft paper</a> starting at
the bottom of page 7 and top of page 8 (marked by a gray line).
The paper is vague about the details; you will need to fill in the gaps,
perhaps with the help of the 6.824 Raft lectures.

</li><li>
A reasonable amount of time to consume for the full set of
Lab 2 tests (2A+2B+2C) is 4 minutes of real time and one
minute of CPU time.
</li></ul>

<p>
Your code should pass all the 2C tests (as shown below), as well as
the 2A and 2B tests.

</p><pre>$ go test -run 2C
Test (2C): basic persistence ...
  ... Passed --   3.4  3   60    6
Test (2C): more persistence ...
  ... Passed --  17.0  5  705   16
Test (2C): partitioned leader and one follower crash, leader restarts ...
  ... Passed --   1.4  3   27    4
Test (2C): Figure 8 ...
  ... Passed --  33.2  5  852   53
Test (2C): unreliable agreement ...
  ... Passed --   2.4  5  207  246
Test (2C): Figure 8 (unreliable) ...
  ... Passed --  35.3  5 1838  216
Test (2C): churn ...
  ... Passed --  16.2  5 5138 2260
Test (2C): unreliable churn ...
  ... Passed --  16.2  5 1254  603
PASS
ok      raft    124.999s
</pre>

<h3>Handin procedure for lab 2C</h3>

<div class="important">
<p>
Before submitting, please run <em>all</em> the tests
one final time.
Some bugs may not appear on
every run, so run the tests multiple times.
</p></div>

<p>
Submit your code via the class's submission website, located at
<a href="https://6824.scripts.mit.edu/2018/handin.py/">https://6824.scripts.mit.edu/2018/handin.py/</a>.

</p><p>
You may use your MIT Certificate or request an API key via
email to log in for the first time. Your API key (<tt>XXX</tt>)
is displayed once you are logged in, which can be used to upload
the lab from the console as follows.

</p><pre>$ cd "$GOPATH"
$ echo "XXX" &gt; api.key
$ make lab2c</pre>

<p class="important">
Check the submission website to make sure it sees your submission!

</p><p class="note">
You may submit multiple times. We will use the timestamp of
your <strong>last</strong> submission for the purpose of
calculating late days. Your grade is determined by the score
your solution <strong>reliably</strong> achieves when we run
the tester on our test machines.

</p><hr>

<address>
Please post questions on <a href="http://piazza.com/">Piazza</a>.
</address>


<!--  LocalWords:  transactional RPC snapshotting Paxos Viewstamped -->
<!--  LocalWords:  Bolosky et al else's github src labrpc cd ok RPCs -->
<!--  LocalWords:  TestInitialElection TestReElection rf persister -->
<!--  LocalWords:  applyCh isleader GetState isLeader rpc -->
<!--  LocalWords:  ApplyMsg Persister's SaveRaftState ReadRaftState -->
<!--  LocalWords:  readPersist sendRequestVote RequestVote struct -->
<!--  LocalWords:  RequestVoteArgs RequestVoteReply structs goroutine -->
<!--  LocalWords:  AppendEntries AppendEntry TestBasicAgree -->
<!--  LocalWords:  InstallSnapshot -->
<tbdiv id="playerControlBtn"></tbdiv><tbdiv id="leftFullStackButton"></tbdiv><tbdiv id="rightFullStackButton"></tbdiv><span id="sbmarwbthv5"></span></body></html>