@@ -8,7 +8,14 @@ Title: Storage migration
88 - [ But we have storage\_ mux.ml] ( #but-we-have-storage_muxml )
99 - [ Thought experiments on an alternative design] ( #thought-experiments-on-an-alternative-design )
1010 - [ Design] ( #design )
11- - [ SMAPIv1 Migration] ( #smapiv1-migration )
11+ - [ SMAPIv1 migration] ( #smapiv1-migration )
12+ - [ SMAPIv3 migration] ( #smapiv3-migration )
13+ - [ Error Handling] ( #error-handling )
14+ - [ Preparation (SMAPIv1 and SMAPIv3)] ( #preparation-smapiv1-and-smapiv3 )
15+ - [ Snapshot and mirror failure (SMAPIv1)] ( #snapshot-and-mirror-failure-smapiv1 )
16+ - [ Mirror failure (SMAPIv3)] ( #mirror-failure-smapiv3 )
17+ - [ Copy failure (SMAPIv1)] ( #copy-failure-smapiv1 )
18+ - [ SMAPIv1 Migration implementation detail] ( #smapiv1-migration-implementation-detail )
1219 - [ Receiving SXM] ( #receiving-sxm )
1320 - [ Xapi code] ( #xapi-code )
1421 - [ Storage code] ( #storage-code )
@@ -109,8 +116,100 @@ Note that later on storage_smapi{v1,v3}_migrate.ml will still have the flexibili
109116to call remote SMAPIv2 functions, such as ` Remote.VDI.attach dest_sr vdi ` , and
110117it will be handled just as before.
111118
119+ ## SMAPIv1 migration
112120
113- ## SMAPIv1 Migration
121+ At a high level, mirror establishment for SMAPIv1 works as follows:
122+
123+ 1 . Take a snapshot of a VDI that is attached to VM1. This gives us an immutable
124+ copy of the current state of the VDI, with all the data until the point we took
125+ the snapshot. This is illustrated in the diagram as a VDI and its snapshot connecting
126+ to a shared parent, which stores the shared content for the snapshot and the writable
127+ VDI from which we took the snapshot (snapshot)
128+ 2 . Mirror the writable VDI to the server hosts: this means that all writes that goes to the
129+ client VDI will also be written to the mirrored VDI on the remote host (mirror)
130+ 3 . Copy the immutable snapshot from our local host to the remote (copy)
131+ 4 . Compose the mirror and the snapshot to form a single VDI
132+ 5 . Destroy the snapshot on the local host (cleanup)
133+
134+
135+ more detail to come...
136+
137+ ## SMAPIv3 migration
138+
139+ More detail to come...
140+
141+ ## Error Handling
142+
143+ Storage migration is a long-running process, and is prone to failures in each
144+ step. Hence it is important specifying what errors could be raised at each step
145+ and their significance. This is beneficial both for the user and for triaging.
146+
147+ There are two general cleanup functions in SXM: ` MIRROR.receive_cancel ` and
148+ ` MIRROR.stop ` . The former is for cleaning up whatever has been created by ` MIRROR.receive_start `
149+ on the destination host (such as VDIs for receiving mirrored data). The latter is
150+ a more comprehensive function that attempts to "undo" all the side effects that
151+ was done during the SXM, and also calls ` receive_cancel ` as part of its operations.
152+
153+ Currently error handling was done by building up a list of cleanup functions in
154+ the ` on_fail ` list ref as the function executes. For example, if the ` receive_start `
155+ has been completed successfully, add ` receive_cancel ` to the list of cleanup functions.
156+ And whenever an exception is encountered, just execute whatever has been added
157+ to the ` on_fail ` list ref. This is convenient, but does entangle all the error
158+ handling logic with the core SXM logic itself, making the code rather than hard
159+ to understand and maintain.
160+
161+ The idea to fix this is to introduce explicit "stages" during the SXM and define
162+ explicitly what error handling should be done if it fails at a certain stage. This
163+ helps separate the error handling logic into the ` with ` part of a ` try with ` block,
164+ which is where they are supposed to be. Since we need to accommodate the existing
165+ SMAPIv1 migration (which has more stages than SMAPIv3), the following stages are
166+ introduced: preparation (v1,v3), snapshot(v1), mirror(v1, v3), copy(v1). Note that
167+ each stage also roughly corresponds to a helper function that is called within ` MIRROR.start ` ,
168+ which is the wrapper function that initiates storage migration. And each helper
169+ functions themselves would also have error handling logic within themselves as
170+ needed (e.g. see `Storage_smapiv1_migrate.receive_start) to deal with exceptions
171+ that happen within each helper functions.
172+
173+ ### Preparation (SMAPIv1 and SMAPIv3)
174+
175+ The preparation stage generally corresponds to what is done in ` receive_start ` , and
176+ this function itself will handle exceptions when there are partial failures within
177+ the function itself, such as an exception after the receiving VDI is created.
178+ It will use the old-style ` on_fail ` function but only with a limited scope.
179+
180+ There is nothing to be done at a higher level (i.e within ` MIRROR.start ` which
181+ calls ` receive_start ` ) if preparation has failed.
182+
183+ ### Snapshot and mirror failure (SMAPIv1)
184+
185+ For SMAPIv1, the mirror is done in a bit cumbersome way. The end goal is to establish
186+ connections between two tapdisk processes on the source and destination hosts.
187+ To achieve this goal, xapi will do two main jobs: 1. create a connection between two
188+ hosts and pass the connection to tapdisk; 2. create a snapshot as a starting point
189+ of the mirroring process.
190+
191+ Therefore handling of failures at these two stages are similar: clean up what was
192+ done in the preparation stage by calling ` receive_cancel ` , and that is almost it.
193+ Again, we will leave whatever is needed for partial failure handling within those
194+ functions themselves and only clean up at a stage-level in ` storage_migrate.ml `
195+
196+ Note that ` receive_cancel ` is a multiplexed function for SMAPIv1 and SMAPIv3, which
197+ means different clean up logic will be executed depending on what type of SR we
198+ are migrating from.
199+
200+ ### Mirror failure (SMAPIv3)
201+
202+ To be filled...
203+
204+ ### Copy failure (SMAPIv1)
205+
206+ The final step of storage migration for SMAPIv1 is to copy the snapshot from the
207+ source to the destination. At this stage, most of the side effectful work has been
208+ done, so we do need to call ` MIRROR.stop ` to clean things up if we experience an
209+ failure during copying.
210+
211+
212+ ## SMAPIv1 Migration implementation detail
114213
115214``` mermaid
116215sequenceDiagram
@@ -1873,3 +1972,4 @@ let pre_deactivate_hook ~dbg ~dp ~sr ~vdi =
18731972 s.failed <- true
18741973 )
18751974```
1975+
0 commit comments