|
| 1 | +# Save and Sync |
| 2 | + |
| 3 | +This guide helps explain some of the details of how the save/sync functionality |
| 4 | +works and interacts with other systems. |
| 5 | + |
| 6 | + |
| 7 | +## Usage and functionality |
| 8 | + |
| 9 | +Users can use `save` to upload changes from python into a gel database. They can |
| 10 | +use `sync` to additionally refetch changes from the database. |
| 11 | + |
| 12 | +These functions make it easier to keep application state consistent with its |
| 13 | +database. |
| 14 | + |
| 15 | +A typical use of these functions might look like this: |
| 16 | +```py |
| 17 | +foo = default.Foo(n=1) |
| 18 | +bar = default.Bar(foo=foo) |
| 19 | +client.save(foo, bar) |
| 20 | + |
| 21 | +# after foo is changed somewhere else |
| 22 | +client.sync(foo) |
| 23 | +``` |
| 24 | + |
| 25 | +New objects can be inserted using either `save` or `sync` and will have its `id` |
| 26 | +set. When `sync` is called, all fields are refetched as well. |
| 27 | + |
| 28 | +See: `_save.push_refetch_new` and `_save.SaveExecutor._commit` |
| 29 | + |
| 30 | +Existing objects will only refetch fields which were previously set or fetched. |
| 31 | +If a single database object is represented by multiple python objects, they are |
| 32 | +all updated, but with the appropriate subset of fields. |
| 33 | + |
| 34 | +See: `_save.push_refetch_existing` |
| 35 | + |
| 36 | + |
| 37 | +### Reachable objects |
| 38 | + |
| 39 | +Both `save` and `sync` will apply to objects directly passed as arguments, as |
| 40 | +well as any objects which can be reached via links. |
| 41 | + |
| 42 | +For example, here both objects are synced: |
| 43 | +```py |
| 44 | +foo = default.Foo(n=1) |
| 45 | +bar = default.Bar(foo=foo) |
| 46 | +client.save(bar) |
| 47 | +``` |
| 48 | + |
| 49 | +See: `_save.make_plan` for where `existing_objects` is updated. |
| 50 | + |
| 51 | +For `sync`, all reachable objects are refetched, even though these are the same |
| 52 | +objects that get scanned for changes that need to be saved in the first place. |
| 53 | +The reasoning is that even if there were no direct changes, they might be |
| 54 | +affected by changes in other objects (e.g. because of backlink computeds). |
| 55 | + |
| 56 | + |
| 57 | +### Refetching links |
| 58 | + |
| 59 | +When links are refetched, the source object will be updated with a target object |
| 60 | +according to the following priority: |
| 61 | +- The existing link target |
| 62 | +- A reachable object, either existing or a refetched new |
| 63 | + - Chosen arbitrarily when multiple are available |
| 64 | +- A new object with only `id` |
| 65 | + |
| 66 | +See: `_descriptors.reconcile_link` and `_descriptors.reconcile_proxy_link`. |
| 67 | + |
| 68 | +Multi links aren't refetched entirely to avoid performance issues. Instead, |
| 69 | +existing data is reconciled with the delta (new and updated object IDs) using a |
| 70 | +filter. |
| 71 | + |
| 72 | +The refetch filter includes: |
| 73 | + |
| 74 | +- All existing link target IDs from the Python field |
| 75 | +- All IDs from the delta |
| 76 | + |
| 77 | +This captures both additions and removals to the multi-link. |
| 78 | + |
| 79 | +**Note**: For partially-fetched multi-links, original filtering criteria |
| 80 | +(filter, offset, limit) may no longer apply after reconciliation. |
| 81 | + |
| 82 | +See: `_save._compile_refetch` where `ptr.cardinality.is_multi()` |
| 83 | + |
| 84 | +### Link properties |
| 85 | + |
| 86 | +Link properties follow the python object model, instead of the gel model. |
| 87 | +As a result, overwriting a link with the same object will overwrite all its |
| 88 | +link properties. |
| 89 | + |
| 90 | +This is illustrated in the following examples: |
| 91 | +``` |
| 92 | +foo = default.Foo() |
| 93 | +bar = default.Bar(foo=default.Bar.foo.link(foo, a=1, b=2, c=3)) |
| 94 | +client.sync(bar) |
| 95 | +
|
| 96 | +# updates a, keeps b and c |
| 97 | +bar.foo.__linkprops__.a = 9 |
| 98 | +
|
| 99 | +# resets a, b, and c |
| 100 | +bar.foo = foo |
| 101 | +``` |
| 102 | + |
| 103 | + |
| 104 | +## Implementation |
| 105 | + |
| 106 | +When either `save` or `sync` is called on a client, it calls an underlying |
| 107 | +`_save_impl` which does the actual work. |
| 108 | + |
| 109 | +The general order of operations is: |
| 110 | +- Make a save plan |
| 111 | +- Compile and execute batch queries |
| 112 | +- If sync, compile and execute refetch queries |
| 113 | +- Commit the changes |
| 114 | + |
| 115 | + |
| 116 | +### Save plan |
| 117 | + |
| 118 | +Unlike a call to `query` or `execute`, a call to `save` or `sync` may be split |
| 119 | +into multiple sub-queries. The save plan forms the general outline of how these |
| 120 | +are arranged. |
| 121 | + |
| 122 | +In `make_save_executor_constructor`, after creating a save plan, these are |
| 123 | +stored in a `SaveExecutor` which tracks different objects throughout the |
| 124 | +save/sync process. |
| 125 | + |
| 126 | +Reachable objects are traversed in graph order and checked whether they are new |
| 127 | +and which properties and links were changed. It then creates `ModelChange` |
| 128 | +bundled into a list of `QueryBatch`s. |
| 129 | + |
| 130 | +Which fields to change is determined using `__gel_get_changed_fields__`. |
| 131 | + |
| 132 | +A `ModelChange` represents changes to a single object. A `QueryBatch` represents |
| 133 | +changes to the model which can be run independently of each other. Batches are |
| 134 | +grouped into insert and update batches. |
| 135 | + |
| 136 | +See: `_save.make_plan` |
| 137 | + |
| 138 | +When syncing, `_save._add_refetch_shape` tracks the fields to refetch for each |
| 139 | +object. |
| 140 | + |
| 141 | + |
| 142 | +### Batch queries |
| 143 | + |
| 144 | +The `_save.SaveExecutor.__iter__` function iterates over the insert and update |
| 145 | +batches, compiles them into queries, and groups them by similar queries. |
| 146 | + |
| 147 | +The `_save.SaveExecutor._compile_batch` does the actual compiling by generating |
| 148 | +edgeql for each property and link change, and assembling them into a shape. |
| 149 | +The shape is then applied to an insert or update, and the resulting statement |
| 150 | +is wrapped in a `select` which differs between `sync` and `select`: |
| 151 | +- for `save`: `select (...).id` |
| 152 | +- for `sync`: `select (...) { * }` |
| 153 | + |
| 154 | +For `save`, only the `id` of a new object is updated. But for sync, |
| 155 | +`GelModel` instances of the new objects get stored in |
| 156 | +`_save.SaveExecutor.new_objects`. Theseare used when updating refetched links. |
| 157 | + |
| 158 | +Query arguments are assembled into a `__data` (or `__all_data` for multi) |
| 159 | +argument which is a tuple any new data for that object, including the object id |
| 160 | +for updates. |
| 161 | + |
| 162 | +Since tuples don't allow optional arguments, `__data` uses arrays of length 0 |
| 163 | +or 1. In `_save.SaveExecutor._compile_change`, the local `arg_cast` function |
| 164 | +helps convert python data and types into the appropriate edgeql. |
| 165 | + |
| 166 | +The compiled queries are grouped by their query string into `QueryBatch`s. |
| 167 | + |
| 168 | +Finally the results of executed queries is stored using |
| 169 | +`_save.QueryBatch.record_inserted_data`. |
| 170 | + |
| 171 | + |
| 172 | +### Refetch queries |
| 173 | + |
| 174 | +The `_save.SaveExecutor.get_refetch_queries` function compiles the refetch |
| 175 | +queries and groups them by object type. |
| 176 | + |
| 177 | +It works similarly to `_compile_batch` in that it generates edgeql for each |
| 178 | +prroperty and link and assembles them into a shape. |
| 179 | + |
| 180 | +A refetch query has 3 parameters: |
| 181 | +- `__new`: ids of all new objects |
| 182 | +- `__existing`: ids of all existing objects |
| 183 | +- `__spec`: an array of tuples of: |
| 184 | + - object id |
| 185 | + - an array of tuples of: |
| 186 | + - link indexes |
| 187 | + - ids of objects previously in that link |
| 188 | + |
| 189 | +The `__spec` parameter is used to filter multi links as discussed above. |
| 190 | + |
| 191 | +The refetched data is a sequence of `GelModel` instances which are stored in |
| 192 | +`_save.SaveExecutor.refetched_data` using |
| 193 | +`_save.QueryRefetch.record_refetched_data`. |
| 194 | + |
| 195 | +### Commiting changes |
| 196 | + |
| 197 | +Up until this point, no changes are actually applied to user objects yet. |
| 198 | +Only once all the refetches are executed and their results recorded are any |
| 199 | +changes made. |
| 200 | + |
| 201 | +In `_save.SaveExecutor._commit`: |
| 202 | +- Ids are applied to new objects |
| 203 | +- Refetch data is applied to existing object |
| 204 | +- Refetch data is applied to new objects |
| 205 | + |
| 206 | +Existing objects will have two (or more) `GelModel` instances: |
| 207 | +- the user instance(s) |
| 208 | +- the refetch instance |
| 209 | + |
| 210 | +The refetch data can be simply applied to the user instance(s) using |
| 211 | +`_save.QueryRefetch._apply_refetched_data_shape`. |
| 212 | + |
| 213 | +In contrast, new objects have three `GelModel` instances: |
| 214 | +- the user instance |
| 215 | +- the batch instance (stored in `_save.SaveExecutor.new_objects`) |
| 216 | +- the refetch instance |
| 217 | + |
| 218 | +Since the batch instance is used to update refetched links, both the |
| 219 | +batch and user instances need to be updated. This is done by updating |
| 220 | +the batch instance in `_apply_refetched_data_shape`, then later updating |
| 221 | +the user instance. |
| 222 | + |
| 223 | +After all changes are made, `_save.SaveExecutor._commit_recursive` "locks in" |
| 224 | +changes to the models by resetting the changed fields flags, resetting |
| 225 | +`_added_items` and `_removed_items` in tracked lists, etc. |
| 226 | + |
| 227 | +Finally, there is a post-commit check step which ensures that no changes are |
| 228 | +made and that re-running save would essentially be a no-op. |
0 commit comments