Skip to content

Commit 436af50

Browse files
authored
Add readme for _save. (#914)
1 parent 5c5d4e3 commit 436af50

2 files changed

Lines changed: 238 additions & 15 deletions

File tree

gel/_internal/_save.md

Lines changed: 228 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,228 @@
1+
# Save and Sync
2+
3+
This guide helps explain some of the details of how the save/sync functionality
4+
works and interacts with other systems.
5+
6+
7+
## Usage and functionality
8+
9+
Users can use `save` to upload changes from python into a gel database. They can
10+
use `sync` to additionally refetch changes from the database.
11+
12+
These functions make it easier to keep application state consistent with its
13+
database.
14+
15+
A typical use of these functions might look like this:
16+
```py
17+
foo = default.Foo(n=1)
18+
bar = default.Bar(foo=foo)
19+
client.save(foo, bar)
20+
21+
# after foo is changed somewhere else
22+
client.sync(foo)
23+
```
24+
25+
New objects can be inserted using either `save` or `sync` and will have its `id`
26+
set. When `sync` is called, all fields are refetched as well.
27+
28+
See: `_save.push_refetch_new` and `_save.SaveExecutor._commit`
29+
30+
Existing objects will only refetch fields which were previously set or fetched.
31+
If a single database object is represented by multiple python objects, they are
32+
all updated, but with the appropriate subset of fields.
33+
34+
See: `_save.push_refetch_existing`
35+
36+
37+
### Reachable objects
38+
39+
Both `save` and `sync` will apply to objects directly passed as arguments, as
40+
well as any objects which can be reached via links.
41+
42+
For example, here both objects are synced:
43+
```py
44+
foo = default.Foo(n=1)
45+
bar = default.Bar(foo=foo)
46+
client.save(bar)
47+
```
48+
49+
See: `_save.make_plan` for where `existing_objects` is updated.
50+
51+
For `sync`, all reachable objects are refetched, even though these are the same
52+
objects that get scanned for changes that need to be saved in the first place.
53+
The reasoning is that even if there were no direct changes, they might be
54+
affected by changes in other objects (e.g. because of backlink computeds).
55+
56+
57+
### Refetching links
58+
59+
When links are refetched, the source object will be updated with a target object
60+
according to the following priority:
61+
- The existing link target
62+
- A reachable object, either existing or a refetched new
63+
- Chosen arbitrarily when multiple are available
64+
- A new object with only `id`
65+
66+
See: `_descriptors.reconcile_link` and `_descriptors.reconcile_proxy_link`.
67+
68+
Multi links aren't refetched entirely to avoid performance issues. Instead,
69+
existing data is reconciled with the delta (new and updated object IDs) using a
70+
filter.
71+
72+
The refetch filter includes:
73+
74+
- All existing link target IDs from the Python field
75+
- All IDs from the delta
76+
77+
This captures both additions and removals to the multi-link.
78+
79+
**Note**: For partially-fetched multi-links, original filtering criteria
80+
(filter, offset, limit) may no longer apply after reconciliation.
81+
82+
See: `_save._compile_refetch` where `ptr.cardinality.is_multi()`
83+
84+
### Link properties
85+
86+
Link properties follow the python object model, instead of the gel model.
87+
As a result, overwriting a link with the same object will overwrite all its
88+
link properties.
89+
90+
This is illustrated in the following examples:
91+
```
92+
foo = default.Foo()
93+
bar = default.Bar(foo=default.Bar.foo.link(foo, a=1, b=2, c=3))
94+
client.sync(bar)
95+
96+
# updates a, keeps b and c
97+
bar.foo.__linkprops__.a = 9
98+
99+
# resets a, b, and c
100+
bar.foo = foo
101+
```
102+
103+
104+
## Implementation
105+
106+
When either `save` or `sync` is called on a client, it calls an underlying
107+
`_save_impl` which does the actual work.
108+
109+
The general order of operations is:
110+
- Make a save plan
111+
- Compile and execute batch queries
112+
- If sync, compile and execute refetch queries
113+
- Commit the changes
114+
115+
116+
### Save plan
117+
118+
Unlike a call to `query` or `execute`, a call to `save` or `sync` may be split
119+
into multiple sub-queries. The save plan forms the general outline of how these
120+
are arranged.
121+
122+
In `make_save_executor_constructor`, after creating a save plan, these are
123+
stored in a `SaveExecutor` which tracks different objects throughout the
124+
save/sync process.
125+
126+
Reachable objects are traversed in graph order and checked whether they are new
127+
and which properties and links were changed. It then creates `ModelChange`
128+
bundled into a list of `QueryBatch`s.
129+
130+
Which fields to change is determined using `__gel_get_changed_fields__`.
131+
132+
A `ModelChange` represents changes to a single object. A `QueryBatch` represents
133+
changes to the model which can be run independently of each other. Batches are
134+
grouped into insert and update batches.
135+
136+
See: `_save.make_plan`
137+
138+
When syncing, `_save._add_refetch_shape` tracks the fields to refetch for each
139+
object.
140+
141+
142+
### Batch queries
143+
144+
The `_save.SaveExecutor.__iter__` function iterates over the insert and update
145+
batches, compiles them into queries, and groups them by similar queries.
146+
147+
The `_save.SaveExecutor._compile_batch` does the actual compiling by generating
148+
edgeql for each property and link change, and assembling them into a shape.
149+
The shape is then applied to an insert or update, and the resulting statement
150+
is wrapped in a `select` which differs between `sync` and `select`:
151+
- for `save`: `select (...).id`
152+
- for `sync`: `select (...) { * }`
153+
154+
For `save`, only the `id` of a new object is updated. But for sync,
155+
`GelModel` instances of the new objects get stored in
156+
`_save.SaveExecutor.new_objects`. Theseare used when updating refetched links.
157+
158+
Query arguments are assembled into a `__data` (or `__all_data` for multi)
159+
argument which is a tuple any new data for that object, including the object id
160+
for updates.
161+
162+
Since tuples don't allow optional arguments, `__data` uses arrays of length 0
163+
or 1. In `_save.SaveExecutor._compile_change`, the local `arg_cast` function
164+
helps convert python data and types into the appropriate edgeql.
165+
166+
The compiled queries are grouped by their query string into `QueryBatch`s.
167+
168+
Finally the results of executed queries is stored using
169+
`_save.QueryBatch.record_inserted_data`.
170+
171+
172+
### Refetch queries
173+
174+
The `_save.SaveExecutor.get_refetch_queries` function compiles the refetch
175+
queries and groups them by object type.
176+
177+
It works similarly to `_compile_batch` in that it generates edgeql for each
178+
prroperty and link and assembles them into a shape.
179+
180+
A refetch query has 3 parameters:
181+
- `__new`: ids of all new objects
182+
- `__existing`: ids of all existing objects
183+
- `__spec`: an array of tuples of:
184+
- object id
185+
- an array of tuples of:
186+
- link indexes
187+
- ids of objects previously in that link
188+
189+
The `__spec` parameter is used to filter multi links as discussed above.
190+
191+
The refetched data is a sequence of `GelModel` instances which are stored in
192+
`_save.SaveExecutor.refetched_data` using
193+
`_save.QueryRefetch.record_refetched_data`.
194+
195+
### Commiting changes
196+
197+
Up until this point, no changes are actually applied to user objects yet.
198+
Only once all the refetches are executed and their results recorded are any
199+
changes made.
200+
201+
In `_save.SaveExecutor._commit`:
202+
- Ids are applied to new objects
203+
- Refetch data is applied to existing object
204+
- Refetch data is applied to new objects
205+
206+
Existing objects will have two (or more) `GelModel` instances:
207+
- the user instance(s)
208+
- the refetch instance
209+
210+
The refetch data can be simply applied to the user instance(s) using
211+
`_save.QueryRefetch._apply_refetched_data_shape`.
212+
213+
In contrast, new objects have three `GelModel` instances:
214+
- the user instance
215+
- the batch instance (stored in `_save.SaveExecutor.new_objects`)
216+
- the refetch instance
217+
218+
Since the batch instance is used to update refetched links, both the
219+
batch and user instances need to be updated. This is done by updating
220+
the batch instance in `_apply_refetched_data_shape`, then later updating
221+
the user instance.
222+
223+
After all changes are made, `_save.SaveExecutor._commit_recursive` "locks in"
224+
changes to the models by resetting the changed fields flags, resetting
225+
`_added_items` and `_removed_items` in tracked lists, etc.
226+
227+
Finally, there is a post-commit check step which ensures that no changes are
228+
made and that re-running save would essentially be a no-op.

gel/_internal/_save.py

Lines changed: 10 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@
1212
TYPE_CHECKING,
1313
Any,
1414
Literal,
15-
NamedTuple,
1615
TypeGuard,
1716
TypeVar,
1817
Generic,
@@ -243,12 +242,13 @@ def __post_init__(self) -> None:
243242
ChangeBatch = TypeAliasType("ChangeBatch", list[ModelChange])
244243

245244

246-
class SavePlan(NamedTuple):
245+
@dataclasses.dataclass(frozen=True)
246+
class SavePlan:
247247
# Lists of lists of queries to create new objects.
248248
# Every list of query is safe to executute in a "batch" --
249249
# basically send them all at once. Follow up lists of queries
250250
# will use objects inserted by the previous batches.
251-
insert_batches: list[ChangeBatch]
251+
create_batches: list[ChangeBatch]
252252

253253
# Optional links of newly inserted objects and changes to
254254
# links between existing objects.
@@ -1161,22 +1161,17 @@ def make_save_executor_constructor(
11611161
warn_on_large_sync_set: bool = False,
11621162
save_postcheck: bool = False,
11631163
) -> Callable[[], SaveExecutor]:
1164-
(
1165-
create_batches,
1166-
updates,
1167-
refetch_batch,
1168-
existing_objects,
1169-
) = make_plan(
1164+
plan = make_plan(
11701165
objs,
11711166
refetch=refetch,
11721167
warn_on_large_sync_set=warn_on_large_sync_set,
11731168
)
11741169
return lambda: SaveExecutor(
11751170
objs=objs,
1176-
create_batches=create_batches,
1177-
updates=updates,
1178-
refetch_batch=refetch_batch,
1179-
existing_objects=existing_objects,
1171+
create_batches=plan.create_batches,
1172+
updates=plan.update_batch,
1173+
refetch_batch=plan.refetch_batch,
1174+
existing_objects=plan.existing_objects,
11801175
refetch=refetch,
11811176
save_postcheck=save_postcheck,
11821177
warn_on_large_sync_set=warn_on_large_sync_set,
@@ -1891,12 +1886,12 @@ def _check_recursive(obj: GelModel, path: pathlib.Path) -> None:
18911886

18921887
# Final check: make sure that the save plan is empty
18931888
# in case we've missed something in `_check_recursive()`.
1894-
create_batches, updates, _, _ = make_plan(
1889+
plan = make_plan(
18951890
self.objs,
18961891
refetch=self.refetch,
18971892
warn_on_large_sync_set=False,
18981893
)
1899-
if create_batches or updates:
1894+
if plan.create_batches or plan.update_batch:
19001895
raise ValueError("non-empty save plan after save()")
19011896

19021897
def _get_id(self, obj: GelModel) -> uuid.UUID:

0 commit comments

Comments
 (0)