Skip to content

Commit b3e7358

Browse files
corrections for using enable_python_native_blobs in README.
1 parent 90ed2b0 commit b3e7358

File tree

2 files changed

+36
-27
lines changed

2 files changed

+36
-27
lines changed

README.md

Lines changed: 35 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -24,15 +24,29 @@ pip3 install --upgrade datajoint
2424
```
2525
## Python Native Blobs
2626

27-
For the v0.12 release, the variable `enable_python_native_blobs` can be
28-
safely enabled for improved blob support of python datatypes if the following
29-
are true:
27+
DataJoint 0.12 adds full support for all native python data types in blobs: tuples, lists, sets, dicts, strings, bytes, `None`, and all their recursive combinations.
28+
The new blobs are a superset of the old functionality and are fully backward compatible.
29+
In previous versions, only MATLAB-style numerical arrays were fully supported.
30+
Some Python datatypes such as dicts were coerced into numpy recarrays and then fetched as such.
3031

31-
* This is a new DataJoint installation / pipeline(s)
32-
* You have not used DataJoint prior to v0.12 with your pipeline(s)
33-
* You do not share blob data between Python and Matlab
32+
However, since some Python types were coerced into MATLAB types, old blobs and new blobs may now be fetched as different types of objects even if they were inserted the same way.
33+
For example, new `dict` objects will be returned as `dict` while the same types of objects inserted with `datajoint 0.11` will be recarrays.
3434

35-
Otherwise, please read the following carefully:
35+
Since this is a big change, we chose to disable full blobs support by default as a temporary precaution.
36+
37+
You can enable it by setting the `enable_python_native_blobs` flag in `dj.config`.
38+
39+
```python
40+
import datajoint as dj
41+
dj.config["enable_python_native_blobs"] = True
42+
```
43+
44+
You can safely enable this setting if both of the following are true:
45+
46+
* All blobs in your current DataJoint databases contain only numerical arrays.
47+
* You do not need to share blob data between Python and Matlab
48+
49+
Otherwise, read the following explanation.
3650

3751
DataJoint v0.12 expands DataJoint's blob serialization mechanism with
3852
improved support for complex native python datatypes, such as dictionaries
@@ -45,22 +59,18 @@ and Python for certain record types. However, this created a discrepancy
4559
between insert and fetch datatypes which could cause problems in other
4660
portions of users pipelines.
4761

48-
For v0.12, it was decided to remove the type squashing behavior, instead
49-
creating a separate storage encoding which improves support for storing
50-
native python datatypes in blobs without squashing them into numpy
51-
structured arrays. However, this change creates a compatibility problem
52-
for pipelines which previously relied on the type squashing behavior
53-
since records saved via the old squashing format will continue to fetch
62+
DataJoint v0.12, removes the squashing behavior, instead encoding native python datatypes in blobs directly.
63+
However, this change creates a compatibility problem for pipelines
64+
which previously relied on the type squashing behavior since records
65+
saved via the old squashing format will continue to fetch
5466
as structured arrays, whereas new record inserted in DataJoint 0.12 with
5567
`enable_python_native_blobs` would result in records returned as the
56-
appropriate native python type (dict, etc). Read support for python
57-
native blobs also not yet implemented in DataJoint for Matlab.
58-
59-
To prevent data from being stored in mixed format within a table across
60-
upgrades from previous versions of DataJoint, the
61-
`enable_python_native_blobs` flag was added as a temporary guard measure
62-
for the 0.12 release. This flag will trigger an exception if any of the
63-
ambiguous cases are encountered during inserts in order to allow testing
68+
appropriate native python type (dict, etc).
69+
Furthermore, DataJoint for MATLAB does not yet support unpacking native Python datatypes.
70+
71+
With `dj.config["enable_python_native_blobs"]` set to `False` (default),
72+
any attempt to insert any datatype other than a numpy array will result in an exception.
73+
This is meant to get users to read this message in order to allow proper testing
6474
and migration of pre-0.12 pipelines to 0.11 in a safe manner.
6575

6676
The exact process to update a specific pipeline will vary depending on
@@ -69,13 +79,12 @@ the situation, but generally the following strategies may apply:
6979
* Altering code to directly store numpy structured arrays or plain
7080
multidimensional arrays. This strategy is likely best one for those
7181
tables requiring compatibility with Matlab.
72-
* Adjust code to deal with both structured array and native fetched data.
82+
* Adjust code to deal with both structured array and native fetched data
83+
for those tables that are populated with `dict`s in blobs in pre-0.12 version.
7384
In this case, insert logic is not adjusted, but downstream consumers
7485
are adjusted to handle records saved under the old and new schemes.
75-
* Manually convert data using fetch/insert into a fresh schema.
76-
In this approach, DataJoint's create_virtual_module functionality would
77-
be used in conjunction with a a fetch/convert/insert loop to update
78-
the data to the new native_blob functionality.
86+
* Migrate data into a fresh schema, fetching the old data, converting blobs to
87+
a uniform data type and re-inserting.
7988
* Drop/Recompute imported/computed tables to ensure they are in the new
8089
format.
8190

datajoint/blob.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ def __init__(self, squeeze=False):
7373

7474
def set_dj0(self):
7575
if not config.get('enable_python_native_blobs'):
76-
raise DataJointError('v0.12+ python native blobs disabled. see also: https://github.com/datajoint/datajoint-python/blob/master/README.md')
76+
raise DataJointError('v0.12+ python native blobs disabled. see also: https://github.com/datajoint/datajoint-python#python-native-blobs')
7777

7878
self.protocol = b"dj0\0" # when using new blob features
7979

0 commit comments

Comments
 (0)