You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/getting_started/tutorials/14.indexing-arrays.ipynb
+13-6Lines changed: 13 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@
7
7
"source": [
8
8
"# Indexing Arrays\n",
9
9
"\n",
10
-
"Blosc2 can attach indexes to 1-D `NDArray` objects and to fields inside 1-D structured arrays. These indexes accelerate selective masks, and `full` indexes can also drive ordered access directly through `sort(order=...)`, `NDArray.argsort(order=...)`, `LazyExpr.argsort(order=...)`, and `iter_sorted(...)`.\n",
10
+
"Blosc2 can attach indexes to 1-D `NDArray` objects and to fields inside 1-D structured arrays. These indexes accelerate selective masks, and `full` indexes can also drive ordered access directly through `sort(order=...)`, `NDArray.argsort(order=...)`, `LazyExpr.argsort(order=...)`, and `iter_sorted(...)`. OPSI indexes are a separate tunable iterative-ordering kind: they improve the physical order used for exact filtering, but they are not intended to converge to a completely sorted `full`/CSI index.\n",
11
11
"\n",
12
12
"This tutorial covers:\n",
13
13
"\n",
@@ -108,16 +108,19 @@
108
108
"source": [
109
109
"## Index kinds and how to create them\n",
110
110
"\n",
111
-
"Blosc2 currently supports four index kinds:\n",
111
+
"Blosc2 currently supports five index kinds:\n",
112
112
"\n",
113
113
"- `summary`: compact summaries only,\n",
114
114
"- `bucket`: summary levels plus lightweight per-block payloads,\n",
115
115
"- `partial`: richer payloads for positional filtering,\n",
116
+
"- `opsi`: tunable iterative ordering for exact filtering,\n",
116
117
"- `full`: globally sorted payloads for positional filtering and ordered reuse.\n",
117
118
"\n",
119
+
"`OPSI` is intentionally a separate kind, not a `full` index construction method. It performs a configurable number of ordering cycles and then keeps that iterative ordering as-is. Achieving a completely sorted index (CSI) is not a goal for OPSI; use `FULL` when you require global sorted order or direct ordered reuse. By default, `OPSI` uses `optlevel` cycles for `optlevel < 8`, and `2 * optlevel` cycles for `optlevel >= 8`. You can override this with `opsi_max_cycles=...`.\n",
120
+
"\n",
118
121
"There is one active index per target field or expression. If you create another index on the same target, it replaces the previous one. The easiest way to compare kinds is to build them on separate arrays.\n",
119
122
"\n",
120
-
"The next cell times index creation and reports the compressed storage footprint of each index relative to the compressed base array."
123
+
"The next cell times index creation and reports the compressed storage footprint of each index relative to the compressed base array.\n"
121
124
]
122
125
},
123
126
{
@@ -152,6 +155,7 @@
152
155
" blosc2.IndexKind.SUMMARY,\n",
153
156
" blosc2.IndexKind.BUCKET,\n",
154
157
" blosc2.IndexKind.PARTIAL,\n",
158
+
" blosc2.IndexKind.OPSI,\n",
155
159
" blosc2.IndexKind.FULL,\n",
156
160
"):\n",
157
161
" arr = data.copy()\n",
@@ -238,7 +242,7 @@
238
242
"source": [
239
243
"### Timing the mask with and without indexes\n",
240
244
"\n",
241
-
"The next cell measures the same selective mask on all four index kinds and compares it with a forced full scan. On this workload, `partial`and `full` usually show the clearest benefit because they carry richer payloads for positional filtering."
245
+
"The next cell measures the same selective mask on all five index kinds and compares it with a forced full scan. On this workload, `partial`, `opsi`, and `full` usually show the clearest benefit because they carry richer payloads for positional filtering.\n"
242
246
]
243
247
},
244
248
{
@@ -299,7 +303,9 @@
299
303
"source": [
300
304
"## `full` indexes and ordered access\n",
301
305
"\n",
302
-
"A `full` index stores a global sorted payload. This is the required index tier for direct ordered reuse. Build it directly with `create_index(kind=blosc2.IndexKind.FULL)`."
306
+
"A `full` index stores a global sorted payload. This is the required index tier for direct ordered reuse. Build it directly with `create_index(kind=blosc2.IndexKind.FULL)`.\n",
307
+
"\n",
308
+
"If you only want a tunable iterative ordering index for exact filtering, use `create_index(kind=blosc2.IndexKind.OPSI)` instead. OPSI can improve cold-query locality as `optlevel` or `opsi_max_cycles` increases, but it does not replace `FULL` for globally sorted access.\n"
303
309
]
304
310
},
305
311
{
@@ -585,11 +591,12 @@
585
591
"## Practical guidance\n",
586
592
"\n",
587
593
"- Use `partial` when your main goal is faster selective masks.\n",
594
+
"- Use `opsi` when you want exact filtering with tunable iterative ordering. Increase `optlevel` or pass `opsi_max_cycles` to spend more build time on ordering; do not expect OPSI to become a `full`/CSI index.\n",
588
595
"- Use `full` when you also want ordered reuse through `sort(order=...)`, `NDArray.argsort(order=...)`, `LazyExpr.argsort(order=...)`, or `iter_sorted(...)`.\n",
589
596
"- Persist the base array if you want indexes to survive reopen automatically.\n",
590
597
"- After unsupported mutations, use `rebuild_index()`.\n",
591
598
"- For append-heavy `full` indexes, compact explicitly at convenient maintenance boundaries instead of on every append.\n",
592
-
"- Measure your own workload: compact indexes, predicate selectivity, and ordered access needs all affect which kind is best.\n"
599
+
"- Measure your own workload: compact indexes, predicate selectivity, iterative-ordering level, and ordered access needs all affect which kind is best.\n"
Copy file name to clipboardExpand all lines: doc/getting_started/tutorials/15.indexing-ctables.ipynb
+3-1Lines changed: 3 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -130,7 +130,8 @@
130
130
"source": [
131
131
"## Creating an index\n",
132
132
"\n",
133
-
"Call `create_index(col_name)` to build a bucket index on a column.\n",
133
+
"Call `create_index(col_name)` to build a bucket index on a column. Pass `kind=...` to choose another index kind, including `blosc2.IndexKind.OPSI` for tunable iterative ordering or `blosc2.IndexKind.FULL` for globally sorted indexes that can also support ordered reuse. OPSI is a separate exact-filtering index kind, not a slower way to build a `FULL`/CSI index; its build effort is controlled by `optlevel` or the explicit `opsi_max_cycles` keyword.\n",
134
+
"\n",
134
135
"The returned `CTableIndex` handle shows the column name, kind, and whether the index is stale.\n"
0 commit comments