Skip to content

Commit 03ec6b2

Browse files
committed
Add preview() user guide notebook (#986)
1 parent a0de7ed commit 03ec6b2

File tree

1 file changed

+125
-0
lines changed

1 file changed

+125
-0
lines changed
Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "rt7pp4omcq",
6+
"source": "# Preview: memory-safe thumbnails of large rasters\n\nWhen a raster is backed by dask (e.g. loaded lazily from Zarr or a stack of GeoTIFFs),\ncalling `.compute()` to visualize it can blow up your memory. `xrspatial.preview()`\ndownsamples the data to a target pixel size using block averaging, and the whole\noperation stays lazy until you ask for the result. Peak memory is bounded by\nthe largest chunk plus the small output array.\n\nThis notebook generates a 1 TB dask-backed terrain raster and previews it at\n1000x1000 pixels. A `dask.distributed` LocalCluster is started so you can\nwatch the task graph and worker memory in the dashboard.",
7+
"metadata": {}
8+
},
9+
{
10+
"cell_type": "code",
11+
"id": "ivhk3f6ui7",
12+
"source": "import numpy as np\nimport xarray as xr\nimport dask.array as da\nimport matplotlib.pyplot as plt\n\nimport xrspatial\nfrom xrspatial import generate_terrain, preview",
13+
"metadata": {},
14+
"execution_count": null,
15+
"outputs": []
16+
},
17+
{
18+
"cell_type": "code",
19+
"id": "lb7wkq291z",
20+
"source": "from dask.distributed import Client, LocalCluster\n\ncluster = LocalCluster(n_workers=4, threads_per_worker=2, memory_limit=\"2GB\")\nclient = Client(cluster)\nprint(f\"Dashboard: {client.dashboard_link}\")\nclient",
21+
"metadata": {},
22+
"execution_count": null,
23+
"outputs": []
24+
},
25+
{
26+
"cell_type": "markdown",
27+
"id": "ouvgm7ttw1",
28+
"source": "## Generate a terrain tile\n\nFirst, create a 1024x1024 terrain tile using `generate_terrain`. This is the\nbuilding block we'll replicate into a massive dask array.",
29+
"metadata": {}
30+
},
31+
{
32+
"cell_type": "code",
33+
"id": "yts07v5mgv9",
34+
"source": "# 1024x1024 in-memory terrain tile\ncanvas = xr.DataArray(np.zeros((1024, 1024), dtype=np.float32), dims=[\"y\", \"x\"])\ntile = generate_terrain(canvas, seed=12345)\n\nfig, ax = plt.subplots(figsize=(6, 6))\ntile.plot(ax=ax, cmap=\"terrain\")\nax.set_title(f\"Terrain tile ({tile.shape[0]}x{tile.shape[1]}, {tile.nbytes / 1e6:.1f} MB)\")\nax.set_aspect(\"equal\")\nplt.tight_layout()",
35+
"metadata": {},
36+
"execution_count": null,
37+
"outputs": []
38+
},
39+
{
40+
"cell_type": "markdown",
41+
"id": "mc23kw3w94",
42+
"source": "## Tile it into a 1 TB dask array\n\nWe replicate the tile 512x512 times using `dask.array.tile` to get a\n524,288 x 524,288 raster. At float32 that's 1.1 TB of data. Nothing is\nactually computed here -- dask just records the tiling as a lazy graph.",
43+
"metadata": {}
44+
},
45+
{
46+
"cell_type": "code",
47+
"id": "ire1hxtder",
48+
"source": "# Tile the small terrain into a ~1 TB dask array\nreps = 512\nbig_dask = da.tile(\n da.from_array(tile.values, chunks=(1024, 1024)),\n (reps, reps),\n)\nrows, cols = big_dask.shape\nbig = xr.DataArray(\n big_dask,\n dims=[\"y\", \"x\"],\n coords={\"y\": np.arange(rows, dtype=np.float64), \"x\": np.arange(cols, dtype=np.float64)},\n)\n\nprint(f\"Shape: {big.shape[0]:,} x {big.shape[1]:,}\")\nprint(f\"Chunk size: {big_dask.chunksize}\")\nprint(f\"Num chunks: {big_dask.numblocks}\")\nprint(f\"Total size: {big_dask.nbytes / 1e12:.2f} TB\")\nprint(f\"Dtype: {big_dask.dtype}\")",
49+
"metadata": {},
50+
"execution_count": null,
51+
"outputs": []
52+
},
53+
{
54+
"cell_type": "markdown",
55+
"id": "3n94gc0t1tg",
56+
"source": "## Preview at 1000x1000\n\n`preview()` builds a lazy coarsen-then-mean graph. Calling `.compute()` on the\nresult materializes only the 1000x1000 output -- about 4 MB.",
57+
"metadata": {}
58+
},
59+
{
60+
"cell_type": "code",
61+
"id": "skqz0wfgial",
62+
"source": "%%time\nsmall = preview(big, width=1000).compute()\n\nprint(f\"Output shape: {small.shape}\")\nprint(f\"Output size: {small.nbytes / 1e6:.1f} MB\")",
63+
"metadata": {},
64+
"execution_count": null,
65+
"outputs": []
66+
},
67+
{
68+
"cell_type": "code",
69+
"id": "2jif06ajupn",
70+
"source": "fig, ax = plt.subplots(figsize=(8, 8))\nsmall.plot(ax=ax, cmap=\"terrain\")\nax.set_title(f\"1000x1000 preview of a {big_dask.nbytes / 1e12:.1f} TB raster\")\nax.set_aspect(\"equal\")\nplt.tight_layout()",
71+
"metadata": {},
72+
"execution_count": null,
73+
"outputs": []
74+
},
75+
{
76+
"cell_type": "markdown",
77+
"id": "nrbcb74q9oa",
78+
"source": "## Different preview sizes\n\nYou can control both width and height. Omitting height preserves the aspect ratio.",
79+
"metadata": {}
80+
},
81+
{
82+
"cell_type": "code",
83+
"id": "mqzjqxdvj4",
84+
"source": "fig, axes = plt.subplots(1, 3, figsize=(14, 4))\nfor ax, w in zip(axes, [100, 500, 2000]):\n p = preview(big, width=w).compute()\n p.plot(ax=ax, cmap=\"terrain\", add_colorbar=False)\n ax.set_title(f\"{p.shape[0]}x{p.shape[1]} ({p.nbytes / 1e6:.1f} MB)\")\n ax.set_aspect(\"equal\")\nplt.tight_layout()",
85+
"metadata": {},
86+
"execution_count": null,
87+
"outputs": []
88+
},
89+
{
90+
"cell_type": "markdown",
91+
"id": "82h89j8n7em",
92+
"source": "## Accessor syntax\n\nYou can also call `preview` directly on a DataArray or Dataset via the `.xrs` accessor.",
93+
"metadata": {}
94+
},
95+
{
96+
"cell_type": "code",
97+
"id": "jastfcpb3i",
98+
"source": "# Accessor on a DataArray\nsmall = big.xrs.preview(width=500).compute()\nprint(f\"DataArray accessor: {small.shape}\")\n\n# Accessor on a Dataset\nds = xr.Dataset({\"elevation\": big, \"slope_proxy\": big * 0.1})\nsmall_ds = ds.xrs.preview(width=500)\nfor name, var in small_ds.data_vars.items():\n print(f\"Dataset var '{name}': {var.shape}\")",
99+
"metadata": {},
100+
"execution_count": null,
101+
"outputs": []
102+
},
103+
{
104+
"cell_type": "code",
105+
"id": "f2s7vgc81u5",
106+
"source": "client.close()\ncluster.close()",
107+
"metadata": {},
108+
"execution_count": null,
109+
"outputs": []
110+
}
111+
],
112+
"metadata": {
113+
"kernelspec": {
114+
"display_name": "Python 3",
115+
"language": "python",
116+
"name": "python3"
117+
},
118+
"language_info": {
119+
"name": "python",
120+
"version": "3.10.0"
121+
}
122+
},
123+
"nbformat": 4,
124+
"nbformat_minor": 5
125+
}

0 commit comments

Comments
 (0)