add review suggestions

reint-fischer · reint-fischer · commit 00c4447b709f · 2025-12-19T11:08:48.000+01:00
diff --git a/docs/user_guide/examples/tutorial_dt_integrators.ipynb b/docs/user_guide/examples/tutorial_dt_integrators.ipynb
@@ -29,7 +29,7 @@
    "id": "2",
    "metadata": {},
    "source": [
-    "### A priori estimation\n",
+    "### A priori estimation of `dt`\n",
     "\n",
     "In this example, we will estimate the appropriate timestep to compute advection of a virtual particle through an ocean current field. \n",
     "\n",
@@ -43,7 +43,11 @@
     "\n",
     "where $\\mathbf{v}(\\mathbf{x},t) = (u(\\mathbf{x},t), v(\\mathbf{x},t))$ describes the ocean velocity field at position $\\mathbf{x}$ at time $t$.\n",
     "\n",
-    "To estimate the timescale that we want to resolve, we can think about the scales at which advection varies. Here we use the daily velocity fields at 1/12th degree horizontal resolution from the Copernicus Marine Service. This means that the velocity will vary in time at scales >= 24 hours, and in space at scales >= 1/12th degree."
+    "To estimate the timescale that we want to resolve, we can think about the scales at which advection varies. Here we use the daily velocity fields at 1/12th degree horizontal resolution from the Copernicus Marine Service. This means that the velocity will vary in time at scales >= 24 hours, and in space at scales >= 1/12th degree.\n",
+    "\n",
+    "```{note}\n",
+    "Our displacement occurs in units of longitude and latitde, but our velocity field is in m/s. Read [this guide](./tutorial_unitconverters.ipynb) to understand how Parcels converts these units under the hood.\n",
+    "```"
    ]
   },
   {
@@ -55,6 +59,7 @@
    "source": [
     "import matplotlib.pyplot as plt\n",
     "import numpy as np\n",
+    "import pandas as pd\n",
     "import xarray as xr\n",
     "\n",
     "import parcels\n",
@@ -143,10 +148,23 @@
    "id": "7",
    "metadata": {},
    "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "\\text{d}t < \\frac{1}{12 U_{max}}\n",
+    "\\end{aligned}\n",
+    "$$\n",
+    "\n",
     "Using `U_max_surface_deg`, we find a second estimated limit of an appropriate `dt`:\n",
     "\n",
     "$$\n",
     "\\begin{aligned}\n",
+    "\\text{d}t < \\frac{1}{12 * 1.71e-5} = 4.9e3 \\text{ seconds}\n",
+    "\\end{aligned}\n",
+    "$$\n",
+    "\n",
+    "\n",
+    "$$\n",
+    "\\begin{aligned}\n",
     "\\text{d}t < 2 \\text{ hours}\n",
     "\\end{aligned}\n",
     "$$"
@@ -184,18 +202,18 @@
    "id": "10",
    "metadata": {},
    "source": [
-    "We simulate particles using a range of timesteps that differ by a factor 2-10, starting at (dt < 24 hours). We also keep track of the time it takes to run each simulation:"
+    "We simulate particles using a range of timesteps that differ by a factor 2-10, starting at (dt < 24 hours). We also keep track of the time it takes to run each simulation:\n",
+    "\n",
+    "```{warning}\n",
+    "`dt` must be chosen such that $k`dt`$ fits exactly in the FieldSet dt (24 hours), where $k$ is an integer. \n",
+    "```"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "11",
-   "metadata": {
-    "tags": [
-     "hide-output"
-    ]
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "import time\n",
@@ -235,10 +253,15 @@
     "    # time and run simulation\n",
     "    start_time = time.time()\n",
     "    pset.execute(\n",
-    "        parcels.kernels.AdvectionRK2, runtime=runtime, dt=dt, output_file=pfile\n",
+    "        parcels.kernels.AdvectionRK2,\n",
+    "        runtime=runtime,\n",
+    "        dt=dt,\n",
+    "        output_file=pfile,\n",
+    "        verbose_progress=False,\n",
     "    )\n",
     "    sim_duration_i = time.time() - start_time\n",
-    "    sim_duration[i] = sim_duration_i"
+    "    sim_duration[i] = sim_duration_i\n",
+    "    print(f\"Simulation duration = {np.round(sim_duration_i, 2)} seconds\")"
    ]
   },
   {
@@ -443,32 +466,40 @@
     "ax[1].set_xlabel(\"dt (minutes)\")\n",
     "ax[1].grid()\n",
     "ax[1].legend()\n",
-    "plt.show()\n",
-    "print(f\"dt choices = {(dt_choices / np.timedelta64(1, 'm')).astype(int)} minutes\")\n",
-    "print(f\"precision = {np.round(np.mean(dist_end, axis=1), 2)} km\")\n",
-    "print(f\"sim duration = {np.round(sim_duration, 2)} s\")"
+    "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "20",
    "metadata": {},
    "source": [
-    "We can see that in our simulation advecting particles for 7 days, the effect of `dt` on the precision of our simulation is approximately linear. The precision for a simulation with a timestep of 20 minutes is order of magnitude ~100 m. The effect on the time it takes to run a simulation is not linear in our case however; it increases sharply as we decrease our timestep. This may be optimized using more efficient chunking.\n",
-    "\n",
-    "| `dt`       | Mean separation distance after 7 days (km) | Simulation duration (s)  | \n",
-    "| ---------- | ------------------------------------------ | ------------------------ |\n",
-    "| 12 hours   | 19.52                                      | 0.11                     |\n",
-    "| 6 hours    | 9.64                                       | 0.37                     | \n",
-    "| 1 hour     | 1.4                                        | 0.94                     | \n",
-    "| 20 minutes | 0.38                                       | 3.19                     | \n",
-    "| 5 minutes  | x                                          | 12.34                    | \n"
+    "We can see that in our simulation advecting particles for 7 days, the effect of `dt` on the precision of our simulation is approximately linear. The precision for a simulation with a timestep of 20 minutes is order of magnitude ~100 m. The effect on the time it takes to run a simulation is not linear in our case however; it increases sharply as we decrease our timestep. This may be optimized using more efficient chunking."
    ]
   },
   {
-   "cell_type": "markdown",
+   "cell_type": "code",
+   "execution_count": null,
    "id": "21",
    "metadata": {},
+   "outputs": [],
+   "source": [
+    "df = pd.DataFrame(\n",
+    "    {\n",
+    "        \"dt\": dt_choices,\n",
+    "        \"Mean separation distance (km)\": np.append(\n",
+    "            np.round(np.mean(dist_end, axis=1), 2), [None]\n",
+    "        ),\n",
+    "        \"Simulation duration (s)\": np.round(sim_duration, 2),\n",
+    "    }\n",
+    ")\n",
+    "df"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "22",
+   "metadata": {},
    "source": [
     "```{note}\n",
     "The desired precision is not always best measured by the separation distance of individual trajectories. Depending on the application of your Parcels simulation and the process you are computing, other metrics may be more suitable.\n",
@@ -477,7 +508,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "22",
+   "id": "23",
    "metadata": {},
    "source": [
     "## Integration schemes\n",
@@ -488,7 +519,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "23",
+   "id": "24",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -501,7 +532,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "24",
+   "id": "25",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -512,7 +543,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "25",
+   "id": "26",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -522,7 +553,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "26",
+   "id": "27",
    "metadata": {},
    "source": [
     "The higher-order methods use weighted intermediate steps in time and space to obtain a more accurate estimate of `dlat` and `dlon` for a given timestep.\n",
@@ -533,7 +564,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "27",
+   "id": "28",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -547,7 +578,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "28",
+   "id": "29",
    "metadata": {
     "tags": [
      "hide-output"
@@ -582,48 +613,59 @@
     "            f\"Begin simulation for (scheme: {advection_scheme.__name__}, dt={int(dt / np.timedelta64(1, 's'))} s)\"\n",
     "        )\n",
     "        start_time = time.time()\n",
-    "        pset.execute(advection_scheme, runtime=runtime, dt=dt, output_file=pfile)\n",
+    "        pset.execute(\n",
+    "            advection_scheme,\n",
+    "            runtime=runtime,\n",
+    "            dt=dt,\n",
+    "            output_file=pfile,\n",
+    "            verbose_progress=False,\n",
+    "        )\n",
     "        sim_duration_ij = time.time() - start_time\n",
-    "        sim_duration[i, j] = sim_duration_ij"
+    "        sim_duration[i, j] = sim_duration_ij\n",
+    "        print(f\"Simulation duration = {np.round(sim_duration_ij, 2)} seconds\")"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "29",
+   "id": "30",
    "metadata": {},
    "outputs": [],
    "source": [
     "scheme_colours = np.linspace(0, 1, len(advection_schemes), endpoint=True)\n",
     "# Now let's compare different advection schemes with the same timestep\n",
-    "fig, axs = plt.subplots(nrows=1, ncols=len(dt_choices), figsize=(20, 5))\n",
+    "fig, axs = plt.subplots(nrows=2, ncols=3, figsize=(15, 10))\n",
     "for i, dt in enumerate(dt_choices):\n",
-    "    axs[i].set_title(f\"{str(dt)}\")\n",
-    "    axs[i].set_xlabel(\"Longitude\")\n",
+    "    m = i // 3\n",
+    "    n = i % 3\n",
+    "    axs[m, n].set_title(f\"dt = {str(dt)}\")\n",
+    "    axs[m, n].set_xlabel(\"Longitude\")\n",
     "    for j, advection_scheme in enumerate(advection_schemes):\n",
     "        ds = xr.open_zarr(\n",
     "            f\"output/{advection_scheme.__name__}_dt_{int(dt / np.timedelta64(1, 's'))}s.zarr\"\n",
     "        )\n",
     "        labels = [f\"{advection_scheme.__name__}\"] + [None] * (ds.lon.shape[0] - 1)\n",
-    "        axs[i].plot(\n",
+    "        axs[m, n].plot(\n",
     "            ds.lon.T,\n",
     "            ds.lat.T,\n",
     "            alpha=0.75,\n",
     "            color=plt.cm.viridis(scheme_colours[j]),\n",
     "            label=labels,\n",
     "        )\n",
-    "    axs[i].scatter(\n",
+    "    axs[m, n].scatter(\n",
     "        ds.lon[:, 0], ds.lat[:, 0], c=\"r\", marker=\"s\", label=\"starting locations\"\n",
     "    )\n",
-    "    axs[i].grid()\n",
-    "    axs[0].legend()\n",
-    "    axs[0].set_ylabel(\"Latitude\")\n",
+    "    axs[m, n].grid()\n",
+    "axs[-1, -1].axis(\"off\")\n",
+    "axs[0, 0].legend()\n",
+    "axs[0, 0].set_ylabel(\"Latitude\")\n",
+    "axs[1, 0].set_ylabel(\"Latitude\")\n",
     "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "30",
+   "id": "31",
    "metadata": {},
    "source": [
     "Clearly, for longer timesteps, the RK2 and RK4 schemes perform better. However, if the timestep is appropriate, as we have determined in the previous section, then the Explicit Euler scheme does not perform notably different."
@@ -632,19 +674,22 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "31",
+   "id": "32",
    "metadata": {},
    "outputs": [],
    "source": [
     "dist_end = np.zeros((len(advection_schemes) - 1, len(dt_choices), npart))\n",
     "\n",
-    "fig, axs = plt.subplots(nrows=1, ncols=len(dt_choices), figsize=(20, 5))\n",
+    "fig, axs = plt.subplots(nrows=2, ncols=3, figsize=(15, 10))\n",
+    "\n",
     "for i, dt in enumerate(dt_choices):\n",
-    "    axs[i].set_title(f\"dt = {str(dt)}\")\n",
-    "    axs[i].set_xlabel(\"Time\")\n",
-    "    axs[i].tick_params(\"x\", rotation=45)\n",
-    "    axs[i].set_yscale(\"log\")\n",
-    "    axs[i].set_ylim(1e-4, 1e1)\n",
+    "    m = i // 3\n",
+    "    n = i % 3\n",
+    "    axs[m, n].set_title(f\"dt = {str(dt)}\")\n",
+    "    axs[m, n].set_xlabel(\"Time\")\n",
+    "    axs[m, n].tick_params(\"x\", rotation=45)\n",
+    "    axs[m, n].set_yscale(\"log\")\n",
+    "    axs[m, n].set_ylim(1e-4, 1e1)\n",
     "    ds_RK4 = xr.open_zarr(\n",
     "        f\"output/AdvectionRK4_dt_{int(dt / np.timedelta64(1, 's'))}s.zarr\"\n",
     "    )\n",
@@ -667,23 +712,26 @@
     "        lat_valid = ds.lat.where(~np.isnan(ds.lat).compute(), drop=True).values\n",
     "        dist = dist_km(lon_valid, lon_valid_RK4, lat_valid, lat_valid_RK4)\n",
     "        time_valid = ds.time.where(~np.isnan(ds.time).compute(), drop=True).values\n",
-    "        axs[i].plot(\n",
+    "        axs[m, n].plot(\n",
     "            time_valid.T,\n",
     "            dist.T,\n",
     "            alpha=0.75,\n",
     "            color=plt.cm.viridis(scheme_colours[j]),\n",
     "            label=labels,\n",
     "        )\n",
     "        dist_end[j, i] = dist[:, -1]\n",
-    "    axs[i].grid()\n",
-    "    axs[0].legend()\n",
-    "    axs[0].set_ylabel(\"Distance (km)\")\n",
+    "    axs[m, n].grid()\n",
+    "axs[-1, -1].axis(\"off\")\n",
+    "axs[0, 0].legend()\n",
+    "axs[0, 0].set_ylabel(\"Latitude\")\n",
+    "axs[1, 0].set_ylabel(\"Latitude\")\n",
+    "plt.tight_layout()\n",
     "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "32",
+   "id": "33",
    "metadata": {},
    "source": [
     "By quantifying the precision of the integration methods, we can see that for a given timestep the Runge-Kutta methods perform orders of magnitude better than the Explicit Euler method. In this example, the error associated with the selected integration methods is smaller than that of the range of timesteps."
@@ -692,7 +740,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "33",
+   "id": "34",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -734,15 +782,15 @@
   },
   {
    "cell_type": "markdown",
-   "id": "34",
+   "id": "35",
    "metadata": {},
    "source": [
     "In this last figure, we see that the improvement of RK2 by orders of magnitude with respect to EE comes at a small computational cost. Since the RK2 and RK4 methods are practically indistinguishable, using the RK2 method with a timestep of 1 hour or 20 minutes, depending on the application, would be an appropriate choice for this simulation."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "35",
+   "id": "36",
    "metadata": {},
    "source": [
     "### Flow conditions\n",
@@ -762,6 +810,18 @@
     "Check out [this notebook](https://github.com/Parcels-code/10year-anniversary-session2/blob/main/solutions/lorenz_and_lotka_volterra_solutions.ipynb) to learn how to model the Lorenz equations with Parcels!\n",
     "```"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "37",
+   "metadata": {},
+   "source": [
+    "## Summary\n",
+    "If you want to test the accuracy and efficiency of your own simulation, here is a brief list of things to consider:\n",
+    "- **Use the temporal and spatial resolution of the input to estimate the timescales you need to resolve.** `dt` must fit into the FieldSet dt by a factor of an integer.\n",
+    "- **Run the simulation for the full runtime.** Many inaccuracies grow over time so we must \n",
+    "- **Consider which step introduces the largest error.** If we compute advection and diffusion, increasing the accuracy of one process much more than the other will not improve the results."
+   ]
   }
  ],
  "metadata": {