expand debugging

fabridamicelli · fabridamicelli · commit 2d65e3c7a212 · 2025-11-05T12:22:10.000+01:00
diff --git a/book/111_debugging.ipynb b/book/111_debugging.ipynb
@@ -17,9 +17,8 @@
    "metadata": {},
    "source": [
     "We are humans, we make mistakes – that's very much true for coding.\n",
-    "Finding errors in our code is fundamental skill to train.\n",
     "\n",
-    "We will cover here a few tools that help us find and fix errors more quickly."
+    "Finding errors in our code is fundamental skill to train so we will cover here a few tools that help us find and fix errors more quickly."
    ]
   },
   {
@@ -32,7 +31,7 @@
     "A debugger is a device that allows us to interrupt code execution and jump into the execution context in a interactive mode, so that we can inspect and run code to find out what's going on.  \n",
     "We called that to \"set a breakpoint\" or \"set a trace\".\n",
     "\n",
-    "There are a few options to do that natively in python."
+    "There are a few options to do that in python."
    ]
   },
   {
@@ -42,35 +41,274 @@
    "source": [
     "## Python debugger, breakpoints\n",
     "The python standard library includes `pdb` module.\n",
+    "If you call this function in your code you will be put into an interactive session exactly at that point.\n",
+    "\n",
+    "Any of the following options would work:\n",
     "\n",
     "```python\n",
-    "import pdb; pdb.set_trace()\n",
+    "# your code\n",
+    "# your code\n",
+    "import pdb\n",
+    "pdb.set_trace()\n",
+    "# your code\n",
+    "# your code\n",
+    "breakpoint()\n",
+    "# your code\n",
+    "# your code\n",
+    "# your code\n",
     "```"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "b40ea608-89a9-4cbd-b07f-4f9a33699d89",
+   "cell_type": "markdown",
+   "id": "1a6d1185-f23d-4fd1-80ee-925836018ef8",
    "metadata": {},
-   "outputs": [],
-   "source": []
+   "source": [
+    "Once in the debugger, we have several functions to proceed with code execution with more granular control.\n",
+    "\n",
+    "There are several options, but the most commonly used methods are:\n",
+    "\n",
+    "- n (next): Continue execution one line, stay in the current function (step over)\n",
+    "- s (step): Execute current line and stop in a foreign function if one is called (step into) \n",
+    "- c (continue): Continue whole code execution until a new breakpoint is found"
+   ]
   },
   {
    "cell_type": "markdown",
    "id": "4ae14f36-84e6-4def-9240-7a423c587e9c",
    "metadata": {},
    "source": [
-    "## IPython debugger"
+    "## IPython debugger\n",
+    "The standard python debugger is fine but a bit basic, so sometimes the IPython debugger is a friendlier option.\n",
+    "\n",
+    "We need to install it:\n",
+    "\n",
+    "```bash\n",
+    "uv add ipdb\n",
+    "```\n",
+    "Then we can do:\n",
+    "\n",
+    "```python\n",
+    "import ipdb\n",
+    "ipdb.set_trace()\n",
+    "```"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "886ba261-d3a3-43e0-b7e1-3353d0029622",
    "metadata": {},
    "source": [
-    "## Notebook %%debug"
+    "## Notebook %debug\n",
+    "Inside the jupyter notebook we can directly jump into a debugger when there is an error.\n",
+    "\n",
+    "If a cell throws an error, you can type this \"ipython magic method\" in the following cell:\n",
+    "\n",
+    "```python\n",
+    "%debug\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "76c6445c-3cc0-4bc6-b602-97cd5707feee",
+   "metadata": {},
+   "outputs": [
+    {
+     "ename": "AssertionError",
+     "evalue": "",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[31m---------------------------------------------------------------------------\u001b[39m",
+      "\u001b[31mAssertionError\u001b[39m                            Traceback (most recent call last)",
+      "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[1]\u001b[39m\u001b[32m, line 5\u001b[39m\n\u001b[32m      2\u001b[39m     a = \u001b[32m1\u001b[39m\n\u001b[32m      3\u001b[39m     \u001b[38;5;28;01massert\u001b[39;00m a == \u001b[32m0\u001b[39m\n\u001b[32m----> \u001b[39m\u001b[32m5\u001b[39m \u001b[43mwrong_func\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n",
+      "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[1]\u001b[39m\u001b[32m, line 3\u001b[39m, in \u001b[36mwrong_func\u001b[39m\u001b[34m()\u001b[39m\n\u001b[32m      1\u001b[39m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34mwrong_func\u001b[39m():\n\u001b[32m      2\u001b[39m     a = \u001b[32m1\u001b[39m\n\u001b[32m----> \u001b[39m\u001b[32m3\u001b[39m     \u001b[38;5;28;01massert\u001b[39;00m a == \u001b[32m0\u001b[39m\n",
+      "\u001b[31mAssertionError\u001b[39m: "
+     ]
+    }
+   ],
+   "source": [
+    "def wrong_func():\n",
+    "    a = 1\n",
+    "    assert a == 0\n",
+    "\n",
+    "wrong_func()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "9225666f-7cce-4bec-96ac-c5fc1f167968",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "> \u001b[32m/tmp/ipykernel_81865/3667203143.py\u001b[39m(\u001b[92m3\u001b[39m)\u001b[36mwrong_func\u001b[39m\u001b[34m()\u001b[39m\n",
+      "\u001b[32m      1\u001b[39m \u001b[38;5;28;01mdef\u001b[39;00m wrong_func():\n",
+      "\u001b[32m      2\u001b[39m     a = \u001b[32m1\u001b[39m\n",
+      "\u001b[32m----> 3\u001b[39m     \u001b[38;5;28;01massert\u001b[39;00m a == \u001b[32m0\u001b[39m\n",
+      "\u001b[32m      4\u001b[39m \n",
+      "\u001b[32m      5\u001b[39m wrong_func()\n",
+      "\n"
+     ]
+    },
+    {
+     "name": "stdin",
+     "output_type": "stream",
+     "text": [
+      "ipdb>  print(a)\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1\n"
+     ]
+    },
+    {
+     "name": "stdin",
+     "output_type": "stream",
+     "text": [
+      "ipdb>  exit\n"
+     ]
+    }
+   ],
+   "source": [
+    "%debug"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fac0dd96-22c9-40a8-b279-e5043f415fa9",
+   "metadata": {},
+   "source": [
+    "Notice that we typed \"exit\" to get out of the debugger."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7995bd99-3ba9-49d5-8c18-2b289f999133",
+   "metadata": {},
+   "source": [
+    "## Exercises\n",
+    "\n",
+    "Here's a piece of code that will fail at run-time:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "abb2830e-8f39-4bad-9f1c-cc0dd697f09c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def f(p):\n",
+    "    assert p == 0\n",
+    "    \n",
+    "def main():\n",
+    "    a = 0\n",
+    "    f(a)\n",
+    "    b = 1 \n",
+    "    f(b)\n",
+    "    c = 0\n",
+    "    f(c)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8275e207-c5b3-4242-832e-4f98d30f1145",
+   "metadata": {},
+   "source": [
+    "1) Run the code to see the error\n",
+    "2) Set a breakpoint inside `main` to use the debugger\n",
+    "3) Step through the code using `n (next)` and another time using `s (step)`\n",
+    "4) Set a second breakpoint inside `main`and run again the code but this time use `c (continue)`\n",
+    "5) Download [this public dataset](https://github.com/fabridamicelli/ds005588/archive/refs/heads/broken-data.zip) into the folder `/pycourse/data/` (create it if you don't yet have it).\n",
+    "\n",
+    "This dataset was modified and has some problems apparently.\n",
+    "Here's a bit of code to unzip it and read through the files."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "bbe54d2f-f5a4-44f1-a015-8a4d19e1097a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "path = \"/home/fdamicel/projects/pycourse/data/ds005588-broken-data.zip\"\n",
+    "target = \"/home/fdamicel/projects/pycourse/data\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "e7ad9b30-c041-4edc-83d4-2cfdf984dac7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import zipfile"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "123149ab-4410-4444-9348-099cbf067f91",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with zipfile.ZipFile(path) as file:\n",
+    "    file.extractall(target)"
    ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "8d95f7c4-977f-4cc5-8f8e-80e87ec7d40a",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "3f764de2-552c-49ab-ae8c-8d8455d254c3",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "4fbd3efe-bd52-4414-99a5-42058984a3c8",
+   "metadata": {},
+   "outputs": [
+    {
+     "ename": "UnicodeDecodeError",
+     "evalue": "'utf-8' codec can't decode byte 0xb9 in position 10: invalid start byte",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[31m---------------------------------------------------------------------------\u001b[39m",
+      "\u001b[31mUnicodeDecodeError\u001b[39m                        Traceback (most recent call last)",
+      "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[15]\u001b[39m\u001b[32m, line 1\u001b[39m\n\u001b[32m----> \u001b[39m\u001b[32m1\u001b[39m lines = \u001b[43mf\u001b[49m\u001b[43m.\u001b[49m\u001b[43mreadlines\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m<frozen codecs>:325\u001b[39m, in \u001b[36mdecode\u001b[39m\u001b[34m(self, input, final)\u001b[39m\n",
+      "\u001b[31mUnicodeDecodeError\u001b[39m: 'utf-8' codec can't decode byte 0xb9 in position 10: invalid start byte"
+     ]
+    }
+   ],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4877305d-66bb-47ab-95c9-ae9377323bef",
+   "metadata": {},
+   "outputs": [],
+   "source": []
   }
  ],
  "metadata": {