Final changes vignette

julianwuth · julianwuth · commit e12d9c91fe70 · 2026-05-31T22:12:29.000+02:00
diff --git a/.gitignore b/.gitignore
@@ -7,6 +7,9 @@
 .pytest_cache/
 .coverage
 
+# Jupyter notebook checkpoints
+**.ipynb_checkpoints
+
 # CSV track lists: ignore all by default so a real personal Spotify/Exportify
 # export can't be committed by accident. The synthetic test fixture is the
 # only CSV that belongs in the repo, so it is explicitly re-included.
diff --git a/docs/vignette.ipynb b/docs/vignette.ipynb
@@ -2,6 +2,7 @@
  "cells": [
   {
    "cell_type": "markdown",
+   "id": "07fbefe9",
    "metadata": {},
    "source": [
     "# `playlistsmith`"
@@ -44,7 +45,7 @@
    "id": "c718ee1c",
    "metadata": {},
    "source": [
-    "## 1. Obtain a .csv of you playlist\n",
+    "## 1. Obtain a .csv of your playlist\n",
     "The package expects a .csv of the format that [Exportify](https://exportify.app/) provides. You can log-in with your Spotify account to see a list of your playlists. Simply press the \"Export\" button on the right, and you are ready to go. "
    ]
   },
@@ -77,14 +78,14 @@
     "## 3. Upload the .csv\n",
     "After starting up the GUI, you'll find yourself looking at the screen below. Note that you can ignore the sidebar as only one feature extraction method is implemented at the moment.\n",
     "\n",
-    "![csv_upload](./gui_screenshots/csv_upload.png){width=800px}\n",
+    "<img src=\"./gui_screenshots/csv_upload.png\" alt=\"csv_upload\" width=\"800\">\n",
     "\n",
-    "If you press the \"Upload\" button (indicated by the red box in the screenshot), you can select a .csv from you local drive to upload. Choose the Exportify generated file here.\n",
+    "If you press the \"Upload\" button (indicated by the red box in the screenshot), you can select a .csv from your local drive to upload. Choose the Exportify generated file here.\n",
     "\n",
     "### 3.1 Running example\n",
     "For illustration, I created a mix of classical and rock music. After uploading the data, you can inspect the first 20 songs using the \"Preview tracklist\" expander. The first songs of the running example are shown below.\n",
     "\n",
-    "![tracklist_preview](./gui_screenshots/tracklist_preview.png){width=800px}"
+    "<img src=\"./gui_screenshots/tracklist_preview.png\" alt=\"tracklist_preview\" width=\"800\">"
    ]
   },
   {
@@ -95,12 +96,12 @@
     "## 4. Extract features\n",
     "Scrolling down will reveal an expander with a list of features and an \"Extract features\" button. This will query the [ReccoBeats](https://reccobeats.com/) API for precomputed features.\n",
     "\n",
-    "![feature_extraction](./gui_screenshots/feature_extraction.png){width=800px}\n",
+    "<img src=\"./gui_screenshots/feature_extraction.png\" alt=\"feature_extraction\" width=\"800\">\n",
     "\n",
     "### 4.1 Running example\n",
     "Extracting features from the example playlist induces the following coverage report:\n",
     "\n",
-    "![coverage_report](./gui_screenshots/coverage_report.png){width=800px}\n",
+    "<img src=\"./gui_screenshots/coverage_report.png\" alt=\"coverage_report\" width=\"800\">\n",
     "\n",
     "Songs included in the table are not covered by ReccoBeats and therefore excluded from the analysis."
    ]
@@ -113,19 +114,21 @@
     "## 5. Clustering\n",
     "Now, this is where the magic happens: running a clustering algorithm on the playlist to discover sub-collections of songs. `playlistsmith` offers three algorithms: a Gaussian Mixture Model (GMM), K-means and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN). While the algorithms are described in a bit more detail in the info strings accessible by hovering over the \"?\" icon, suffice it to say here that GMM is suitable for most applications, K-means can be run in small samples or as a sanity check in addition to the GMM, and HDBSCAN is useful if you suspect that your playlist includes songs that are unique so that they would likely not fit into any sub-playlist. \n",
     "\n",
-    "![algorithms](./gui_screenshots/algorithms.png){width=800px}\n",
+    "<img src=\"./gui_screenshots/algorithms.png\" alt=\"algorithms\" width=\"800\">\n",
     "\n",
     "Before we can start clustering, we should take a minute to think about hyperparameters—settings that influence the outcome of each algorithm (more information accessible by hovering over the \"?\"). As we are going to run the GMM below, let's quickly discuss the choices made:\n",
     "\n",
     "1. `min_playlist_size`: What is the minimum amount of songs you would like to have in the sub-playlists? Smaller clusters will be assigned to an unclassified class. We stick with 5 (the default).\n",
     "2. `max_playlist_share`: The proportion of uploaded tracks a sub-playlist is maximally allowed to contain before returning a warning. As we know that two playlists underlie our running example and that they are unequal in size, the default (0.5) seems unreasonable. We opt for 0.65.\n",
-    "3. `k range`: The range of cluster numbers to test and compare. The algorithm will be fit to discover each number of clusters within this range and the best fit will be retained. As we have a clear idea about how many clusters we would like to find, we select 2 by just dragging the knobs on top of each other[^1].\n",
+    "3. `k range`: The range of cluster numbers to test and compare. The algorithm will be fit to discover each number of clusters within this range and the best fit will be retained. As we have a clear idea about how many clusters we would like to find, we select 2 by just dragging the knobs on top of each other<sup id=\"ref1-c7\"><a href=\"#fn1-c7\">1</a></sup>.\n",
     "\n",
-    "![hyperparameters_cluster](./gui_screenshots/hyperparameters_cluster.png){width=800px}\n",
+    "<img src=\"./gui_screenshots/hyperparameters_cluster.png\" alt=\"hyperparameters_cluster\" width=\"800\">\n",
     "\n",
     "Press \"Cluster\" to run the algorithm.\n",
     "\n",
-    "[^1]: You can also try to fit a range of clusters, but the GMM will still discover that two clusters fit the data best."
+    "\n",
+    "---\n",
+    "<p id=\"fn1-c7\" style=\"font-size:0.85em\"><sup>1</sup> You can also try to fit a range of clusters, but the GMM will still discover that two clusters fit the data best. <a href=\"#ref1-c7\">↩</a></p>\n"
    ]
   },
   {
@@ -137,11 +140,11 @@
     "### 6.1 Running example\n",
     "If you select a range for k, the number of found clusters is determined based on fit indices. The \"Per-cluster z-profile heatmap\" indicates how songs in each cluster score on the extracted features on average.\n",
     "\n",
-    "![heatmap](./gui_screenshots/heatmap.png){width=800px}\n",
+    "<img src=\"./gui_screenshots/heatmap.png\" alt=\"heatmap\" width=\"800\">\n",
     "\n",
     "In the present example, Cluster 1 (bottom row) seems to score high on energy and loudness while Cluster 0 scores low on both of these features. This provides some evidence that Cluster 1 contains the rock songs whereas Cluster 0 is made up of classical songs. This is supported by the visualisation output in the GUI. Hovering over the data points shows the song title as well as artist names.\n",
     "\n",
-    "![cluster_visualisation](./gui_screenshots/cluster_visualisation.png){width=800px}\n",
+    "<img src=\"./gui_screenshots/cluster_visualisation.png\" alt=\"cluster_visualisation\" width=\"800\">\n",
     "\n",
     "A quick scan through both clusters confirms that Cluster 0 consists of classical songs and Cluster 1 of rock songs."
    ]
@@ -155,12 +158,14 @@
     "The final stage in the GUI is exporting the created playlists as .csv files. \n",
     "\n",
     "### 7.1 Running example\n",
-    "![export](./gui_screenshots/export.png){width=800px}\n",
+    "<img src=\"./gui_screenshots/export.png\" alt=\"export\" width=\"800\">\n",
+    "\n",
+    "You can specify the output directory<sup id=\"ref1-c9\"><a href=\"#fn1-c9\">1</a></sup>, choose to also export the unclassified songs<sup id=\"ref2-c9\"><a href=\"#fn2-c9\">2</a></sup>, or write a combined file that includes the cluster assignment in a separate column. Additionally, you can rename the .csv files (last column in the table). Ready to export? Simply click \"Write CSVs\".\n",
     "\n",
-    "You can specify the output directory[^1], choose to also export the unclassified songs[^2], or write a combined file that includes the cluster assignment in a separate column. Additionally, you can rename the .csv files (last column in the table). Ready to export? Simply click \"Write CSVs\".\n",
     "\n",
-    "[^1]: If the directory does not exist, it will be created.\n",
-    "[^2]: This is common when using HDBSCAN or when a playlist smaller than `min_playlist_size` is discovered.\n"
+    "---\n",
+    "<p id=\"fn1-c9\" style=\"font-size:0.85em\"><sup>1</sup> If the directory does not exist, it will be created. <a href=\"#ref1-c9\">↩</a></p>\n",
+    "<p id=\"fn2-c9\" style=\"font-size:0.85em\"><sup>2</sup> This is common when using HDBSCAN or when a playlist smaller than <code>min_playlist_size</code> is discovered. <a href=\"#ref2-c9\">↩</a></p>\n"
    ]
   },
   {
@@ -175,7 +180,7 @@
     "3. Open the Spotify desktop app and create a new playlist. \n",
     "4. Press CMD + V/CTRL + V to paste the songs into the playlist. \n",
     "\n",
-    "And voilà, you have split up your tediously long playlist into smaller chunks. Hopefully, you'll find the right song for your mood faster next time!\n",
+    "And voilà, you have split up your tediously long playlist into smaller chunks. Hopefully, you'll find the right song for your mood faster next time! Want to split up another playlist? Press \"Reset session\" in the sidebar, and you can start fresh.\n",
     "\n",
     "If you are only interested in using the GUI, you should be ready to go now. For users who prefer the pure Python experience and who are interested in the internal workings of `playlistsmith`, I'll quickly mirror the above-mentioned steps using the underlying package."
    ]
@@ -189,7 +194,7 @@
     "\n",
     "Before turning to the Python usage, here is how the GUI steps you just saw map onto the underlying functions. The right column (*User*) is exactly the click-path from sections 1–8. The left column (*Software*) is the code each click triggers. The dashed box marks everything that a single `ps.cluster(...)` call wraps: Preprocessing, the fit, ordering, the small-cluster collapse and interpretation all happen inside that one function. The pure-Python usage in the next section walks the same software path, just through calling `playlistsmith`'s publicly exported functions.\n",
     "\n",
-    "![mermaid_diagram](./gui_screenshots/mermaid_diagram.png){width=1000px}"
+    "<img src=\"./gui_screenshots/mermaid_diagram.png\" alt=\"mermaid_diagram\" width=\"1000\">"
    ]
   },
   {