diff --git a/tutorials/intermediate/ndcg-metric-tutorial-updated.ipynb b/tutorials/intermediate/ndcg-metric-tutorial-updated.ipynb new file mode 100644 index 0000000..db6b7f6 --- /dev/null +++ b/tutorials/intermediate/ndcg-metric-tutorial-updated.ipynb @@ -0,0 +1,359 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Understanding NDCG (Normalized Discounted Cumulative Gain)\n", + "\n", + "This tutorial walks through how NDCG is computed from scratch, then verifies the result using PyTorch Ignite's `NDCG` metric.\n", + "\n", + "NDCG is a ranking metric commonly used in information retrieval and recommender systems. Unlike metrics that only check if the right item was retrieved, NDCG rewards models that rank more relevant items **higher** in the list.\n", + "\n", + "By the end of this notebook you will:\n", + "- Understand what ground truth and predictions look like for a ranking problem\n", + "- Compute DCG and IDCG step by step by hand\n", + "- Calculate NDCG manually\n", + "- Verify every number matches the Ignite `NDCG` implementation" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# Install dependencies if needed\n", + "# !pip install pytorch-ignite torch\n", + "import torch\n", + "import math" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. The Problem Setup\n", + "\n", + "Imagine a search engine returning 5 documents for a query. Each document has a **relevance score** (ground truth) assigned by a human — higher means more relevant:\n", + "\n", + "| Document | Relevance (ground truth) |\n", + "|----------|-------------------------|\n", + "| Doc A | 3 (highly relevant) |\n", + "| Doc B | 2 (relevant) |\n", + "| Doc C | 3 (highly relevant) |\n", + "| Doc D | 0 (not relevant) |\n", + "| Doc E | 1 (slightly relevant) |\n", + "\n", + "The model predicts a **score** for each document. The model then ranks documents by these scores (highest score = rank 1):" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Ground truth relevance: tensor([[3., 2., 3., 0., 1.]])\n", + "Model prediction scores: tensor([[0.1000, 0.4000, 0.3500, 0.8000, 0.1000]])\n" + ] + } + ], + "source": [ + "# Ground truth relevance scores (one query, 5 documents)\n", + "# Shape: (1, 5) — batch of 1 query\n", + "y_true = torch.tensor([[3.0, 2.0, 3.0, 0.0, 1.0]])\n", + "\n", + "# Model prediction scores for each document\n", + "# Higher score = model thinks this doc is more relevant\n", + "y_pred = torch.tensor([[0.1, 0.4, 0.35, 0.8, 0.1]])\n", + "\n", + "print(\"Ground truth relevance:\", y_true)\n", + "print(\"Model prediction scores:\", y_pred)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Step 1 — The DCG Helper Function\n", + "\n", + "DCG measures the quality of a ranking. It rewards relevant documents but **discounts** them based on their position — finding a relevant document at rank 1 is worth more than finding it at rank 5.\n", + "\n", + "The formula is:\n", + "\n", + "$$DCG@K = \\sum_{i=1}^{K} \\frac{2^{rel_i} - 1}{\\log_2(i + 1)}$$\n", + "\n", + "Where:\n", + "- $rel_i$ is the relevance of the document at rank $i$\n", + "- The numerator $2^{rel_i} - 1$ is the **gain** (higher relevance = exponentially higher gain)\n", + "- The denominator $\\log_2(i+1)$ is the **discount** (lower rank position = larger discount)\n", + "\n", + "We define a single `compute_dcg` function that accepts both `y_true` (relevance scores) and `scores` (the signal used to rank documents). This lets us reuse the same function for both DCG and IDCG:\n", + "- **DCG**: pass `scores=y_pred` → ranks by model predictions\n", + "- **IDCG**: pass `scores=y_true` → ranks by ground truth (ideal order)" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "def compute_dcg(y_true, scores, k):\n", + " \"\"\"Compute DCG@K by ranking y_true according to scores.\n", + "\n", + " Args:\n", + " y_true: 1D tensor of ground-truth relevance values\n", + " scores: 1D tensor used to rank the documents (descending)\n", + " Pass y_pred to get DCG; pass y_true to get IDCG.\n", + " k: number of top positions to consider\n", + "\n", + " Returns:\n", + " DCG@K score (float)\n", + " \"\"\"\n", + " # Rank documents by scores (descending) and reorder relevance accordingly\n", + " ranked_indices = torch.argsort(scores, descending=True)\n", + " ranked_relevance = y_true[ranked_indices]\n", + "\n", + " dcg = 0.0\n", + " print(f\"{'Rank':<6} {'Relevance':<12} {'Gain (2^rel-1)':<18} {'Discount log2(i+1)':<22} {'Contribution'}\")\n", + " print(\"-\" * 72)\n", + " for i in range(k):\n", + " rank = i + 1\n", + " rel = ranked_relevance[i].item()\n", + " gain = (2 ** rel) - 1\n", + " discount = math.log2(rank + 1)\n", + " contribution = gain / discount\n", + " dcg += contribution\n", + " print(f\"{rank:<6} {rel:<12.0f} {gain:<18.4f} {discount:<22.4f} {contribution:.4f}\")\n", + " print(\"-\" * 72)\n", + " return dcg" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Step 2 — Compute DCG and IDCG\n", + "\n", + "We call `compute_dcg` twice:\n", + "- `compute_dcg(y_true, y_pred, k)` → ranks by model predictions → **DCG**\n", + "- `compute_dcg(y_true, y_true, k)` → ranks by ground truth → **IDCG**" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "DCG@5 — model's ranking (scores = y_pred):\n", + "\n", + "Rank Relevance Gain (2^rel-1) Discount log2(i+1) Contribution\n", + "------------------------------------------------------------------------\n", + "1 0 0.0000 1.0000 0.0000\n", + "2 2 3.0000 1.5850 1.8928\n", + "3 3 7.0000 2.0000 3.5000\n", + "4 3 7.0000 2.3219 3.0147\n", + "5 1 1.0000 2.5850 0.3869\n", + "------------------------------------------------------------------------\n", + "DCG@5 = 8.7944\n", + "\n", + "IDCG@5 — ideal ranking (scores = y_true):\n", + "\n", + "Rank Relevance Gain (2^rel-1) Discount log2(i+1) Contribution\n", + "------------------------------------------------------------------------\n", + "1 3 7.0000 1.0000 7.0000\n", + "2 3 7.0000 1.5850 4.4165\n", + "3 2 3.0000 2.0000 1.5000\n", + "4 1 1.0000 2.3219 0.4307\n", + "5 0 0.0000 2.5850 0.0000\n", + "------------------------------------------------------------------------\n", + "IDCG@5 = 13.3472\n" + ] + } + ], + "source": [ + "K = 5 # Evaluate top 5 results\n", + "\n", + "# --- DCG: rank by model predictions ---\n", + "print(f\"DCG@{K} — model's ranking (scores = y_pred):\\n\")\n", + "dcg = compute_dcg(y_true[0], y_pred[0], K)\n", + "print(f\"DCG@{K} = {dcg:.4f}\")\n", + "\n", + "print()\n", + "\n", + "# --- IDCG: rank by ground truth (ideal order) ---\n", + "print(f\"IDCG@{K} — ideal ranking (scores = y_true):\\n\")\n", + "idcg = compute_dcg(y_true[0], y_true[0], K)\n", + "print(f\"IDCG@{K} = {idcg:.4f}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Step 3 — Compute NDCG\n", + "\n", + "NDCG normalizes DCG by IDCG, giving a score between 0 and 1:\n", + "\n", + "$$NDCG@K = \\frac{DCG@K}{IDCG@K}$$\n", + "\n", + "A score of 1.0 means the model ranked everything perfectly. A score close to 0 means the ranking was very poor." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "DCG@5 = 8.7944\n", + "IDCG@5 = 13.3472\n", + "NDCG@5 = DCG / IDCG = 8.7944 / 13.3472 = 0.6589\n", + "\n", + "The model achieved 65.9% of the ideal ranking quality.\n" + ] + } + ], + "source": [ + "ndcg_manual = dcg / idcg\n", + "\n", + "print(f\"DCG@{K} = {dcg:.4f}\")\n", + "print(f\"IDCG@{K} = {idcg:.4f}\")\n", + "print(f\"NDCG@{K} = DCG / IDCG = {dcg:.4f} / {idcg:.4f} = {ndcg_manual:.4f}\")\n", + "print(f\"\\nThe model achieved {ndcg_manual*100:.1f}% of the ideal ranking quality.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. Verify with PyTorch Ignite\n", + "\n", + "Now let's confirm our manual calculation matches the Ignite `NDCG` metric exactly." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Manual NDCG@5: 0.6589\n", + "Ignite NDCG@5: 0.6589\n", + "\n", + "✓ Manual calculation matches Ignite implementation perfectly!\n" + ] + } + ], + "source": [ + "from ignite.metrics.rec_sys.ndcg import NDCG\n", + "\n", + "# Initialize the NDCG metric with k=5\n", + "ndcg_metric = NDCG(output_transform=lambda x: x, top_k=[K])\n", + "\n", + "# Reset and update with our data\n", + "ndcg_metric.reset()\n", + "ndcg_metric.update((y_pred, y_true))\n", + "\n", + "# Compute the result\n", + "ignite_result = ndcg_metric.compute()\n", + "\n", + "print(f\"Manual NDCG@{K}: {ndcg_manual:.4f}\")\n", + "print(f\"Ignite NDCG@{K}: {ignite_result[0]:.4f}\")\n", + "print()\n", + "\n", + "# Verify they match\n", + "assert abs(ndcg_manual - ignite_result[0]) < 1e-4, \"Mismatch!\"\n", + "print(\"✓ Manual calculation matches Ignite implementation perfectly!\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6. Understanding the Score\n", + "\n", + "Let's build some intuition by looking at two extreme cases." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Perfect ranking NDCG@5: 1.0000 (should be 1.0)\n", + "Worst ranking NDCG@5: 0.5884 (should be close to 0)\n", + "\n", + "Our model's ranking NDCG@5: 0.6589 (somewhere in between)\n" + ] + } + ], + "source": [ + "# Case 1: Perfect ranking (model scores match relevance exactly)\n", + "y_pred_perfect = torch.tensor([[0.9, 0.6, 0.8, 0.1, 0.3]])\n", + "\n", + "ndcg_metric.reset()\n", + "ndcg_metric.update((y_pred_perfect, y_true))\n", + "perfect_score = ndcg_metric.compute()\n", + "print(f\"Perfect ranking NDCG@{K}: {perfect_score[0]:.4f} (should be 1.0)\")\n", + "\n", + "# Case 2: Worst ranking (model ranks least relevant items highest)\n", + "y_pred_worst = torch.tensor([[0.1, 0.3, 0.2, 0.9, 0.6]])\n", + "\n", + "ndcg_metric.reset()\n", + "ndcg_metric.update((y_pred_worst, y_true))\n", + "worst_score = ndcg_metric.compute()\n", + "print(f\"Worst ranking NDCG@{K}: {worst_score[0]:.4f} (should be close to 0)\")\n", + "\n", + "print(f\"\\nOur model's ranking NDCG@{K}: {ndcg_manual:.4f} (somewhere in between)\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.8" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}