Skip to content

Commit 81bc537

Browse files
committed
Built site for gh-pages
1 parent e10a1e2 commit 81bc537

9 files changed

Lines changed: 31503 additions & 11 deletions

File tree

.nojekyll

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
faae111c
1+
f607bafe

about.html

Lines changed: 768 additions & 0 deletions
Large diffs are not rendered by default.

index.html

Lines changed: 1797 additions & 8 deletions
Large diffs are not rendered by default.

robots.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Sitemap: https://AXLMRIN.github.io/bertopic-tutorial/sitemap.xml

search.json

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,13 @@
8383
"section": "Discussion on evaluating a topic model",
8484
"text": "Discussion on evaluating a topic model\nTopic model evaluation is an active domain of research that goes beyond the scope of this tutorial. We propose an overview of the methods that exist and how to quickly tell if your topic model can be used or need to be refined.\nIn short: quantitative methods are impractical and one should focus more on the qualitative evaluation.\n\nQualitative evaluation\nThere is no one way around qualitatively evaluate your BERTopic, however the point is \n\n\nQuantitative evaluation\nIn this section we introduce different metrics that can be used to evaluate your topic model. However, we mainly included it to warn you of the complexity behind evaluating a topic model and that there is no one-fit-all solution.\n\nFirst, choosing the coherence score by itself can have a large influence on the difference in performance you will find between models. For example, NPMI and UCI may each lead to quite different values. Second, the coherence score only tells a part of the story. Perhaps your purpose is more classification than having the most coherent words or perhaps you want as diverse topics as possible. These use cases require vastly different evaluation metrics to be used.\nResponse to “How to evaluate the performance of the model?” by Maarten Grootendorst source\n\nThere are two types of metrics that you could use:\n\nCluster metrics — ie focus on the group making. There exist a lot of metrics, but few are fit to our situation: unsupervised learning with density based algorithms. In our experience, optimising these metrics result in a sub-optimal solutions as illustrated bellow. Read more\nTopic representation metrics — ie focus on how relevant the keywords are. Although some metrics exist their utility is limited: good score does not necessarily align with what expert consider good topic models, and they are not good scores to optimise (Stammbach et al., 2023). Read more"
8585
},
86+
{
87+
"objectID": "tutorial.html#precompute-your-embeddings",
88+
"href": "tutorial.html#precompute-your-embeddings",
89+
"title": "BERTopic Tutorial",
90+
"section": "Precompute your embeddings",
91+
"text": "Precompute your embeddings\nPre-computing the embeddings is a good practice as it will prevent from computing them at each run, but also because it allows you to use a broader spectrum of embedding models that could ne necessarily be used with BERTopic15. We retrieve the embeddings and the documents\nds = load_from_disk(\"path/to/file\")\ndocs = np.array(ds[f\"resumes.en\"]) # Number of documents : 6500\nembeddings = np.array(ds[\"embedding\"]) # shape : (6500, 768)\nThe columns in the dataset are the same as before in addition to an embedding column containing the embeddings of the resumes."
92+
},
8693
{
8794
"objectID": "tutorial.html#save-your-instance-locally",
8895
"href": "tutorial.html#save-your-instance-locally",

sitemap.xml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
3+
<url>
4+
<loc>https://AXLMRIN.github.io/bertopic-tutorial/techy-notes.html</loc>
5+
<lastmod>2025-11-11T13:36:43.412Z</lastmod>
6+
</url>
7+
<url>
8+
<loc>https://AXLMRIN.github.io/bertopic-tutorial/tutorial.html</loc>
9+
<lastmod>2025-11-11T13:36:43.477Z</lastmod>
10+
</url>
11+
</urlset>

techy-notes.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -433,7 +433,7 @@ <h1>Coherence metrics</h1>
433433
}
434434
var localhostRegex = new RegExp(/^(?:http|https):\/\/localhost\:?[0-9]*\//);
435435
var mailtoRegex = new RegExp(/^mailto:/);
436-
var filterRegex = new RegExp('/' + window.location.host + '/');
436+
var filterRegex = new RegExp("https:\/\/AXLMRIN\.github\.io\/bertopic-tutorial\/");
437437
var isInternal = (href) => {
438438
return filterRegex.test(href) || localhostRegex.test(href) || mailtoRegex.test(href);
439439
}

tutorial.html

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,7 @@ <h2 id="toc-title">On this page</h2>
209209
</ul></li>
210210
<li><a href="#good-practices" id="toc-good-practices" class="nav-link" data-scroll-target="#good-practices">Good practices</a>
211211
<ul class="collapse">
212+
<li><a href="#precompute-your-embeddings" id="toc-precompute-your-embeddings" class="nav-link" data-scroll-target="#precompute-your-embeddings">Precompute your embeddings</a></li>
212213
<li><a href="#save-your-instance-locally" id="toc-save-your-instance-locally" class="nav-link" data-scroll-target="#save-your-instance-locally">Save your instance locally</a></li>
213214
</ul></li>
214215
<li><a href="#bibliography" id="toc-bibliography" class="nav-link" data-scroll-target="#bibliography">Bibliography</a></li>
@@ -1309,11 +1310,14 @@ <h3 class="anchored" data-anchor-id="quantitative-evaluation">Quantitative evalu
13091310
</section>
13101311
<section id="good-practices" class="level1">
13111312
<h1>Good practices</h1>
1313+
<section id="precompute-your-embeddings" class="level2">
1314+
<h2 class="anchored" data-anchor-id="precompute-your-embeddings">Precompute your embeddings</h2>
13121315
<p>Pre-computing the embeddings is a good practice as it will prevent from computing them at each run, but also because it allows you to use a broader spectrum of embedding models that could ne necessarily be used with BERTopic<a href="#fn15" class="footnote-ref" id="fnref15" role="doc-noteref"><sup>15</sup></a>. We retrieve the embeddings and the documents</p>
13131316
<div class="sourceCode" id="cb23"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb23-1"><a href="#cb23-1" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> load_from_disk(<span class="st">"path/to/file"</span>)</span>
13141317
<span id="cb23-2"><a href="#cb23-2" aria-hidden="true" tabindex="-1"></a>docs <span class="op">=</span> np.array(ds[<span class="ss">f"resumes.en"</span>]) <span class="co"># Number of documents : 6500</span></span>
13151318
<span id="cb23-3"><a href="#cb23-3" aria-hidden="true" tabindex="-1"></a>embeddings <span class="op">=</span> np.array(ds[<span class="st">"embedding"</span>]) <span class="co"># shape : (6500, 768)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
13161319
<p>The columns in the dataset are the same as before in addition to an <code>embedding</code> column containing the embeddings of the resumes.</p>
1320+
</section>
13171321
<section id="save-your-instance-locally" class="level2">
13181322
<h2 class="anchored" data-anchor-id="save-your-instance-locally">Save your instance locally</h2>
13191323
<p>For reproducibility purposes, and more generally, to save your work, BERTopic lets you do that with the <code>save</code> method. Two parameters of importance:</p>
@@ -1458,7 +1462,7 @@ <h1>Bibliography</h1>
14581462
}
14591463
var localhostRegex = new RegExp(/^(?:http|https):\/\/localhost\:?[0-9]*\//);
14601464
var mailtoRegex = new RegExp(/^mailto:/);
1461-
var filterRegex = new RegExp('/' + window.location.host + '/');
1465+
var filterRegex = new RegExp("https:\/\/AXLMRIN\.github\.io\/bertopic-tutorial\/");
14621466
var isInternal = (href) => {
14631467
return filterRegex.test(href) || localhostRegex.test(href) || mailtoRegex.test(href);
14641468
}

0 commit comments

Comments
 (0)