Comprehensive documentation upgrade: improved docstrings, guides, and syntax highlighting

maskedsyntax · maskedsyntax · commit 8c6ea900a941 · 2026-03-15T16:57:20.000+05:30
diff --git a/docs/guide/advanced.md b/docs/guide/advanced.md
@@ -0,0 +1,67 @@
+# Advanced Scoring and Performance
+
+FuzzyBunny provides several advanced tools for performance and custom matching needs.
+
+## WRatio (Weighted Similarity Ratio)
+
+`WRatio` is the recommended general-purpose matcher. It combines several algorithms using heuristics to provide a more "intuitive" similarity score.
+
+```python
+import fuzzybunny
+
+# Matches well even with different word orders and lengths
+score = fuzzybunny.wratio("fuzzy bunny", "bunny fuzzy!!!")
+# 1.0 (Token sort/set will match and WRatio will pick the best)
+```
+
+## High-Performance Batch Matching
+
+When comparing many queries against a common candidate set, `batch_match` is the most efficient choice.
+
+It provides two major optimizations over calling `rank` in a loop:
+1.  **Multi-threading (OpenMP)**: Automatically distributes work across all CPU cores.
+2.  **Normalization Caching**: Normalizes the candidate set only once per batch.
+
+```python
+import fuzzybunny
+
+queries = ["apple", "banana", "cherry"]
+candidates = ["apple pie", "banana bread", "cherry tart", "apple turnover"]
+
+# Parallel matching
+results = fuzzybunny.batch_match(queries, candidates, top_n=2)
+
+# Results is a list where each element matches the corresponding query
+for i, res in enumerate(results):
+    print(f"Results for {queries[i]}: {res}")
+```
+
+## Custom Python Scorers
+
+You can pass a custom Python function as the `scorer` argument. 
+
+!!! warning "Performance"
+    Custom Python scorers are significantly slower than C++ scorers because they must acquire the Python Global Interpreter Lock (GIL) for every comparison.
+
+```python
+def my_custom_scorer(s1, s2):
+    # Your custom logic here
+    # Return a score between 0.0 and 1.0
+    return 1.0 if s1[0] == s2[0] else 0.0
+
+results = fuzzybunny.rank("apple", ["apricot", "banana"], scorer=my_custom_scorer)
+```
+
+## Integration with Pandas and NumPy
+
+FuzzyBunny integrates directly with common data science tools:
+
+```python
+import pandas as pd
+import fuzzybunny
+
+df = pd.DataFrame({"names": ["apple pie", "banana bread", "cherry tart"]})
+
+# Use the pandas accessor
+results = df["names"].fuzzy.match("apple")
+```
diff --git a/docs/guide/basic_usage.md b/docs/guide/basic_usage.md
@@ -0,0 +1,63 @@
+# Basic Usage
+
+FuzzyBunny provides a simple and intuitive API for fuzzy string matching.
+
+## Individual Scorers
+
+The library offers several algorithms to compare strings:
+
+```python
+import fuzzybunny
+
+# Levenshtein Distance
+score = fuzzybunny.levenshtein("kitten", "sitting")
+# 0.5714...
+
+# Token Sort Ratio
+# Good for strings with the same words but in different orders
+score = fuzzybunny.token_sort("apple banana", "banana apple")
+# 1.0
+
+# Jaccard Similarity
+# Good for comparing sets of tokens
+score = fuzzybunny.jaccard("apple banana cherry", "banana apple")
+# 0.666...
+```
+
+## Ranking Candidates
+
+To find the best matches from a list of strings, use the `rank` function:
+
+```python
+candidates = ["apple pie", "banana bread", "cherry tart", "apple turnover"]
+
+# Find top 2 matches for "apple"
+results = fuzzybunny.rank("apple", candidates, top_n=2)
+# [('apple pie', 0.55), ('apple turnover', 0.35)]
+```
+
+### Partial Matching
+
+If you want to find if a query exists as a substring of a candidate, use `mode="partial"`:
+
+```python
+# Standard rank (full match)
+res_full = fuzzybunny.rank("apple", ["apple pie"], mode="full")
+# Score will be ~0.55
+
+# Partial rank (substring match)
+res_partial = fuzzybunny.rank("apple", ["apple pie"], mode="partial")
+# Score will be 1.0 because "apple" is exactly in "apple pie"
+```
+
+## Normalization
+
+By default, FuzzyBunny normalizes strings by lowercasing and removing punctuation. You can disable this by passing `process=False`:
+
+```python
+# Default (case-insensitive)
+fuzzybunny.levenshtein("APPLE", "apple", process=True) # 1.0
+
+# Case-sensitive
+fuzzybunny.levenshtein("APPLE", "apple", process=False) # < 1.0
+```
diff --git a/docs/guide/installation.md b/docs/guide/installation.md
@@ -0,0 +1,32 @@
+# Installation
+
+FuzzyBunny can be installed from PyPI using `pip`.
+
+```bash
+pip install fuzzybunny
+```
+
+## System Requirements
+
+- **Python**: 3.8 or higher.
+- **Compiler**: C++17 compatible compiler (only if building from source).
+
+## Platform Specifics
+
+### macOS
+
+For high-performance parallel processing via OpenMP, it is highly recommended to install `libomp` via Homebrew:
+
+```bash
+brew install libomp
+```
+
+FuzzyBunny will automatically detect `libomp` and enable multi-threading for `batch_match`.
+
+### Linux
+
+Most Linux distributions have `libgomp` pre-installed as part of `gcc`. No extra steps are typically required.
+
+### Windows
+
+OpenMP is supported via the MSVC compiler flags.
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -4,17 +4,60 @@ repo_url: https://github.com/cachevector/fuzzybunny
 theme:
   name: material
   palette:
-    primary: deep purple
-    accent: pink
+    - media: "(prefers-color-scheme: light)"
+      scheme: default
+      primary: deep purple
+      accent: pink
+      toggle:
+        icon: material/brightness-7
+        name: Switch to dark mode
+    - media: "(prefers-color-scheme: dark)"
+      scheme: slate
+      primary: deep purple
+      accent: pink
+      toggle:
+        icon: material/brightness-4
+        name: Switch to light mode
+  features:
+    - navigation.tabs
+    - navigation.sections
+    - toc.follow
+    - content.code.annotate
+    - content.code.copy
 
 plugins:
   - search
   - mkdocstrings:
       handlers:
         python:
           paths: [src]
+          options:
+            show_source: true
+            show_root_heading: true
+            show_category_heading: true
+
+markdown_extensions:
+  - pymdownx.highlight:
+      anchor_linenums: true
+      pygments_lang_class: true
+  - pymdownx.inlinehilite
+  - pymdownx.snippets
+  - pymdownx.superfences:
+      custom_fences:
+        - name: mermaid
+          class: mermaid
+          format: !!python/name:pymdownx.superfences.fence_code_format
+  - admonition
+  - pymdownx.details
+  - pymdownx.emoji:
+      emoji_index: !!python/name:pymdownx.emoji.twemoji
+      emoji_generator: !!python/name:pymdownx.emoji.to_svg
 
 nav:
   - Home: index.md
+  - User Guide:
+    - Installation: guide/installation.md
+    - Basic Usage: guide/basic_usage.md
+    - Advanced Scoring: guide/advanced.md
   - API Reference: api.md
   - Performance: performance.md
diff --git a/src/bindings.cpp b/src/bindings.cpp
@@ -20,31 +20,64 @@ PYBIND11_MODULE(_fuzzybunny, m) {
 
     m.def("levenshtein", [](const std::string& s1, const std::string& s2) {
         return levenshtein_ratio(utf8_to_u32(s1), utf8_to_u32(s2));
-    }, py::arg("s1"), py::arg("s2"), "Calculate Levenshtein ratio (0.0 - 1.0)");
+    }, py::arg("s1"), py::arg("s2"), R"pbdoc(
+        Calculate the Levenshtein similarity ratio between two strings.
+        
+        Returns a score between 0.0 and 1.0, where 1.0 is an exact match.
+        The ratio is calculated as: 1 - (distance / max_length).
+    )pbdoc");
 
     m.def("partial_ratio", [](const std::string& s1, const std::string& s2) {
         return partial_ratio(utf8_to_u32(s1), utf8_to_u32(s2));
-    }, py::arg("s1"), py::arg("s2"), "Calculate Partial Levenshtein ratio (0.0 - 1.0)");
+    }, py::arg("s1"), py::arg("s2"), R"pbdoc(
+        Calculate the best substring similarity ratio.
+        
+        If the shorter string has length k, this finds the best Levenshtein 
+        ratio between the shorter string and any substring of length k 
+        in the longer string.
+    )pbdoc");
 
     m.def("jaccard", [](const std::string& s1, const std::string& s2) {
         return jaccard_similarity(utf8_to_u32(s1), utf8_to_u32(s2));
-    }, py::arg("s1"), py::arg("s2"), "Calculate Jaccard similarity (0.0 - 1.0)");
+    }, py::arg("s1"), py::arg("s2"), R"pbdoc(
+        Calculate Jaccard similarity between token sets.
+        
+        Tokenizes both strings and calculates the intersection over union 
+        of the unique tokens.
+    )pbdoc");
 
     m.def("token_sort", [](const std::string& s1, const std::string& s2) {
         return token_sort_ratio(utf8_to_u32(s1), utf8_to_u32(s2));
-    }, py::arg("s1"), py::arg("s2"), "Calculate Token Sort ratio (0.0 - 1.0)");
+    }, py::arg("s1"), py::arg("s2"), R"pbdoc(
+        Calculate similarity ratio after sorting tokens.
+        
+        Tokenizes both strings, sorts the tokens alphabetically, joins them 
+        back with spaces, and then calculates the Levenshtein ratio.
+    )pbdoc");
 
     m.def("token_set", [](const std::string& s1, const std::string& s2) {
         return token_set_ratio(utf8_to_u32(s1), utf8_to_u32(s2));
-    }, py::arg("s1"), py::arg("s2"), "Calculate Token Set ratio (0.0 - 1.0)");
+    }, py::arg("s1"), py::arg("s2"), R"pbdoc(
+        Calculate similarity ratio while ignoring duplicates and token order.
+        
+        Finds the intersection and differences between token sets and 
+        compares them to find the best possible match.
+    )pbdoc");
 
     m.def("qratio", [](const std::string& s1, const std::string& s2) {
         return qratio(utf8_to_u32(s1), utf8_to_u32(s2));
-    }, py::arg("s1"), py::arg("s2"), "Calculate QRatio (0.0 - 1.0)");
+    }, py::arg("s1"), py::arg("s2"), R"pbdoc(
+        A simple Levenshtein ratio matching the behavior of other fuzzy libs.
+    )pbdoc");
 
     m.def("wratio", [](const std::string& s1, const std::string& s2) {
         return wratio(utf8_to_u32(s1), utf8_to_u32(s2));
-    }, py::arg("s1"), py::arg("s2"), "Calculate WRatio (0.0 - 1.0)");
+    }, py::arg("s1"), py::arg("s2"), R"pbdoc(
+        Weighted similarity ratio (recommended for general use).
+        
+        Combines Levenshtein, partial ratio, and token-based ratios using 
+        heuristics to provide the most 'intuitive' similarity score.
+    )pbdoc");
 
     m.def("rank", &rank,
           py::arg("query"),
diff --git a/src/fuzzybunny/__init__.py b/src/fuzzybunny/__init__.py