Sync documentation with current API and features

maskedsyntax · maskedsyntax · commit a1b0efd1f08d · 2026-03-16T18:08:07.000+05:30
diff --git a/README.md b/README.md
@@ -17,14 +17,14 @@
 ## Features
 
 - **Blazing Fast**: C++ core for 2-5x speed improvement over pure Python alternatives.
-- **Multiple Scorers**: Support for Levenshtein, Jaccard, and Token Sort ratios.
-- **Partial Matching**: Find the best substring matches.
-- **Hybrid Scoring**: Combine multiple scorers with custom weights.
-- **Pandas & NumPy Integration**: Native support for Series and Arrays.
+- **Multiple Scorers**: Support for Levenshtein, Jaccard, Token Sort, Token Set, QRatio, WRatio, and Partial Ratio.
+- **Partial Matching**: Find the best substring matches using `mode="partial"`.
+- **Hybrid Scoring**: Combine multiple scorers with custom weights for complex matching tasks.
+- **Pandas & NumPy Integration**: Native support for Series and Arrays via a dedicated accessor.
 - **Batch Processing**: Parallelized matching for large datasets using OpenMP.
-- **Unicode Support**: Handles international characters and normalization.
-- **Benchmarking Tools**: Built-in utilities to measure performance.
-- **Thread Safe**: Releases the GIL in C++ for better multi-threading performance.
+- **Unicode Support**: Handles international characters and basic normalization.
+- **Benchmarking Tools**: Built-in utilities to measure and compare performance.
+- **Thread Safe**: Releases the GIL in C++ for optimal multi-threaded performance.
 - **Type Safe**: Includes PEP 561 type stubs for full IDE and MyPy support.
 
 ## Installation
@@ -51,7 +51,7 @@ results = fuzzybunny.rank("app", candidates, top_n=2)
 ## Advanced Usage
 
 ### Hybrid Scorer
-Combine different algorithms to get better results:
+Combine different algorithms using custom weights:
 
 ```python
 results = fuzzybunny.rank(
@@ -62,8 +62,19 @@ results = fuzzybunny.rank(
 )
 ```
 
+### Partial Matching
+Find the best substring match:
+
+```python
+score = fuzzybunny.partial_ratio("apple", "apple pie") # 1.0
+
+# Using rank with partial mode
+results = fuzzybunny.rank("apple", ["apple pie", "banana"], mode="partial")
+# [('apple pie', 1.0), ('banana', 0.18)]
+```
+
 ### Pandas Integration
-Use the specialized accessor for clean code:
+Use the specialized `fuzzy` accessor:
 
 ```python
 import pandas as pd
diff --git a/docs/guide/advanced.md b/docs/guide/advanced.md
@@ -14,13 +14,36 @@ score = fuzzybunny.wratio("fuzzy bunny", "bunny fuzzy!!!")
 # 1.0 (Token sort/set will match and WRatio will pick the best)
 ```
 
+## Hybrid Scorer
+
+The `hybrid` scorer allows you to define a custom weighted average of multiple built-in algorithms. This is useful when you have specific data requirements that a single algorithm can't fully capture.
+
+To use it, set `scorer="hybrid"` and provide a `weights` dictionary in `rank` or `batch_match`.
+
+```python
+import fuzzybunny
+
+results = fuzzybunny.rank(
+    "fuzzy bunny", 
+    ["bunny fuzzy", "the fuzzy bunny", "rabbit"],
+    scorer="hybrid",
+    weights={
+        "levenshtein": 0.2,
+        "token_sort": 0.5,
+        "token_set": 0.3
+    }
+)
+```
+
+**Supported weight keys:** `levenshtein`, `jaccard`, `token_sort`, `token_set`, `qratio`, `wratio`.
+
 ## High-Performance Batch Matching
 
 When comparing many queries against a common candidate set, `batch_match` is the most efficient choice.
 
-It provides two major optimizations over calling `rank` in a loop:
-1.  **Multi-threading (OpenMP)**: Automatically distributes work across all CPU cores.
-2.  **Normalization Caching**: Normalizes the candidate set only once per batch.
+It provides two major optimizations:
+1.  **Normalization Caching**: In a standard loop, each candidate is normalized once per query. `batch_match` normalizes each candidate only once for the entire batch.
+2.  **Multi-threading (OpenMP)**: The C++ core uses OpenMP to parallelize the comparison loops across all available CPU cores.
 
 ```python
 import fuzzybunny
@@ -31,11 +54,14 @@ candidates = ["apple pie", "banana bread", "cherry tart", "apple turnover"]
 # Parallel matching
 results = fuzzybunny.batch_match(queries, candidates, top_n=2)
 
-# Results is a list where each element matches the corresponding query
-for i, res in enumerate(results):
-    print(f"Results for {queries[i]}: {res}")
+# results is a list of result lists
+# results[0] contains matches for "apple"
+# results[1] contains matches for "banana"
 ```
 
+!!! tip "Performance Hint"
+    Parallel execution is automatically triggered when the number of queries is greater than 5. It releases the Python GIL during the intensive matching loops, allowing for true multi-core utilization.
+
 ## Custom Python Scorers
 
 You can pass a custom Python function as the `scorer` argument. 
diff --git a/docs/guide/basic_usage.md b/docs/guide/basic_usage.md
@@ -4,60 +4,77 @@ FuzzyBunny provides a simple and intuitive API for fuzzy string matching.
 
 ## Individual Scorers
 
-The library offers several algorithms to compare strings:
+The library offers several algorithms to compare two strings directly. These functions expect strings as input and return a score between 0.0 and 1.0.
 
 ```python
 import fuzzybunny
 
-# Levenshtein Distance
-score = fuzzybunny.levenshtein("kitten", "sitting")
+# Levenshtein Ratio (edit distance)
+fuzzybunny.levenshtein("kitten", "sitting")
 # 0.5714...
 
-# Token Sort Ratio
-# Good for strings with the same words but in different orders
-score = fuzzybunny.token_sort("apple banana", "banana apple")
+# Partial Ratio (best substring match)
+fuzzybunny.partial_ratio("apple", "apple pie")
 # 1.0
 
-# Jaccard Similarity
-# Good for comparing sets of tokens
-score = fuzzybunny.jaccard("apple banana cherry", "banana apple")
+# Token Sort Ratio (alphabetical word ordering)
+fuzzybunny.token_sort("apple banana", "banana apple")
+# 1.0
+
+# Token Set Ratio (set intersection/difference)
+# Good for strings with extra words or duplicates
+fuzzybunny.token_set("apple banana", "apple banana banana")
+# 1.0
+
+# Jaccard Similarity (intersection over union)
+fuzzybunny.jaccard("apple banana cherry", "banana apple")
 # 0.666...
+
+# WRatio (Weighted Ratio - Recommended for general use)
+fuzzybunny.wratio("fuzzy bunny", "bunny fuzzy!!!")
+# 1.0
 ```
 
+!!! info "Direct vs. Ranked Matching"
+    Individual scorer functions (like `levenshtein`, `jaccard`, etc.) do **not** automatically normalize your strings. They perform a direct comparison. If you need automatic lowercasing or punctuation removal, use `rank` or `batch_match`, or preprocess your strings manually.
+
 ## Ranking Candidates
 
-To find the best matches from a list of strings, use the `rank` function:
+To find the best matches from a list of strings, use the `rank` function. This function *does* provide integrated normalization.
 
 ```python
 candidates = ["apple pie", "banana bread", "cherry tart", "apple turnover"]
 
 # Find top 2 matches for "apple"
+# By default, it uses 'levenshtein' and 'process=True'
 results = fuzzybunny.rank("apple", candidates, top_n=2)
 # [('apple pie', 0.55), ('apple turnover', 0.35)]
 ```
 
 ### Partial Matching
 
-If you want to find if a query exists as a substring of a candidate, use `mode="partial"`:
+If you want to find if a query exists as a substring of a candidate, use `mode="partial"`. In `rank`, this uses the `partial_ratio` logic.
 
 ```python
 # Standard rank (full match)
 res_full = fuzzybunny.rank("apple", ["apple pie"], mode="full")
-# Score will be ~0.55
+# Score: 0.555...
 
 # Partial rank (substring match)
 res_partial = fuzzybunny.rank("apple", ["apple pie"], mode="partial")
-# Score will be 1.0 because "apple" is exactly in "apple pie"
+# Score: 1.0
 ```
 
 ## Normalization
 
-By default, FuzzyBunny normalizes strings by lowercasing and removing punctuation. You can disable this by passing `process=False`:
+By default, `rank` and `batch_match` normalize strings by lowercasing and removing punctuation. You can disable this by passing `process=False`:
 
 ```python
-# Default (case-insensitive)
-fuzzybunny.levenshtein("APPLE", "apple", process=True) # 1.0
+# Default (case-insensitive & punctuation-agnostic)
+fuzzybunny.rank("APPLE!", ["apple"], process=True) 
+# [('apple', 1.0)]
 
-# Case-sensitive
-fuzzybunny.levenshtein("APPLE", "apple", process=False) # < 1.0
+# Case-sensitive and strict
+fuzzybunny.rank("APPLE!", ["apple"], process=False) 
+# [('apple', 0.0)]
 ```
diff --git a/docs/index.md b/docs/index.md
@@ -5,12 +5,13 @@ A high-performance, lightweight Python library for fuzzy string matching and ran
 ## Features
 
 - **Blazing Fast**: Optimized C++ core (Myers' Bit-Parallel algorithm) for superior performance.
-- **Multiple Scorers**: Support for Levenshtein, Jaccard, Token Sort, Token Set, QRatio, and WRatio.
-- **Partial Matching**: Find the best substring matches.
+- **Multiple Scorers**: Support for Levenshtein, Jaccard, Token Sort, Token Set, QRatio, WRatio, and Partial Ratio.
+- **Partial Matching**: Find the best substring matches using `mode="partial"`.
 - **Hybrid Scoring**: Combine multiple scorers with custom weights.
 - **Python Callbacks**: Use your own Python functions as scorers.
-- **Pandas & NumPy Integration**: Native support for Series and Arrays.
+- **Pandas & NumPy Integration**: Native support for Series and Arrays via a dedicated accessor.
 - **Parallelized**: Parallel matching for large datasets using OpenMP.
+- **Unicode Support**: Handles international characters and basic normalization.
 
 ## Quick Start