You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# 1.0 (Token sort/set will match and WRatio will pick the best)
15
15
```
16
16
17
+
## Hybrid Scorer
18
+
19
+
The `hybrid` scorer allows you to define a custom weighted average of multiple built-in algorithms. This is useful when you have specific data requirements that a single algorithm can't fully capture.
20
+
21
+
To use it, set `scorer="hybrid"` and provide a `weights` dictionary in `rank` or `batch_match`.
When comparing many queries against a common candidate set, `batch_match` is the most efficient choice.
20
43
21
-
It provides two major optimizations over calling `rank` in a loop:
22
-
1.**Multi-threading (OpenMP)**: Automatically distributes work across all CPU cores.
23
-
2.**Normalization Caching**: Normalizes the candidate set only once per batch.
44
+
It provides two major optimizations:
45
+
1.**Normalization Caching**: In a standard loop, each candidate is normalized once per query. `batch_match` normalizes each candidate only once for the entire batch.
46
+
2.**Multi-threading (OpenMP)**: The C++ core uses OpenMP to parallelize the comparison loops across all available CPU cores.
#Results is a list where each element matches the corresponding query
35
-
for i, res inenumerate(results):
36
-
print(f"Results for {queries[i]}: {res}")
57
+
#results is a list of result lists
58
+
# results[0] contains matches for "apple"
59
+
# results[1] contains matches for "banana"
37
60
```
38
61
62
+
!!! tip "Performance Hint"
63
+
Parallel execution is automatically triggered when the number of queries is greater than 5. It releases the Python GIL during the intensive matching loops, allowing for true multi-core utilization.
64
+
39
65
## Custom Python Scorers
40
66
41
67
You can pass a custom Python function as the `scorer` argument.
Individual scorer functions (like `levenshtein`, `jaccard`, etc.) do **not** automatically normalize your strings. They perform a direct comparison. If you need automatic lowercasing or punctuation removal, use `rank` or `batch_match`, or preprocess your strings manually.
40
+
27
41
## Ranking Candidates
28
42
29
-
To find the best matches from a list of strings, use the `rank` function:
43
+
To find the best matches from a list of strings, use the `rank` function. This function *does* provide integrated normalization.
0 commit comments