Skip to content

Commit f3182cb

Browse files
committed
Set up documentation site using MkDocs and mkdocstrings
1 parent 0313e3e commit f3182cb

4 files changed

Lines changed: 105 additions & 0 deletions

File tree

docs/api.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# API Reference
2+
3+
## Basic Matching
4+
5+
::: fuzzybunny.levenshtein
6+
::: fuzzybunny.partial_ratio
7+
::: fuzzybunny.jaccard
8+
::: fuzzybunny.token_sort
9+
::: fuzzybunny.token_set
10+
::: fuzzybunny.qratio
11+
::: fuzzybunny.wratio
12+
13+
## Ranking
14+
15+
::: fuzzybunny.rank
16+
::: fuzzybunny.batch_match
17+
18+
## Utilities
19+
20+
::: fuzzybunny.benchmark.benchmark
21+
::: fuzzybunny.benchmark.benchmark_batch

docs/index.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# FuzzyBunny
2+
3+
A high-performance, lightweight Python library for fuzzy string matching and ranking, implemented in C++ with Pybind11.
4+
5+
## Features
6+
7+
- **Blazing Fast**: Optimized C++ core (Myers' Bit-Parallel algorithm) for superior performance.
8+
- **Multiple Scorers**: Support for Levenshtein, Jaccard, Token Sort, Token Set, QRatio, and WRatio.
9+
- **Partial Matching**: Find the best substring matches.
10+
- **Hybrid Scoring**: Combine multiple scorers with custom weights.
11+
- **Python Callbacks**: Use your own Python functions as scorers.
12+
- **Pandas & NumPy Integration**: Native support for Series and Arrays.
13+
- **Parallelized**: Parallel matching for large datasets using OpenMP.
14+
15+
## Quick Start
16+
17+
```python
18+
import fuzzybunny
19+
20+
# Basic matching
21+
score = fuzzybunny.levenshtein("kitten", "sitting")
22+
print(f"Similarity: {score:.2f}")
23+
24+
# Ranking candidates
25+
candidates = ["apple", "apricot", "banana", "cherry"]
26+
results = fuzzybunny.rank("app", candidates, top_n=2)
27+
# [('apple', 0.6), ('apricot', 0.42)]
28+
```
29+
30+
## Installation
31+
32+
```bash
33+
pip install fuzzybunny
34+
```
35+
36+
*Note: On macOS, it is recommended to have `libomp` installed via Homebrew for full parallel processing support.*

docs/performance.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Performance Benchmarks
2+
3+
FuzzyBunny is designed for speed, utilizing an optimized C++ core with bit-parallel algorithms and multi-threading support.
4+
5+
## Core Matching Performance
6+
7+
The core Levenshtein implementation uses **Myers' Bit-Parallel algorithm** for strings up to 64 characters. This allows for $O(N)$ matching with very low constant overhead.
8+
9+
For longer strings, it falls back to an optimized $O(NM)$ dynamic programming approach.
10+
11+
## Multi-threading with OpenMP
12+
13+
`batch_match` is parallelized using **OpenMP**. This allows you to process large numbers of queries across all available CPU cores without being bottlenecked by the Python Global Interpreter Lock (GIL).
14+
15+
## In-place Normalization
16+
17+
By pre-normalizing candidates and queries in `batch_match`, we avoid redundant computations, making it significantly faster for bulk operations compared to repetitive calls to individual matchers.
18+
19+
## Benchmarking on Your Machine
20+
21+
You can run the built-in benchmarking tool to see how it performs on your specific hardware and data:
22+
23+
```python
24+
import fuzzybunny
25+
26+
perf = fuzzybunny.benchmark("query", ["candidate"] * 10000)
27+
print(perf)
28+
```

mkdocs.yml

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
site_name: FuzzyBunny
2+
site_url: https://cachevector.github.io/fuzzybunny/
3+
repo_url: https://github.com/cachevector/fuzzybunny
4+
theme:
5+
name: material
6+
palette:
7+
primary: deep purple
8+
accent: pink
9+
10+
plugins:
11+
- search
12+
- mkdocstrings:
13+
handlers:
14+
python:
15+
paths: [src]
16+
17+
nav:
18+
- Home: index.md
19+
- API Reference: api.md
20+
- Performance: performance.md

0 commit comments

Comments
 (0)