Skip to content

Commit e757790

Browse files
Enable support Ollama/gemma4 for local work
2 parents ec22532 + 3e8243e commit e757790

9 files changed

Lines changed: 2552 additions & 1840 deletions

File tree

README.md

Lines changed: 82 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
<div align="center">
2-
2+
33
<img src="assets/logo.png">
4-
4+
55
</div>
66

77
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/oidlabs-com/Lexoid/blob/main/examples/example_notebook_colab.ipynb)
@@ -35,6 +35,21 @@ OPENAI_API_KEY=""
3535
GOOGLE_API_KEY=""
3636
```
3737

38+
For local inference with Ollama, no API key is required. Install Ollama, pull the target model, and keep the local server running:
39+
40+
```bash
41+
ollama pull gemma4
42+
export OLLAMA_BASE_URL=127.0.0.1:11434
43+
ollama list
44+
ollama serve
45+
46+
# docker
47+
Reference: https://docs.ollama.com/docker#run-model-locally
48+
CPU example (will most likely be slower; remember to adjust `OLLAMA_TIMEOUT` as needed)
49+
- docker run -d -v ollama:/root/.ollama -p 11434:11434 -e OLLAMA_BASE_URL=0.0.0.0 -e OLLAMA_TIMEOUT=240 --name ollama ollama/ollama
50+
- docker exec -it ollama ollama pull gemma4:latest
51+
```
52+
3853
Optionally, to use `Playwright` for retrieving web content (instead of the `requests` library):
3954

4055
```
@@ -43,9 +58,9 @@ playwright install --with-deps --only-shell chromium
4358

4459
### Building `.whl` from source
4560

46-
>[!NOTE]
47-
>Installing the package from within the virtual environment could cause unexpected behavior,
48-
>as Lexoid creates and activates its own environment in order to build the wheel.
61+
> [!NOTE]
62+
> Installing the package from within the virtual environment could cause unexpected behavior,
63+
> as Lexoid creates and activates its own environment in order to build the wheel.
4964
5065
```
5166
make build
@@ -103,48 +118,75 @@ print(parsed_md)
103118
- \*\*kwargs: Additional arguments for the parser.
104119

105120
## Supported API Providers
106-
* Google
107-
* OpenAI
108-
* Hugging Face
109-
* Together AI
110-
* OpenRouter
111-
* Fireworks
121+
122+
- Google
123+
- OpenAI
124+
- Hugging Face
125+
- Together AI
126+
- OpenRouter
127+
- Fireworks
128+
- Ollama
129+
130+
## Ollama Local Parsing
131+
132+
Lexoid supports local `LLM_PARSE` inference through Ollama. The initial recommended model is `gemma4:latest`.
133+
134+
```python
135+
from lexoid.api import parse
136+
137+
result = parse(
138+
"path/to/document.pdf",
139+
parser_type="LLM_PARSE",
140+
api_provider="ollama",
141+
model="gemma4:latest",
142+
max_processes=1,
143+
)
144+
145+
print(result["raw"])
146+
```
147+
148+
Notes:
149+
150+
- Ollama uses the default local endpoint `http://localhost:11434` unless `OLLAMA_BASE_URL` is set.
151+
- Lexoid forces `max_processes=1` for Ollama-backed parsing to avoid local multiprocess contention.
152+
- `AUTO` routing does not select Ollama in this first version; choose it explicitly with `api_provider="ollama"`.
112153

113154
## Benchmark
114155

115156
Results aggregated across 14 documents.
116157

117158
_Note:_ Benchmarks are currently done in the zero-shot setting.
118159

119-
| Rank | Model | SequenceMatcher Similarity | TFIDF Similarity | Time (s) | Cost ($) |
120-
| --- | --- | --- | --- | --- | --- |
121-
| 1 | gemini-3-pro-preview | 0.917 (±0.127) | 0.943 (±0.159) | 46.92 | 0.06288 |
122-
| 2 | AUTO (with auto-selected model) | 0.899 (±0.131) | 0.960 (±0.066) | 21.17 | 0.00066 |
123-
| 3 | AUTO | 0.895 (±0.112) | 0.973 (±0.046) | 9.29 | 0.00063 |
124-
| 4 | gpt-5.2 | 0.890 (±0.193) | 0.975 (±0.036) | 33.32 | 0.03959 |
125-
| 5 | gemini-2.5-flash | 0.886 (±0.164) | 0.986 (±0.027) | 52.55 | 0.01226 |
126-
| 6 | mistral-ocr-latest | 0.882 (±0.106) | 0.932 (±0.091) | 5.75 | 0.00121 |
127-
| 7 | gemini-2.5-pro | 0.876 (±0.195) | 0.976 (±0.049) | 22.65 | 0.02408 |
128-
| 8 | gemini-2.0-flash | 0.875 (±0.148) | 0.977 (±0.037) | 11.96 | 0.00079 |
129-
| 9 | claude-3-5-sonnet-20241022 | 0.858 (±0.184) | 0.930 (±0.098) | 17.32 | 0.01804 |
130-
| 10 | gemini-1.5-flash | 0.842 (±0.214) | 0.969 (±0.037) | 15.58 | 0.00043 |
131-
| 11 | gpt-5-mini | 0.819 (±0.201) | 0.917 (±0.104) | 52.84 | 0.00811 |
132-
| 12 | gpt-5 | 0.807 (±0.215) | 0.919 (±0.088) | 98.12 | 0.05505 |
133-
| 13 | claude-sonnet-4-20250514 | 0.801 (±0.188) | 0.905 (±0.136) | 22.02 | 0.02056 |
134-
| 14 | claude-opus-4-20250514 | 0.789 (±0.220) | 0.886 (±0.148) | 29.55 | 0.09513 |
135-
| 15 | accounts/fireworks/models/llama4-maverick-instruct-basic | 0.772 (±0.203) | 0.930 (±0.117) | 16.02 | 0.00147 |
136-
| 16 | gemini-1.5-pro | 0.767 (±0.309) | 0.865 (±0.230) | 24.77 | 0.01139 |
137-
| 17 | gemini-3-flash-preview | 0.766 (±0.293) | 0.858 (±0.210) | 39.38 | 0.00969 |
138-
| 18 | gpt-4.1-mini | 0.754 (±0.249) | 0.803 (±0.193) | 23.28 | 0.00347 |
139-
| 19 | accounts/fireworks/models/llama4-scout-instruct-basic | 0.754 (±0.243) | 0.942 (±0.063) | 13.36 | 0.00087 |
140-
| 20 | gpt-4o | 0.752 (±0.269) | 0.896 (±0.123) | 28.87 | 0.01469 |
141-
| 21 | gpt-4o-mini | 0.728 (±0.241) | 0.850 (±0.128) | 18.96 | 0.00609 |
142-
| 22 | claude-3-7-sonnet-20250219 | 0.646 (±0.397) | 0.758 (±0.297) | 57.96 | 0.01730 |
143-
| 23 | gpt-4.1 | 0.637 (±0.301) | 0.787 (±0.185) | 35.37 | 0.01498 |
144-
| 24 | google/gemma-3-27b-it | 0.604 (±0.342) | 0.788 (±0.297) | 23.16 | 0.00020 |
145-
| 25 | ds4sd/SmolDocling-256M-preview | 0.603 (±0.292) | 0.705 (±0.262) | 507.74 | 0.00000 |
146-
| 26 | microsoft/phi-4-multimodal-instruct | 0.589 (±0.273) | 0.820 (±0.197) | 14.00 | 0.00045 |
147-
| 27 | qwen/qwen-2.5-vl-7b-instruct | 0.498 (±0.378) | 0.630 (±0.445) | 14.73 | 0.00056 |
160+
| Rank | Model | SequenceMatcher Similarity | TFIDF Similarity | Time (s) | Cost ($) |
161+
| ---- | -------------------------------------------------------- | -------------------------- | ---------------- | -------- | -------- |
162+
| 1 | gemini-3-pro-preview | 0.917 (±0.127) | 0.943 (±0.159) | 46.92 | 0.06288 |
163+
| 2 | AUTO (with auto-selected model) | 0.899 (±0.131) | 0.960 (±0.066) | 21.17 | 0.00066 |
164+
| 3 | AUTO | 0.895 (±0.112) | 0.973 (±0.046) | 9.29 | 0.00063 |
165+
| 4 | gpt-5.2 | 0.890 (±0.193) | 0.975 (±0.036) | 33.32 | 0.03959 |
166+
| 5 | gemini-2.5-flash | 0.886 (±0.164) | 0.986 (±0.027) | 52.55 | 0.01226 |
167+
| 6 | mistral-ocr-latest | 0.882 (±0.106) | 0.932 (±0.091) | 5.75 | 0.00121 |
168+
| 7 | gemini-2.5-pro | 0.876 (±0.195) | 0.976 (±0.049) | 22.65 | 0.02408 |
169+
| 8 | gemini-2.0-flash | 0.875 (±0.148) | 0.977 (±0.037) | 11.96 | 0.00079 |
170+
| 9 | claude-3-5-sonnet-20241022 | 0.858 (±0.184) | 0.930 (±0.098) | 17.32 | 0.01804 |
171+
| 10 | gemini-1.5-flash | 0.842 (±0.214) | 0.969 (±0.037) | 15.58 | 0.00043 |
172+
| 11 | gpt-5-mini | 0.819 (±0.201) | 0.917 (±0.104) | 52.84 | 0.00811 |
173+
| 12 | gpt-5 | 0.807 (±0.215) | 0.919 (±0.088) | 98.12 | 0.05505 |
174+
| 13 | claude-sonnet-4-20250514 | 0.801 (±0.188) | 0.905 (±0.136) | 22.02 | 0.02056 |
175+
| 14 | claude-opus-4-20250514 | 0.789 (±0.220) | 0.886 (±0.148) | 29.55 | 0.09513 |
176+
| 15 | accounts/fireworks/models/llama4-maverick-instruct-basic | 0.772 (±0.203) | 0.930 (±0.117) | 16.02 | 0.00147 |
177+
| 16 | gemini-1.5-pro | 0.767 (±0.309) | 0.865 (±0.230) | 24.77 | 0.01139 |
178+
| 17 | gemini-3-flash-preview | 0.766 (±0.293) | 0.858 (±0.210) | 39.38 | 0.00969 |
179+
| 18 | gpt-4.1-mini | 0.754 (±0.249) | 0.803 (±0.193) | 23.28 | 0.00347 |
180+
| 19 | accounts/fireworks/models/llama4-scout-instruct-basic | 0.754 (±0.243) | 0.942 (±0.063) | 13.36 | 0.00087 |
181+
| 20 | gpt-4o | 0.752 (±0.269) | 0.896 (±0.123) | 28.87 | 0.01469 |
182+
| 21 | gpt-4o-mini | 0.728 (±0.241) | 0.850 (±0.128) | 18.96 | 0.00609 |
183+
| 22 | claude-3-7-sonnet-20250219 | 0.646 (±0.397) | 0.758 (±0.297) | 57.96 | 0.01730 |
184+
| 23 | gpt-4.1 | 0.637 (±0.301) | 0.787 (±0.185) | 35.37 | 0.01498 |
185+
| 24 | google/gemma-3-27b-it | 0.604 (±0.342) | 0.788 (±0.297) | 23.16 | 0.00020 |
186+
| 25 | ds4sd/SmolDocling-256M-preview | 0.603 (±0.292) | 0.705 (±0.262) | 507.74 | 0.00000 |
187+
| 26 | microsoft/phi-4-multimodal-instruct | 0.589 (±0.273) | 0.820 (±0.197) | 14.00 | 0.00045 |
188+
| 27 | qwen/qwen-2.5-vl-7b-instruct | 0.498 (±0.378) | 0.630 (±0.445) | 14.73 | 0.00056 |
148189

149190
## Citation
191+
150192
If you use Lexoid in production or publications, please cite accordingly and acknowledge usage. We appreciate the support 🙏

docs/api.rst

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -33,12 +33,12 @@ parse
3333
* ``api_cost_mapping`` (Union[dict, str]): Dictionary containing API cost details or the string path to a JSON file containing
3434
the cost details. Sample file available at ``tests/api_cost_mapping.json``
3535
* ``router_priority`` (str): What the routing strategy should prioritize. Options are ``"speed"`` and ``"accuracy"``. The router directs a file to either ``"STATIC_PARSE"`` or ``"LLM_PARSE"`` based on its type and the selected priority. If priority is "accuracy", it prefers LLM_PARSE unless the PDF has no images but contains embedded/hidden hyperlinks, in which case it uses ``STATIC_PARSE`` (because LLMs currently fail to parse hidden hyperlinks). If priority is "speed", it uses ``STATIC_PARSE`` for documents without images and ``LLM_PARSE`` for documents with images.
36-
* ``api_provider`` (str): The API provider to use for LLM parsing. Options are ``gemini``, ``openai``, ``claude``, ``huggingface``, ``together``, ``openrouter``, and ``fireworks``. This parameter is only relevant when using LLM parsing.
36+
* ``api_provider`` (str): The API provider to use for LLM parsing. Options are ``gemini``, ``openai``, ``claude``, ``huggingface``, ``together``, ``openrouter``, ``fireworks``, and ``ollama``. This parameter is only relevant when using LLM parsing. For Ollama, use an explicit provider selection such as ``api_provider="ollama"`` with a local model like ``gemma4:latest``.
3737
* ``return_bboxes`` (bool): Whether to return bounding box information for each text segment. Default is ``False``.
3838

3939
Return value format:
4040
A dictionary containing a subset or all of the following keys:
41-
41+
4242
* ``raw``: Full markdown content as string
4343
* ``segments``: List of dictionaries with metadata and content of each segment. For PDFs, a segment denotes a page. For webpages, a segment denotes a section (a heading and its content).
4444
* ``title``: Title of the document
@@ -58,7 +58,7 @@ parse_with_schema
5858

5959
:param path: Path to the PDF file.
6060
:param schema: JSON schema to which the parsed output should conform.
61-
:param api: LLM API provider to use (``"gemini"``, ``"openai"``, ``"claude"``, ``"huggingface"``, ``"together"``, ``"openrouter"``, or ``"fireworks"``).
61+
:param api: LLM API provider to use (``"gemini"``, ``"openai"``, ``"claude"``, ``"huggingface"``, ``"together"``, ``"openrouter"``, ``"fireworks"``, or ``"ollama"``).
6262
:param model: LLM model name.
6363
:param kwargs: Additional keyword arguments passed to the LLM (e.g., ``temperature``, ``max_tokens``).
6464
:return: A list where each element represents a page, which in turn contains a list of dictionaries conforming to the provided schema.
@@ -105,6 +105,15 @@ LLM-Based Parsing
105105
# Parse using Gemini 1.5 Pro
106106
result = parse("document.pdf", parser_type="LLM_PARSE", model="gemini-1.5-pro")
107107
108+
# Parse using a local Ollama model
109+
result = parse(
110+
"document.pdf",
111+
parser_type="LLM_PARSE",
112+
api_provider="ollama",
113+
model="gemma4:latest",
114+
max_processes=1,
115+
)
116+
108117
109118
Static Parsing
110119
^^^^^^^^^^^^^^
@@ -137,7 +146,7 @@ Parse with Schema
137146
]
138147
139148
pdf_path = "inputs/test_1.pdf"
140-
result = parse_with_schema(path=pdf_path, schema=sample_schema, model="gpt-4o")
149+
result = parse_with_schema(path=pdf_path, schema=sample_schema, model="gpt-4o")
141150
142151
Web Content
143152
^^^^^^^^^^^

0 commit comments

Comments
 (0)