@@ -46,10 +46,10 @@ from `asyncio`.
4646``` python
4747from gliner.serve import GLiNERClient
4848
49- client = GLiNERClient()
49+ client = GLiNERClient() # defaults to http://localhost:8000/gliner
5050result = client.predict(
5151 " John works at Google in Mountain View" ,
52- labels = [" person" , " organization" , " location" ]
52+ labels = [" person" , " organization" , " location" ],
5353)
5454print (result)
5555# {'entities': [
@@ -59,7 +59,35 @@ print(result)
5959# ]}
6060```
6161
62- ** HTTP request:**
62+ ` GLiNERClient ` is a pure HTTP client built on the Python standard library —
63+ it does ** not** import ` ray ` and does ** not** join the Ray cluster, so it
64+ runs from any Python process (including environments where ` ray ` is not
65+ installed). Construct it with a custom URL/prefix or timeout as needed:
66+
67+ ``` python
68+ client = GLiNERClient(
69+ base_url = " http://gliner.internal:8000" ,
70+ route_prefix = " /gliner" ,
71+ timeout = 30.0 ,
72+ max_concurrency = 32 , # bound on concurrent in-flight HTTP requests
73+ )
74+ ```
75+
76+ Passing a list of texts preserves server-side dynamic batching — each text
77+ is dispatched as its own HTTP request concurrently (threads for ` predict ` ,
78+ ` asyncio.gather ` for ` predict_async ` ) so Ray Serve's ` @serve.batch `
79+ coalesces them into a single forward pass:
80+
81+ ``` python
82+ outputs = client.predict(
83+ [" John works at Google" , " Paris is in France" ],
84+ labels = [" person" , " organization" , " location" ],
85+ ) # → list[dict], one per input text
86+ ```
87+
88+ Network or server errors surface as ` gliner.serve.client.GLiNERClientError ` .
89+
90+ ** HTTP request (no client library):**
6391``` bash
6492curl -X POST http://localhost:8000/gliner \
6593 -H " Content-Type: application/json" \
@@ -128,16 +156,100 @@ result = ref.result()
128156
129157## Relation Extraction
130158
131- For models that support relation extraction:
159+ GLiNER-RelEx models (e.g. ` knowledgator/gliner-relex-large-v0.5 ` ,
160+ ` knowledgator/gliner-token-relex-v1.0 ` ) jointly extract entities and the
161+ relations between them in a single forward pass. The server auto-detects
162+ relation support by inspecting ` model.config.model_type ` and enables the
163+ relex code path when it contains ` "relex" ` — no extra flag is needed.
164+
165+ ### Start a RelEx server
166+
167+ ``` bash
168+ python -m gliner.serve \
169+ --model knowledgator/gliner-relex-large-v1.0 \
170+ --dtype bfloat16 \
171+ --max-batch-size 16
172+ ```
173+
174+ ### Predict via the client
132175
133176``` python
177+ from gliner.serve import GLiNERClient
178+
179+ client = GLiNERClient() # http://localhost:8000/gliner
180+
181+ text = " Bill Gates founded Microsoft in 1975. The company is headquartered in Redmond."
182+
134183result = client.predict(
135- " John works at Google" ,
136- labels = [" person" , " organization" ],
137- relations = [" works_at" , " founded_by" ]
184+ text,
185+ labels = [" person" , " organization" , " date" , " location" ],
186+ relations = [" founded" , " founded_in" , " headquartered_in" ],
187+ threshold = 0.5 ,
188+ relation_threshold = 0.5 ,
138189)
139- # {'entities': [...], 'relations': [...]}
190+
191+ for ent in result[" entities" ]:
192+ print (f " { ent[' text' ]} ( { ent[' label' ]} ) " )
193+
194+ for rel in result[" relations" ]:
195+ head = result[" entities" ][rel[" head" ][" entity_idx" ]]
196+ tail = result[" entities" ][rel[" tail" ][" entity_idx" ]]
197+ print (f " { head[' text' ]} --[ { rel[' relation' ]} ]--> { tail[' text' ]} " )
198+ ```
199+
200+ For a batched call, pass a list of texts — each one dispatches as its own
201+ request so the server can coalesce them into a single relex forward pass:
202+
203+ ``` python
204+ results = client.predict(
205+ [
206+ " Bill Gates founded Microsoft in 1975." ,
207+ " Apple is headquartered in Cupertino." ,
208+ ],
209+ labels = [" person" , " organization" , " location" , " date" ],
210+ relations = [" founded" , " founded_in" , " headquartered_in" ],
211+ )
212+ # results == [ {"entities": [...], "relations": [...]}, {...} ]
213+ ```
214+
215+ ### In-process (GLiNERFactory)
216+
217+ ``` python
218+ from gliner.serve import GLiNERFactory
219+
220+ with GLiNERFactory(model = " knowledgator/gliner-relex-large-v0.5" ) as llm:
221+ out = llm.predict(
222+ " Bill Gates founded Microsoft in 1975." ,
223+ labels = [" person" , " organization" , " date" ],
224+ relations = [" founded" , " founded_in" ],
225+ )
226+ ```
227+
228+ ### HTTP (curl)
229+
230+ ``` bash
231+ curl -X POST http://localhost:8000/gliner \
232+ -H " Content-Type: application/json" \
233+ -d ' {
234+ "text": "Bill Gates founded Microsoft in 1975.",
235+ "labels": ["person", "organization", "date"],
236+ "relations": ["founded", "founded_in"],
237+ "threshold": 0.5,
238+ "relation_threshold": 0.5
239+ }'
240+ ```
241+
242+ ** Response shape for RelEx models:**
243+ ``` python
244+ {
245+ " entities" : [{" start" , " end" , " text" , " label" , " score" }, ... ],
246+ " relations" : [{" relation" , " score" ,
247+ " head" : {" entity_idx" : int , ... },
248+ " tail" : {" entity_idx" : int , ... }}, ... ],
249+ }
140250```
251+ For NER-only models the ` "relations" ` key is omitted; passing ` relations= `
252+ to such a model is a no-op.
141253
142254## All CLI Options
143255
@@ -152,7 +264,7 @@ Model Configuration:
152264
153265Batching:
154266 --max-batch-size Max batch size (default: 32)
155- --batch-wait-timeout-ms Batch wait timeout (default: 50 )
267+ --batch-wait-timeout-ms Batch wait timeout (default: 10 )
156268 --precompiled-batch-sizes Comma-separated sizes (default: 1,2,4,8,16,32)
157269
158270Replicas:
@@ -165,7 +277,8 @@ Performance:
165277 --no-compile Disable torch.compile
166278
167279Memory:
168- --target-memory-fraction GPU memory fraction (default: 0.8)
280+ --target-memory-fraction GPU memory fraction (default: 0.9)
281+ --memory-overhead-factor Safety margin on memory estimates (default: 1.3)
169282
170283Server:
171284 --route-prefix HTTP route (default: /gliner)
0 commit comments