Skip to content

Commit 9538f2a

Browse files
authored
πŸ“ ✨ update learners and documentations (PR #310)
2 parents 25397a4 + 16ed733 commit 9538f2a

13 files changed

Lines changed: 616 additions & 57 deletions

File tree

β€ŽCHANGELOG.mdβ€Ž

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,11 @@
11
## Changelog
22

3-
### v1.5.1 (February 23, 2026)
3+
### v1.5.1 (March 30, 2026)
44
- Fix challenge learner
55
- Update requirements.
6+
- Updated documentations website.
7+
- Add RAG var to LearnerPipeline and its documentation with examples.
8+
- Minor bug fixing in LLM-Augmenter.
69

710
### v1.5.0 (February 5, 2026)
811
- Fix challenge learners

β€ŽREADME.mdβ€Ž

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,9 @@ print(metrics)
134134

135135
Other available learners:
136136
- [LLM-Based Learner](https://ontolearner.readthedocs.io/learners/llm.html)
137+
- [Retriever-Based Learner](https://ontolearner.readthedocs.io/learners/retrieval.html)
137138
- [RAG-Based Learner](https://ontolearner.readthedocs.io/learners/rag.html)
139+
- [LLMs4OL Challenge Learners](https://ontolearner.readthedocs.io/learners/llms4ol.html)
138140

139141
---
140142

β€Ždocs/source/index.rstβ€Ž

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
1-
2-
31
.. raw:: html
42

53
<div align="center">
@@ -109,8 +107,8 @@ Working with OntoLearner is straightforward:
109107
random_state=42
110108
)
111109
112-
# Initialize a multi-component learning pipeline (retriever + LLM)
113-
# This configuration enables a Retrieval-Augmented Generation (RAG) setup
110+
# RAG can be configured either by passing both IDs (shown here),
111+
# or by passing a prebuilt `rag=` learner object.
114112
pipeline = LearnerPipeline(
115113
retriever_id='sentence-transformers/all-MiniLM-L6-v2',
116114
llm_id='Qwen/Qwen2.5-0.5B-Instruct',

β€Ždocs/source/learners/llm.rstβ€Ž

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ You will see a evaluations results.
9393

9494
Pipeline Usage
9595
-----------------------
96-
The OntoLearner package also offers a streamlined ``LearnerPipeline`` class that simplifies the entire process of initializing, training, predicting, and evaluating a RAG setup into a single call. This is particularly useful for rapid experimentation and deployment.
96+
The OntoLearner package also offers a streamlined ``LearnerPipeline`` class that simplifies initialization, training, prediction, and evaluation into a single call. In this section, we run the pipeline in **LLM-only** mode by setting ``llm_id`` only.
9797

9898
.. code-block:: python
9999
@@ -113,7 +113,7 @@ The OntoLearner package also offers a streamlined ``LearnerPipeline`` class that
113113
114114
# Set up the learner pipeline using a lightweight instruction-tuned LLM
115115
pipeline = LearnerPipeline(
116-
llm_id='Qwen/Qwen2.5-0.5B-Instruct', # Small-scale LLM for reasoning over term-type assignments
116+
llm_id='Qwen/Qwen2.5-0.5B-Instruct', # LLM-only mode
117117
hf_token='...', # Hugging Face access token for loading gated models
118118
batch_size=32 # Batch size for parallel inference (if applicable)
119119
)

β€Ždocs/source/learners/rag.rstβ€Ž

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,8 @@ We start by importing necessary components from the ontolearner package, loading
2525
AgrO, # Example agricultural ontology
2626
train_test_split, # Helper function for data splitting
2727
LabelMapper, # Maps ontology labels to/from textual representations
28-
StandardizedPrompting # Standard prompting strategy across tasks
29-
evaluation_report
28+
StandardizedPrompting, # Standard prompting strategy across tasks
29+
evaluation_report,
3030
)
3131
3232
# Load the AgrO ontology (an agricultural domain ontology)
@@ -99,16 +99,24 @@ To build a RAG model, you first initialize its constituent parts: an LLM learner
9999
100100
Pipeline Usage
101101
---------------------
102-
Similar to LLM and Retrieval learner, RAG Learner is also callable via streamlined ``LearnerPipeline`` class that simplifies the entire learning process.
102+
Similar to LLM and Retrieval learners, RAG is callable via ``LearnerPipeline``, you can run RAG in two equivalent ways:
103103

104-
You initialize the ``LearnerPipeline`` by directly providing the ``retriever_id``, ``llm_id``, and other parameters like ``hf_token``, ``batch_size``, and ``top_k`` (number of top retrievals to include in RAG prompting). Then, you simply call the ``pipeline`` instance with your ``train_data``, ``test_data``, specify ``evaluate=True`` to compute metrics, and define the ``task`` (e.g., `'term-typing'`).
104+
1. Provide both ``retriever_id`` and ``llm_id`` (pipeline auto-composes an ``AutoRAGLearner``).
105+
2. Provide a prebuilt ``rag`` learner object for custom configurations.
105106

106107
.. code-block:: python
107108
108-
# Import core modules from the OntoLearner library
109-
from ontolearner import LearnerPipeline, AgrO, train_test_split
109+
from ontolearner import (
110+
LearnerPipeline,
111+
AutoLLMLearner,
112+
AutoRetrieverLearner,
113+
AutoRAGLearner,
114+
LabelMapper,
115+
StandardizedPrompting,
116+
AgrO,
117+
train_test_split,
118+
)
110119
111-
# Load the AgrO ontology, which contains concepts related to wines, their properties, and categories
112120
ontology = AgrO()
113121
ontology.load() # Load entities, types, and structured term annotations from the ontology
114122
ontological_data = ontology.extract()

β€Ždocs/source/learners/retrieval.rstβ€Ž

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ When working with large contexts, the retriever model may encounter memory issue
8181
Pipeline Usage
8282
-----------------------
8383

84-
Similar to LLM learner, Retrieval Learner is also callable via streamlined ``LearnerPipeline`` class that simplifies the entire process learning.
84+
Similar to the LLM learner, Retrieval learner is also callable via the streamlined ``LearnerPipeline`` class. In this section we use **retriever-only** mode by providing ``retriever_id`` only.
8585

8686
.. code-block:: python
8787
@@ -100,7 +100,7 @@ Similar to LLM learner, Retrieval Learner is also callable via streamlined ``Lea
100100
)
101101
102102
# Initialize the learning pipeline using a dense retriever
103-
# This configuration uses sentence embeddings to match similar relational contexts
103+
# This is retriever-only mode (no LLM component)
104104
pipeline = LearnerPipeline(
105105
retriever_id='sentence-transformers/all-MiniLM-L6-v2', # Hugging Face model ID for retrieval
106106
batch_size=10, # Number of samples to process per batch (if batching is enabled internally)
@@ -125,6 +125,10 @@ Similar to LLM learner, Retrieval Learner is also callable via streamlined ``Lea
125125
# Print the full output dictionary (includes predictions)
126126
print(outputs)
127127
128+
.. note::
129+
130+
For RAG with ``LearnerPipeline`` see: `https://ontolearner.readthedocs.io/learners/rag.html <https://ontolearner.readthedocs.io/learners/rag.html>`_.
131+
128132
.. hint::
129133
See `Learning Tasks <https://ontolearner.readthedocs.io/learning_tasks/llms4ol.html>`_ for possible tasks within Learners.
130134

@@ -372,6 +376,9 @@ Here the ``LLMAugmentedRetrieverLearner`` is the high-level wrapper that orchest
372376
augments = {"config": llm_augmenter_generator.get_config()}
373377
augments[task] = llm_augmenter_generator.augment(ontological_data, task=task)
374378
379+
base_retriever = LLMAugmentedRetriever()
380+
learner = LLMAugmentedRetrieverLearner(base_retriever=base_retriever)
381+
375382
learner.set_augmenter(augments)
376383
learner.load(model_id="Qwen/Qwen3-Embedding-8B")
377384

β€Ždocs/source/package_reference/pipeline.rstβ€Ž

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,12 @@
11
Learner Pipeline
22
====================
33

4+
``LearnerPipeline`` supports:
5+
6+
- retriever-only mode (set ``retriever_id``)
7+
- llm-only mode (set ``llm_id``)
8+
- rag mode (set both ``retriever_id`` and ``llm_id``), or provide a prebuilt ``rag`` learner
9+
410
LearnerPipeline
511
---------------------
612
.. autoclass:: ontolearner._learner.LearnerPipeline

β€Ždocs/source/quickstart.rstβ€Ž

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,11 @@ To alighn with machine learning follow, once the ontology is loaded, and ontolog
137137
)
138138
139139
140-
Once the data is split into training and testing sets, you can apply learning models to the ontology learning tasks. OntoLearner supports multiple modeling approaches, including retrieval-based methods, Large Language Model (LLM)-based techniques, and Retrieval-Augmented Generation (RAG) strategies. The ``LearnerPipeline`` within OntoLearner is designed for ease of use, abstracting away the complexities of loading models and preparing datasets or data loaders. You can configure the pipeline with your choice of LLMs, retrievers, or RAG components.
140+
Once the data is split into training and testing sets, you can apply learning models to the ontology learning tasks. OntoLearner supports multiple modeling approaches, including retrieval-based methods, Large Language Model (LLM)-based techniques, and Retrieval-Augmented Generation (RAG) strategies. The ``LearnerPipeline`` supports all three modes:
141+
142+
- Retriever-only: set ``retriever_id``
143+
- LLM-only: set ``llm_id``
144+
- RAG: set both ``retriever_id`` + ``llm_id`` for AutoRAGLearner. For prebuild RAG pass ``rag`` learner.
141145

142146
In the example below, we configure a RAG-based learner by specifying the Qwen LLM (`Qwen/Qwen2.5-0.5B-Instruct <https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct>`_) and a retriever based on a sentence-transformer model (`all-MiniLM-L6-v2 <https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2>`_):
143147

@@ -165,6 +169,34 @@ In the example below, we configure a RAG-based learner by specifying the Qwen LL
165169
- ``llm_id``: The instruction-following language model used to generate candidate outputs.
166170
- ``top_k``: Number of retrieved examples passed to the LLM (used in RAG setup).
167171
- ``hf_token``: Required for loading gated models from Hugging Face.
172+
- ``rag``: Optional prebuilt ``AutoRAGLearner`` (or compatible) object for custom RAG setups.
173+
174+
If you already created a RAG learner object, you can pass it directly:
175+
176+
.. code-block:: python
177+
178+
from ontolearner import (
179+
LearnerPipeline,
180+
AutoLLMLearner,
181+
AutoRetrieverLearner,
182+
AutoRAGLearner,
183+
LabelMapper,
184+
StandardizedPrompting,
185+
)
186+
187+
retriever = AutoRetrieverLearner(top_k=3)
188+
llm = AutoLLMLearner(
189+
prompting=StandardizedPrompting,
190+
label_mapper=LabelMapper(),
191+
token='<YOUR_HF_TOKEN>'
192+
)
193+
rag = AutoRAGLearner(retriever=retriever, llm=llm)
194+
195+
pipeline = LearnerPipeline(
196+
rag=rag,
197+
retriever_id='sentence-transformers/all-MiniLM-L6-v2',
198+
llm_id='Qwen/Qwen2.5-0.5B-Instruct'
199+
)
168200
169201
Once configured, the pipeline is executed on the training and test data:
170202

β€Žexamples/llm_learner_alexbek_rag_term_typing.pyβ€Ž

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,11 @@
2727
output_dir="./results/",
2828
)
2929

30-
# Build the pipeline and pass raw structured objects end-to-end.
31-
# We place the RAG learner in the llm slot and set llm_id accordingly.
30+
# Build the pipeline and pass the dedicated RAG learner explicitly.
3231
pipe = LearnerPipeline(
33-
llm=rag_learner,
32+
rag=rag_learner,
3433
llm_id="Qwen/Qwen2.5-0.5B-Instruct",
34+
retriever_id="sentence-transformers/all-MiniLM-L6-v2",
3535
ontologizer_data=True,
3636
)
3737

0 commit comments

Comments
Β (0)