📝 ✨ update learners and documentations (PR #310)

HamedBabaei · web-flow · commit 9538f2afe91a · 2026-03-30T15:48:42.000+02:00
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,8 +1,11 @@
 ## Changelog
 
-### v1.5.1 (February 23, 2026)
+### v1.5.1 (March 30, 2026)
 - Fix challenge learner
 - Update requirements.
+- Updated documentations website.
+- Add RAG var to LearnerPipeline and its documentation with examples.
+- Minor bug fixing in LLM-Augmenter.
 
 ### v1.5.0 (February 5, 2026)
 - Fix challenge learners
diff --git a/README.md b/README.md
@@ -134,7 +134,9 @@ print(metrics)
 
 Other available learners:
 - [LLM-Based Learner](https://ontolearner.readthedocs.io/learners/llm.html)
+- [Retriever-Based Learner](https://ontolearner.readthedocs.io/learners/retrieval.html)
 - [RAG-Based Learner](https://ontolearner.readthedocs.io/learners/rag.html)
+- [LLMs4OL Challenge Learners](https://ontolearner.readthedocs.io/learners/llms4ol.html)
 
 ---
 
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -1,5 +1,3 @@
-
-
 .. raw:: html
 
    <div align="center">
@@ -109,8 +107,8 @@ Working with OntoLearner is straightforward:
             random_state=42
         )
 
-        # Initialize a multi-component learning pipeline (retriever + LLM)
-        # This configuration enables a Retrieval-Augmented Generation (RAG) setup
+        # RAG can be configured either by passing both IDs (shown here),
+        # or by passing a prebuilt `rag=` learner object.
         pipeline = LearnerPipeline(
             retriever_id='sentence-transformers/all-MiniLM-L6-v2',
             llm_id='Qwen/Qwen2.5-0.5B-Instruct',
diff --git a/docs/source/learners/llm.rst b/docs/source/learners/llm.rst
@@ -93,7 +93,7 @@ You will see a evaluations results.
 
 Pipeline Usage
 -----------------------
-The OntoLearner package also offers a streamlined ``LearnerPipeline`` class that simplifies the entire process of initializing, training, predicting, and evaluating a RAG setup into a single call. This is particularly useful for rapid experimentation and deployment.
+The OntoLearner package also offers a streamlined ``LearnerPipeline`` class that simplifies initialization, training, prediction, and evaluation into a single call. In this section, we run the pipeline in **LLM-only** mode by setting ``llm_id`` only.
 
 .. code-block:: python
 
@@ -113,7 +113,7 @@ The OntoLearner package also offers a streamlined ``LearnerPipeline`` class that
 
     # Set up the learner pipeline using a lightweight instruction-tuned LLM
     pipeline = LearnerPipeline(
-        llm_id='Qwen/Qwen2.5-0.5B-Instruct',   # Small-scale LLM for reasoning over term-type assignments
+        llm_id='Qwen/Qwen2.5-0.5B-Instruct',   # LLM-only mode
         hf_token='...',                        # Hugging Face access token for loading gated models
         batch_size=32                          # Batch size for parallel inference (if applicable)
     )
diff --git a/docs/source/learners/rag.rst b/docs/source/learners/rag.rst
@@ -25,8 +25,8 @@ We start by importing necessary components from the ontolearner package, loading
         AgrO,                   # Example agricultural ontology
         train_test_split,       # Helper function for data splitting
         LabelMapper,            # Maps ontology labels to/from textual representations
-        StandardizedPrompting   # Standard prompting strategy across tasks
-        evaluation_report
+        StandardizedPrompting,  # Standard prompting strategy across tasks
+        evaluation_report,
     )
 
     # Load the AgrO ontology (an agricultural domain ontology)
@@ -99,16 +99,24 @@ To build a RAG model, you first initialize its constituent parts: an LLM learner
 
 Pipeline Usage
 ---------------------
-Similar to LLM and Retrieval learner, RAG Learner is also callable via streamlined ``LearnerPipeline`` class that simplifies the entire learning process.
+Similar to LLM and Retrieval learners, RAG is callable via ``LearnerPipeline``, you can run RAG in two equivalent ways:
 
-You initialize the ``LearnerPipeline`` by directly providing the ``retriever_id``, ``llm_id``, and other parameters like ``hf_token``, ``batch_size``, and ``top_k`` (number of top retrievals to include in RAG prompting). Then, you simply call the ``pipeline`` instance with your ``train_data``, ``test_data``, specify ``evaluate=True`` to compute metrics, and define the ``task`` (e.g., `'term-typing'`).
+1. Provide both ``retriever_id`` and ``llm_id`` (pipeline auto-composes an ``AutoRAGLearner``).
+2. Provide a prebuilt ``rag`` learner object for custom configurations.
 
 .. code-block:: python
 
-    # Import core modules from the OntoLearner library
-    from ontolearner import LearnerPipeline, AgrO, train_test_split
+    from ontolearner import (
+        LearnerPipeline,
+        AutoLLMLearner,
+        AutoRetrieverLearner,
+        AutoRAGLearner,
+        LabelMapper,
+        StandardizedPrompting,
+        AgrO,
+        train_test_split,
+    )
 
-    # Load the AgrO ontology, which contains concepts related to wines, their properties, and categories
     ontology = AgrO()
     ontology.load()  # Load entities, types, and structured term annotations from the ontology
     ontological_data = ontology.extract()
diff --git a/docs/source/learners/retrieval.rst b/docs/source/learners/retrieval.rst
@@ -81,7 +81,7 @@ When working with large contexts, the retriever model may encounter memory issue
 Pipeline Usage
 -----------------------
 
-Similar to LLM learner, Retrieval Learner is also callable via streamlined ``LearnerPipeline`` class that simplifies the entire process learning.
+Similar to the LLM learner, Retrieval learner is also callable via the streamlined ``LearnerPipeline`` class. In this section we use **retriever-only** mode by providing ``retriever_id`` only.
 
 .. code-block:: python
 
@@ -100,7 +100,7 @@ Similar to LLM learner, Retrieval Learner is also callable via streamlined ``Lea
     )
 
     # Initialize the learning pipeline using a dense retriever
-    # This configuration uses sentence embeddings to match similar relational contexts
+    # This is retriever-only mode (no LLM component)
     pipeline = LearnerPipeline(
         retriever_id='sentence-transformers/all-MiniLM-L6-v2',  # Hugging Face model ID for retrieval
         batch_size=10,       # Number of samples to process per batch (if batching is enabled internally)
@@ -125,6 +125,10 @@ Similar to LLM learner, Retrieval Learner is also callable via streamlined ``Lea
     # Print the full output dictionary (includes predictions)
     print(outputs)
 
+.. note::
+
+    For RAG with ``LearnerPipeline`` see: `https://ontolearner.readthedocs.io/learners/rag.html <https://ontolearner.readthedocs.io/learners/rag.html>`_.
+
 .. hint::
     See `Learning Tasks <https://ontolearner.readthedocs.io/learning_tasks/llms4ol.html>`_ for possible tasks within Learners.
 
@@ -372,6 +376,9 @@ Here the ``LLMAugmentedRetrieverLearner`` is the high-level wrapper that orchest
 	augments = {"config": llm_augmenter_generator.get_config()}
 	augments[task] = llm_augmenter_generator.augment(ontological_data, task=task)
 
+	base_retriever = LLMAugmentedRetriever()
+	learner = LLMAugmentedRetrieverLearner(base_retriever=base_retriever)
+
 	learner.set_augmenter(augments)
 	learner.load(model_id="Qwen/Qwen3-Embedding-8B")
 
diff --git a/docs/source/package_reference/pipeline.rst b/docs/source/package_reference/pipeline.rst
@@ -1,6 +1,12 @@
 Learner Pipeline
 ====================
 
+``LearnerPipeline`` supports:
+
+- retriever-only mode (set ``retriever_id``)
+- llm-only mode (set ``llm_id``)
+- rag mode (set both ``retriever_id`` and ``llm_id``), or provide a prebuilt ``rag`` learner
+
 LearnerPipeline
 ---------------------
 .. autoclass:: ontolearner._learner.LearnerPipeline
diff --git a/docs/source/quickstart.rst b/docs/source/quickstart.rst
@@ -137,7 +137,11 @@ To alighn with machine learning follow, once the ontology is loaded, and ontolog
    )
 
 
-Once the data is split into training and testing sets, you can apply learning models to the ontology learning tasks. OntoLearner supports multiple modeling approaches, including retrieval-based methods, Large Language Model (LLM)-based techniques, and Retrieval-Augmented Generation (RAG) strategies. The ``LearnerPipeline`` within OntoLearner is designed for ease of use, abstracting away the complexities of loading models and preparing datasets or data loaders. You can configure the pipeline with your choice of LLMs, retrievers, or RAG components.
+Once the data is split into training and testing sets, you can apply learning models to the ontology learning tasks. OntoLearner supports multiple modeling approaches, including retrieval-based methods, Large Language Model (LLM)-based techniques, and Retrieval-Augmented Generation (RAG) strategies. The ``LearnerPipeline`` supports all three modes:
+
+- Retriever-only: set ``retriever_id``
+- LLM-only: set ``llm_id``
+- RAG: set both ``retriever_id`` + ``llm_id`` for AutoRAGLearner. For prebuild RAG pass ``rag`` learner.
 
 In the example below, we configure a RAG-based learner by specifying the Qwen LLM (`Qwen/Qwen2.5-0.5B-Instruct <https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct>`_) and a retriever based on a sentence-transformer model (`all-MiniLM-L6-v2 <https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2>`_):
 
@@ -165,6 +169,34 @@ In the example below, we configure a RAG-based learner by specifying the Qwen LL
     - ``llm_id``: The instruction-following language model used to generate candidate outputs.
     - ``top_k``: Number of retrieved examples passed to the LLM (used in RAG setup).
     - ``hf_token``: Required for loading gated models from Hugging Face.
+    - ``rag``: Optional prebuilt ``AutoRAGLearner`` (or compatible) object for custom RAG setups.
+
+If you already created a RAG learner object, you can pass it directly:
+
+.. code-block:: python
+
+   from ontolearner import (
+       LearnerPipeline,
+       AutoLLMLearner,
+       AutoRetrieverLearner,
+       AutoRAGLearner,
+       LabelMapper,
+       StandardizedPrompting,
+   )
+
+   retriever = AutoRetrieverLearner(top_k=3)
+   llm = AutoLLMLearner(
+       prompting=StandardizedPrompting,
+       label_mapper=LabelMapper(),
+       token='<YOUR_HF_TOKEN>'
+   )
+   rag = AutoRAGLearner(retriever=retriever, llm=llm)
+
+   pipeline = LearnerPipeline(
+       rag=rag,
+       retriever_id='sentence-transformers/all-MiniLM-L6-v2',
+       llm_id='Qwen/Qwen2.5-0.5B-Instruct'
+   )
 
 Once configured, the pipeline is executed on the training and test data:
 
diff --git a/examples/llm_learner_alexbek_rag_term_typing.py b/examples/llm_learner_alexbek_rag_term_typing.py
@@ -27,11 +27,11 @@
     output_dir="./results/",
 )
 
-# Build the pipeline and pass raw structured objects end-to-end.
-# We place the RAG learner in the llm slot and set llm_id accordingly.
+# Build the pipeline and pass the dedicated RAG learner explicitly.
 pipe = LearnerPipeline(
-    llm=rag_learner,
+    rag=rag_learner,
     llm_id="Qwen/Qwen2.5-0.5B-Instruct",
+    retriever_id="sentence-transformers/all-MiniLM-L6-v2",
     ontologizer_data=True,
 )
 
diff --git a/examples/pipeline.ipynb b/examples/pipeline.ipynb
diff --git a/ontolearner/_learner.py b/ontolearner/_learner.py
diff --git a/ontolearner/learner/retriever/augmented_retriever.py b/ontolearner/learner/retriever/augmented_retriever.py
diff --git a/ontolearner/learner/term_typing/rwthdbis.py b/ontolearner/learner/term_typing/rwthdbis.py

Original file line number	Diff line number	Diff line change
`@@ -27,11 +27,11 @@`
`27`	`27`	`output_dir="./results/",`
`28`	`28`	`)`
`29`	`29`
`30`		`-# Build the pipeline and pass raw structured objects end-to-end.`
`31`		`-# We place the RAG learner in the llm slot and set llm_id accordingly.`
	`30`	`+# Build the pipeline and pass the dedicated RAG learner explicitly.`
`32`	`31`	`pipe = LearnerPipeline(`
`33`		`- llm=rag_learner,`
	`32`	`+ rag=rag_learner,`
`34`	`33`	`llm_id="Qwen/Qwen2.5-0.5B-Instruct",`
	`34`	`+ retriever_id="sentence-transformers/all-MiniLM-L6-v2",`
`35`	`35`	`ontologizer_data=True,`
`36`	`36`	`)`
`37`	`37`