parameterlab
diff --git a/‎CHANGELOG.md‎
Lines changed: 1 addition & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/examples/example1.md‎
Lines changed: 0 additions & 8 deletions b/‎docs/examples/example1.md‎
Lines changed: 0 additions & 8 deletions
diff --git a/‎docs/examples/example2.md‎
Lines changed: 0 additions & 8 deletions b/‎docs/examples/example2.md‎
Lines changed: 0 additions & 8 deletions
diff --git a/‎docs/examples/five_a_day_benchmark.ipynb‎
Lines changed: 1 addition & 0 deletions b/‎docs/examples/five_a_day_benchmark.ipynb‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/examples/tutorial.ipynb‎
Lines changed: 1 addition & 0 deletions b/‎docs/examples/tutorial.ipynb‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/getting-started/quickstart.md‎
Lines changed: 167 additions & 25 deletions b/‎docs/getting-started/quickstart.md‎
Lines changed: 167 additions & 25 deletions
@@ -19,6 +19,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - `FileResultLogger` now accepts `pathlib.Path` for argument `output_dir` and has an `overwrite` argument to prevent overwriting of existing logs files.
 - `Benchmark` class now has a `fail_on_setup_error` flag that raises errors observed during setup of task (PR: #10)
 - The `Evaluator` class now has a `filter_traces` base method to conveniently adapt the same evaluator to different entities in the traces (PR: #10).
+- Improved Quick Start Guide in `docs/getting-started/quickstart.md`. (PR: #10)
 
 ### Fixed
 
 
@@ -0,0 +1 @@
+../../examples/five_a_day_benchmark/five_a_day_benchmark.ipynb
@@ -0,0 +1 @@
+../../examples/introduction/tutorial.ipynb
@@ -1,6 +1,6 @@
 # Getting Started
 
-This guide will help you get started with MASEval.
+This guide introduces the core concepts of MASEval and helps you get started quickly.
 
 ## Installation
 
@@ -23,48 +23,190 @@ This includes all core functionality for defining benchmarks, tasks, and evaluat
 
 ### Optional Dependencies
 
-Install additional integrations based on your agent framework and tooling. These can also be installed separately, but are offered here for convenience.
-
-**Agent Frameworks:**
+Install additional integrations based on your agent framework. For example,
 
 ```bash
+# Agent framework integrations
 pip install "maseval[smolagents]"  # SmolAgents integration
-```
 
-**LLM Providers:**
-
-```bash
+# LLM providers
 pip install "maseval[openai]"      # OpenAI models
-pip install "maseval[google]"      # Google GenAI models
+
+# Combine multiple extras
+pip install "maseval[smolagents,openai]"
+
+# Install everything (for examples or development)
+pip install "maseval[all]"
+pip install "maseval[examples]"
 ```
 
-**Observability & Tracing:**
+---
 
-```bash
-# TODO
+## Using the Library
+
+### Philosophy
+
+MASEval follows a clear separation of concerns:
+
+1. **You implement your agents** using any framework (LangChain, AutoGen, smolagents, custom code, etc.)
+2. **MASEval provides the evaluation infrastructure** — benchmarks, tasks, environments, and metrics
+3. **Adapters bridge the gap** — thin wrappers that connect your agent to MASEval's interface
+
+Think of MASEval like pytest for agents: you bring the code, MASEval runs the tests.
+
+### Key Concepts
+
+| Term             | Description                                                                                            |
+| ---------------- | ------------------------------------------------------------------------------------------------------ |
+| **Benchmark**    | Orchestrates the evaluation lifecycle: setup, execution, and measurement across a collection of tasks. |
+| **Task**         | A single evaluation unit with a query, expected outcome, and evaluation criteria.                      |
+| **Environment**  | The context in which agents operate (e.g., simulated tools, databases, file systems).                  |
+| **AgentAdapter** | Wraps your agent to provide a unified interface for MASEval.                                           |
+| **Evaluator**    | Measures agent performance by comparing outputs or states to expected results.                         |
+| **Callback**     | Hooks into the evaluation lifecycle for logging, tracing, or custom metrics.                           |
+
+### Implementing a Benchmark
+
+To create your own benchmark, subclass `Benchmark` and implement the required abstract methods. Here's the typical workflow:
+
+1. **Agents and environment** Define your agents and environment using any tool you prefer to use. Wrap them in `Environment` and `AgentAdapter`.
+2. **Create your tasks** as `Task` objects with queries and evaluation data
+3. **Subclass `Benchmark`** and implement the abstract setup/run/evaluate methods
+4. **Call `Benchmark.run(tasks)`** to execute the complete benchmark
+
+```python
+from maseval import Benchmark, AgentAdapter, Environment, Evaluator, Task
+
+class MyBenchmark(Benchmark):
+    """Custom benchmark for evaluating agents on my tasks."""
+
+    def setup_environment(self, agent_data, task) -> Environment:
+        # Initialize the environment for this task
+        # e.g., set up tools, databases, or simulated systems
+        ...
+
+    def setup_user(self, agent_data, environment, task):
+        # Optional: create a user simulator for interactive tasks
+        # Return None if not needed
+        return None
+
+    def setup_agents(self, agent_data, environment, task, user):
+        # Create your agent(s) and wrap them in AgentAdapter
+        # Returns a tuple: (agents_to_run, agents_dict)
+        #   - agents_to_run: list of agents to invoke in run_agents()
+        #   - agents_dict: dict mapping names to all agents for tracing
+        ...
+
+    def setup_evaluators(self, environment, task, agents, user):
+        # Define how success is measured
+        # Return: list of Evaluator instances
+        ...
+
+    def run_agents(self, agents, task, environment):
+        # Execute your agent system to solve the task
+        # Return the final answer (message traces are captured automatically)
+        ...
+
+    def evaluate(self, evaluators, agents, final_answer, traces):
+        # Run each evaluator with the execution data
+        # Return: list of evaluation result dicts
+        ...
 ```
 
-**Combine Multiple Extras:**
+Once implemented, run your benchmark:
 
-```bash
-pip install "maseval[smolagents,openai,wandb]"
+```python
+# Define your tasks
+tasks = TaskCollection([Task(query="...", expected="..."), ...])
+
+# Configure your agents (e.g., model parameters, tool settings)
+agent_config = {"model": "gpt-4", "temperature": 0.7}
+
+# Instantiate and run the evaluation
+benchmark = MyBenchmark(agent_data=agent_config)
+reports = benchmark.run(tasks)
 ```
 
-**Install Everything:**
+For the complete interface and lifecycle details, see the [Benchmark reference](../reference/benchmark.md).
 
-```bash
-pip install "maseval[all]"         # All integrations
-pip install "maseval[examples]"    # All dependencies needed for examples
+### Adapters
+
+Adapters are lightweight wrappers that connect your agent implementation to MASEval. They provide:
+
+- A unified `run()` method for executing agents
+- Message history tracking for tracing
+- Callback hooks for monitoring
+
+**Creating an adapter:**
+
+```python
+from maseval import AgentAdapter
+
+class MyAgentAdapter(AgentAdapter):
+    """Adapter for my custom agent framework."""
+
+    def _run_agent(self, query: str):
+        # Call your agent's execution method
+        result = self.agent.execute(query)
+
+        # Return the final answer (message history is tracked separately)
+        return result
+
+    def get_messages(self):
+        # Return the conversation history from your agent
+        return self.agent.get_conversation_history()
 ```
 
-## Use the Library
+MASEval provides built-in adapters for popular frameworks in `maseval.interface.agents`. For example:
+
+- `SmolAgentsAdapter` — for HuggingFace smolagents
+- `LangGraphAdapter` — for LangGraph agents
+
+See the [Agent Adapters](../interface/agents/smolagents.md) documentation for the full list.
+
+### Existing Benchmarks
+
+Pre-built benchmarks for established evaluation suites are coming soon.
+
+---
+
+## Using the Documentation
+
+This documentation is organized to help you find what you need quickly:
+
+### Examples
+
+End-to-end walkthroughs demonstrating complete evaluation pipelines. Start here to see MASEval in action with real agent implementations.
+
+- [Tiny Tutorial](../examples/tutorial.ipynb)
+- [5-A-Day Benchmark](../examples/five_a_day_benchmark.ipynb)
+
+### Guides
+
+Topic-based discussions covering specific features and best practices:
+
+- [Message Tracing](../guides/message-tracing.md) — Capture and analyze agent conversations
+- [Configuration Gathering](../guides/config-gathering.md) — Collect reproducible experiment configurations
+
+### Reference
+
+Formal API documentation for all MASEval components. The reference is split into two sections:
+
+**Core** — The fundamental building blocks (no optional dependencies):
 
-Start with examples.
+- [Benchmark](../reference/benchmark.md) — Evaluation orchestration
+- [Task](../reference/task.md) — Individual evaluation units
+- [Environment](../reference/environment.md) — Agent execution context
+- [AgentAdapter](../reference/agent.md) — Agent interface wrappers
+- [Evaluator](../reference/evaluator.md) — Performance measurement
+- [Callback](../reference/callback.md) — Lifecycle hooks
+- [MessageHistory](../reference/history.md) — Conversation tracking
 
-TODO: Insert examples
+**Interface** — Optional integrations for specific frameworks:
 
-## Use the Docs
+- [Agent Adapters](../interface/agents/smolagents.md) — Pre-built adapters (smolagents, langgraph, etc.)
+- [Inference Providers](../interface/inference/openai.md) — LLM provider integrations
 
-The docs are hosted here: TODO
+## Next Steps
 
-For comprehensive documentation on how to piece together the library's components—including detailed explanations of the execution lifecycle, setup methods, and best practices — see the [`Benchmark`](../reference/benchmark.md) class documentation.
+Work through the examples listed above. 3. **Explore the [`examples/five_a_day_benchmark/`](https://github.com/parameterlab/MASEval/tree/main/examples/five_a_day_benchmark) folder** for tool implementations, evaluators, and the CLI script (`five_a_day_benchmark.py`) 4. **Build your own benchmark** using the patterns you've learned
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+../../examples/five_a_day_benchmark/five_a_day_benchmark.ipynb`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+../../examples/introduction/tutorial.ipynb`