docs: add implementation examples for AgentScope, OpenAI, raw HTTP, and langchain in math agent and learn2ask tutorials

Kanaricc · Kanaricc · commit aa511a7be550 · 2026-01-12T15:18:07.000+08:00
diff --git a/docs/en/example_learning_to_ask.md b/docs/en/example_learning_to_ask.md
@@ -99,6 +99,68 @@ At the code level, everything is implemented in `tutorial/example_learn2ask/lear
 * `ExampleLearn2Ask` defines the workflow: how the dialogue context is converted into the agent’s prompt/input, and what output format is expected (one follow-up question, optionally with choices).
 * `reward_fn` defines how to convert the judge’s feedback into a scalar reward used for training.
 
+We provide two implmentations of the agent based on AgentScope and langchain:
+
+=== "AgentScope"
+
+    ```python
+    # create the agent
+    self.agent = ReActAgent(
+      name="math_react_agent",
+      sys_prompt=system_prompt,
+      model=tuner.as_agentscope_model(),
+      formatter=DashScopeChatFormatter(),
+      toolkit=None,
+      memory=InMemoryMemory(),
+      max_iters=1,
+    )
+    self.agent.set_console_output_enabled(False)
+
+    # convert the messages to agent scope format and send to the agent
+    msg = [
+      # Msg("system", system_prompt, role="system"),
+      *[Msg(name=x["role"], content=x["content"], role=x["role"]) for x in messages]
+    ]
+    result = await self.agent.reply(msg)
+    if isinstance(result.content, str):
+      response = result.content
+    elif isinstance(result.content, list):
+      response = result.content[0]["text"]  # type: ignore
+    else:
+      raise NotImplementedError(f"do not know how to handle {type(result.content)}")
+    reward = await reward_fn_with_semaphore(msg, response, truth_action, truth_info)
+    return WorkflowOutput(reward=reward)
+    ```
+
+=== "Langchain"
+
+    ```python
+    # get the trainable llm
+    llm_info=tuner.as_oai_baseurl_apikey()
+    
+    # create the langchain agent
+    llm=ChatOpenAI(
+        base_url=llm_info.base_url,
+        api_key=lambda:llm_info.api_key,
+    )
+    agent=create_agent(
+        model=llm,
+        system_prompt=system_prompt,
+    )
+    
+    # build messages and send to the agent
+    msg=[
+        {"role": x["role"], "content": x["content"]} for x in messages
+    ]
+    result = agent.invoke({
+        "messages": msg, # type: ignore
+    })
+    
+    response = result["messages"][-1].content
+    reward = await reward_fn_with_semaphore(msg, response, truth_action, truth_info)
+    return WorkflowOutput(reward=reward)
+    ```
+
 #### 3.4 Reward
 
 `llm_reward` is the LLM-as-a-judge called inside `reward_fn` to score the model output. The evaluation follows these rules:
diff --git a/docs/en/example_math_agent.md b/docs/en/example_math_agent.md
@@ -112,6 +112,85 @@ Compare `final_answer` with reference, compute `raw_reward` and `is_success`.</l
 </div>
 </div>
 
+### YAML Configuration
+
+Most wiring happens in `tutorial/example_math_agent/math_agent.yaml`:
+
+=== "AgentScope"
+
+    ```yaml title="math_agent.yaml"
+    ajet:
+    task_reader:
+        type: huggingface_dat_repo   # also supports: dataset_file / env_service
+
+    rollout:
+        user_workflow: tutorial.example_math_agent.math_agent->ExampleMathLearn
+
+    task_judge:
+        judge_protocol: tutorial.example_math_agent.math_answer_as_judge->MathAnswerAndLlmAsJudge
+
+    model:
+        path: YOUR_MODEL_PATH
+    ```
+
+=== "OpenAI"
+
+    ```yaml title="math_agent.yaml"
+    ajet:
+    task_reader:
+        type: huggingface_dat_repo   # also supports: dataset_file / env_service
+
+    rollout:
+        user_workflow: tutorial.example_math_agent.math_agent_oai_sdk->ExampleMathLearn
+
+    task_judge:
+        judge_protocol: tutorial.example_math_agent.math_answer_as_judge->MathAnswerAndLlmAsJudge
+
+    model:
+        path: YOUR_MODEL_PATH
+    ```
+
+=== "Raw HTTP"
+
+    ```yaml title="math_agent.yaml"
+    ajet:
+    task_reader:
+        type: huggingface_dat_repo   # also supports: dataset_file / env_service
+
+    rollout:
+        user_workflow: tutorial.example_math_agent.math_agent_raw_http->ExampleMathLearn
+
+    task_judge:
+        judge_protocol: tutorial.example_math_agent.math_answer_as_judge->MathAnswerAndLlmAsJudge
+
+    model:
+        path: YOUR_MODEL_PATH
+    ```
+
+=== "langchain"
+
+    ```yaml title="math_agent.yaml"
+    ajet:
+    task_reader:
+        type: huggingface_dat_repo   # also supports: dataset_file / env_service
+
+    rollout:
+        user_workflow: tutorial.example_math_agent.math_agent_langchain->ExampleMathLearn
+
+    task_judge:
+        judge_protocol: tutorial.example_math_agent.math_answer_as_judge->MathAnswerAndLlmAsJudge
+
+    model:
+        path: YOUR_MODEL_PATH
+    ```
+
+
+| Field | Description |
+|-------|-------------|
+| `task_reader` | Where tasks come from |
+| `user_workflow` | Which workflow runs per sample |
+| `judge_protocol` | Which judge computes rewards |
+| `model.path` | Pretrained model to fine-tune |
 
 ### Code Walkthrough
 
@@ -140,50 +219,134 @@ Compare `final_answer` with reference, compute `raw_reward` and `is_success`.</l
     return WorkflowOutput(reward=None, metadata={"final_answer": final_answer})
     ```
 
+=== "OpenAI"
+
+    ```python title="Workflow Sketch"
+    client = tuner.as_raw_openai_sdk_client()
+
+    # call 1: get response with tool call
+    messages = [
+        { "role": "system", "content": self.system_prompt },
+        { "role": "user", "content": query }
+    ]
+    reply_message: ChatCompletion = await client.chat.completions.create(messages=messages, tools=self.available_functions)
+    if (reply_message.choices[0].message.content):
+        messages.append({
+            "role": "assistant",
+            "content": reply_message.choices[0].message.content
+        })
+
+    # If the model called a tool
+    if (reply_message.choices[0].message) and (reply_message.choices[0].message.tool_calls):
+        tool_calls: list[ChatCompletionMessageToolCall] = reply_message.choices[0].message.tool_calls
+        for tool_call in tool_calls:
+            if tool_call.function.name == "execute_python_code":
+                arguments = json.loads(tool_call.function.arguments)
+
+                def sync_wrapper():
+                    import subprocess
+                    import sys
+                    process = subprocess.run(
+                        [sys.executable, "-c", arguments["code"]],
+                        timeout=arguments.get("timeout", 300),
+                        capture_output=True,
+                        text=True
+                    )
+                    return process.stdout
+
+                result = await asyncio.to_thread(sync_wrapper)
+                tool_result_message = {
+                    "role": "tool",
+                    "tool_call_id": tool_call.id,
+                    "name": tool_call.function.name,
+                    "content": json.dumps({
+                        "return_code": str(result),
+                    })
+                }
+                messages.append(tool_result_message)
+
+        # Step 3: Make a follow-up API call with the tool result
+        final_response: ChatCompletion = await client.chat.completions.create(
+            messages=messages,
+        )
+        final_stage_response = final_response.choices[0].message.content
+    else:
+        final_stage_response = reply_message.choices[0].message.content
+
+
+    return WorkflowOutput(reward=None, metadata={"final_answer": final_stage_response})
+    ```
+
+
+=== "Raw HTTP"
+
+    ```python title="raw http"
+    url_and_apikey = tuner.as_oai_baseurl_apikey()
+    base_url = url_and_apikey.base_url
+    api_key = url_and_apikey.api_key
+
+    # take out query
+    query = workflow_task.task.main_query
+
+    messages = [
+        {
+            "role": "system",
+            "content": self.system_prompt
+        },
+        {
+            "role": "user",
+            "content": query
+        }
+    ]
+
+    # use raw http requests (non-streaming) to get response
+    response = requests.post(
+            f"{base_url}/chat/completions",
+            json={
+                "model": "fill_whatever_model", # Of course, this `model` field will be ignored.
+                "messages": messages,
+            },
+            headers={
+                "Authorization": f"Bearer {api_key}"
+            }
+    )
+    final_answer = response.json()['choices'][0]['message']['content']
+    return WorkflowOutput(reward=None, metadata={"final_answer": final_answer})
+    ```
+
+
 === "Langchain"
 
-    ```python
-
-    class ExampleMathLearn(Workflow):
-
-        name: str = "math_agent_workflow"
-        system_prompt: str = dedent("""
-            You are an agent specialized in solving math problems.
-            Please solve the math problem given to you.
-            You can write and execute Python code to perform calculation or verify your answer.
-            You should return your final answer within \\boxed{{}}.
-        """)
-
-        async def execute(self, workflow_task: WorkflowTask, tuner: AjetTuner) -> WorkflowOutput:   # type: ignore
-            # tuner to api key
-            url_and_apikey = tuner.as_oai_baseurl_apikey()
-            base_url = url_and_apikey.base_url
-            api_key = url_and_apikey.api_key
-
-            from langchain_openai import ChatOpenAI
-            llm=ChatOpenAI(
-                base_url=base_url,
-                api_key=lambda:api_key,
-            )
-            agent=create_agent(
-                model=llm,
-                system_prompt=self.system_prompt,
-            )
-
-            # take out query
-            query = workflow_task.task.main_query
-
-            response = agent.invoke({
-                "messages": [
-                    {
-                        "role": "user",
-                        "content": query
-                    }
-                ],
-            })
-
-            final_answer = response['messages'][-1].content
-            return WorkflowOutput(reward=None, metadata={"final_answer": final_answer})
+    ```python title="langchain"
+    # tuner to api key
+    url_and_apikey = tuner.as_oai_baseurl_apikey()
+    base_url = url_and_apikey.base_url
+    api_key = url_and_apikey.api_key
+
+    from langchain_openai import ChatOpenAI
+    llm=ChatOpenAI(
+        base_url=base_url,
+        api_key=lambda:api_key,
+    )
+    agent=create_agent(
+        model=llm,
+        system_prompt=self.system_prompt,
+    )
+
+    # take out query
+    query = workflow_task.task.main_query
+
+    response = agent.invoke({
+        "messages": [
+            {
+                "role": "user",
+                "content": query
+            }
+        ],
+    })
+
+    final_answer = response['messages'][-1].content
+    return WorkflowOutput(reward=None, metadata={"final_answer": final_answer})
     ```
 
 !!! warning "Important"