Skip to content

Commit aa511a7

Browse files
committed
docs: add implementation examples for AgentScope, OpenAI, raw HTTP, and langchain in math agent and learn2ask tutorials
1 parent d6e47b5 commit aa511a7

File tree

2 files changed

+267
-42
lines changed

2 files changed

+267
-42
lines changed

docs/en/example_learning_to_ask.md

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,68 @@ At the code level, everything is implemented in `tutorial/example_learn2ask/lear
9999
* `ExampleLearn2Ask` defines the workflow: how the dialogue context is converted into the agent’s prompt/input, and what output format is expected (one follow-up question, optionally with choices).
100100
* `reward_fn` defines how to convert the judge’s feedback into a scalar reward used for training.
101101

102+
We provide two implmentations of the agent based on AgentScope and langchain:
103+
104+
=== "AgentScope"
105+
106+
```python
107+
# create the agent
108+
self.agent = ReActAgent(
109+
name="math_react_agent",
110+
sys_prompt=system_prompt,
111+
model=tuner.as_agentscope_model(),
112+
formatter=DashScopeChatFormatter(),
113+
toolkit=None,
114+
memory=InMemoryMemory(),
115+
max_iters=1,
116+
)
117+
self.agent.set_console_output_enabled(False)
118+
119+
# convert the messages to agent scope format and send to the agent
120+
msg = [
121+
# Msg("system", system_prompt, role="system"),
122+
*[Msg(name=x["role"], content=x["content"], role=x["role"]) for x in messages]
123+
]
124+
result = await self.agent.reply(msg)
125+
if isinstance(result.content, str):
126+
response = result.content
127+
elif isinstance(result.content, list):
128+
response = result.content[0]["text"] # type: ignore
129+
else:
130+
raise NotImplementedError(f"do not know how to handle {type(result.content)}")
131+
reward = await reward_fn_with_semaphore(msg, response, truth_action, truth_info)
132+
return WorkflowOutput(reward=reward)
133+
```
134+
135+
=== "Langchain"
136+
137+
```python
138+
# get the trainable llm
139+
llm_info=tuner.as_oai_baseurl_apikey()
140+
141+
# create the langchain agent
142+
llm=ChatOpenAI(
143+
base_url=llm_info.base_url,
144+
api_key=lambda:llm_info.api_key,
145+
)
146+
agent=create_agent(
147+
model=llm,
148+
system_prompt=system_prompt,
149+
)
150+
151+
# build messages and send to the agent
152+
msg=[
153+
{"role": x["role"], "content": x["content"]} for x in messages
154+
]
155+
result = agent.invoke({
156+
"messages": msg, # type: ignore
157+
})
158+
159+
response = result["messages"][-1].content
160+
reward = await reward_fn_with_semaphore(msg, response, truth_action, truth_info)
161+
return WorkflowOutput(reward=reward)
162+
```
163+
102164
#### 3.4 Reward
103165

104166
`llm_reward` is the LLM-as-a-judge called inside `reward_fn` to score the model output. The evaluation follows these rules:

docs/en/example_math_agent.md

Lines changed: 205 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,85 @@ Compare `final_answer` with reference, compute `raw_reward` and `is_success`.</l
112112
</div>
113113
</div>
114114

115+
### YAML Configuration
116+
117+
Most wiring happens in `tutorial/example_math_agent/math_agent.yaml`:
118+
119+
=== "AgentScope"
120+
121+
```yaml title="math_agent.yaml"
122+
ajet:
123+
task_reader:
124+
type: huggingface_dat_repo # also supports: dataset_file / env_service
125+
126+
rollout:
127+
user_workflow: tutorial.example_math_agent.math_agent->ExampleMathLearn
128+
129+
task_judge:
130+
judge_protocol: tutorial.example_math_agent.math_answer_as_judge->MathAnswerAndLlmAsJudge
131+
132+
model:
133+
path: YOUR_MODEL_PATH
134+
```
135+
136+
=== "OpenAI"
137+
138+
```yaml title="math_agent.yaml"
139+
ajet:
140+
task_reader:
141+
type: huggingface_dat_repo # also supports: dataset_file / env_service
142+
143+
rollout:
144+
user_workflow: tutorial.example_math_agent.math_agent_oai_sdk->ExampleMathLearn
145+
146+
task_judge:
147+
judge_protocol: tutorial.example_math_agent.math_answer_as_judge->MathAnswerAndLlmAsJudge
148+
149+
model:
150+
path: YOUR_MODEL_PATH
151+
```
152+
153+
=== "Raw HTTP"
154+
155+
```yaml title="math_agent.yaml"
156+
ajet:
157+
task_reader:
158+
type: huggingface_dat_repo # also supports: dataset_file / env_service
159+
160+
rollout:
161+
user_workflow: tutorial.example_math_agent.math_agent_raw_http->ExampleMathLearn
162+
163+
task_judge:
164+
judge_protocol: tutorial.example_math_agent.math_answer_as_judge->MathAnswerAndLlmAsJudge
165+
166+
model:
167+
path: YOUR_MODEL_PATH
168+
```
169+
170+
=== "langchain"
171+
172+
```yaml title="math_agent.yaml"
173+
ajet:
174+
task_reader:
175+
type: huggingface_dat_repo # also supports: dataset_file / env_service
176+
177+
rollout:
178+
user_workflow: tutorial.example_math_agent.math_agent_langchain->ExampleMathLearn
179+
180+
task_judge:
181+
judge_protocol: tutorial.example_math_agent.math_answer_as_judge->MathAnswerAndLlmAsJudge
182+
183+
model:
184+
path: YOUR_MODEL_PATH
185+
```
186+
187+
188+
| Field | Description |
189+
|-------|-------------|
190+
| `task_reader` | Where tasks come from |
191+
| `user_workflow` | Which workflow runs per sample |
192+
| `judge_protocol` | Which judge computes rewards |
193+
| `model.path` | Pretrained model to fine-tune |
115194

116195
### Code Walkthrough
117196

@@ -140,50 +219,134 @@ Compare `final_answer` with reference, compute `raw_reward` and `is_success`.</l
140219
return WorkflowOutput(reward=None, metadata={"final_answer": final_answer})
141220
```
142221

222+
=== "OpenAI"
223+
224+
```python title="Workflow Sketch"
225+
client = tuner.as_raw_openai_sdk_client()
226+
227+
# call 1: get response with tool call
228+
messages = [
229+
{ "role": "system", "content": self.system_prompt },
230+
{ "role": "user", "content": query }
231+
]
232+
reply_message: ChatCompletion = await client.chat.completions.create(messages=messages, tools=self.available_functions)
233+
if (reply_message.choices[0].message.content):
234+
messages.append({
235+
"role": "assistant",
236+
"content": reply_message.choices[0].message.content
237+
})
238+
239+
# If the model called a tool
240+
if (reply_message.choices[0].message) and (reply_message.choices[0].message.tool_calls):
241+
tool_calls: list[ChatCompletionMessageToolCall] = reply_message.choices[0].message.tool_calls
242+
for tool_call in tool_calls:
243+
if tool_call.function.name == "execute_python_code":
244+
arguments = json.loads(tool_call.function.arguments)
245+
246+
def sync_wrapper():
247+
import subprocess
248+
import sys
249+
process = subprocess.run(
250+
[sys.executable, "-c", arguments["code"]],
251+
timeout=arguments.get("timeout", 300),
252+
capture_output=True,
253+
text=True
254+
)
255+
return process.stdout
256+
257+
result = await asyncio.to_thread(sync_wrapper)
258+
tool_result_message = {
259+
"role": "tool",
260+
"tool_call_id": tool_call.id,
261+
"name": tool_call.function.name,
262+
"content": json.dumps({
263+
"return_code": str(result),
264+
})
265+
}
266+
messages.append(tool_result_message)
267+
268+
# Step 3: Make a follow-up API call with the tool result
269+
final_response: ChatCompletion = await client.chat.completions.create(
270+
messages=messages,
271+
)
272+
final_stage_response = final_response.choices[0].message.content
273+
else:
274+
final_stage_response = reply_message.choices[0].message.content
275+
276+
277+
return WorkflowOutput(reward=None, metadata={"final_answer": final_stage_response})
278+
```
279+
280+
281+
=== "Raw HTTP"
282+
283+
```python title="raw http"
284+
url_and_apikey = tuner.as_oai_baseurl_apikey()
285+
base_url = url_and_apikey.base_url
286+
api_key = url_and_apikey.api_key
287+
288+
# take out query
289+
query = workflow_task.task.main_query
290+
291+
messages = [
292+
{
293+
"role": "system",
294+
"content": self.system_prompt
295+
},
296+
{
297+
"role": "user",
298+
"content": query
299+
}
300+
]
301+
302+
# use raw http requests (non-streaming) to get response
303+
response = requests.post(
304+
f"{base_url}/chat/completions",
305+
json={
306+
"model": "fill_whatever_model", # Of course, this `model` field will be ignored.
307+
"messages": messages,
308+
},
309+
headers={
310+
"Authorization": f"Bearer {api_key}"
311+
}
312+
)
313+
final_answer = response.json()['choices'][0]['message']['content']
314+
return WorkflowOutput(reward=None, metadata={"final_answer": final_answer})
315+
```
316+
317+
143318
=== "Langchain"
144319

145-
```python
146-
147-
class ExampleMathLearn(Workflow):
148-
149-
name: str = "math_agent_workflow"
150-
system_prompt: str = dedent("""
151-
You are an agent specialized in solving math problems.
152-
Please solve the math problem given to you.
153-
You can write and execute Python code to perform calculation or verify your answer.
154-
You should return your final answer within \\boxed{{}}.
155-
""")
156-
157-
async def execute(self, workflow_task: WorkflowTask, tuner: AjetTuner) -> WorkflowOutput: # type: ignore
158-
# tuner to api key
159-
url_and_apikey = tuner.as_oai_baseurl_apikey()
160-
base_url = url_and_apikey.base_url
161-
api_key = url_and_apikey.api_key
162-
163-
from langchain_openai import ChatOpenAI
164-
llm=ChatOpenAI(
165-
base_url=base_url,
166-
api_key=lambda:api_key,
167-
)
168-
agent=create_agent(
169-
model=llm,
170-
system_prompt=self.system_prompt,
171-
)
172-
173-
# take out query
174-
query = workflow_task.task.main_query
175-
176-
response = agent.invoke({
177-
"messages": [
178-
{
179-
"role": "user",
180-
"content": query
181-
}
182-
],
183-
})
184-
185-
final_answer = response['messages'][-1].content
186-
return WorkflowOutput(reward=None, metadata={"final_answer": final_answer})
320+
```python title="langchain"
321+
# tuner to api key
322+
url_and_apikey = tuner.as_oai_baseurl_apikey()
323+
base_url = url_and_apikey.base_url
324+
api_key = url_and_apikey.api_key
325+
326+
from langchain_openai import ChatOpenAI
327+
llm=ChatOpenAI(
328+
base_url=base_url,
329+
api_key=lambda:api_key,
330+
)
331+
agent=create_agent(
332+
model=llm,
333+
system_prompt=self.system_prompt,
334+
)
335+
336+
# take out query
337+
query = workflow_task.task.main_query
338+
339+
response = agent.invoke({
340+
"messages": [
341+
{
342+
"role": "user",
343+
"content": query
344+
}
345+
],
346+
})
347+
348+
final_answer = response['messages'][-1].content
349+
return WorkflowOutput(reward=None, metadata={"final_answer": final_answer})
187350
```
188351

189352
!!! warning "Important"

0 commit comments

Comments
 (0)