You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You are an expert in graphical user interfaces and Python code. You are responsible for executing the current subtask: `SUBTASK_DESCRIPTION` of the larger goal: `TASK_DESCRIPTION`.
11
10
IMPORTANT: ** The subtasks: ['DONE_TASKS'] have already been done. The future subtasks ['FUTURE_TASKS'] will be done in the future by me. You must only perform the current subtask: `SUBTASK_DESCRIPTION`. Do not try to do future subtasks. **
12
11
You are working in CURRENT_OS. You must only complete the subtask provided and not the larger goal.
3. The history of your previous interactions with the UI.
17
16
4. Access to the following class and methods to interact with the UI:
18
17
class Agent:
19
-
"""
20
-
)
18
+
""")
21
19
22
20
forattr_nameindir(agent_class):
23
21
attr=getattr(agent_class, attr_name)
@@ -29,8 +27,7 @@ def {attr_name}{signature}:
29
27
'''{attr.__doc__}'''
30
28
"""
31
29
32
-
procedural_memory+=textwrap.dedent(
33
-
"""
30
+
procedural_memory+=textwrap.dedent("""
34
31
Your response should be formatted like this:
35
32
(Previous action verification)
36
33
Carefully analyze based on the screenshot and the accessibility tree if the previous action was successful. If the previous action was not successful, provide a reason for the failure.
@@ -57,8 +54,7 @@ def {attr_name}{signature}:
57
54
7. Do not do anything other than the exact specified task. Return with `agent.done()` immediately after the task is completed or `agent.fail()` if it cannot be completed.
58
55
8. Whenever possible use hot-keys or typing rather than mouse clicks.
59
56
9. My computer's password is 'password', feel free to use it when you need sudo rights
60
-
"""
61
-
)
57
+
""")
62
58
returnprocedural_memory.strip()
63
59
64
60
# MANAGER_PROMPT = """You are a planning agent for solving GUI navigation tasks. You will be provided the initial configuration of a system including accessibility, screenshot and other information. You need to solve the following task: TASK_DESCRIPTION. You will describe in as much detail as possible the steps required to complete the task by a GUI agent. Please do not include any verification steps in your plan that is not your responsibility. IMPORTANT: Your plan should be as concize as possible and should not include any unnecessary steps. Do not fine-tune, or embellish anything or cause any side effects. Generate the plan that can be accomplished in the shortest time. Please take the current state into account when generating the plan. Please provide the plan in a step-by-step format and make sure you do not include anything that's already done in the GUI in your plan."""
You are an expert in graphical user interfaces and Python code. You are responsible for executing the current subtask: `SUBTASK_DESCRIPTION` of the larger goal: `TASK_DESCRIPTION`.
12
11
IMPORTANT: ** The subtasks: ['DONE_TASKS'] have already been done. The future subtasks ['FUTURE_TASKS'] will be done in the future by me. You must only perform the current subtask: `SUBTASK_DESCRIPTION`. Do not try to do future subtasks. **
13
12
You are working in CURRENT_OS. You must only complete the subtask provided and not the larger goal.
2. The history of your previous interactions with the UI.
17
16
3. Access to the following class and methods to interact with the UI:
18
17
class Agent:
19
-
"""
20
-
)
18
+
""")
21
19
22
20
forattr_nameindir(agent_class):
23
21
ifattr_nameinskipped_actions:
@@ -32,8 +30,7 @@ def {attr_name}{signature}:
32
30
'''{attr.__doc__}'''
33
31
"""
34
32
35
-
procedural_memory+=textwrap.dedent(
36
-
"""
33
+
procedural_memory+=textwrap.dedent("""
37
34
Your response should be formatted like this:
38
35
(Previous action verification)
39
36
Carefully analyze based on the screenshot if the previous action was successful. If the previous action was not successful, provide a reason for the failure.
@@ -60,14 +57,12 @@ def {attr_name}{signature}:
60
57
8. Whenever possible, your grounded action should use hot-keys with the agent.hotkey() action instead of clicking or dragging.
61
58
9. My computer's password is 'password', feel free to use it when you need sudo rights.
62
59
10. Do not use the "command" + "tab" hotkey on MacOS.
63
-
"""
64
-
)
60
+
""")
65
61
66
62
returnprocedural_memory.strip()
67
63
68
64
# Manager prompt that generalizes to initial planning, re-planning after subtask completion, and re-planning after failure
69
-
COMBINED_MANAGER_PROMPT=textwrap.dedent(
70
-
"""
65
+
COMBINED_MANAGER_PROMPT=textwrap.dedent("""
71
66
You are an expert planning agent for solving GUI navigation tasks. You need to generate a plan for solving the following task: TASK_DESCRIPTION.
72
67
73
68
You are provided with:
@@ -91,8 +86,7 @@ def {attr_name}{signature}:
91
86
- If you feel the trajectory and future subtasks seem correct based on the current state of the desktop, you may re-use future subtasks.
92
87
- If you feel some future subtasks are not detailed enough, use your observations from the desktop screenshot to update these subtasks to be more detailed.
93
88
- If you feel some future subtasks are incorrect or unnecessary, feel free to modify or even remove them.
94
-
"""
95
-
)
89
+
""")
96
90
97
91
# USED IN OSWORLD EXPERIMENTS
98
92
RAG_AGENT_OSWORLD="""
@@ -107,8 +101,7 @@ def {attr_name}{signature}:
107
101
"""
108
102
109
103
# For reflection agent, post-action verification mainly for cycle detection
110
-
REFLECTION_ON_TRAJECTORY=textwrap.dedent(
111
-
"""
104
+
REFLECTION_ON_TRAJECTORY=textwrap.dedent("""
112
105
You are a reflection agent designed to assist in subtask execution by reflecting on the trajectory of a subtask and providing feedback for what the next step should be.
113
106
You have access to the Subtask Description and the Current Trajectory of another computer agent. The Current Trajectory is a sequence of a desktop image, chain-of-thought reasoning, and a desktop action for each time step. The last image is the screen's display after the last action.
114
107
Your task is to generate a reflection. Your generated reflection must fall under one of the two cases listed below:
@@ -120,8 +113,7 @@ def {attr_name}{signature}:
120
113
- DO NOT suggest any specific future plans or actions. Your only goal is to provide a reflection, not an actual plan or action.
121
114
- Any response that falls under Case 1 should explain why the trajectory is not going according to plan. You should especially lookout for cycles of actions that are continually repeated with no progress.
122
115
- Any response that falls under Case 2 should be concise, since you just need to affirm the agent to continue with the current trajectory.
123
-
"""
124
-
)
116
+
""")
125
117
126
118
TASK_SUMMARIZATION_PROMPT="""
127
119
You are a summarization agent designed to analyze a trajectory of desktop task execution.
@@ -178,8 +170,7 @@ def {attr_name}{signature}:
178
170
Analyze the given plan and provide the output in this JSON format within the <json></json> tags. Ensure the JSON is valid and properly escaped.
179
171
"""
180
172
181
-
SUBTASK_SUMMARIZATION_PROMPT=textwrap.dedent(
182
-
"""
173
+
SUBTASK_SUMMARIZATION_PROMPT=textwrap.dedent("""
183
174
You are a summarization agent designed to analyze a trajectory of desktop task execution.
184
175
You will summarize the correct plan and grounded actions based on the whole trajectory of a subtask, ensuring the summarized plan contains only correct and necessary steps.
185
176
@@ -195,8 +186,7 @@ def {attr_name}{signature}:
195
186
Action: [Description of the correct action]
196
187
Grounded Action: [Grounded actions with the \"element1_description\" replacement when needed]
197
188
5. Exclude any other details that are not necessary for completing the task.
198
-
"""
199
-
)
189
+
""")
200
190
201
191
STATE_EVALUATOR_SYSTEM_PROMPT="""
202
192
You are an impartial evaluator to evaluate the completeness of the given desktop computer task, you are also an expert of accessibility tree, os environment and python programming.
@@ -242,8 +232,7 @@ def {attr_name}{signature}:
242
232
Only say Yes or No in the Judgment section. Do not provide any other information in the Judgment section.
243
233
"""
244
234
245
-
PHRASE_TO_WORD_COORDS_PROMPT=textwrap.dedent(
246
-
"""
235
+
PHRASE_TO_WORD_COORDS_PROMPT=textwrap.dedent("""
247
236
You are an expert in graphical user interfaces. Your task is to process a phrase of text, and identify the most relevant word on the computer screen.
248
237
You are provided with a phrase, a table with all the text on the screen, and a screenshot of the computer screen. You will identify the single word id that is best associated with the provided phrase.
249
238
This single word must be displayed on the computer screenshot, and its location on the screen should align with the provided phrase.
@@ -254,5 +243,4 @@ def {attr_name}{signature}:
254
243
2. Then, output the unique word id. Remember, the word id is the 1st number in each row of the text table.
255
244
3. If there are multiple occurrences of the same word, use the surrounding context in the phrase to choose the correct one. Pay very close attention to punctuation and capitalization.
You are an expert in graphical user interfaces and Python code. You are responsible for executing the task: `TASK_DESCRIPTION`.
11
10
You are working in CURRENT_OS.
12
11
You are provided with:
13
12
1. A screenshot of the current time step.
14
13
2. The history of your previous interactions with the UI.
15
14
3. Access to the following class and methods to interact with the UI:
16
15
class Agent:
17
-
"""
18
-
)
16
+
""")
19
17
20
18
forattr_nameindir(agent_class):
21
19
ifattr_nameinskipped_actions:
@@ -30,8 +28,7 @@ def {attr_name}{signature}:
30
28
'''{attr.__doc__}'''
31
29
"""
32
30
33
-
procedural_memory+=textwrap.dedent(
34
-
"""
31
+
procedural_memory+=textwrap.dedent("""
35
32
Your response should be formatted like this:
36
33
(Previous action verification)
37
34
Carefully analyze based on the screenshot if the previous action was successful. If the previous action was not successful, provide a reason for the failure.
@@ -58,14 +55,12 @@ def {attr_name}{signature}:
58
55
8. Generate agent.fail() as your grounded action if you get exhaustively stuck on the task and believe it is impossible.
59
56
9. Generate agent.done() as your grounded action when your believe the task is fully complete.
60
57
10. Do not use the "command" + "tab" hotkey on MacOS.
61
-
"""
62
-
)
58
+
""")
63
59
64
60
returnprocedural_memory.strip()
65
61
66
62
# For reflection agent, post-action verification mainly for cycle detection
67
-
REFLECTION_ON_TRAJECTORY=textwrap.dedent(
68
-
"""
63
+
REFLECTION_ON_TRAJECTORY=textwrap.dedent("""
69
64
You are an expert computer use agent designed to reflect on the trajectory of a task and provide feedback on what has happened so far.
70
65
You have access to the Task Description and the Current Trajectory of another computer agent. The Current Trajectory is a sequence of a desktop image, chain-of-thought reasoning, and a desktop action for each time step. The last image is the screen's display after the last action.
71
66
Your task is to generate a reflection. Your generated reflection must fall under one of the cases listed below:
@@ -79,11 +74,9 @@ def {attr_name}{signature}:
79
74
- DO NOT suggest any specific future plans or actions. Your only goal is to provide a reflection, not an actual plan or action.
80
75
- Any response that falls under Case 1 should explain why the trajectory is not going according to plan. You should especially lookout for cycles of actions that are continually repeated with no progress.
81
76
- Any response that falls under Case 2 should be concise, since you just need to affirm the agent to continue with the current trajectory.
82
-
"""
83
-
)
77
+
""")
84
78
85
-
PHRASE_TO_WORD_COORDS_PROMPT=textwrap.dedent(
86
-
"""
79
+
PHRASE_TO_WORD_COORDS_PROMPT=textwrap.dedent("""
87
80
You are an expert in graphical user interfaces. Your task is to process a phrase of text, and identify the most relevant word on the computer screen.
88
81
You are provided with a phrase, a table with all the text on the screen, and a screenshot of the computer screen. You will identify the single word id that is best associated with the provided phrase.
89
82
This single word must be displayed on the computer screenshot, and its location on the screen should align with the provided phrase.
@@ -94,5 +87,4 @@ def {attr_name}{signature}:
94
87
2. Then, output the unique word id. Remember, the word id is the 1st number in each row of the text table.
95
88
3. If there are multiple occurrences of the same word, use the surrounding context in the phrase to choose the correct one. Pay very close attention to punctuation and capitalization.
0 commit comments