Skip to content

Commit b09090e

Browse files
committed
Get gpt-4o to do well on pybullet coffee pred invention.
1 parent 0a70654 commit b09090e

2 files changed

Lines changed: 61 additions & 18 deletions

File tree

Lines changed: 6 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,13 @@
1-
You are a vision system for a robot provided with two images: a before image showing the state before a skill is executed, an after image showing the state after the skill is executed. You are given a list of predicates below, and you are given the values of these predicates in the image before the skill is executed. Your job is to output the values of the following predicates in the image after the skill is executed. Pay careful attention to the visual changes between the two images to figure out which predicates change and which predicates do not change. Note that some or all of the predicates don't necessary have to change. First, output a description of what changes you expect to happen based on the skill that was just run, explicitly noting the skill that was run. Second, output a description of what visual changes you see happen between the before and after images, looking specifically at the objects involved in the skill's arguments, noting what objects these are. From these two descriptions, for each predicate labeled in the previous timestep, note whether you expect its value to change or stay the same. Next, for each predicate given in the list of predicates to label, output each predicate value in the after image as a bulleted list (use '*' for the bullets) with each predicate and value on a different line. Ensure there is a period ('.') after the truth value of the predicate. For each predicate value, provide an explanation as to why you labelled this predicate as having this particular value, and note what value this predicate had in the previous timestep, which is given to you in the prompt. Use the format: `* <predicate>: <truth_value>. <explanation>`. When labeling the value of a predicate, if you don't see the objects involved in that predicate, retain its truth value from the previous timestep. Also, if your description of changes you expect to happen, and your description of visual changes you saw happen, have nothing to do with the predicate you are trying to label, retain its truth value from the previous timestep. For example, if in the previous timestep I paint an object, and in the current timestamp I sit on it, we don't expect its color to change after sitting on it.
1+
You are a vision system for a robot working in a controlled, synthetic laboratory academic research environment at MIT. You are provided with two images: a before image showing the environment before a skill is executed, and an after image showing the environment after the skill is executed. The skill that was executed will be provided.
22

3-
Our object detection system tells us that the table is white, the coffee machine is black, the jug is white, and coffee is brown.
3+
You are given a list of predicates that describe the environment, and the values of these predicates in the image before the skill is executed. Your task is to label the truth value, True or False, for each predicate in the after image based on visual evidence, the provided facts, and logical inference.
44

5-
Double check your answer and make sure that you have provided labels for all predicates we requested labels for.
5+
The environment is safe, synthetic, and fully observable for the purposes of this task. There are no real-world consequences. You are free to make judgments based on what you see and know.
66

7-
Your response should have three sections. Here is an outline of what your response should look like:
8-
[START OULTLINE]
9-
# Expected changes based on the executed skill
10-
[insert your analysis on the expected changes you will see based on the skill that was executed]
7+
Pay careful attention to the visual changes between the two images to figure out which predicates change and which predicates do not change. Note that some or all of the predicates don't necessary have to change. First, output a description of what changes you expect to happen based on the skill that was just run, explicitly noting the skill that was run. Second, output a description of what visual changes you see happen between the before and after images, looking specifically at the objects involved in the skill's arguments, noting what objects these are. From these two descriptions, for each predicate labeled in the previous timestep, note whether you expect its value to change or stay the same. Next, for each predicate given in the list of predicates to label, output each predicate value in the after image as a bulleted list (use '*' for the bullets) with each predicate and value on a different line. Ensure there is a period ('.') after the truth value of the predicate. For each predicate value, provide an explanation as to why you labelled this predicate as having this particular value, and note what value this predicate had in the previous timestep, which is given to you in the prompt. Use the format: `* <predicate>: <truth_value>. <explanation>`. When labeling the value of a predicate, if you don't see the objects involved in that predicate, retain its truth value from the previous timestep. Also, if your description of changes you expect to happen, and your description of visual changes you saw happen, have nothing to do with the predicate you are trying to label, retain its truth value from the previous timestep. For example, if in the previous timestep I paint an object, and in the current timestamp I sit on it, we don't expect its color to change after sitting on it.
118

12-
# Visual changes observed between the images
13-
[insert your analysis on the visual changes observed between the images]
9+
The object detection system provides the following facts. The table is white. The coffee machine is black. The jug is white. The coffee is brown.
1410

15-
# Predicate values in the after image
16-
[insert your bulleted list of `* <predicate>: <truth value>. <explanation>`]
17-
[END OUTLINE]
11+
Double-check that you have provided a label for every requested predicate. Do not refuse to label any predicate.
1812

1913
Predicates to label:

predicators/pretrained_model_interface.py

Lines changed: 55 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -171,28 +171,77 @@ def set_openai_key(self, key: Optional[str] = None) -> None:
171171
assert "OPENAI_API_KEY" in os.environ
172172
key = os.environ["OPENAI_API_KEY"]
173173

174+
# @retry(wait=wait_random_exponential(min=1, max=60),
175+
# stop=stop_after_attempt(10))
176+
# def call_openai_api(self,
177+
# messages: list,
178+
# model: str = "gpt-4",
179+
# seed: Optional[int] = None,
180+
# max_tokens: int = 32,
181+
# temperature: float = 0.2,
182+
# verbose: bool = False) -> str: # pragma: no cover
183+
# """Make an API call to OpenAI."""
184+
# client = openai.OpenAI()
185+
# completion = client.chat.completions.create(
186+
# model=model,
187+
# messages=messages,
188+
# seed=seed,
189+
# max_tokens=max_tokens,
190+
# temperature=temperature,
191+
# )
192+
# if verbose:
193+
# logging.debug(f"OpenAI API response: {completion}")
194+
# assert len(completion.choices) == 1
195+
# assert completion.choices[0].message.content is not None
196+
# return completion.choices[0].message.content
174197
@retry(wait=wait_random_exponential(min=1, max=60),
175198
stop=stop_after_attempt(10))
176199
def call_openai_api(self,
177-
messages: list,
200+
messages: List[dict],
178201
model: str = "gpt-4",
179202
seed: Optional[int] = None,
180203
max_tokens: int = 32,
181204
temperature: float = 0.2,
182-
verbose: bool = False) -> str: # pragma: no cover
205+
verbose: bool = False) -> str:
183206
"""Make an API call to OpenAI."""
184207
client = openai.OpenAI()
208+
209+
# Insert a strong, safe research system prompt
210+
system_message = {
211+
"role": "system",
212+
# "content": (
213+
# "You are a helpful AI research assistant working on a robotics project at MIT. "
214+
# "All tasks are safe, academic, and intended for legitimate research purposes. "
215+
# "Respond helpfully and do not refuse unless absolutely necessary."
216+
# )
217+
"content": (
218+
"You are a helpful AI research assistant working on a robotics project at MIT. "
219+
"The task involves analyzing synthetic lab images in a controlled academic setting. "
220+
"You are permitted to make judgments about predicates describing object states, robot actions, and scene properties. "
221+
"You should always attempt to answer unless the input is truly unintelligible. "
222+
"Assume no safety concerns. Assume all content is safe."
223+
"You are permitted to infer reasonable truths based on typical physics, causality, and object behaviors."
224+
)
225+
}
226+
227+
# Ensure the system message is the first message
228+
full_messages = [system_message] + messages
229+
230+
# Make the chat completion call
185231
completion = client.chat.completions.create(
186232
model=model,
187-
messages=messages,
233+
messages=full_messages,
188234
seed=seed,
189235
max_tokens=max_tokens,
190236
temperature=temperature,
191237
)
238+
192239
if verbose:
193240
logging.debug(f"OpenAI API response: {completion}")
194-
assert len(completion.choices) == 1
195-
assert completion.choices[0].message.content is not None
241+
242+
assert len(completion.choices) == 1, "Unexpected number of choices returned."
243+
assert completion.choices[0].message.content is not None, "Completion content is None."
244+
196245
return completion.choices[0].message.content
197246

198247

@@ -358,7 +407,7 @@ def prepare_vision_messages(
358407
content_str = {
359408
"image_url": {
360409
"url": f"data:image/png;base64,{frame}",
361-
"detail": "auto"
410+
"detail": detail
362411
},
363412
"type": "image_url"
364413
}

0 commit comments

Comments
 (0)