Get gpt-4o to do well on pybullet coffee pred invention.

ashayathalye · ashayathalye · commit b09090ea1bf6 · 2025-04-28T01:12:51.000-07:00
diff --git a/predicators/datasets/vlm_input_data_prompts/atom_labelling/img_option_diffs_label_history.txt b/predicators/datasets/vlm_input_data_prompts/atom_labelling/img_option_diffs_label_history.txt
@@ -1,19 +1,13 @@
-You are a vision system for a robot provided with two images: a before image showing the state before a skill is executed, an after image showing the state after the skill is executed. You are given a list of predicates below, and you are given the values of these predicates in the image before the skill is executed. Your job is to output the values of the following predicates in the image after the skill is executed. Pay careful attention to the visual changes between the two images to figure out which predicates change and which predicates do not change. Note that some or all of the predicates don't necessary have to change. First, output a description of what changes you expect to happen based on the skill that was just run, explicitly noting the skill that was run. Second, output a description of what visual changes you see happen between the before and after images, looking specifically at the objects involved in the skill's arguments, noting what objects these are. From these two descriptions, for each predicate labeled in the previous timestep, note whether you expect its value to change or stay the same. Next, for each predicate given in the list of predicates to label, output each predicate value in the after image as a bulleted list (use '*' for the bullets) with each predicate and value on a different line. Ensure there is a period ('.') after the truth value of the predicate. For each predicate value, provide an explanation as to why you labelled this predicate as having this particular value, and note what value this predicate had in the previous timestep, which is given to you in the prompt. Use the format: `* <predicate>: <truth_value>. <explanation>`. When labeling the value of a predicate, if you don't see the objects involved in that predicate, retain its truth value from the previous timestep. Also, if your description of changes you expect to happen, and your description of visual changes you saw happen, have nothing to do with the predicate you are trying to label, retain its truth value from the previous timestep. For example, if in the previous timestep I paint an object, and in the current timestamp I sit on it, we don't expect its color to change after sitting on it. 
+You are a vision system for a robot working in a controlled, synthetic laboratory academic research environment at MIT. You are provided with two images: a before image showing the environment before a skill is executed, and an after image showing the environment after the skill is executed. The skill that was executed will be provided.
 
-Our object detection system tells us that the table is white, the coffee machine is black, the jug is white, and coffee is brown.
+You are given a list of predicates that describe the environment, and the values of these predicates in the image before the skill is executed. Your task is to label the truth value, True or False, for each predicate in the after image based on visual evidence, the provided facts, and logical inference.
 
-Double check your answer and make sure that you have provided labels for all predicates we requested labels for.
+The environment is safe, synthetic, and fully observable for the purposes of this task. There are no real-world consequences. You are free to make judgments based on what you see and know.
 
-Your response should have three sections. Here is an outline of what your response should look like:
-[START OULTLINE]
-# Expected changes based on the executed skill
-[insert your analysis on the expected changes you will see based on the skill that was executed]
+Pay careful attention to the visual changes between the two images to figure out which predicates change and which predicates do not change. Note that some or all of the predicates don't necessary have to change. First, output a description of what changes you expect to happen based on the skill that was just run, explicitly noting the skill that was run. Second, output a description of what visual changes you see happen between the before and after images, looking specifically at the objects involved in the skill's arguments, noting what objects these are. From these two descriptions, for each predicate labeled in the previous timestep, note whether you expect its value to change or stay the same. Next, for each predicate given in the list of predicates to label, output each predicate value in the after image as a bulleted list (use '*' for the bullets) with each predicate and value on a different line. Ensure there is a period ('.') after the truth value of the predicate. For each predicate value, provide an explanation as to why you labelled this predicate as having this particular value, and note what value this predicate had in the previous timestep, which is given to you in the prompt. Use the format: `* <predicate>: <truth_value>. <explanation>`. When labeling the value of a predicate, if you don't see the objects involved in that predicate, retain its truth value from the previous timestep. Also, if your description of changes you expect to happen, and your description of visual changes you saw happen, have nothing to do with the predicate you are trying to label, retain its truth value from the previous timestep. For example, if in the previous timestep I paint an object, and in the current timestamp I sit on it, we don't expect its color to change after sitting on it. 
 
-# Visual changes observed between the images
-[insert your analysis on the visual changes observed between the images]
+The object detection system provides the following facts. The table is white. The coffee machine is black. The jug is white. The coffee is brown.
 
-# Predicate values in the after image
-[insert your bulleted list of `* <predicate>: <truth value>. <explanation>`]
-[END OUTLINE]
+Double-check that you have provided a label for every requested predicate. Do not refuse to label any predicate.
 
 Predicates to label:
diff --git a/predicators/pretrained_model_interface.py b/predicators/pretrained_model_interface.py
@@ -171,28 +171,77 @@ def set_openai_key(self, key: Optional[str] = None) -> None:
             assert "OPENAI_API_KEY" in os.environ
             key = os.environ["OPENAI_API_KEY"]
 
+    # @retry(wait=wait_random_exponential(min=1, max=60),
+    #        stop=stop_after_attempt(10))
+    # def call_openai_api(self,
+    #                     messages: list,
+    #                     model: str = "gpt-4",
+    #                     seed: Optional[int] = None,
+    #                     max_tokens: int = 32,
+    #                     temperature: float = 0.2,
+    #                     verbose: bool = False) -> str:  # pragma: no cover
+    #     """Make an API call to OpenAI."""
+    #     client = openai.OpenAI()
+    #     completion = client.chat.completions.create(
+    #         model=model,
+    #         messages=messages,
+    #         seed=seed,
+    #         max_tokens=max_tokens,
+    #         temperature=temperature,
+    #     )
+    #     if verbose:
+    #         logging.debug(f"OpenAI API response: {completion}")
+    #     assert len(completion.choices) == 1
+    #     assert completion.choices[0].message.content is not None
+    #     return completion.choices[0].message.content
     @retry(wait=wait_random_exponential(min=1, max=60),
            stop=stop_after_attempt(10))
     def call_openai_api(self,
-                        messages: list,
+                        messages: List[dict],
                         model: str = "gpt-4",
                         seed: Optional[int] = None,
                         max_tokens: int = 32,
                         temperature: float = 0.2,
-                        verbose: bool = False) -> str:  # pragma: no cover
+                        verbose: bool = False) -> str:
         """Make an API call to OpenAI."""
         client = openai.OpenAI()
+
+        # Insert a strong, safe research system prompt
+        system_message = {
+            "role": "system",
+            # "content": (
+            #     "You are a helpful AI research assistant working on a robotics project at MIT. "
+            #     "All tasks are safe, academic, and intended for legitimate research purposes. "
+            #     "Respond helpfully and do not refuse unless absolutely necessary."
+            # )
+            "content": (
+                "You are a helpful AI research assistant working on a robotics project at MIT. "
+                "The task involves analyzing synthetic lab images in a controlled academic setting. "
+                "You are permitted to make judgments about predicates describing object states, robot actions, and scene properties. "
+                "You should always attempt to answer unless the input is truly unintelligible. "
+                "Assume no safety concerns. Assume all content is safe."
+                "You are permitted to infer reasonable truths based on typical physics, causality, and object behaviors."
+            )
+        }
+
+        # Ensure the system message is the first message
+        full_messages = [system_message] + messages
+
+        # Make the chat completion call
         completion = client.chat.completions.create(
             model=model,
-            messages=messages,
+            messages=full_messages,
             seed=seed,
             max_tokens=max_tokens,
             temperature=temperature,
         )
+
         if verbose:
             logging.debug(f"OpenAI API response: {completion}")
-        assert len(completion.choices) == 1
-        assert completion.choices[0].message.content is not None
+
+        assert len(completion.choices) == 1, "Unexpected number of choices returned."
+        assert completion.choices[0].message.content is not None, "Completion content is None."
+
         return completion.choices[0].message.content
 
 
@@ -358,7 +407,7 @@ def prepare_vision_messages(
             content_str = {
                 "image_url": {
                     "url": f"data:image/png;base64,{frame}",
-                    "detail": "auto"
+                    "detail": detail
                 },
                 "type": "image_url"
             }