Skip to content

Clarification on rule against synthetic data generation #39

@wise-east

Description

@wise-east

The paper mentions that agents are told: "IMPORTANT: You are NOT allowed to use the OpenAI API for anything but this evaluation script." in section 5.4 when they have access to an OpenAI API Key for evaluation, but this is not explicitly mentioned in the instructions given to the agents or the judge.

Are there other general rules that should be applied for synthetic data generation? e.g., are all closed-source LLM providers, or teacher models bigger than the model being post-trained, off the table?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions