Clarification on rule against synthetic data generation

The paper mentions that agents are told: "IMPORTANT: You are NOT allowed to use the OpenAI API for anything but this evaluation script." in section 5.4 when they have access to an OpenAI API Key for evaluation, but this is not explicitly mentioned in the instructions given to the agents or the judge. 

Are there other general rules that should be applied for synthetic data generation? e.g., are all closed-source LLM providers, or teacher models bigger than the model being post-trained, off the table?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on rule against synthetic data generation #39

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Clarification on rule against synthetic data generation #39

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions