Output Formats

we support generating datasets in alpaca, sharegpt and chatml format.

Alpaca Format

Supervised Fine-Tuning Dataset

Example In supervised fine-tuning, the instruction column will be concatenated with the input column and used as the user prompt, then the user prompt would be instruction\ninput. The output column represents the model response.

[
  {
    "instruction": "user instruction (required)",
    "input": "user input (optional)",
    "output": "model response (required)"
  }
]

Sharegpt Format

Supervised Fine-Tuning Dataset

Example Compared to the alpaca format, the sharegpt format allows the datasets have more roles, such as human, gpt, observation and function. They are presented in a list of objects in the conversations column.

Note that the human and observation should appear in odd positions, while gpt and function should appear in even positions. The gpt and function will be learned by the model.

In our implementation, only human and gpt will be used.

[
  {
    "conversations": [
      {
        "from": "human",
        "value": "user instruction (required)"
      },
      {
        "from": "gpt",
        "value": "model response (required)"
      }
    ]
    }
]

ChatML Format

Supervised Fine-Tuning Dataset

Example Like the sharegpt format, the chatml format also allows the datasets have more roles, such as user, assistant, system and tool. They are presented in a list of objects in the messages column.

In our implementation, only user and assistant will be used.

[
  {
    "messages": [
      {
        "role": "user",
        "content": "user instruction (required)"
      },
      {
        "role": "assistant",
        "content": "model response (required)"
      }
    ]
    }
]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output Formats

Alpaca Format

Supervised Fine-Tuning Dataset

Sharegpt Format

Supervised Fine-Tuning Dataset

ChatML Format

Supervised Fine-Tuning Dataset

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Output Formats

Alpaca Format

Supervised Fine-Tuning Dataset

Sharegpt Format

Supervised Fine-Tuning Dataset

ChatML Format

Supervised Fine-Tuning Dataset