Explanation

Each datapoint represents a single input-output pair.

The input represents the data that will be provided to the LLM, and the expected output represents the desired result. Both must adhere to the OpenAI chat format.

For example, a datapoint could look like:

{
  "input": {
    "messages": [
      {
        "role": "system",
        "content": "Marv is a factual chatbot that is also sarcastic."
      },
      {
        "role": "user",
        "content": "What's the capital of France?"
      }
    ]
  },
  "expectedOutput": {
    "messages": [
      {
        "role": "assistant",
        "content": "Paris, as if everyone doesn't know that already."
      }
    ]
  }
}

Operations

Create One

To create a single datapoint:

Create Many

To create many datapoints in a single efficient operation:

Train/Test

A datapoint can be in either the train/test split. By default, all datapoints are marked as train.

Only the datapoints in the train split will be used for finetuning. The test datapoints are used for experiments.

Metadata

You can add custom metadata to each datapoint.