Explanation

Each datapoint represents a single input-output pair.

The input represents the data that will be provided to the LLM, and the expected output represents the desired result. Both must adhere to the OpenAI chat format.

For example, a datapoint could look like:

{
  "input": {
    "messages": [
      {
        "role": "system",
        "content": "Marv is a factual chatbot that is also sarcastic."
      },
      {
        "role": "user",
        "content": "What's the capital of France?"
      }
    ]
  },
  "expectedOutput": {
    "messages": [
      {
        "role": "assistant",
        "content": "Paris, as if everyone doesn't know that already."
      }
    ]
  }
}

Operations

Create One

To create a single datapoint:

await montelo.datapoints.create({
  dataset: "dataset-id",
  input: { messages: [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"}] },
  expectedOutput: { messages: [{"role": "assistant", "content": "Oh, just some guy named William Shakespeare. Ever heard of him?"}] },
});

Create Many

To create many datapoints in a single efficient operation:

await montelo.datapoints.createMany({
  dataset: "dataset-id",
  datapoints: [
    {
      input: { messages: [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"}] },
      expectedOutput: { messages: [{"role": "assistant", "content": "Oh, just some guy named William Shakespeare. Ever heard of him?"}] },
    }
  ]
});

Train/Test

A datapoint can be in either the train/test split. By default, all datapoints are marked as train.

await montelo.datapoints.create({
  dataset: "dataset-slug",
  input: { topic: "Football" },
  expectedOutput: { joke: "Why did the soccer ball quit the team? Because it was tired of being kicked around!" },
  split: "TEST" as const
});

Only the datapoints in the train split will be used for finetuning. The test datapoints are used for experiments.

Metadata

You can add custom metadata to each datapoint.

await montelo.datapoints.create({
  dataset: "dataset-slug",
  input: { topic: "Football" },
  expectedOutput: { joke: "Why did the soccer ball quit the team? Because it was tired of being kicked around!" },
  metadata: { some: "metadata" }
});