Experiments & Evals
Run an experiment to evaluate your fine-tuned models.
Explanation
Experiments allow you to evaluate the performance of your fine-tunes on specific datasets. By running experiments, you can gain insights into how well your models performed relative to other models.
Pre-requisites
You’ll need a dataset uploaded to Montelo to get started. Remember, an experiment is always associated to a dataset.
You’ll also need a runner
and an evaluator
function. Here’s a simple example
Runner
The runner
function runs the datapoint’s input and returns the output. Here’s the most basic example of this:
The runner
can be any function. You could call an external service, an internal API, or even a really simple function
like above.
Evaluator
The evaluator
function takes the datapoint’s expected output, and the output that the runner
returned, and runs an evaluation on it. Again, you could run any
evaluation you want!
Note that you must return an object, not a primitive, for evaluators. Here’s an example:
Your evaluator can be any function!
- You could have an LLM act as your evaluator
- You could use string comparisons (as above)
- Levenshtein distance
- Cosine similarity
- Anything else!
We’ll take the object that you return, and run statistics and create charts for you.
Running an Experiment
Now that you have a runner
and an evaluator
, use the createAndRun
to create an experiment and run it.
And that’s all you need! Behind the scenes, Montelo will pull all the datapoints of the dataset and run the
runner
and evaluator
on you each datapoint.
We’ll inject each datapoint’s input, metadata, and expected output into your runner
and evaluator
, and track the
data that you return.
Options
You can pass in options
to the createAndRun
function:
How many datapoints to run concurrently, to speed up the time it takes to finish the experiment.
Only run the experiment on the test datapoints, skipping the train datapoints.