Test assertions
Assertions are used to test the output of a language model (LLM) against expected values or conditions. While they are not required, they are a useful way to automate prompt engineering analysis.
Different types of assertions can be used to validate the output in various ways, such as checking for equality, JSON structure, similarity, or custom functions.
Using assertions
To use assertions in your test cases, add an assert property to the test case with an array of assertion objects. Each assertion object should have a type property indicating the assertion type and any additional properties required for that assertion type.
Example:
tests:
- description: "Test if output is equal to the expected value"
vars:
example: "Hello, World!"
assert:
- type: equals
value: "Hello, World!"
Assertion properties
| Property | Type | Required | Description |
|---|---|---|---|
| type | string | Yes | Type of assertion |
| value | string | No | The expected value, if applicable |
| threshold | number | No | The threshold value, only applicable for similarity |
| provider | string | No | Some assertions (similarity, llm-rubric) require an LLM provider |
Assertion Types
| Assertion Type | Returns true if... |
|---|---|
equals | output matches exactly |
contains | output contains substring |
icontains | output contains substring, case insensitive |
regex | output matches regex |
contains-some | output contains some in list of substrings |
contains-all | output contains all list of substrings |
is-json | output is valid json |
contains-json | output contains valid json |
javascript | provided Javascript function validates the output |
webhook | provided webhook returns `{pass: true} |
similar | embeddings and cosine similarity are above a threshold |
llm-rubric | LLM output matches a given rubric, using a Language Model to grade output |
Every test type can be negated by prepending not-. For example, not-equals or not-regex.
Equality
The equals assertion checks if the LLM output is equal to the expected value.
Example:
assert:
- type: equals
value: "The expected output"
Here are the new additions to the "Assertion Types" section:
Contains
The contains assertion checks if the LLM output contains the expected value.
Example:
assert:
- type: contains
value: "The expected substring"
The icontains is the same, except it ignores case:
assert:
- type: icontains
value: "The expected substring"
Regex
The regex assertion checks if the LLM output matches the provided regular expression.
Example:
assert:
- type: regex
value: "\\d{4}" # Matches a 4-digit number
Contains-Some
The contains-some assertion checks if the LLM output contains at least one of the specified values.
Example:
assert:
- type: contains-some
value:
- "Value 1"
- "Value 2"
- "Value 3"
Contains-All
The contains-all assertion checks if the LLM output contains all of the specified values.
Example:
assert:
- type: contains-all
value:
- "Value 1"
- "Value 2"
- "Value 3"
Is-JSON
The is-json assertion checks if the LLM output is a valid JSON string.
Example:
assert:
- type: is-json
Contains-JSON
The contains-json assertion checks if the LLM output contains a valid JSON structure.
Example:
assert:
- type: contains-json
Javascript
The javascript assertion allows you to provide a custom JavaScript function to validate the LLM output. The function should return true if the output passes the assertion, and false otherwise.
Example:
assert:
- type: javascript
value: "output.includes('Hello, World!')"
You may also return a number, which will be treated as a score:
assert:
- type: javascript
value: Math.log(output.length) * 10
Webhook
The webhook assertion sends the LLM output to a specified webhook URL for custom validation. The webhook should return a JSON object with a pass property set to true or false.
Example:
assert:
- type: webhook
value: "https://example.com/webhook"
The webhook will receive a POST request with a JSON payload containing the LLM output and the context (test case variables). For example, if the LLM output is "Hello, World!" and the test case has a variable example set to "Example text", the payload will look like:
{
"output": "Hello, World!",
"context": {
"vars": {
"example": "Example text"
}
}
}
The webhook should process the request and return a JSON response with a pass property set to true or false, indicating whether the LLM output meets the custom validation criteria. Optionally, the webhook can also provide a reason property to describe why the output passed or failed the assertion.
Example response:
{
"pass": true,
"reason": "The output meets the custom validation criteria"
}
If the webhook returns a pass value of true, the assertion will be considered successful. If it returns false, the assertion will fail, and the provided reason will be used to describe the failure.
You may also return a score:
{
"pass": true,
"score": 0.5,
"reason": "The output meets the custom validation criteria"
}
Similarity
The similarity assertion checks if the LLM output is semantically similar to the expected value, using a cosine similarity threshold.
Example:
assert:
- type: similar
value: "The expected output"
threshold: 0.8
LLM-Rubric
The llm-rubric assertion checks if the LLM output matches a given rubric, using a Language Model to grade the output based on the rubric.
Example:
assert:
- type: llm-rubric
value: "The expected output"
Here's an example output that indicates PASS/FAIL based on LLM assessment (see example setup and outputs):
Load an external tests file
The Tests file is an optional format that lets you specify test cases outside of the main config file.
To add an assertion to a test case in a vars file, use the special __expected column.
Here's an example tests.csv:
text,__expected
"Hello, world!","Bonjour le monde"
"Goodbye, everyone!","fn:output.includes('Au revoir');"
"I am a pineapple","grade:doesn't reference any fruits besides pineapple"
All assertion types can be used in __expected. The column supports exactly one assertion.
is-jsonandcontains-jsonare supported directly, and do not require any valuefnindicatesjavascripttype. For example:fn:output.includes('foo')similartakes a threshold value. For example:similar(0.8):hello worldgradeindicatesllm-rubric. For example:grade: does not mention being an AI- By default,
__expectedwill use typeequals
When the __expected field is provided, the success and failure statistics in the evaluation summary will be based on whether the expected criteria are met.
For more advanced test cases, we recommend using a testing framework like Jest or Mocha and using promptfoo as a library.
Reusing assertions with templates
If you have a set of common assertions that you want to apply to multiple test cases, you can create assertion templates and reuse them across your configuration.
assertionTemplates:
containsMentalHealth:
type: javascript
value: output.toLowerCase().includes('mental health')
prompts: [prompt1.txt, prompt2.txt]
providers: [openai:gpt-3.5-turbo, localai:chat:vicuna]
tests:
- vars:
input: Tell me about the benefits of exercise.
assert:
- $ref: "#/assertionTemplates/containsMentalHealth"
- vars:
input: How can I improve my well-being?
assert:
- $ref: "#/assertionTemplates/containsMentalHealth"
In this example, the containsMentalHealth assertion template is defined at the top of the configuration file and then reused in two test cases. This approach helps maintain consistency and reduces duplication in your configuration.
