Skip to main content

Test assertions

Assertions are used to test the output of a language model (LLM) against expected values or conditions. While they are not required, they are a useful way to automate prompt engineering analysis.

Different types of assertions can be used to validate the output in various ways, such as checking for equality, JSON structure, similarity, or custom functions.

Using assertions

To use assertions in your test cases, add an assert property to the test case with an array of assertion objects. Each assertion object should have a type property indicating the assertion type and any additional properties required for that assertion type.

Example:

tests:
- description: "Test if output is equal to the expected value"
vars:
example: "Hello, World!"
assert:
- type: equals
value: "Hello, World!"

Assertion properties

PropertyTypeRequiredDescription
typestringYesType of assertion
valuestringNoThe expected value, if applicable
thresholdnumberNoThe threshold value, only applicable for similarity
providerstringNoSome assertions (similarity, llm-rubric) require an LLM provider

Assertion Types

Assertion TypeReturns true if...
equalsoutput matches exactly
containsoutput contains substring
icontainsoutput contains substring, case insensitive
regexoutput matches regex
contains-someoutput contains some in list of substrings
contains-alloutput contains all list of substrings
is-jsonoutput is valid json
contains-jsonoutput contains valid json
javascriptprovided Javascript function validates the output
webhookprovided webhook returns `{pass: true}
similarembeddings and cosine similarity are above a threshold
llm-rubricLLM output matches a given rubric, using a Language Model to grade output
tip

Every test type can be negated by prepending not-. For example, not-equals or not-regex.

Equality

The equals assertion checks if the LLM output is equal to the expected value.

Example:

assert:
- type: equals
value: "The expected output"

Here are the new additions to the "Assertion Types" section:

Contains

The contains assertion checks if the LLM output contains the expected value.

Example:

assert:
- type: contains
value: "The expected substring"

The icontains is the same, except it ignores case:

assert:
- type: icontains
value: "The expected substring"

Regex

The regex assertion checks if the LLM output matches the provided regular expression.

Example:

assert:
- type: regex
value: "\\d{4}" # Matches a 4-digit number

Contains-Some

The contains-some assertion checks if the LLM output contains at least one of the specified values.

Example:

assert:
- type: contains-some
value:
- "Value 1"
- "Value 2"
- "Value 3"

Contains-All

The contains-all assertion checks if the LLM output contains all of the specified values.

Example:

assert:
- type: contains-all
value:
- "Value 1"
- "Value 2"
- "Value 3"

Is-JSON

The is-json assertion checks if the LLM output is a valid JSON string.

Example:

assert:
- type: is-json

Contains-JSON

The contains-json assertion checks if the LLM output contains a valid JSON structure.

Example:

assert:
- type: contains-json

Javascript

The javascript assertion allows you to provide a custom JavaScript function to validate the LLM output. The function should return true if the output passes the assertion, and false otherwise.

Example:

assert:
- type: javascript
value: "output.includes('Hello, World!')"

You may also return a number, which will be treated as a score:

assert:
- type: javascript
value: Math.log(output.length) * 10

Webhook

The webhook assertion sends the LLM output to a specified webhook URL for custom validation. The webhook should return a JSON object with a pass property set to true or false.

Example:

assert:
- type: webhook
value: "https://example.com/webhook"

The webhook will receive a POST request with a JSON payload containing the LLM output and the context (test case variables). For example, if the LLM output is "Hello, World!" and the test case has a variable example set to "Example text", the payload will look like:

{
"output": "Hello, World!",
"context": {
"vars": {
"example": "Example text"
}
}
}

The webhook should process the request and return a JSON response with a pass property set to true or false, indicating whether the LLM output meets the custom validation criteria. Optionally, the webhook can also provide a reason property to describe why the output passed or failed the assertion.

Example response:

{
"pass": true,
"reason": "The output meets the custom validation criteria"
}

If the webhook returns a pass value of true, the assertion will be considered successful. If it returns false, the assertion will fail, and the provided reason will be used to describe the failure.

You may also return a score:

{
"pass": true,
"score": 0.5,
"reason": "The output meets the custom validation criteria"
}

Similarity

The similarity assertion checks if the LLM output is semantically similar to the expected value, using a cosine similarity threshold.

Example:

assert:
- type: similar
value: "The expected output"
threshold: 0.8

LLM-Rubric

The llm-rubric assertion checks if the LLM output matches a given rubric, using a Language Model to grade the output based on the rubric.

Example:

assert:
- type: llm-rubric
value: "The expected output"

Here's an example output that indicates PASS/FAIL based on LLM assessment (see example setup and outputs):

LLM prompt quality evaluation with PASS/FAIL expectations

Load an external tests file

The Tests file is an optional format that lets you specify test cases outside of the main config file.

To add an assertion to a test case in a vars file, use the special __expected column.

Here's an example tests.csv:

text,__expected
"Hello, world!","Bonjour le monde"
"Goodbye, everyone!","fn:output.includes('Au revoir');"
"I am a pineapple","grade:doesn't reference any fruits besides pineapple"

All assertion types can be used in __expected. The column supports exactly one assertion.

  • is-json and contains-json are supported directly, and do not require any value
  • fn indicates javascript type. For example: fn:output.includes('foo')
  • similar takes a threshold value. For example: similar(0.8):hello world
  • grade indicates llm-rubric. For example: grade: does not mention being an AI
  • By default, __expected will use type equals

When the __expected field is provided, the success and failure statistics in the evaluation summary will be based on whether the expected criteria are met.

For more advanced test cases, we recommend using a testing framework like Jest or Mocha and using promptfoo as a library.

Reusing assertions with templates

If you have a set of common assertions that you want to apply to multiple test cases, you can create assertion templates and reuse them across your configuration.

assertionTemplates:
containsMentalHealth:
type: javascript
value: output.toLowerCase().includes('mental health')

prompts: [prompt1.txt, prompt2.txt]
providers: [openai:gpt-3.5-turbo, localai:chat:vicuna]
tests:
- vars:
input: Tell me about the benefits of exercise.
assert:
- $ref: "#/assertionTemplates/containsMentalHealth"
- vars:
input: How can I improve my well-being?
assert:
- $ref: "#/assertionTemplates/containsMentalHealth"

In this example, the containsMentalHealth assertion template is defined at the top of the configuration file and then reused in two test cases. This approach helps maintain consistency and reduces duplication in your configuration.