| title | AI Data Validations |
|---|
Version Requirement: This feature requires Elementary dbt package version 0.18.0 or above.
Elementary's elementary.ai_data_validation test allows you to validate any data column using AI and LLM language models. This test is more flexible than traditional tests as it can be applied to any column type and uses natural language to define validation rules.
With ai_data_validation, you can simply describe what you expect from your data in plain English, and Elementary will check if your data meets those expectations. This is particularly useful for complex validation rules that would be difficult to express with traditional SQL or dbt tests.
Elementary leverages the AI and LLM capabilities built directly into your data warehouse. When you run a validation test:
- Your data stays within your data warehouse
- The warehouse's built-in AI and LLM functions analyze the data
- Elementary reports whether each value meets your expectations based on the prompt
Before you can use Elementary's AI data validations, you need to set up AI and LLM capabilities in your data warehouse:
- Prerequisite: Enable Snowflake Cortex AI LLM functions
- Recommended Model:
claude-3-5-sonnet - View Snowflake's Guide
- Prerequisite: Ensure Databricks AI Functions are available
- Recommended Model:
databricks-meta-llama-3-3-70b-instruct - View Databrick's Setup Guide
- Prerequisite: Configure BigQuery to use Vertex AI models
- Recommended Model:
gemini-1.5-pro - View BigQuery's Setup Guide
- Support coming soon
- Currently supported through Snowflake, Databricks, or BigQuery external object tables
- View Data Lakes Information
The test requires one main parameter:
expectation_prompt: Describe what you expect from the data in plain English
Optionally, you can also specify:
llm_model_name: Specify which AI model to use (see recommendations above for each warehouse)
version: 2
models:
- name: < model name >
columns:
- name: < column name >
tests:
- elementary.ai_data_validation:
expectation_prompt: "Description of what the data should satisfy"
llm_model_name: "model_name" # Optionalversion: 2
models:
- name: crm
description: "A table containing contract details."
columns:
- name: contract_date
description: "The date when the contract was signed."
tests:
- elementary.ai_data_validation:
expectation_prompt: "There should be no contract date in the future"version: 2
models:
- name: sales
description: "A table containing sales data."
columns:
- name: discount_percentage
description: "The discount percentage applied to the sale."
tests:
- elementary.ai_data_validation:
expectation_prompt: "The discount percentage should be between 0 and 50, and should only be a whole number."
llm_model_name: "claude-3-5-sonnet"
config:
severity: warnversion: 2
models:
- name: customer_accounts
description: "A table containing customer account information."
columns:
- name: account_status
description: "The current status of the customer account."
tests:
- elementary.ai_data_validation:
expectation_prompt: "The account status should be one of: 'active', 'inactive', 'suspended', or 'pending'. If the account is 'suspended', there should be a reason code in the suspension_reason column."
llm_model_name: "gemini-1.5-pro"