Skip to content

Commit 1b346ed

Browse files
committed
Revamp unstructured data tests quickstart documentation with comprehensive guide
1 parent da0bb9f commit 1b346ed

2 files changed

Lines changed: 54 additions & 14 deletions

File tree

docs/data-tests/unstructured-data-tests/bigquery-setup.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ OPTIONS (
7070

7171
```sql
7272
CREATE OR REPLACE MODEL
73-
`my-project.my-dataset.gemini_model`
73+
`my-project.my-dataset.gemini-1.5-pro`
7474
REMOTE WITH CONNECTION
7575
`projects/my-project/locations/us/connections/my-remote-connection-model-name`
7676
OPTIONS (
@@ -99,7 +99,7 @@ models:
9999
tests:
100100
- elementary.validate_unstructured_data:
101101
expectation_prompt: "The text data should represent an example of unstructured data."
102-
llm_model_name: "gemini_model"
102+
llm_model_name: "gemini-1.5-pro"
103103
```
104104
105105

docs/data-tests/unstructured-data-tests/quickstart.mdx

Lines changed: 52 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -2,18 +2,57 @@
22
title: "Quickstart"
33
---
44

5-
`elementary.validate_unstructured_data`
5+
# Validating Unstructured Data with Elementary
66

7-
Executes validation on unstructured data using LLM capabilities. This test allows you to validate unstructured data against specific expectations by leveraging language models to analyze the content.
7+
## What is Unstructured Data Validation?
88

9-
The test requires:
10-
- `expectation_prompt`: A prompt that describes your expectation on the unstructured data.
11-
- `llm_model_name`: The name of the language model to use for validation. This parameter depends on your data warehouse type:
12-
- For Snowflake: Use the model name configured in your Snowflake account (see [Snowflake Setup](/data-tests/unstructured-data-tests/snowflake-setup.mdx))
13-
- For other warehouses: Refer to your specific warehouse's LLM integration documentation
9+
Elementary's `elementary.validate_unstructured_data` test allows you to validate unstructured data using AI and LLM language models. Instead of writing complex code, you can simply describe what you expect from your data in plain English, and Elementary will check if your data meets those expectations.
10+
11+
For example, you can verify that customer feedback comments are in English, product descriptions contain required information, or support tickets follow a specific format or a sentiment.
12+
13+
## How It Works
14+
15+
Elementary leverages the AI and LLM capabilities built directly into your data warehouse. When you run a validation test:
16+
17+
1. Your unstructured data stays within your data warehouse
18+
2. The warehouse's built-in AI and LLM functions analyze the data
19+
3. Elementary reports whether each text value meets your expectations
20+
21+
## Required Setup for Each Data Warehouse
22+
23+
Before you can use Elementary's unstructured data validations, you need to set up AI and LLM capabilities in your data warehouse:
24+
25+
### Snowflake
26+
- **Prerequisite**: Enable Snowflake Cortex AI LLM functions
27+
- **Recommended Model**: `claude-3-5-sonnet`
28+
- [View Detailed Snowflake Setup Guide](/data-tests/unstructured-data-tests/snowflake-setup.mdx)
29+
30+
### Databricks
31+
- **Prerequisite**: Ensure Databricks AI Functions are available
32+
- **Recommended Model**: `databricks-meta-llama-3-3-70b-instruct`
33+
- [View Detailed Databricks Setup Guide](/data-tests/unstructured-data-tests/databricks-setup.mdx)
34+
35+
### BigQuery
36+
- **Prerequisite**: Configure BigQuery to use Vertex AI models
37+
- **Recommended Model**: `gemini-1.5-pro`
38+
- [View Detailed BigQuery Setup Guide](/data-tests/unstructured-data-tests/bigquery-setup.mdx)
39+
40+
### Redshift
41+
- Support coming soon
42+
43+
### Data Lakes
44+
- Currently supported through Snowflake, Databricks, or BigQuery external object tables
45+
- [View Data Lakes Information](/data-tests/unstructured-data-tests/data-lakes-setup.mdx)
46+
47+
48+
## Using the Validation Test
49+
50+
The test requires two main parameters:
51+
- `expectation_prompt`: Describe what you expect from the text in plain English
52+
- `llm_model_name`: Specify which AI model to use (see recommendations above for each warehouse)
1453

1554
<Info>
16-
This test is designed for columns containing unstructured text data such as descriptions, comments, or other free-form text fields. It can also be applied to any structured column that can be converted to a string. This enables writing data validations in native language.
55+
This test works with any column containing unstructured text data such as descriptions, comments, or other free-form text fields. It can also be applied to structured columns that can be converted to strings, enabling natural language data validations.
1756
</Info>
1857

1958
<RequestExample>
@@ -31,7 +70,7 @@ models:
3170
llm_model_name: "model_name"
3271
```
3372
34-
```yml Models example
73+
```yml Example
3574
version: 2
3675

3776
models:
@@ -46,7 +85,7 @@ models:
4685
llm_model_name: "test_model"
4786
```
4887
49-
```yml Snowflake example
88+
```yml Example - Validating Customer Feedback
5089
version: 2
5190

5291
models:
@@ -57,10 +96,11 @@ models:
5796
description: "Customer feedback in free text format."
5897
tests:
5998
- elementary.validate_unstructured_data:
60-
expectation_prompt: "The text should be a customer feedback comment in English, containing sentiment about a product or service."
61-
llm_model_name: "snowflake.models.llama2_70b_chat"
99+
expectation_prompt: "The text should be a customer feedback comment in English, it should describe only a bug or a feature request."
100+
llm_model_name: "claude-3-5-sonnet"
62101
config:
63102
severity: warn
64103
```
65104
105+
66106
</RequestExample>

0 commit comments

Comments
 (0)