You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/data-tests/unstructured-data-tests/quickstart.mdx
+52-12Lines changed: 52 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,18 +2,57 @@
2
2
title: "Quickstart"
3
3
---
4
4
5
-
`elementary.validate_unstructured_data`
5
+
# Validating Unstructured Data with Elementary
6
6
7
-
Executes validation on unstructured data using LLM capabilities. This test allows you to validate unstructured data against specific expectations by leveraging language models to analyze the content.
7
+
## What is Unstructured Data Validation?
8
8
9
-
The test requires:
10
-
-`expectation_prompt`: A prompt that describes your expectation on the unstructured data.
11
-
-`llm_model_name`: The name of the language model to use for validation. This parameter depends on your data warehouse type:
12
-
- For Snowflake: Use the model name configured in your Snowflake account (see [Snowflake Setup](/data-tests/unstructured-data-tests/snowflake-setup.mdx))
13
-
- For other warehouses: Refer to your specific warehouse's LLM integration documentation
9
+
Elementary's `elementary.validate_unstructured_data` test allows you to validate unstructured data using AI and LLM language models. Instead of writing complex code, you can simply describe what you expect from your data in plain English, and Elementary will check if your data meets those expectations.
10
+
11
+
For example, you can verify that customer feedback comments are in English, product descriptions contain required information, or support tickets follow a specific format or a sentiment.
12
+
13
+
## How It Works
14
+
15
+
Elementary leverages the AI and LLM capabilities built directly into your data warehouse. When you run a validation test:
16
+
17
+
1. Your unstructured data stays within your data warehouse
18
+
2. The warehouse's built-in AI and LLM functions analyze the data
19
+
3. Elementary reports whether each text value meets your expectations
20
+
21
+
## Required Setup for Each Data Warehouse
22
+
23
+
Before you can use Elementary's unstructured data validations, you need to set up AI and LLM capabilities in your data warehouse:
24
+
25
+
### Snowflake
26
+
-**Prerequisite**: Enable Snowflake Cortex AI LLM functions
- Currently supported through Snowflake, Databricks, or BigQuery external object tables
45
+
-[View Data Lakes Information](/data-tests/unstructured-data-tests/data-lakes-setup.mdx)
46
+
47
+
48
+
## Using the Validation Test
49
+
50
+
The test requires two main parameters:
51
+
-`expectation_prompt`: Describe what you expect from the text in plain English
52
+
-`llm_model_name`: Specify which AI model to use (see recommendations above for each warehouse)
14
53
15
54
<Info>
16
-
This test is designed for columns containing unstructured text data such as descriptions, comments, or other free-form text fields. It can also be applied to any structured column that can be converted to a string. This enables writing data validations in native language.
55
+
This test works with any column containing unstructured text data such as descriptions, comments, or other free-form text fields. It can also be applied to structured columns that can be converted to strings, enabling natural language data validations.
17
56
</Info>
18
57
19
58
<RequestExample>
@@ -31,7 +70,7 @@ models:
31
70
llm_model_name: "model_name"
32
71
```
33
72
34
-
```yml Models example
73
+
```yml Example
35
74
version: 2
36
75
37
76
models:
@@ -46,7 +85,7 @@ models:
46
85
llm_model_name: "test_model"
47
86
```
48
87
49
-
```yml Snowflake example
88
+
```yml Example - Validating Customer Feedback
50
89
version: 2
51
90
52
91
models:
@@ -57,10 +96,11 @@ models:
57
96
description: "Customer feedback in free text format."
58
97
tests:
59
98
- elementary.validate_unstructured_data:
60
-
expectation_prompt: "The text should be a customer feedback comment in English, containing sentiment about a product or service."
0 commit comments