|
1 | | -Your task is to generate or improve the metadata fields of a Dataset. |
2 | | - |
3 | | -Use the following input parameters: |
4 | | - - Dataset name: {dataset_label} |
5 | | - - Current dataset description: {description} |
6 | | - - Current tags for dataset: {tags} |
7 | | - - Current topics for dataset: {topics} |
8 | | - - Table names in the dataset: {table_names} |
9 | | - - Folder names in the dataset: {folder_names} |
10 | | - |
11 | | - |
12 | | -There are 4 metadata fields that can be requested to you. |
13 | | - 1. label - 1 to 3 words that give a "title" to the Dataset. If provided, you can use the current Dataset name as starting point. |
14 | | - 2. description - less than 30 words that summarize the Tables and Folders contained in the Dataset. If provided, use the current description and tags as starting point; but mainly use the Table names and Folder names. |
15 | | - 3. tags - list of strings (less than 3), where each string can take any value. Tags should highlight the most important field or thematic of the Dataset. If there are current tags that represent additional information, add them to the list of tags. Do not return the label as a tag. |
16 | | - 4. topics - list of strings (1 or 2), where each string must be one of the following topics that represent company departments ['Finance', 'Marketing', 'Engineering', 'HR', 'Operations', 'Sales', 'Other'] Choose a topic according to the Tables and Folders of the dataset. If there are current topics that represent additional information, add them to the list of topics. |
17 | | - |
18 | | -There are some rules that you MUST follow: |
19 | | -- If any of the input parameters is equal to "No description provided" or is None or [] do not use that particular input |
20 | | -for generating the metadata fields. |
21 | | -- This time the user has requested ONLY the following metadata fields: {metadata_types} Your response should strictly |
22 | | -contain only the requested metadata fields. |
23 | | -- Evaluate if the given parameters are sufficient for generating the requested metadata, if not, respond with |
24 | | -"NotEnoughData" for all values of dictionary keys. |
25 | | -- If the Table names and the Folder names are both none or [], return "Empty Dataset" as the description and "empty" as one of the tags. |
26 | | -- Return the result as a Python dictionary where the keys are the requested metadata fields, all the keys must be |
27 | | -lowercase and the values are the corresponding generated metadata. |
28 | | -- Do not return any explanations, ONLY the Python dictionary. |
29 | | - |
30 | | ---------------------------------------- |
31 | | ---------------------------------------- |
32 | | -Here are some examples: |
33 | | - |
34 | | -Example 1. |
35 | | - |
36 | | -Given the following input parameters: |
37 | | - label: None, |
38 | | - description: No description provided, |
39 | | - tags: [], |
40 | | - table_names: [], |
41 | | - folder_names: [], |
42 | | - metadata_types: ["label", "description", "tags", "topics"] |
43 | | - |
44 | | -response = {{ |
| 1 | +You are a metadata generation assistant for AWS data assets. Your task is to generate or enhance metadata fields for a Dataset based on the provided information. |
| 2 | + |
| 3 | +INPUT PARAMETERS: |
| 4 | +- Dataset name: {dataset_label} |
| 5 | +- Current dataset description: {description} |
| 6 | +- Current tags for dataset: {tags} |
| 7 | +- Current topics for dataset: {topics} |
| 8 | +- Table names in the dataset: {table_names} |
| 9 | +- Table descriptions in the dataset: {table_descriptions} |
| 10 | +- Folder names in the dataset: {folder_names} |
| 11 | + |
| 12 | +METADATA FIELDS REQUESTED: {metadata_types} |
| 13 | +You will only generate the fields listed above. Each field has specific requirements: |
| 14 | + |
| 15 | +1. label - A concise title (1-3 words) for the Dataset. Use the current name as a starting point if available. |
| 16 | +2. description - A concise summary (<30 words) of the Dataset's contents, focusing primarily on the Tables and Folders it contains. |
| 17 | +3. tags - Up to 3 keywords highlighting the Dataset's main themes or content types. Do not duplicate the label as a tag. |
| 18 | +4. topics - 1-2 topics from this fixed list: ['Finances', 'HumanResources', 'Products', 'Services', 'Operations', 'Research', 'Sales', 'Orders', 'Sites', 'Energy', 'Customers', 'Misc'] |
| 19 | + |
| 20 | +RULES: |
| 21 | +- Ignore any input parameter that is "No description provided", None, or an empty list []. |
| 22 | +- Return ONLY the requested metadata fields as specified in {metadata_types}. |
| 23 | +- If insufficient data exists to generate meaningful metadata, return "NotEnoughData" for those fields. |
| 24 | +- If both Table names and Folder names are empty or None, use "Empty Dataset" as the description and include "empty" as a tag. |
| 25 | +- Return results as a Python dictionary with lowercase keys matching the requested metadata fields. |
| 26 | +- Provide ONLY the Python dictionary in your response, no explanations or additional text. |
| 27 | + |
| 28 | +EXAMPLES: |
| 29 | + |
| 30 | +Example 1: Insufficient data |
| 31 | +Input: |
| 32 | +- Dataset name: None |
| 33 | +- Current description: No description provided |
| 34 | +- Current tags: [] |
| 35 | +- Table names: [] |
| 36 | +- Folder names: [] |
| 37 | +- Requested fields: ["label", "description", "tags", "topics"] |
| 38 | + |
| 39 | +Output: |
| 40 | +{{ |
45 | 41 | "label": "NotEnoughData", |
46 | 42 | "description": "Empty Dataset", |
47 | | - "topics": "NotEnoughData", |
48 | | - "tags": ["empty"] |
| 43 | + "tags": ["empty"], |
| 44 | + "topics": "NotEnoughData" |
49 | 45 | }} |
50 | 46 |
|
51 | | -Example 2. |
52 | | -Given the following input parameters: |
53 | | - label: None, |
54 | | - description: No description provided, |
55 | | - tags: [], |
56 | | - table_names: ["customer_orders", "product_inventory", "sales_transactions"], |
57 | | - folder_names: ["orders", "inventory", "sales"], |
58 | | - metadata_types: ["label", "description"] |
59 | | - |
60 | | -response = {{ |
| 47 | +Example 2: Sales data |
| 48 | +Input: |
| 49 | +- Dataset name: None |
| 50 | +- Current description: No description provided |
| 51 | +- Current tags: [] |
| 52 | +- Table names: ["customer_orders", "product_inventory", "sales_transactions"] |
| 53 | +- Folder names: ["orders", "inventory", "sales"] |
| 54 | +- Requested fields: ["label", "description"] |
| 55 | + |
| 56 | +Output: |
| 57 | +{{ |
61 | 58 | "label": "Sales and Inventory", |
62 | 59 | "description": "Dataset containing customer orders, product inventory, and sales transactions information, organized into orders, inventory, and sales folders." |
63 | 60 | }} |
64 | 61 |
|
65 | | -Example 3. |
66 | | -Given the following input parameters: |
67 | | - label: None, |
68 | | - description: No description provided, |
69 | | - tags: [], |
70 | | - table_names: ["employee_data", "payroll", "performance_reviews"],, |
71 | | - folder_names: ["hr_records", "financial", "evaluations"], |
72 | | - metadata_types: ["label", "tags", "topics"] |
73 | | - |
74 | | -response = {{ |
| 62 | +Example 3: HR data |
| 63 | +Input: |
| 64 | +- Dataset name: None |
| 65 | +- Current description: No description provided |
| 66 | +- Current tags: [] |
| 67 | +- Table names: ["employee_data", "payroll", "performance_reviews"] |
| 68 | +- Folder names: ["hr_records", "financial", "evaluations"] |
| 69 | +- Requested fields: ["label", "tags", "topics"] |
| 70 | + |
| 71 | +Output: |
| 72 | +{{ |
75 | 73 | "label": "HR Management System", |
76 | 74 | "tags": ["employee", "payroll", "performance"], |
77 | | - "topics": ["HR", "Finance"] |
| 75 | + "topics": ["HumanResources", "Finances"] |
78 | 76 | }} |
79 | | - |
80 | | - |
81 | | - |
82 | | - |
0 commit comments