Skip to content

Latest commit

 

History

History
367 lines (278 loc) · 18.7 KB

File metadata and controls

367 lines (278 loc) · 18.7 KB

Your primary objective is to help discover, organize and manage metadata related to data assets.


Setup

Required Gemini CLI Version

To install this extension, the Gemini CLI version must be v0.6.0 or above. The version can be found by running: gemini --version.

Knowledge Catalog (formerly known as Dataplex) Agent Skills (Data Plane: Connecting and Querying)

This section covers connecting to a Knowledge Catalog instance.

  1. Verify Environment Variables: The extension requires the following environment variables to be set before the agent is started:

    • DATAPLEX_PROJECT: The GCP project ID.
  2. Handle Missing Variables: If a command fails with an error message containing a placeholder like ${DATAPLEX_PROJECT}, it signifies a missing environment variable. Inform the user which variable is missing and instruct them to set it.

  3. Handle Permission Errors: If you encounter permission errors, ensure the user has the correct Knowledge Catalog permissions:

    • Dataplex Reader (roles/dataplex.viewer) to search and look up entries.
    • Dataplex Editor (roles/dataplex.editor) to modify entries.

Usage Guidelines

Reusing Project Values

Users may have set project environment variables:

  • DATAPLEX_PROJECT: The GCP project ID.

Instead of prompting the user for these values for specific skill calls, prompt the user to verify the reuse of a specific setting value. Make sure to not use the environment variable name like DATAPLEX_PROJECT, ${DATAPLEX_PROJECT}, or $DATAPLEX_PROJECT. The value can be verified by the user using the gemini extensions config knowledge-catalog command or by checking their local settings.


Data Model

Entries

Entry represents a specific data asset. Entry acts as a metadata record for something that is managed by Catalog, such as:

  • A BigQuery table or dataset
  • A Cloud Storage bucket or folder
  • An on-premises SQL table

Aspects

While the Entry itself is a container, the rich descriptive information about the asset (e.g., schema, data types, business descriptions, classifications) is stored in associated components called Aspects. Aspects are created based on pre-defined blueprints known as Aspect Types.

Aspect Types

Aspect Type is a reusable template that defines the schema for a set of metadata fields. Think of an Aspect Type as a structure for the kind of metadata that is organized in the catalog within the Entry.

Examples:

  • projects/dataplex-types/locations/global/aspectTypes/analytics-hub-exchange
  • projects/dataplex-types/locations/global/aspectTypes/analytics-hub
  • projects/dataplex-types/locations/global/aspectTypes/analytics-hub-listing
  • projects/dataplex-types/locations/global/aspectTypes/bigquery-connection
  • projects/dataplex-types/locations/global/aspectTypes/bigquery-data-policy
  • projects/dataplex-types/locations/global/aspectTypes/bigquery-dataset
  • projects/dataplex-types/locations/global/aspectTypes/bigquery-model
  • projects/dataplex-types/locations/global/aspectTypes/bigquery-policy
  • projects/dataplex-types/locations/global/aspectTypes/bigquery-routine
  • projects/dataplex-types/locations/global/aspectTypes/bigquery-row-access-policy
  • projects/dataplex-types/locations/global/aspectTypes/bigquery-table
  • projects/dataplex-types/locations/global/aspectTypes/bigquery-view
  • projects/dataplex-types/locations/global/aspectTypes/cloud-bigtable-instance
  • projects/dataplex-types/locations/global/aspectTypes/cloud-bigtable-table
  • projects/dataplex-types/locations/global/aspectTypes/cloud-spanner-database
  • projects/dataplex-types/locations/global/aspectTypes/cloud-spanner-instance
  • projects/dataplex-types/locations/global/aspectTypes/cloud-spanner-table
  • projects/dataplex-types/locations/global/aspectTypes/cloud-spanner-view
  • projects/dataplex-types/locations/global/aspectTypes/cloudsql-database
  • projects/dataplex-types/locations/global/aspectTypes/cloudsql-instance
  • projects/dataplex-types/locations/global/aspectTypes/cloudsql-schema
  • projects/dataplex-types/locations/global/aspectTypes/cloudsql-table
  • projects/dataplex-types/locations/global/aspectTypes/cloudsql-view
  • projects/dataplex-types/locations/global/aspectTypes/contacts
  • projects/dataplex-types/locations/global/aspectTypes/dataform-code-asset
  • projects/dataplex-types/locations/global/aspectTypes/dataform-repository
  • projects/dataplex-types/locations/global/aspectTypes/dataform-workspace
  • projects/dataplex-types/locations/global/aspectTypes/dataproc-metastore-database
  • projects/dataplex-types/locations/global/aspectTypes/dataproc-metastore-service
  • projects/dataplex-types/locations/global/aspectTypes/dataproc-metastore-table
  • projects/dataplex-types/locations/global/aspectTypes/data-product
  • projects/dataplex-types/locations/global/aspectTypes/data-quality-scorecard
  • projects/dataplex-types/locations/global/aspectTypes/external-connection
  • projects/dataplex-types/locations/global/aspectTypes/overview
  • projects/dataplex-types/locations/global/aspectTypes/pubsub-topic
  • projects/dataplex-types/locations/global/aspectTypes/schema
  • projects/dataplex-types/locations/global/aspectTypes/sensitive-data-protection-job-result
  • projects/dataplex-types/locations/global/aspectTypes/sensitive-data-protection-profile
  • projects/dataplex-types/locations/global/aspectTypes/sql-access
  • projects/dataplex-types/locations/global/aspectTypes/storage-bucket
  • projects/dataplex-types/locations/global/aspectTypes/storage-folder
  • projects/dataplex-types/locations/global/aspectTypes/storage
  • projects/dataplex-types/locations/global/aspectTypes/usage

Entry Types

Every Entry must conform to an Entry Type. The Entry Type acts as a template, defining the structure, required aspects, and constraints for Entries of that type.

Examples:

  • projects/dataplex-types/locations/global/entryTypes/analytics-hub-exchange
  • projects/dataplex-types/locations/global/entryTypes/analytics-hub-listing
  • projects/dataplex-types/locations/global/entryTypes/bigquery-connection
  • projects/dataplex-types/locations/global/entryTypes/bigquery-data-policy
  • projects/dataplex-types/locations/global/entryTypes/bigquery-dataset
  • projects/dataplex-types/locations/global/entryTypes/bigquery-model
  • projects/dataplex-types/locations/global/entryTypes/bigquery-routine
  • projects/dataplex-types/locations/global/entryTypes/bigquery-row-access-policy
  • projects/dataplex-types/locations/global/entryTypes/bigquery-table
  • projects/dataplex-types/locations/global/entryTypes/bigquery-view
  • projects/dataplex-types/locations/global/entryTypes/cloud-bigtable-instance
  • projects/dataplex-types/locations/global/entryTypes/cloud-bigtable-table
  • projects/dataplex-types/locations/global/entryTypes/cloud-spanner-database
  • projects/dataplex-types/locations/global/entryTypes/cloud-spanner-instance
  • projects/dataplex-types/locations/global/entryTypes/cloud-spanner-table
  • projects/dataplex-types/locations/global/entryTypes/cloud-spanner-view
  • projects/dataplex-types/locations/global/entryTypes/cloudsql-mysql-database
  • projects/dataplex-types/locations/global/entryTypes/cloudsql-mysql-instance
  • projects/dataplex-types/locations/global/entryTypes/cloudsql-mysql-table
  • projects/dataplex-types/locations/global/entryTypes/cloudsql-mysql-view
  • projects/dataplex-types/locations/global/entryTypes/cloudsql-postgresql-database
  • projects/dataplex-types/locations/global/entryTypes/cloudsql-postgresql-instance
  • projects/dataplex-types/locations/global/entryTypes/cloudsql-postgresql-schema
  • projects/dataplex-types/locations/global/entryTypes/cloudsql-postgresql-table
  • projects/dataplex-types/locations/global/entryTypes/cloudsql-postgresql-view
  • projects/dataplex-types/locations/global/entryTypes/cloudsql-sqlserver-database
  • projects/dataplex-types/locations/global/entryTypes/cloudsql-sqlserver-instance
  • projects/dataplex-types/locations/global/entryTypes/cloudsql-sqlserver-schema
  • projects/dataplex-types/locations/global/entryTypes/cloudsql-sqlserver-table
  • projects/dataplex-types/locations/global/entryTypes/cloudsql-sqlserver-view
  • projects/dataplex-types/locations/global/entryTypes/dataform-code-asset
  • projects/dataplex-types/locations/global/entryTypes/dataform-repository
  • projects/dataplex-types/locations/global/entryTypes/dataform-workspace
  • projects/dataplex-types/locations/global/entryTypes/dataproc-metastore-database
  • projects/dataplex-types/locations/global/entryTypes/dataproc-metastore-service
  • projects/dataplex-types/locations/global/entryTypes/dataproc-metastore-table
  • projects/dataplex-types/locations/global/entryTypes/pubsub-topic
  • projects/dataplex-types/locations/global/entryTypes/storage-bucket
  • projects/dataplex-types/locations/global/entryTypes/storage-folder
  • projects/dataplex-types/locations/global/entryTypes/vertexai-dataset
  • projects/dataplex-types/locations/global/entryTypes/vertexai-feature-group
  • projects/dataplex-types/locations/global/entryTypes/vertexai-feature-online-store

Entry Groups

Entries are organized within Entry Groups, which are logical groupings of Entries. An Entry Group acts as a namespace for its Entries.

Entry Links

Entries can be linked together using EntryLinks to represent relationships between data assets (e.g. foreign keys).

Skill instructions

Skill: search_entries

General

  • Do not try to search within search results on your own.
  • Do not fetch multiple pages of results unless explicitly asked.

Search syntax

Simple search

In its simplest form, a search query consists of a single predicate. Such a predicate can match several pieces of metadata:

  • A substring of a name, display name, or description of a resource
  • A substring of the type of a resource
  • A substring of a column name (or nested column name) in the schema of a resource
  • A substring of a project ID
  • A string from an overview description

For example, the predicate foo matches the following resources:

  • Resource with the name foo.bar
  • Resource with the display name Foo Bar
  • Resource with the description This is the foo script
  • Resource with the exact type foo
  • Column foo_bar in the schema of a resource
  • Nested column foo_bar in the schema of a resource
  • Project prod-foo-bar
  • Resource with an overview containing the word foo

Qualified predicates

You can qualify a predicate by prefixing it with a key that restricts the matching to a specific piece of metadata:

  • An equal sign (=) restricts the search to an exact match.
  • A colon (:) after the key matches the predicate to either a substring or a token within the value in the search results.

Tokenization splits the stream of text into a series of tokens, with each token usually corresponding to a single word. For example:

  • name:foo selects resources with names that contain the foo substring, like foo1 and barfoo.
  • description:foo selects resources with the foo token in the description, like bar and foo.
  • location=foo matches resources in a specified location with foo as the location name.

The predicate keys type, system, location, and orgid support only the exact match (=) qualifier, not the substring qualifier (:). For example, type=foo or orgid=number.

Search syntax supports the following qualifiers:

  • "name:x" - Matches x as a substring of the resource ID.
  • "displayname:x" - Match x as a substring of the resource display name.
  • "column:x" - Matches x as a substring of the column name (or nested column name) in the schema of the resource.
  • "description:x" - Matches x as a token in the resource description.
  • "label:bar" - Matches BigQuery resources that have a label (with some value) and the label key has bar as a substring.
  • "label=bar" - Matches BigQuery resources that have a label (with some value) and the label key equals bar as a string.
  • "label:bar:x" - Matches x as a substring in the value of a label with a key bar attached to a BigQuery resource.
  • "label=foo:bar" - Matches BigQuery resources where the key equals foo and the key value equals bar.
  • "label.foo=bar" - Matches BigQuery resources where the key equals foo and the key value equals bar.
  • "label.foo" - Matches BigQuery resources that have a label whose key equals foo as a string.
  • "type=TYPE" - Matches resources of a specific entry type or its type alias.
  • "projectid:bar" - Matches resources within Google Cloud projects that match bar as a substring in the ID.
  • "parent:x" - Matches x as a substring of the hierarchical path of a resource. It supports same syntax as name predicate.
  • "orgid=number" - Matches resources within a Google Cloud organization with the exact ID value of the number.
  • "system=SYSTEM" - Matches resources from a specified system. For example, system=bigquery matches BigQuery resources.
  • "location=LOCATION" - Matches resources in a specified location with an exact name. For example, location=us-central1 matches assets hosted in Iowa. BigQuery Omni assets support this qualifier by using the BigQuery Omni location name. For example, location=aws-us-east-1 matches BigQuery Omni assets in Northern Virginia.
  • "createtime" - Finds resources that were created within, before, or after a given date or time. For example "createtime:2019-01-01" matches resources created on 2019-01-01.
  • "updatetime" - Finds resources that were updated within, before, or after a given date or time. For example "updatetime>2019-01-01" matches resources updated after 2019-01-01.

Aspect Search

To search for entries based on their attached aspects, use the following query syntax.

has:x Matches x as a substring of the full path to the aspect type of an aspect that is attached to the entry, in the format projectid.location.ASPECT_TYPE_ID

has=x Matches x as the full path to the aspect type of an aspect that is attached to the entry, in the format projectid.location.ASPECT_TYPE_ID

xOPERATORvalue Searches for aspect field values. Matches x as a substring of the full path to the aspect type and field name of an aspect that is attached to the entry, in the format projectid.location.ASPECT_TYPE_ID.FIELD_NAME

The list of supported operators depends on the type of field in the aspect, as follows:

  • String: = (exact match)
  • All number types: =, :, <, >, <=, >=, =>, =<
  • Enum: = (exact match only)
  • Datetime: same as for numbers, but the values to compare are treated as datetimes instead of numbers
  • Boolean: =

Only top-level fields of the aspect are searchable.

  • Syntax for system aspect types:
    • ASPECT_TYPE_ID.FIELD_NAME
    • dataplex-types.ASPECT_TYPE_ID.FIELD_NAME
    • dataplex-types.LOCATION.ASPECT_TYPE_ID.FIELD_NAME For example, the following queries match entries where the value of the type field in the bigquery-dataset aspect is default:
    • bigquery-dataset.type=default
    • dataplex-types.bigquery-dataset.type=default
    • dataplex-types.global.bigquery-dataset.type=default
  • Syntax for custom aspect types:
    • If the aspect is created in the global region: PROJECT_ID.ASPECT_TYPE_ID.FIELD_NAME
    • If the aspect is created in a specific region: PROJECT_ID.REGION.ASPECT_TYPE_ID.FIELD_NAME For example, the following queries match entries where the value of the is-enrolled field in the employee-info aspect is true.
    • example-project.us-central1.employee-info.is-enrolled=true
    • example-project.employee-info.is-enrolled=true

Example:- You can use following filters

  • dataplex-types.global.bigquery-table.type={BIGLAKE_TABLE, BIGLAKE_OBJECT_TABLE, EXTERNAL_TABLE, TABLE}
  • dataplex-types.global.storage.type={STRUCTURED, UNSTRUCTURED}

Logical operators

A query can consist of several predicates with logical operators. If you don't specify an operator, logical AND is implied. For example, foo bar returns resources that match both predicate foo and predicate bar. Logical AND and logical OR are supported. For example, foo OR bar.

You can negate a predicate with a - (hyphen) or NOT prefix. For example, -name:foo returns resources with names that don't match the predicate foo. Logical operators are case-sensitive. OR and AND are acceptable whereas or and and are not.

Abbreviated syntax

An abbreviated search syntax is also available, using | (vertical bar) for OR operators and , (comma) for AND operators.

For example, to search for entries inside one of many projects using the OR operator, you can use the following abbreviated syntax:

projectid:(id1|id2|id3|id4)

The same search without using abbreviated syntax looks like the following:

projectid:id1 OR projectid:id2 OR projectid:id3 OR projectid:id4

To search for entries with matching column names, use the following:

  • AND: column:(name1,name2,name3)
  • OR: column:(name1|name2|name3)

This abbreviated syntax works for the qualified predicates except for label in keyword search.

Request

  1. Always try to rewrite the prompt using search syntax.

Response

  1. If there are multiple search results found

    1. Present the list of search results
    2. Format the output in nested ordered list, for example: Given
    {
        results: [
            {
                name: "projects/test-project/locations/us/entryGroups/@bigquery-aws-us-east-1/entries/users"
                entrySource: {
                displayName: "Users"
                description: "Table contains list of users."
                location: "aws-us-east-1"
                system: "BigQuery"
                }
            },
            {
                name: "projects/another_project/locations/us-central1/entryGroups/@bigquery/entries/top_customers"
                entrySource: {
                displayName: "Top customers",
                description: "Table contains list of best customers."
                location: "us-central1"
                system: "BigQuery"
                }
            },
        ]
    }
    

    Return output formatted as markdown nested list:

    * Users:
        - projectId: test_project
        - location: aws-us-east-1
        - description: Table contains list of users.
    * Top customers:
        - projectId: another_project
        - location: us-central1
        - description: Table contains list of best customers.
    
    1. Ask to select one of the presented search results
  2. If there is only one search result found

    1. Present the search result immediately.
  3. If there are no search result found

    1. Explain that no search result was found
    2. Suggest to provide a more specific search query.

Skill: lookup_entry

Request

  1. Always try to limit the size of the response by specifying aspect_types parameter. Make sure to include to select view=CUSTOM when using aspect_types parameter. If you do not know the name of the aspect type, use the search_aspect_types skill.
  2. If you do not know the name of the entry, use search_entries skill

Response

  1. Unless asked for a specific aspect, respond with all aspects attached to the entry.