-
Notifications
You must be signed in to change notification settings - Fork 102
Redshift Profiler PR4: docs #2408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ysmx-github
wants to merge
3
commits into
main
Choose a base branch
from
feature/redshift-profiler-docs
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -22,3 +22,4 @@ remorph_transpile/ | |
| .databricks-login.json | ||
| .mypy_cache | ||
| .env | ||
| .cursor/rules/profiler-fetchresult-connectors.mdc | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,159 @@ | ||
| --- | ||
| sidebar_position: 2 | ||
| title: Amazon Redshift Profiler Details | ||
| --- | ||
| import Admonition from '@theme/Admonition'; | ||
|
|
||
| # Amazon Redshift Profiler Details | ||
|
|
||
| - [Prerequisites](#prerequisites) | ||
| - [Configure Connection to Redshift](#configure-connection-to-redshift) | ||
| - [Run the profiler](#run-the-profiler) | ||
| - [Profiler output and dashboards](#profiler-output-and-dashboards) | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| ### 1. Environment | ||
|
|
||
| - **Lakebridge CLI** installed and configured for your Databricks workspace (same as other profiler sources). | ||
|
|
||
| ### 2. Choose the Redshift deployment variant | ||
|
|
||
| The profiler ships **three extract pipelines** — pick the one that matches your Redshift instance: | ||
|
|
||
| | Variant | Use when | | ||
| |--------|-----------| | ||
| | **serverless** | Amazon Redshift Serverless | | ||
| | **provisioned** | Single-AZ provisioned cluster | | ||
| | **provisioned_multi_az** | Multi-AZ provisioned cluster | | ||
|
|
||
| When you run `execute-database-profiler`, the CLI prompts you to select this variant so Lakebridge loads the correct SQL pipeline under `resources/assessments/redshift/<variant>/`. | ||
|
|
||
| ### 3. Network connectivity | ||
|
|
||
| The machine running the profiler must reach the Redshift cluster **endpoint** (hostname) on the cluster port (default **5439**), subject to your security groups / VPC / routing rules. | ||
|
|
||
| ### 4. Authentication | ||
|
|
||
| During configuration you choose an **authentication method** and where secrets are read from (**local**, **env**, or **file**): | ||
|
|
||
| | Authentication method | Typical use | | ||
| |----------------------|-------------| | ||
| | **database_password** | Native database user and password | | ||
| | **temporary_credentials_db_user** | Temporary credentials via AWS (`GetClusterCredentials`-style flows); wizard collects DB user for credential exchange (often `awsuser` for the master DB user path) | | ||
| | **temporary_credentials_iam** | IAM-authenticated temporary credentials | | ||
| | **federated_user** | Federated identity mapped through AWS → Redshift | | ||
|
|
||
| For IAM-oriented methods you typically need: | ||
|
|
||
| - **AWS credentials** available to the process (for example `AWS_PROFILE`, environment variables for keys, or instance/profile credentials). | ||
| - **IAM permissions** allowing Amazon Redshift credential APIs appropriate for your setup (for example `redshift:GetClusterCredentials` where applicable). | ||
|
|
||
| Use **`local`** to store plaintext values in `~/.databricks/labs/lakebridge/.credentials.yml`, **`env`** to substitute values from environment variables (with fallback), or **`file`** to reuse an existing credential file when it already contains valid Redshift entries. | ||
|
|
||
| ### 5. Database privileges | ||
|
|
||
| The profiler connects as your configured user and runs read-only extracts. The pipeline includes **prepare** steps that create a helper view **`query_view`** in the database (via `CREATE OR REPLACE VIEW`). The connecting user therefore needs permission to: | ||
|
|
||
| - **Create (and replace) views** in the target database used for profiling. | ||
| - **Select** from the Amazon Redshift system relations used by the extracts (see below). | ||
|
|
||
| **Provisioned clusters** (`provisioned` / `provisioned_multi_az`) — objects referenced by the bundled SQL include, among others: | ||
|
|
||
| - `stl_query`, `stl_query_metrics` | ||
| - `stv_node_storage_capacity`, `stv_partitions` | ||
| - `sys_external_query_detail` | ||
|
|
||
| **Serverless** (`serverless`) — examples include: | ||
|
|
||
| - `sys_query_history`, `sys_query_detail` | ||
| - `sys_external_query_detail`, `sys_serverless_usage` | ||
|
|
||
| Exact object access is determined by Amazon Redshift documentation for your edition; grant **minimum read** access consistent with those views/tables. | ||
|
|
||
| :::tip | ||
| If you cannot grant broad catalog access, narrow to the relations used in the YAML pipeline for your variant under `resources/assessments/redshift/<variant>/` in the Lakebridge package. | ||
| ::: | ||
|
|
||
| ## Configure Connection to Redshift | ||
|
|
||
| ```bash | ||
| databricks labs lakebridge configure-database-profiler | ||
| ``` | ||
|
|
||
| Select **redshift** when prompted for the source system. The wizard will ask for authentication method, credential source (`local` | `env` | `file`), and connection details — for password auth, for example: | ||
|
|
||
| - Redshift cluster **endpoint** (host) | ||
| - **Port** (default 5439) | ||
| - **Database** name | ||
| - **User** and **password** | ||
|
|
||
| For temporary / federated IAM-style paths, expect prompts for the **DB user** used with `GetClusterCredentials` (default suggested: `awsuser`) and optionally an **AWS profile** name. | ||
|
|
||
| Example-style transcript (values are illustrative): | ||
|
|
||
| ```console | ||
| databricks labs lakebridge configure-database-profiler | ||
|
|
||
| Please select the source system you want to configure | ||
| [0] synapse | ||
| [1] redshift | ||
| Enter a number between 0 and 1: 1 | ||
|
|
||
| Redshift authentication: database_password, temporary_credentials_db_user, | ||
| temporary_credentials_iam, or federated_user. | ||
|
|
||
| Authentication method | ||
| [0] database_password | ||
| [1] temporary_credentials_db_user | ||
| ... | ||
| Credential source (local | env | file) | ||
| [0] local | ||
| [1] env | ||
| [2] file | ||
| ... | ||
|
|
||
| Enter the Redshift cluster endpoint (host): mycluster.abc123.us-east-1.redshift.amazonaws.com | ||
| Enter the port details (default: 5439): 5439 | ||
| Enter the database name: dev | ||
| Enter the user details: profiler_reader | ||
| Enter the password details: ******** | ||
|
|
||
| Do you want to test the connection to redshift? [y/n]: y | ||
| ``` | ||
|
|
||
| ## Run the profiler | ||
|
|
||
| After configuration and a successful connection test (optional): | ||
|
|
||
| ```bash | ||
| databricks labs lakebridge execute-database-profiler --help | ||
| ``` | ||
|
|
||
| Run the profiler (interactive source selection): | ||
|
|
||
| ```bash | ||
| databricks labs lakebridge execute-database-profiler | ||
| ``` | ||
|
|
||
| When **redshift** is selected, the CLI prompts for **Redshift variant**: `serverless`, `provisioned`, or `provisioned_multi_az`. | ||
|
|
||
| You can pass the source explicitly where supported: | ||
|
|
||
| ```bash | ||
| databricks labs lakebridge execute-database-profiler --source-tech redshift | ||
| ``` | ||
|
|
||
| Execution will: | ||
|
|
||
| 1. Load `pipeline_config.yml` for the chosen variant. | ||
| 2. Run **prepare** steps on the cluster (including creating/updating **`query_view`**). | ||
| 3. Run SQL extracts and persist results into **`profiler_extract.db`** (DuckDB) under the configured extract folder. | ||
|
|
||
| ## Profiler output and dashboards | ||
|
|
||
| :::warning Attention: | ||
| For **Redshift**, `create_profiler-dashboard` uploads the **`profiler_extract.db`** extract to Unity Catalog Volume storage **only**. It **does not** deploy the Synapse-style Lakeview profiler summary dashboard for Redshift (`serverless`, `provisioned`, and `provisioned_multi_az`). Plan to analyze the DuckDB extract locally or with your own tooling unless/until dashboard support is added. | ||
| ::: | ||
|
|
||
| [Back to Configure Profiler](../#configure-profiler) | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.