Skip to content

perf: use Glue BatchDeleteTable for Athena schema cleanup#961

Closed
devin-ai-integration[bot] wants to merge 5 commits intomasterfrom
devin/1772583468-athena-batch-delete-cleanup
Closed

perf: use Glue BatchDeleteTable for Athena schema cleanup#961
devin-ai-integration[bot] wants to merge 5 commits intomasterfrom
devin/1772583468-athena-batch-delete-cleanup

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot commented Mar 4, 2026

Summary

Replaces the serial one-by-one table drops in Athena CI cleanup with Glue's BatchDeleteTable API (up to 100 tables per API call). Adds a standalone Python script (integration_tests/ci_tools/athena_drop_schemas.py) that:

  1. Collects all tables across all target schemas up-front
  2. Batch-deletes tables per schema in chunks of 100
  3. Deletes the now-empty schemas at the end

The script auto-resolves AWS credentials from the existing CI_WAREHOUSE_SECRETS blob (via _resolve_credentials), so no new secrets are needed and no inline credential-extraction logic is required in the workflow YAML.

Workflow changes:

  • test-warehouse.yml: Athena now uses the Python script in a dedicated "Drop test schemas (Athena)" step; other cloud targets still use the dbt macro
  • cleanup-stale-schemas.yml: Athena is removed from the dbt matrix and gets its own cleanup-athena job (skips dbt installation overhead entirely)

Script modes:

  • --ci-test-schemas BASE_NAME: Expands base schema to all xdist workers (_gw0_gw7) + _elementary variants
  • --stale --prefixes dbt_: Scans all Glue databases and drops those matching the CI naming convention older than --max-age-hours

Review & Testing Checklist for Human

  • S3 data orphaning: batch_delete_table only removes Glue catalog entries. Unlike the previous adapter.drop_relation() path, it does not clean up underlying S3 data for Iceberg tables. Verify this is acceptable or whether S3 cleanup should be added (could lead to growing storage costs over time).
  • Credential key names: _resolve_credentials looks for athena_region, athena_aws_access_key_id, athena_aws_secret_access_key in the decoded secrets blob. Confirm these match the actual keys in CI_WAREHOUSE_SECRETS.
  • $SCHEMA_NAME propagation: The schema name is exported via $GITHUB_ENV for the Athena step. If the "Write dbt profiles" step is skipped or the job is cancelled before it runs, $SCHEMA_NAME will be empty. The script will expand an empty string to harmless non-existent schema names, but verify this is acceptable.
  • Recommended test plan: Manually dispatch the cleanup-stale-schemas workflow from this branch, and verify the cleanup-athena job successfully lists and drops stale schemas. Also run a test-warehouse Athena job and confirm the "Drop test schemas (Athena)" step runs and reports sensible output.

Notes

Summary by CodeRabbit

  • Chores

    • Moved Athena schema cleanup into its own CI job and removed Athena from the general cleanup matrix for clearer workflow isolation.
    • Test workflows now write schema names to the environment and run Athena-specific cleanup separately.
  • New Features

    • Added a tool to batch-delete Athena/Glue tables and remove empty schemas, with CI-test schema expansion and stale-schema cleanup support.

Replace the serial one-by-one table drops with Glue's BatchDeleteTable
API (up to 100 tables per call) for Athena cleanup operations.

- Add standalone Python script (athena_drop_schemas.py) that:
  - Collects all tables across all target schemas up-front
  - Batch-deletes tables per schema in chunks of 100
  - Deletes the now-empty schemas at the end
  - Auto-resolves AWS credentials from CI_WAREHOUSE_SECRETS blob
  - Supports both CI test schema cleanup and stale schema discovery

- Update test-warehouse.yml to use the script for Athena (other cloud
  targets still use the dbt macro path)

- Update cleanup-stale-schemas.yml to use a dedicated cleanup-athena
  job that skips the dbt installation overhead entirely

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 4, 2026

👋 @devin-ai-integration[bot]
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 4, 2026

📝 Walkthrough

Walkthrough

Removed Athena from shared cleanup matrix and shared drop steps in workflows; added dedicated Athena cleanup jobs/steps that invoke a new CLI Python utility to discover and batch-delete AWS Glue/Athena tables and drop empty schemas, with CI-style expansion and stale-schema support.

Changes

Cohort / File(s) Summary
Workflow: cleanup-stale-schemas
.github/workflows/cleanup-stale-schemas.yml
Removed athena from the cleanup job matrix, simplified dbt package install expression (removed athena-specific branch), and added a new cleanup-athena job that runs integration_tests/ci_tools/athena_drop_schemas.py after setting up Python and deps.
Workflow: test-warehouse
.github/workflows/test-warehouse.yml
Writes derived SCHEMA_NAME to the GitHub environment, excludes athena from the shared drop step, and adds a conditional step for warehouse-type == athena that installs boto3 and runs integration_tests/ci_tools/athena_drop_schemas.py.
Athena schema utility
integration_tests/ci_tools/athena_drop_schemas.py
New CLI script that resolves credentials/region (args or base64 secrets blob), expands CI test schema patterns, discovers stale schemas, lists tables, batch-deletes tables (<=100 per BatchDeleteTable), deletes empty databases, logs results, and exposes main, drop_schemas, and expand_ci_test_schemas.

Sequence Diagram

sequenceDiagram
    participant GH as GitHub Actions
    participant Py as athena_drop_schemas.py
    participant Glue as AWS Glue API

    GH->>Py: Start job/step with schema names and flags
    activate Py
    Py->>Py: Resolve region & credentials (args or secrets blob)
    Py->>Glue: List tables for a schema
    activate Glue
    Glue-->>Py: Return table list
    deactivate Glue

    loop per schema (batch groups)
        Py->>Glue: BatchDeleteTable (<=100 tables)
        activate Glue
        Glue-->>Py: Deletion results (success/fail per table)
        deactivate Glue
        Py->>Py: Log progress/errors
    end

    Py->>Glue: DeleteDatabase (if empty)
    activate Glue
    Glue-->>Py: Success/failure
    deactivate Glue

    Py-->>GH: Exit with summary (counts, errors)
    deactivate Py
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped through YAML, flags, and traces,

Batching tables in tidy little places,
Secrets unwrapped, regions set clear,
Empty databases vanish — cheer!
A small clean hop, and the paths are dear.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'perf: use Glue BatchDeleteTable for Athena schema cleanup' accurately describes the main change: replacing serial table drops with AWS Glue's BatchDeleteTable API for better performance in Athena cleanup workflows.
Docstring Coverage ✅ Passed Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/1772583468-athena-batch-delete-cleanup

Comment @coderabbitai help to get the list of available commands and usage tips.

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration_tests/ci_tools/athena_drop_schemas.py`:
- Around line 293-296: The code uses max_age_hours directly to compute cutoff =
timedelta(hours=max_age_hours) which makes negative values produce a negative
cutoff and can mark nearly every schema as stale; before computing cutoff (and
similarly before the other calculation block around the code at lines ~304-306)
validate that max_age_hours is non-negative and fail-fast: if max_age_hours < 0
raise a ValueError or log an error and exit, otherwise proceed to compute cutoff
and perform deletions; reference the variables cutoff and max_age_hours and
place the check immediately before cutoff = timedelta(hours=max_age_hours) and
the other related stale-deletion math.
- Around line 201-223: The current flow returns early when region is provided,
which prevents filling missing aws_access_key_id/aws_secret_access_key from
CI_WAREHOUSE_SECRETS and also decodes/loads the secrets blob without error
handling; update the logic so that if region is set you still attempt to read
and parse the secrets blob (secrets_json_env -> blob) to fill any missing
aws_access_key_id/aws_secret_access_key (use resolved_key/resolved_secret)
instead of returning immediately, and wrap the base64.b64decode/json.loads steps
in a try/except to catch and log decode/JSON errors (use click.echo with a clear
message and sys.exit(1)) so malformed blobs fail fast and clearly while still
allowing region-only success to keep cleanup robust.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d88dc70 and fbf6af5.

📒 Files selected for processing (3)
  • .github/workflows/cleanup-stale-schemas.yml
  • .github/workflows/test-warehouse.yml
  • integration_tests/ci_tools/athena_drop_schemas.py

Comment thread integration_tests/ci_tools/athena_drop_schemas.py Outdated
Comment thread integration_tests/ci_tools/athena_drop_schemas.py Outdated
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration_tests/ci_tools/athena_drop_schemas.py`:
- Around line 63-87: The current _batch_delete_tables swallows
batch_delete_table errors and only logs them; update _batch_delete_tables to
surface failures (either by returning a second boolean/flag like had_errors or
by raising an exception) so callers can detect incomplete deletions, then in
drop_schemas consume that signal (compare expected vs deleted or check
had_errors) and call sys.exit(1) when any table or database deletion failed;
apply the same pattern for the other deletion blocks referenced (the sections
around lines 132-173 and 346-347) so all deletion failures cause a non-zero
exit.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between fbf6af5 and 7e12ed2.

📒 Files selected for processing (1)
  • integration_tests/ci_tools/athena_drop_schemas.py

Comment thread integration_tests/ci_tools/athena_drop_schemas.py Outdated
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
integration_tests/ci_tools/athena_drop_schemas.py (1)

307-307: Validate --num-workers is non-negative before schema expansion.

A negative value silently changes behavior and skips worker schema generation.

💡 Proposed fix
 def main(
@@
 ) -> None:
     """Drop Athena/Glue schemas efficiently using batch API calls."""
+    if num_workers < 0:
+        click.echo("error: --num-workers must be >= 0", err=True)
+        sys.exit(1)
+
     region, aws_access_key_id, aws_secret_access_key = _resolve_credentials(
         region, aws_access_key_id, aws_secret_access_key
     )

Also applies to: 321-322

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration_tests/ci_tools/athena_drop_schemas.py` at line 307, Validate that
the num_workers argument is >= 0 at the start of the function that receives the
num_workers parameter (the routine handling schema expansion/worker generation)
and fail fast if it is negative: check num_workers, raise a ValueError or exit
with a clear error message and non-zero status so schema expansion/worker
generation is not silently skipped (this validation should be added where
num_workers is accepted and before any worker schema generation logic referenced
around the existing num_workers usage on the schema expansion block and the code
near lines 321-322).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration_tests/ci_tools/athena_drop_schemas.py`:
- Line 318: The target_schemas assignment currently copies schemas including
possible empty strings which then get passed to Glue and cause validation
errors; update the code that builds target_schemas (and the similar logic around
the other occurrence at the block handling lines ~368-371) to filter out empty
schema names (e.g., keep only truthy/non-empty strings from schemas using a list
comprehension or filter) so subsequent Glue calls receive only valid schema
names and empty args are treated as a no-op.

---

Nitpick comments:
In `@integration_tests/ci_tools/athena_drop_schemas.py`:
- Line 307: Validate that the num_workers argument is >= 0 at the start of the
function that receives the num_workers parameter (the routine handling schema
expansion/worker generation) and fail fast if it is negative: check num_workers,
raise a ValueError or exit with a clear error message and non-zero status so
schema expansion/worker generation is not silently skipped (this validation
should be added where num_workers is accepted and before any worker schema
generation logic referenced around the existing num_workers usage on the schema
expansion block and the code near lines 321-322).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7e12ed2 and 51d7d0c.

📒 Files selected for processing (1)
  • integration_tests/ci_tools/athena_drop_schemas.py

Comment thread integration_tests/ci_tools/athena_drop_schemas.py Outdated
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
integration_tests/ci_tools/athena_drop_schemas.py (1)

124-125: Make CI prefix parsing honor full user-provided prefixes.

The current regex only accepts [a-z]+_ prefixes. Prefixes containing additional underscores/digits (while still valid for your convention) are skipped during stale detection.

Proposed refactor
-_CI_SCHEMA_RE = re.compile(r"^(?P<prefix>[a-z]+_)" r"(?P<ts>\d{6}_\d{6})" r"_")
+_CI_SCHEMA_TS_RE = re.compile(r"^(?P<ts>\d{6}_\d{6})_")
@@
 def _parse_ci_schema_timestamp(
     schema_name: str, prefixes: list[str]
 ) -> Optional[datetime]:
     """Extract the timestamp from a CI schema name, or None if it doesn't match."""
-    m = _CI_SCHEMA_RE.match(schema_name)
-    if not m:
+    prefix = next((p for p in prefixes if schema_name.startswith(p)), None)
+    if not prefix:
         return None
-    if m.group("prefix") not in prefixes:
+    remainder = schema_name[len(prefix) :]
+    m = _CI_SCHEMA_TS_RE.match(remainder)
+    if not m:
         return None
     try:
         return datetime.strptime(m.group("ts"), "%y%m%d_%H%M%S")
     except ValueError:
         return None

Also applies to: 127-135

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration_tests/ci_tools/athena_drop_schemas.py` around lines 124 - 125,
Update the _CI_SCHEMA_RE regex to accept prefixes that include letters, digits
and underscores (not just letters) so user-provided prefixes with additional
underscores/digits are recognized; replace the pattern ^(?P<prefix>[a-z]+_)
(?P<ts>\d{6}_\d{6})_ with something like ^(?P<prefix>[a-z0-9_]+_)
(?P<ts>\d{6}_\d{6})_ (or add re.I if you need case-insensitivity), and make the
identical change to the other occurrences referenced around the block that uses
_CI_SCHEMA_RE (lines 127–135) so stale-detection logic uses the updated regex
everywhere. Ensure tests or parsing code that relies on the prefix group
continue to work with the new characters allowed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration_tests/ci_tools/athena_drop_schemas.py`:
- Around line 306-307: Validate and normalize inputs before calling
expand_ci_test_schemas: ensure num_workers is a positive integer (raise/exit if
<= 0) to avoid silent under-expansion, and strip ci_test_schemas (and reject if
the stripped string is empty) to prevent generating invalid schema names;
perform these checks where ci_test_schemas and num_workers are accepted/used
(including the other call site around expand_ci_test_schemas) and only call
expand_ci_test_schemas(...) after validation/normalization succeeds.

---

Nitpick comments:
In `@integration_tests/ci_tools/athena_drop_schemas.py`:
- Around line 124-125: Update the _CI_SCHEMA_RE regex to accept prefixes that
include letters, digits and underscores (not just letters) so user-provided
prefixes with additional underscores/digits are recognized; replace the pattern
^(?P<prefix>[a-z]+_) (?P<ts>\d{6}_\d{6})_ with something like
^(?P<prefix>[a-z0-9_]+_) (?P<ts>\d{6}_\d{6})_ (or add re.I if you need
case-insensitivity), and make the identical change to the other occurrences
referenced around the block that uses _CI_SCHEMA_RE (lines 127–135) so
stale-detection logic uses the updated regex everywhere. Ensure tests or parsing
code that relies on the prefix group continue to work with the new characters
allowed.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 51d7d0c and ce5f765.

📒 Files selected for processing (1)
  • integration_tests/ci_tools/athena_drop_schemas.py

Comment on lines +306 to +307
ci_test_schemas: Optional[str],
num_workers: int,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate --num-workers and normalize --ci-test-schemas before expansion.

A negative worker count silently under-expands targets, and a whitespace-only base schema can produce invalid schema names. Fail fast before calling expand_ci_test_schemas(...).

Proposed fix
     # Expand CI test schemas (base + workers + elementary)
     if ci_test_schemas:
+        if num_workers < 0:
+            click.echo("error: --num-workers must be >= 0", err=True)
+            sys.exit(1)
+        ci_test_schemas = ci_test_schemas.strip()
+        if not ci_test_schemas:
+            click.echo("error: --ci-test-schemas cannot be blank", err=True)
+            sys.exit(1)
         expanded = expand_ci_test_schemas(ci_test_schemas, num_workers)
         click.echo(
             f"CI test schemas expanded to {len(expanded)} schema(s) from base '{ci_test_schemas}'."
         )

Also applies to: 321-323

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration_tests/ci_tools/athena_drop_schemas.py` around lines 306 - 307,
Validate and normalize inputs before calling expand_ci_test_schemas: ensure
num_workers is a positive integer (raise/exit if <= 0) to avoid silent
under-expansion, and strip ci_test_schemas (and reject if the stripped string is
empty) to prevent generating invalid schema names; perform these checks where
ci_test_schemas and num_workers are accepted/used (including the other call site
around expand_ci_test_schemas) and only call expand_ci_test_schemas(...) after
validation/normalization succeeds.

@haritamar haritamar closed this Mar 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant