perf: use Glue BatchDeleteTable for Athena schema cleanup#961
perf: use Glue BatchDeleteTable for Athena schema cleanup#961devin-ai-integration[bot] wants to merge 5 commits intomasterfrom
Conversation
Replace the serial one-by-one table drops with Glue's BatchDeleteTable API (up to 100 tables per call) for Athena cleanup operations. - Add standalone Python script (athena_drop_schemas.py) that: - Collects all tables across all target schemas up-front - Batch-deletes tables per schema in chunks of 100 - Deletes the now-empty schemas at the end - Auto-resolves AWS credentials from CI_WAREHOUSE_SECRETS blob - Supports both CI test schema cleanup and stale schema discovery - Update test-warehouse.yml to use the script for Athena (other cloud targets still use the dbt macro path) - Update cleanup-stale-schemas.yml to use a dedicated cleanup-athena job that skips the dbt installation overhead entirely Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
👋 @devin-ai-integration[bot] |
📝 WalkthroughWalkthroughRemoved Athena from shared cleanup matrix and shared drop steps in workflows; added dedicated Athena cleanup jobs/steps that invoke a new CLI Python utility to discover and batch-delete AWS Glue/Athena tables and drop empty schemas, with CI-style expansion and stale-schema support. Changes
Sequence DiagramsequenceDiagram
participant GH as GitHub Actions
participant Py as athena_drop_schemas.py
participant Glue as AWS Glue API
GH->>Py: Start job/step with schema names and flags
activate Py
Py->>Py: Resolve region & credentials (args or secrets blob)
Py->>Glue: List tables for a schema
activate Glue
Glue-->>Py: Return table list
deactivate Glue
loop per schema (batch groups)
Py->>Glue: BatchDeleteTable (<=100 tables)
activate Glue
Glue-->>Py: Deletion results (success/fail per table)
deactivate Glue
Py->>Py: Log progress/errors
end
Py->>Glue: DeleteDatabase (if empty)
activate Glue
Glue-->>Py: Success/failure
deactivate Glue
Py-->>GH: Exit with summary (counts, errors)
deactivate Py
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@integration_tests/ci_tools/athena_drop_schemas.py`:
- Around line 293-296: The code uses max_age_hours directly to compute cutoff =
timedelta(hours=max_age_hours) which makes negative values produce a negative
cutoff and can mark nearly every schema as stale; before computing cutoff (and
similarly before the other calculation block around the code at lines ~304-306)
validate that max_age_hours is non-negative and fail-fast: if max_age_hours < 0
raise a ValueError or log an error and exit, otherwise proceed to compute cutoff
and perform deletions; reference the variables cutoff and max_age_hours and
place the check immediately before cutoff = timedelta(hours=max_age_hours) and
the other related stale-deletion math.
- Around line 201-223: The current flow returns early when region is provided,
which prevents filling missing aws_access_key_id/aws_secret_access_key from
CI_WAREHOUSE_SECRETS and also decodes/loads the secrets blob without error
handling; update the logic so that if region is set you still attempt to read
and parse the secrets blob (secrets_json_env -> blob) to fill any missing
aws_access_key_id/aws_secret_access_key (use resolved_key/resolved_secret)
instead of returning immediately, and wrap the base64.b64decode/json.loads steps
in a try/except to catch and log decode/JSON errors (use click.echo with a clear
message and sys.exit(1)) so malformed blobs fail fast and clearly while still
allowing region-only success to keep cleanup robust.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
.github/workflows/cleanup-stale-schemas.yml.github/workflows/test-warehouse.ymlintegration_tests/ci_tools/athena_drop_schemas.py
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@integration_tests/ci_tools/athena_drop_schemas.py`:
- Around line 63-87: The current _batch_delete_tables swallows
batch_delete_table errors and only logs them; update _batch_delete_tables to
surface failures (either by returning a second boolean/flag like had_errors or
by raising an exception) so callers can detect incomplete deletions, then in
drop_schemas consume that signal (compare expected vs deleted or check
had_errors) and call sys.exit(1) when any table or database deletion failed;
apply the same pattern for the other deletion blocks referenced (the sections
around lines 132-173 and 346-347) so all deletion failures cause a non-zero
exit.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
integration_tests/ci_tools/athena_drop_schemas.py
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
integration_tests/ci_tools/athena_drop_schemas.py (1)
307-307: Validate--num-workersis non-negative before schema expansion.A negative value silently changes behavior and skips worker schema generation.
💡 Proposed fix
def main( @@ ) -> None: """Drop Athena/Glue schemas efficiently using batch API calls.""" + if num_workers < 0: + click.echo("error: --num-workers must be >= 0", err=True) + sys.exit(1) + region, aws_access_key_id, aws_secret_access_key = _resolve_credentials( region, aws_access_key_id, aws_secret_access_key )Also applies to: 321-322
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@integration_tests/ci_tools/athena_drop_schemas.py` at line 307, Validate that the num_workers argument is >= 0 at the start of the function that receives the num_workers parameter (the routine handling schema expansion/worker generation) and fail fast if it is negative: check num_workers, raise a ValueError or exit with a clear error message and non-zero status so schema expansion/worker generation is not silently skipped (this validation should be added where num_workers is accepted and before any worker schema generation logic referenced around the existing num_workers usage on the schema expansion block and the code near lines 321-322).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@integration_tests/ci_tools/athena_drop_schemas.py`:
- Line 318: The target_schemas assignment currently copies schemas including
possible empty strings which then get passed to Glue and cause validation
errors; update the code that builds target_schemas (and the similar logic around
the other occurrence at the block handling lines ~368-371) to filter out empty
schema names (e.g., keep only truthy/non-empty strings from schemas using a list
comprehension or filter) so subsequent Glue calls receive only valid schema
names and empty args are treated as a no-op.
---
Nitpick comments:
In `@integration_tests/ci_tools/athena_drop_schemas.py`:
- Line 307: Validate that the num_workers argument is >= 0 at the start of the
function that receives the num_workers parameter (the routine handling schema
expansion/worker generation) and fail fast if it is negative: check num_workers,
raise a ValueError or exit with a clear error message and non-zero status so
schema expansion/worker generation is not silently skipped (this validation
should be added where num_workers is accepted and before any worker schema
generation logic referenced around the existing num_workers usage on the schema
expansion block and the code near lines 321-322).
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
integration_tests/ci_tools/athena_drop_schemas.py
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
integration_tests/ci_tools/athena_drop_schemas.py (1)
124-125: Make CI prefix parsing honor full user-provided prefixes.The current regex only accepts
[a-z]+_prefixes. Prefixes containing additional underscores/digits (while still valid for your convention) are skipped during stale detection.Proposed refactor
-_CI_SCHEMA_RE = re.compile(r"^(?P<prefix>[a-z]+_)" r"(?P<ts>\d{6}_\d{6})" r"_") +_CI_SCHEMA_TS_RE = re.compile(r"^(?P<ts>\d{6}_\d{6})_") @@ def _parse_ci_schema_timestamp( schema_name: str, prefixes: list[str] ) -> Optional[datetime]: """Extract the timestamp from a CI schema name, or None if it doesn't match.""" - m = _CI_SCHEMA_RE.match(schema_name) - if not m: + prefix = next((p for p in prefixes if schema_name.startswith(p)), None) + if not prefix: return None - if m.group("prefix") not in prefixes: + remainder = schema_name[len(prefix) :] + m = _CI_SCHEMA_TS_RE.match(remainder) + if not m: return None try: return datetime.strptime(m.group("ts"), "%y%m%d_%H%M%S") except ValueError: return NoneAlso applies to: 127-135
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@integration_tests/ci_tools/athena_drop_schemas.py` around lines 124 - 125, Update the _CI_SCHEMA_RE regex to accept prefixes that include letters, digits and underscores (not just letters) so user-provided prefixes with additional underscores/digits are recognized; replace the pattern ^(?P<prefix>[a-z]+_) (?P<ts>\d{6}_\d{6})_ with something like ^(?P<prefix>[a-z0-9_]+_) (?P<ts>\d{6}_\d{6})_ (or add re.I if you need case-insensitivity), and make the identical change to the other occurrences referenced around the block that uses _CI_SCHEMA_RE (lines 127–135) so stale-detection logic uses the updated regex everywhere. Ensure tests or parsing code that relies on the prefix group continue to work with the new characters allowed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@integration_tests/ci_tools/athena_drop_schemas.py`:
- Around line 306-307: Validate and normalize inputs before calling
expand_ci_test_schemas: ensure num_workers is a positive integer (raise/exit if
<= 0) to avoid silent under-expansion, and strip ci_test_schemas (and reject if
the stripped string is empty) to prevent generating invalid schema names;
perform these checks where ci_test_schemas and num_workers are accepted/used
(including the other call site around expand_ci_test_schemas) and only call
expand_ci_test_schemas(...) after validation/normalization succeeds.
---
Nitpick comments:
In `@integration_tests/ci_tools/athena_drop_schemas.py`:
- Around line 124-125: Update the _CI_SCHEMA_RE regex to accept prefixes that
include letters, digits and underscores (not just letters) so user-provided
prefixes with additional underscores/digits are recognized; replace the pattern
^(?P<prefix>[a-z]+_) (?P<ts>\d{6}_\d{6})_ with something like
^(?P<prefix>[a-z0-9_]+_) (?P<ts>\d{6}_\d{6})_ (or add re.I if you need
case-insensitivity), and make the identical change to the other occurrences
referenced around the block that uses _CI_SCHEMA_RE (lines 127–135) so
stale-detection logic uses the updated regex everywhere. Ensure tests or parsing
code that relies on the prefix group continue to work with the new characters
allowed.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
integration_tests/ci_tools/athena_drop_schemas.py
| ci_test_schemas: Optional[str], | ||
| num_workers: int, |
There was a problem hiding this comment.
Validate --num-workers and normalize --ci-test-schemas before expansion.
A negative worker count silently under-expands targets, and a whitespace-only base schema can produce invalid schema names. Fail fast before calling expand_ci_test_schemas(...).
Proposed fix
# Expand CI test schemas (base + workers + elementary)
if ci_test_schemas:
+ if num_workers < 0:
+ click.echo("error: --num-workers must be >= 0", err=True)
+ sys.exit(1)
+ ci_test_schemas = ci_test_schemas.strip()
+ if not ci_test_schemas:
+ click.echo("error: --ci-test-schemas cannot be blank", err=True)
+ sys.exit(1)
expanded = expand_ci_test_schemas(ci_test_schemas, num_workers)
click.echo(
f"CI test schemas expanded to {len(expanded)} schema(s) from base '{ci_test_schemas}'."
)Also applies to: 321-323
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@integration_tests/ci_tools/athena_drop_schemas.py` around lines 306 - 307,
Validate and normalize inputs before calling expand_ci_test_schemas: ensure
num_workers is a positive integer (raise/exit if <= 0) to avoid silent
under-expansion, and strip ci_test_schemas (and reject if the stripped string is
empty) to prevent generating invalid schema names; perform these checks where
ci_test_schemas and num_workers are accepted/used (including the other call site
around expand_ci_test_schemas) and only call expand_ci_test_schemas(...) after
validation/normalization succeeds.
Summary
Replaces the serial one-by-one table drops in Athena CI cleanup with Glue's
BatchDeleteTableAPI (up to 100 tables per API call). Adds a standalone Python script (integration_tests/ci_tools/athena_drop_schemas.py) that:The script auto-resolves AWS credentials from the existing
CI_WAREHOUSE_SECRETSblob (via_resolve_credentials), so no new secrets are needed and no inline credential-extraction logic is required in the workflow YAML.Workflow changes:
test-warehouse.yml: Athena now uses the Python script in a dedicated "Drop test schemas (Athena)" step; other cloud targets still use the dbt macrocleanup-stale-schemas.yml: Athena is removed from the dbt matrix and gets its owncleanup-athenajob (skips dbt installation overhead entirely)Script modes:
--ci-test-schemas BASE_NAME: Expands base schema to all xdist workers (_gw0–_gw7) +_elementaryvariants--stale --prefixes dbt_: Scans all Glue databases and drops those matching the CI naming convention older than--max-age-hoursReview & Testing Checklist for Human
batch_delete_tableonly removes Glue catalog entries. Unlike the previousadapter.drop_relation()path, it does not clean up underlying S3 data for Iceberg tables. Verify this is acceptable or whether S3 cleanup should be added (could lead to growing storage costs over time)._resolve_credentialslooks forathena_region,athena_aws_access_key_id,athena_aws_secret_access_keyin the decoded secrets blob. Confirm these match the actual keys inCI_WAREHOUSE_SECRETS.$SCHEMA_NAMEpropagation: The schema name is exported via$GITHUB_ENVfor the Athena step. If the "Write dbt profiles" step is skipped or the job is cancelled before it runs,$SCHEMA_NAMEwill be empty. The script will expand an empty string to harmless non-existent schema names, but verify this is acceptable.cleanup-stale-schemasworkflow from this branch, and verify thecleanup-athenajob successfully lists and drops stale schemas. Also run atest-warehouseAthena job and confirm the "Drop test schemas (Athena)" step runs and reports sensible output.Notes
Summary by CodeRabbit
Chores
New Features