Conversation
…isions Replace the old truncation-based schema naming with a hash-based approach that prevents cross-branch collisions when concurrent CI jobs share the same warehouse. Uses py_ prefix to identify the Python package CI (matching dbt_ prefix in dbt-data-reliability). Format: py_<YYMMDD>_<branch≤29>_<8-char-hash> The hash is derived from the concurrency group key. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
👋 @devin-ai-integration[bot] |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReplaces prior UNDERSCORED_REF_NAME logic in the CI workflow with an explicit SCHEMA_NAME composed of DATE_STAMP (YYMMDD_HHMMSS), SAFE_BRANCH (sanitized, lowercased, alphanumeric/underscore, length-capped), and SHORT_HASH (first 8 chars of SHA‑256 of CONCURRENCY_GROUP). Emits an echo and updates the Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
.github/workflows/test-warehouse.yml (1)
120-131: Schema naming logic is well-designed with correct budget calculation.The implementation correctly:
- Derives the hash from the same components as the concurrency group
- Uses
echo -nto avoid hashing a trailing newline- Applies lowercase before the character filter
- Respects the 63-character PostgreSQL limit with the calculated budget
One optional enhancement: multiple consecutive special characters in branch names become multiple underscores (e.g.,
feature/v2--hotfix→feature_v2__hotfix). If consistent formatting is desired, you could collapse them:Optional: collapse consecutive underscores
- SAFE_BRANCH=$(echo "${BRANCH_NAME}" | awk '{print tolower($0)}' | sed "s/[^a-z0-9]/_/g" | head -c 29) + SAFE_BRANCH=$(echo "${BRANCH_NAME}" | awk '{print tolower($0)}' | sed "s/[^a-z0-9]/_/g; s/__*/_/g" | head -c 29)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/test-warehouse.yml around lines 120 - 131, The SAFE_BRANCH transformation currently lowercases and replaces non-alphanumerics with underscores but leaves consecutive underscores; update the SAFE_BRANCH pipeline (the command that produces SAFE_BRANCH used by CONCURRENCY_GROUP, SHORT_HASH and SCHEMA_NAME) to collapse repeated underscores into a single underscore (e.g., by adding a step such as running the output through a squeeze/dedup of underscores) and then apply head -c 29 as before so the final SCHEMA_NAME still fits the 63-char PostgreSQL budget.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In @.github/workflows/test-warehouse.yml:
- Around line 120-131: The SAFE_BRANCH transformation currently lowercases and
replaces non-alphanumerics with underscores but leaves consecutive underscores;
update the SAFE_BRANCH pipeline (the command that produces SAFE_BRANCH used by
CONCURRENCY_GROUP, SHORT_HASH and SCHEMA_NAME) to collapse repeated underscores
into a single underscore (e.g., by adding a step such as running the output
through a squeeze/dedup of underscores) and then apply head -c 29 as before so
the final SCHEMA_NAME still fits the 63-char PostgreSQL budget.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
.github/workflows/test-warehouse.yml
…tpick) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
.github/workflows/test-warehouse.yml (1)
120-134: Consider using explicit UTC for consistent timestamps across runners.The schema naming logic is well-designed with accurate budget calculations. However,
date +%y%m%d_%H%Muses the system timezone. While GitHub-hosted runners default to UTC, explicitly specifying UTC ensures consistency if runners are ever reconfigured or if self-hosted runners are used.🔧 Suggested change for explicit UTC
- DATE_STAMP=$(date +%y%m%d_%H%M) + DATE_STAMP=$(date -u +%y%m%d_%H%M)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/test-warehouse.yml around lines 120 - 134, The DATE_STAMP generation uses the local system timezone; change it to explicit UTC so schema timestamps are consistent across runners by updating how DATE_STAMP is created (the DATE_STAMP variable used to build SCHEMA_NAME). Replace the current date invocation with an explicit UTC invocation (e.g., use date -u or set TZ=UTC for the command) so DATE_STAMP and consequently SCHEMA_NAME are always based on UTC.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In @.github/workflows/test-warehouse.yml:
- Around line 120-134: The DATE_STAMP generation uses the local system timezone;
change it to explicit UTC so schema timestamps are consistent across runners by
updating how DATE_STAMP is created (the DATE_STAMP variable used to build
SCHEMA_NAME). Replace the current date invocation with an explicit UTC
invocation (e.g., use date -u or set TZ=UTC for the command) so DATE_STAMP and
consequently SCHEMA_NAME are always based on UTC.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
.github/workflows/test-warehouse.yml
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
fix: use
py_<YYMMDD_HHMMSS>_<branch>_<hash>schema naming to prevent CI collisionsSummary
Replaces the old truncation-based CI schema naming (
py_<warehouse>_dbt_<version>_<branch>truncated to 40 chars) with a hash-based approach that prevents cross-branch collisions when concurrent CI jobs target the same warehouse.Old format:
py_+<warehouse>_dbt_<dbt_version>_<branch>truncated to 40 chars viahead -c 40New format:
py_<YYMMDD_HHMMSS>_<branch≤19>_<8-char SHA-256 hash of concurrency group>The hash is derived from the full concurrency group key (
warehouse × dbt-version × branch), so jobs that can run concurrently always get different schema names. The timestamp (second-level, explicit UTC) ensures each CI run gets a unique schema and enables time-based cleanup of stale schemas.This is a companion PR to dbt-data-reliability#940 which applies the same pattern with a
dbt_prefix.Updates since last revision
YYMMDD_HHMM, 11 chars) for per-run uniqueness — motivated by Athena investigation showing staledata_monitoring_metricsfrom prior runs contaminating test results when schemas collide across runs on the same day%S) to timestamp (YYMMDD_HHMMSS, 13 chars) per maintainer request for completenessdate -u) for timestamp consistency across runnerss/__*/_/g)Budget (PostgreSQL 63-char limit):
py_(3) + timestamp(13) +_(1) + branch(≤19) +_(1) + hash(8) = 45, plus_elementary(11) +_gw7(4) = 60Review & Testing Checklist for Human
py_(3) +YYMMDD_HHMMSS(13) +_(1) + branch(≤19) +_(1) + hash(8) = 45, plus_elementary(11) +_gw7(4) = 60. Confirm this holds for your longest xdist worker suffix and schema postfixes. The margin is only 3 chars — any longer suffixes will silently truncate on PostgreSQL.py_YYMMDD_HHMMSS_branch_hashformat (with seconds).py_<warehouse>_dbt_...will no longer be used or cleaned up by CI. Plan manual cleanup if needed.Notes
cancel-in-progress: true.Summary by CodeRabbit