Skip to content

fix: use py_<YYMMDD_HHMMSS>_<branch>_<hash> schema naming to prevent CI collisions#2126

Merged
haritamar merged 5 commits intomasterfrom
devin/1772219716-fix-ci-schema-naming
Feb 28, 2026
Merged

fix: use py_<YYMMDD_HHMMSS>_<branch>_<hash> schema naming to prevent CI collisions#2126
haritamar merged 5 commits intomasterfrom
devin/1772219716-fix-ci-schema-naming

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot commented Feb 27, 2026

fix: use py_<YYMMDD_HHMMSS>_<branch>_<hash> schema naming to prevent CI collisions

Summary

Replaces the old truncation-based CI schema naming (py_<warehouse>_dbt_<version>_<branch> truncated to 40 chars) with a hash-based approach that prevents cross-branch collisions when concurrent CI jobs target the same warehouse.

Old format: py_ + <warehouse>_dbt_<dbt_version>_<branch> truncated to 40 chars via head -c 40
New format: py_<YYMMDD_HHMMSS>_<branch≤19>_<8-char SHA-256 hash of concurrency group>

The hash is derived from the full concurrency group key (warehouse × dbt-version × branch), so jobs that can run concurrently always get different schema names. The timestamp (second-level, explicit UTC) ensures each CI run gets a unique schema and enables time-based cleanup of stale schemas.

This is a companion PR to dbt-data-reliability#940 which applies the same pattern with a dbt_ prefix.

Updates since last revision

  • Added HHMM to timestamp (YYMMDD_HHMM, 11 chars) for per-run uniqueness — motivated by Athena investigation showing stale data_monitoring_metrics from prior runs contaminating test results when schemas collide across runs on the same day
  • Added seconds (%S) to timestamp (YYMMDD_HHMMSS, 13 chars) per maintainer request for completeness
  • Branch truncated to 19 chars (down from 29) to accommodate the longer timestamp while staying within budget
  • Explicit UTC (date -u) for timestamp consistency across runners
  • Collapsed consecutive underscores in sanitized branch names (s/__*/_/g)

Budget (PostgreSQL 63-char limit):
py_(3) + timestamp(13) + _(1) + branch(≤19) + _(1) + hash(8) = 45, plus _elementary(11) + _gw7(4) = 60

Review & Testing Checklist for Human

  • Verify 60-char budget math. py_(3) + YYMMDD_HHMMSS(13) + _(1) + branch(≤19) + _(1) + hash(8) = 45, plus _elementary(11) + _gw7(4) = 60. Confirm this holds for your longest xdist worker suffix and schema postfixes. The margin is only 3 chars — any longer suffixes will silently truncate on PostgreSQL.
  • Run a CI job on a branch and verify the schema name in the "Write dbt profiles" step output matches the expected py_YYMMDD_HHMMSS_branch_hash format (with seconds).
  • Old schemas are now orphaned. Existing schemas named py_<warehouse>_dbt_... will no longer be used or cleaned up by CI. Plan manual cleanup if needed.

Notes

  • Second-level timestamp precision provides extra safety beyond the minute-level uniqueness already enforced by concurrency groups + cancel-in-progress: true.
  • Link to Devin run
  • Requested by: @haritamar

Summary by CodeRabbit

  • Chores
    • Updated warehouse testing workflow to use an explicit multi-part schema naming scheme.
    • New schema names include a UTC timestamp (YYMMDD_HHMMSS), a sanitized lowercased branch label (max 19 chars), and a short hash to reduce collisions.
    • Runs are grouped and serialized by warehouse type, tooling version, and branch to avoid conflicts.
    • Workflow now logs the generated schema identifier for visibility.
    • All other workflow behavior remains unchanged.

Open with Devin

…isions

Replace the old truncation-based schema naming with a hash-based approach
that prevents cross-branch collisions when concurrent CI jobs share the
same warehouse. Uses py_ prefix to identify the Python package CI
(matching dbt_ prefix in dbt-data-reliability).

Format: py_<YYMMDD>_<branch≤29>_<8-char-hash>
The hash is derived from the concurrency group key.

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Copy Markdown
Contributor

👋 @devin-ai-integration[bot]
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in this pull request.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 27, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replaces prior UNDERSCORED_REF_NAME logic in the CI workflow with an explicit SCHEMA_NAME composed of DATE_STAMP (YYMMDD_HHMMSS), SAFE_BRANCH (sanitized, lowercased, alphanumeric/underscore, length-capped), and SHORT_HASH (first 8 chars of SHA‑256 of CONCURRENCY_GROUP). Emits an echo and updates the --schema-name argument; introduces CONCURRENCY_GROUP for run serialization.

Changes

Cohort / File(s) Summary
Workflow: schema naming
.github/workflows/test-warehouse.yml
Reworks schema-name derivation: introduces CONCURRENCY_GROUP (warehouse type + dbt version + branch), computes SHORT_HASH as first 8 chars of SHA‑256(CONCURRENCY_GROUP), creates SAFE_BRANCH (lowercased, alphanumeric/underscore, max 19 chars), adds DATE_STAMP (YYMMDD_HHMMSS), and builds SCHEMA_NAME as py_<DATE_STAMP>_<SAFE_BRANCH>_<SHORT_HASH>. Replaces prior DBT_VERSION/UNDERSCORED_REF_NAME approach, updates --schema-name usage, echoes derived components, and clarifies concurrency-group comment about serialization by warehouse type, dbt version, and branch.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇
I dig a name with time and hash tonight,
A tidy burrow stamped in moonlit light,
Branches scrubbed and shortened, hashes neat,
Tests hop in order, no two burrows meet. 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: replacing the schema naming approach from truncation-based to hash-and-timestamp-based format to prevent CI collisions.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/1772219716-fix-ci-schema-naming

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
.github/workflows/test-warehouse.yml (1)

120-131: Schema naming logic is well-designed with correct budget calculation.

The implementation correctly:

  • Derives the hash from the same components as the concurrency group
  • Uses echo -n to avoid hashing a trailing newline
  • Applies lowercase before the character filter
  • Respects the 63-character PostgreSQL limit with the calculated budget

One optional enhancement: multiple consecutive special characters in branch names become multiple underscores (e.g., feature/v2--hotfixfeature_v2__hotfix). If consistent formatting is desired, you could collapse them:

Optional: collapse consecutive underscores
-          SAFE_BRANCH=$(echo "${BRANCH_NAME}" | awk '{print tolower($0)}' | sed "s/[^a-z0-9]/_/g" | head -c 29)
+          SAFE_BRANCH=$(echo "${BRANCH_NAME}" | awk '{print tolower($0)}' | sed "s/[^a-z0-9]/_/g; s/__*/_/g" | head -c 29)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/test-warehouse.yml around lines 120 - 131, The SAFE_BRANCH
transformation currently lowercases and replaces non-alphanumerics with
underscores but leaves consecutive underscores; update the SAFE_BRANCH pipeline
(the command that produces SAFE_BRANCH used by CONCURRENCY_GROUP, SHORT_HASH and
SCHEMA_NAME) to collapse repeated underscores into a single underscore (e.g., by
adding a step such as running the output through a squeeze/dedup of underscores)
and then apply head -c 29 as before so the final SCHEMA_NAME still fits the
63-char PostgreSQL budget.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.github/workflows/test-warehouse.yml:
- Around line 120-131: The SAFE_BRANCH transformation currently lowercases and
replaces non-alphanumerics with underscores but leaves consecutive underscores;
update the SAFE_BRANCH pipeline (the command that produces SAFE_BRANCH used by
CONCURRENCY_GROUP, SHORT_HASH and SCHEMA_NAME) to collapse repeated underscores
into a single underscore (e.g., by adding a step such as running the output
through a squeeze/dedup of underscores) and then apply head -c 29 as before so
the final SCHEMA_NAME still fits the 63-char PostgreSQL budget.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c65cd98 and 41c5055.

📒 Files selected for processing (1)
  • .github/workflows/test-warehouse.yml

devin-ai-integration Bot and others added 2 commits February 27, 2026 19:20
…tpick)

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
.github/workflows/test-warehouse.yml (1)

120-134: Consider using explicit UTC for consistent timestamps across runners.

The schema naming logic is well-designed with accurate budget calculations. However, date +%y%m%d_%H%M uses the system timezone. While GitHub-hosted runners default to UTC, explicitly specifying UTC ensures consistency if runners are ever reconfigured or if self-hosted runners are used.

🔧 Suggested change for explicit UTC
-          DATE_STAMP=$(date +%y%m%d_%H%M)
+          DATE_STAMP=$(date -u +%y%m%d_%H%M)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/test-warehouse.yml around lines 120 - 134, The DATE_STAMP
generation uses the local system timezone; change it to explicit UTC so schema
timestamps are consistent across runners by updating how DATE_STAMP is created
(the DATE_STAMP variable used to build SCHEMA_NAME). Replace the current date
invocation with an explicit UTC invocation (e.g., use date -u or set TZ=UTC for
the command) so DATE_STAMP and consequently SCHEMA_NAME are always based on UTC.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.github/workflows/test-warehouse.yml:
- Around line 120-134: The DATE_STAMP generation uses the local system timezone;
change it to explicit UTC so schema timestamps are consistent across runners by
updating how DATE_STAMP is created (the DATE_STAMP variable used to build
SCHEMA_NAME). Replace the current date invocation with an explicit UTC
invocation (e.g., use date -u or set TZ=UTC for the command) so DATE_STAMP and
consequently SCHEMA_NAME are always based on UTC.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3f540f6 and 415b35b.

📒 Files selected for processing (1)
  • .github/workflows/test-warehouse.yml

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
coderabbitai[bot]

This comment was marked as resolved.

@devin-ai-integration devin-ai-integration Bot changed the title fix: use py_<yymmdd>_<branch>_<hash> schema naming to prevent CI collisions fix: use py_<YYMMDD_HHMM>_<branch>_<hash> schema naming to prevent CI collisions Feb 27, 2026
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
@devin-ai-integration devin-ai-integration Bot changed the title fix: use py_<YYMMDD_HHMM>_<branch>_<hash> schema naming to prevent CI collisions fix: use py_<YYMMDD_HHMMSS>_<branch>_<hash> schema naming to prevent CI collisions Feb 27, 2026
@haritamar haritamar merged commit c0a9602 into master Feb 28, 2026
16 checks passed
@haritamar haritamar deleted the devin/1772219716-fix-ci-schema-naming branch February 28, 2026 08:44
Copy link
Copy Markdown
Contributor Author

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Open in Devin Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant