Skip to content

Commit 82a62ee

Browse files
Fix #359: configuration-aware cquery hashing (#363)
Under --useCquery, BazelRule.ruleInputList() previously folded each ConfiguredRuleInput to its bare label via `.map { it.label }`, dropping `configurationChecksum`. Two configured graphs that shared dep labels but differed in per-edge configuration -- a `cfg = "exec"` transition flip, a --platforms swap, a --config=... change, or a `--define` toggling a select() inside a dep -- collapsed to identical ruleInputList() output, so RuleHasher's transitive walk produced identical digests and bazel-diff returned an empty impacted set under cquery mode. This change folds each dep edge's `configurationChecksum` into the rule-input string using `|` -- a character that is invalid in a Bazel label, so the encoding can never collide with a real input and survives the existing `List<String>` return type. The full encoded string is what gets mixed into RuleHasher's transitive hash, so two configured graphs differing only by per-edge configuration now produce distinct parent digests. RuleHasher splits on the separator to recover the bare label for `allRulesMap` / `sourceDigests` lookups and dep tracking, so the user-visible deps in the generated JSON stay clean. The empty-checksum fallback (a known Bazel quirk where cquery's `transitions=lite` output sometimes omits the checksum for an edge) returns the bare label unchanged, preserving pre-#359 behaviour for that path. This is a hash-format change: hashes generated by this version are not comparable to hashes from earlier versions under --useCquery, which is the intended behaviour -- the old hashes were missing the configuration signal. Tests: * Unignore the PR #361 reproducer (`testCqueryRuleInputListDistinguishesConfigurationChecksum_issue359`) and drop the two `_currentBehaviourBuggy_*` companions. * Add `testCqueryRuleInputListEncodesLabelAndChecksum_issue359` to pin the on-the-wire encoding. * Add `testCqueryRuleInputListEmptyChecksumFallsBackToBareLabel_issue359` to cover the Bazel-quirk fallback. * Add `testNonCqueryRuleInputListIgnoresConfiguredRuleInput_issue359` to confirm the encoding does not leak into the non-cquery path. * Add `testDecodeConfiguredRuleInputLabelRoundTrip_issue359` to pin the round-trip behaviour RuleHasher relies on. This is the minimal symptom-fixing slice of #359 -- the parent target's hash now reflects per-edge configuration changes, so the empty-impacted set under cquery is gone. Full target-determinator parity (per-(label, configuration) digest entries, `bazel config --output=json` enumeration, empty-checksum inheritance from the depending target) is intentionally out of scope here; those would change the on-disk JSON schema and warrant a separate design discussion on the issue. Refs #359, #361 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 117a3b1 commit 82a62ee

3 files changed

Lines changed: 115 additions & 51 deletions

File tree

cli/src/main/kotlin/com/bazel_diff/bazel/BazelRule.kt

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,33 @@ import com.google.devtools.build.lib.query2.proto.proto2api.Build
1111
// https://github.com/bazelbuild/bazel/blob/6971b016f1e258e3bb567a0f9fe7a88ad565d8f2/src/main/java/com/google/devtools/build/lib/query2/query/output/SyntheticAttributeHashCalculator.java#L78-L81
1212
private val DEFAULT_IGNORED_ATTRS = arrayOf("generator_location")
1313

14+
// Separator used to fold a cquery dep-edge's `configurationChecksum` into a rule-input string.
15+
// Bazel target/package names are restricted to `[A-Za-z0-9/._+=,@~-]*` plus `:` between package
16+
// and target, so `|` is invalid in a label and is a safe in-band encoding that survives the
17+
// existing `List<String>` return type of `ruleInputList()`. `RuleHasher` splits on this character
18+
// to recover the bare label for `allRulesMap` / `sourceDigests` lookups and dep tracking; the
19+
// full encoded string is what gets mixed into the transitive hash, so two configured graphs that
20+
// share dep labels but differ in per-edge configuration produce distinct rule digests. See #359.
21+
const val CONFIGURED_RULE_INPUT_SEPARATOR: Char = '|'
22+
23+
fun encodeConfiguredRuleInput(input: Build.ConfiguredRuleInput): String {
24+
// Fall back to a bare label when the checksum is empty -- this matches the pre-#359 behaviour
25+
// for the known Bazel quirk where cquery's `transitions=lite` output sometimes omits the
26+
// configurationChecksum for an edge, and avoids appending a dangling separator.
27+
val checksum = input.configurationChecksum
28+
return if (checksum.isNullOrEmpty()) {
29+
input.label
30+
} else {
31+
"${input.label}${CONFIGURED_RULE_INPUT_SEPARATOR}${checksum}"
32+
}
33+
}
34+
35+
fun decodeConfiguredRuleInputLabel(encoded: String): String {
36+
// No separator present -> the input is a bare label (non-cquery callers, empty-checksum
37+
// fallback, or //external:* synthetic inputs), so this is a no-op.
38+
return encoded.substringBefore(CONFIGURED_RULE_INPUT_SEPARATOR)
39+
}
40+
1441
class BazelRule(private val rule: Build.Rule) {
1542
fun digest(ignoredAttrs: Set<String>): ByteArray {
1643
return sha256 {
@@ -27,7 +54,7 @@ class BazelRule(private val rule: Build.Rule) {
2754

2855
fun ruleInputList(useCquery: Boolean, fineGrainedHashExternalRepos: Set<String>): List<String> {
2956
return if (useCquery) {
30-
rule.configuredRuleInputList.map { it.label } +
57+
rule.configuredRuleInputList.map { encodeConfiguredRuleInput(it) } +
3158
rule.ruleInputList
3259
.map { ruleInput: String ->
3360
transformRuleInput(fineGrainedHashExternalRepos, ruleInput)

cli/src/main/kotlin/com/bazel_diff/hash/RuleHasher.kt

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ package com.bazel_diff.hash
22

33
import com.bazel_diff.bazel.BazelRule
44
import com.bazel_diff.bazel.BazelSourceFileTarget
5+
import com.bazel_diff.bazel.decodeConfiguredRuleInputLabel
56
import com.bazel_diff.log.Logger
67
import com.google.common.annotations.VisibleForTesting
78
import java.nio.file.Path
@@ -56,12 +57,18 @@ class RuleHasher(
5657
putDirectBytes(seedHash)
5758

5859
for (ruleInput in rule.ruleInputList(useCquery, fineGrainedHashExternalRepos)) {
60+
// Under --useCquery, `ruleInput` may carry an embedded configurationChecksum (see
61+
// BazelRule.CONFIGURED_RULE_INPUT_SEPARATOR / #359). The full encoded string is what
62+
// we mix into the hash so two configured graphs differing only by per-edge
63+
// configuration produce distinct digests; the bare label is what we use to look up
64+
// the input in `allRulesMap` / `sourceDigests` and to track in `deps`.
5965
putDirectBytes(ruleInput.toByteArray())
66+
val inputLabel = decodeConfiguredRuleInputLabel(ruleInput)
6067

61-
val inputRule = allRulesMap[ruleInput]
68+
val inputRule = allRulesMap[inputLabel]
6269
when {
63-
inputRule == null && sourceDigests.containsKey(ruleInput) -> {
64-
putDirectBytes(sourceDigests[ruleInput])
70+
inputRule == null && sourceDigests.containsKey(inputLabel) -> {
71+
putDirectBytes(sourceDigests[inputLabel])
6572
}
6673
inputRule?.name != null && inputRule.name != rule.name -> {
6774
val ruleInputHash =
@@ -74,23 +81,23 @@ class RuleHasher(
7481
depPathClone,
7582
ignoredAttrs,
7683
modifiedFilepaths)
77-
putTransitiveBytes(ruleInput, ruleInputHash.overallDigest)
84+
putTransitiveBytes(inputLabel, ruleInputHash.overallDigest)
7885
}
7986
else -> {
8087
val heuristicDigest =
8188
sourceFileHasher.softDigest(
82-
BazelSourceFileTarget(ruleInput, ByteArray(0)), modifiedFilepaths)
89+
BazelSourceFileTarget(inputLabel, ByteArray(0)), modifiedFilepaths)
8390
when {
8491
heuristicDigest != null -> {
8592
logger.i {
86-
"Source file $ruleInput picked up as an input for rule ${rule.name}"
93+
"Source file $inputLabel picked up as an input for rule ${rule.name}"
8794
}
88-
sourceDigests[ruleInput] = heuristicDigest
95+
sourceDigests[inputLabel] = heuristicDigest
8996
putDirectBytes(heuristicDigest)
9097
}
9198
else ->
9299
logger.w {
93-
"Unable to calculate digest for input $ruleInput for rule ${rule.name}"
100+
"Unable to calculate digest for input $inputLabel for rule ${rule.name}"
94101
}
95102
}
96103
}

cli/src/test/kotlin/com/bazel_diff/bazel/BazelRuleTest.kt

Lines changed: 72 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@ import assertk.assertions.isNotEqualTo
66
import com.google.devtools.build.lib.query2.proto.proto2api.Build
77
import com.google.devtools.build.lib.query2.proto.proto2api.Build.Attribute
88
import com.google.devtools.build.lib.query2.proto.proto2api.Build.Rule
9-
import org.junit.Ignore
109
import org.junit.Test
1110

1211
class BazelRuleTest {
@@ -51,26 +50,18 @@ class BazelRuleTest {
5150
.isEqualTo(BazelRule(rule2Pb).digest(emptySet()))
5251
}
5352

54-
// Reproducer for https://github.com/Tinder/bazel-diff/issues/359
53+
// Fix for https://github.com/Tinder/bazel-diff/issues/359
5554
//
56-
// Under --useCquery, BazelRule.ruleInputList() reads each entry from configured_rule_input but
57-
// only takes `.label`, discarding `.configurationChecksum`. Two configured graphs that share
58-
// dep labels but differ in per-edge configuration -- a `cfg = "exec"` transition flip, a
59-
// --platforms swap, a --config=... change, or a `--define` toggling a select() inside a dep --
60-
// produce identical ruleInputList() output, so RuleHasher's transitive walk visits the same
61-
// labels and yields identical digests. The user-visible symptom is that bazel-diff under
62-
// cquery returns an empty impacted-targets set despite a real change in the configured graph,
63-
// diverging from bazel-contrib/target-determinator which keys hashes by
64-
// (label, configurationChecksum) and mixes the dep-edge configuration into the rule digest.
65-
//
66-
// @Ignore on the assertion below pins the current (buggy) behaviour as a reproducer: the
67-
// companion `_currentBehaviourBuggy` test asserts the lists are equal, so the suite stays
68-
// green; the `@Ignore`d test documents the desired post-fix invariant and can be flipped on
69-
// (or merged with `_currentBehaviourBuggy`) when the fix lands.
55+
// Under --useCquery, BazelRule.ruleInputList() now folds each ConfiguredRuleInput's
56+
// configurationChecksum into the rule-input string via CONFIGURED_RULE_INPUT_SEPARATOR (`|`).
57+
// Two configured graphs that share dep labels but differ in per-edge configuration -- a
58+
// `cfg = "exec"` transition flip, a --platforms swap, a --config=... change, or a `--define`
59+
// toggling a select() inside a dep -- produce distinct ruleInputList() output, so RuleHasher's
60+
// transitive walk mixes those bytes into the digest and the parent target's hash changes.
61+
// RuleHasher decodes back to the bare label when looking the input up in `allRulesMap` /
62+
// `sourceDigests` and when tracking deps, so the user-visible deps output stays clean.
7063
@Test
71-
@Ignore(
72-
"Reproducer for #359 -- expected to pass once cquery rule inputs are keyed by (label, configurationChecksum). Today the lists collapse and this assertion fails. The companion *_currentBehaviourBuggy_issue359 tests pin down the current behaviour so CI stays green; flipping this @Ignore off should be the second-to-last step of the fix PR.")
73-
fun testCqueryRuleInputListDistinguishesConfigurationChecksum_reproducerForIssue359() {
64+
fun testCqueryRuleInputListDistinguishesConfigurationChecksum_issue359() {
7465
val ruleA = configuredGenrule(depLabel = "//:dep", configurationChecksum = "cfg-checksum-A")
7566
val ruleB = configuredGenrule(depLabel = "//:dep", configurationChecksum = "cfg-checksum-B")
7667

@@ -79,39 +70,78 @@ class BazelRuleTest {
7970
val inputsB =
8071
BazelRule(ruleB).ruleInputList(useCquery = true, fineGrainedHashExternalRepos = emptySet())
8172

82-
// Today: both `[//:dep]` -- the test fails. After the fix it should pass.
8373
assertThat(inputsA).isNotEqualTo(inputsB)
8474
}
8575

86-
// Companion test that locks in the *current* buggy behaviour so we have a regression signal:
87-
// if this test ever starts failing, it means the collapse has been (partially) fixed and the
88-
// `@Ignore`d assertion above should be revisited. This keeps the reproducer visible in source
89-
// without breaking CI.
76+
// Pins the on-the-wire encoding so the format is observable and future changes are intentional.
77+
// The `|` separator is invalid in a Bazel label, so the encoded string cannot collide with any
78+
// real label and can be safely round-tripped by RuleHasher's `decodeConfiguredRuleInputLabel`.
9079
@Test
91-
fun testCqueryRuleInputListCollapsesConfigurationChecksum_currentBehaviourBuggy_issue359() {
92-
val ruleA = configuredGenrule(depLabel = "//:dep", configurationChecksum = "cfg-checksum-A")
93-
val ruleB = configuredGenrule(depLabel = "//:dep", configurationChecksum = "cfg-checksum-B")
80+
fun testCqueryRuleInputListEncodesLabelAndChecksum_issue359() {
81+
val rule = configuredGenrule(depLabel = "//:dep", configurationChecksum = "cfg-checksum-A")
9482

95-
val inputsA =
96-
BazelRule(ruleA).ruleInputList(useCquery = true, fineGrainedHashExternalRepos = emptySet())
97-
val inputsB =
98-
BazelRule(ruleB).ruleInputList(useCquery = true, fineGrainedHashExternalRepos = emptySet())
83+
val inputs =
84+
BazelRule(rule).ruleInputList(useCquery = true, fineGrainedHashExternalRepos = emptySet())
9985

100-
// Current (buggy) behaviour: both lists are exactly `[//:dep]`.
101-
assertThat(inputsA).isEqualTo(listOf("//:dep"))
102-
assertThat(inputsB).isEqualTo(listOf("//:dep"))
103-
assertThat(inputsA).isEqualTo(inputsB)
86+
assertThat(inputs).isEqualTo(listOf("//:dep|cfg-checksum-A"))
10487
}
10588

106-
// Also pins the buggy behaviour at the digest level: two BazelRules that differ ONLY in the
107-
// dep-edge configuration checksum currently produce identical `digest(...)` output, because
108-
// configured_rule_input is not part of BazelRule.digest() at all (see BazelRule.digest()).
89+
// The empty-checksum fallback (a known Bazel cquery quirk) must NOT append a dangling separator
90+
// -- we drop back to a bare label so RuleHasher's lookup keeps working unchanged.
10991
@Test
110-
fun testBazelRuleDigestIgnoresConfigurationChecksum_currentBehaviourBuggy_issue359() {
111-
val ruleA = configuredGenrule(depLabel = "//:dep", configurationChecksum = "cfg-checksum-A")
112-
val ruleB = configuredGenrule(depLabel = "//:dep", configurationChecksum = "cfg-checksum-B")
92+
fun testCqueryRuleInputListEmptyChecksumFallsBackToBareLabel_issue359() {
93+
val rule = configuredGenrule(depLabel = "//:dep", configurationChecksum = "")
94+
95+
val inputs =
96+
BazelRule(rule).ruleInputList(useCquery = true, fineGrainedHashExternalRepos = emptySet())
97+
98+
assertThat(inputs).isEqualTo(listOf("//:dep"))
99+
}
100+
101+
// Without --useCquery, configured_rule_input is irrelevant: the legacy ruleInputList field is
102+
// the source of truth, so the encoding must not leak into the non-cquery path.
103+
@Test
104+
fun testNonCqueryRuleInputListIgnoresConfiguredRuleInput_issue359() {
105+
val rule =
106+
Rule.newBuilder()
107+
.setRuleClass("genrule")
108+
.setName("//:gen")
109+
.addRuleInput("//:legacy_dep")
110+
.addConfiguredRuleInput(
111+
Build.ConfiguredRuleInput.newBuilder()
112+
.setLabel("//:dep")
113+
.setConfigurationChecksum("cfg-checksum-A")
114+
.build())
115+
.build()
116+
117+
val inputs =
118+
BazelRule(rule).ruleInputList(useCquery = false, fineGrainedHashExternalRepos = emptySet())
119+
120+
assertThat(inputs).isEqualTo(listOf("//:legacy_dep"))
121+
}
122+
123+
// Pins the round-trip behaviour `RuleHasher` relies on: the full encoded string lives in the
124+
// hash, and the bare label is what gets looked up in `allRulesMap` / `sourceDigests` and
125+
// tracked in `deps`. If the round-trip ever drifts the user-facing JSON would start emitting
126+
// dep labels with `|<checksum>` suffixes.
127+
@Test
128+
fun testDecodeConfiguredRuleInputLabelRoundTrip_issue359() {
129+
val withChecksum =
130+
encodeConfiguredRuleInput(
131+
Build.ConfiguredRuleInput.newBuilder()
132+
.setLabel("//:dep")
133+
.setConfigurationChecksum("cfg-A")
134+
.build())
135+
assertThat(decodeConfiguredRuleInputLabel(withChecksum)).isEqualTo("//:dep")
136+
137+
val withoutChecksum =
138+
encodeConfiguredRuleInput(Build.ConfiguredRuleInput.newBuilder().setLabel("//:dep").build())
139+
assertThat(decodeConfiguredRuleInputLabel(withoutChecksum)).isEqualTo("//:dep")
113140

114-
assertThat(BazelRule(ruleA).digest(emptySet())).isEqualTo(BazelRule(ruleB).digest(emptySet()))
141+
// Non-encoded labels (the non-cquery path or //external:* synthetic inputs) must round-trip
142+
// unchanged so the decode is safe to call unconditionally in RuleHasher.
143+
assertThat(decodeConfiguredRuleInputLabel("//:bare")).isEqualTo("//:bare")
144+
assertThat(decodeConfiguredRuleInputLabel("//external:foo")).isEqualTo("//external:foo")
115145
}
116146

117147
private fun configuredGenrule(depLabel: String, configurationChecksum: String): Rule {

0 commit comments

Comments
 (0)