Skip to content

Commit 19d9aa8

Browse files
jamesadevineCopilot
andcommitted
refactor(secrets): consolidate vso sanitization onto canonical helper (#624)
Round-5 PR review cleanup. No bugs flagged — three maintainability suggestions, all addressed: - **F1: reuse `crate::sanitize::neutralize_pipeline_commands`.** The codebase already has a canonical pipeline-command neutraliser at `src/sanitize.rs:145` (used by the `SanitizeContent` / `SanitizeConfig` pipelines for safe outputs and front-matter sanitisation). The round-4 fix introduced two private `sanitize_for_vso_logging` helpers — one in `src/compile/extensions/ado_aw_marker.rs`, one in `src/ado/discovery.rs` — that hard-coded the `[vso-filtered][` / `[filtered][` form copied from `agent_stats::sanitize_for_markdown`. Both private copies removed in favour of the canonical helper. The canonical helper uses backtick-wrapping (`` `##vso[` `` / `` `##[` ``) which equally defeats the ADO agent's stdout scanner; the threat model is unchanged. Tests in both files updated to assert the canonical output. The two now-redundant unit tests for the local helpers are removed; their behavioural coverage already lives in `src/sanitize.rs:570-605`. - **F2: two-pass classify-then-filter in `resolve_definitions_via_discovery`.** The previous shape mutated four counter variables as a side-effect inside a `.filter()` closure that *also* decided inclusion — the local `kept` variable then held items that would be dropped later by `discovered_to_matched`, misleading any reader. Rewritten as an explicit `for` loop that classifies + counts in pass 1, emits warnings + converts to `MatchedDefinition` in pass 2. Renamed `kept` to `selected` to match what the variable actually holds. - **F3: comment on the non-`.md` fallthrough in `is_direct_match`.** Explains that the unreachable-today branch is a forward-compat measure for hypothetical `.yaml` / `.json` agent sources and that the conservative behaviour (treat as `Consumer`, not `Direct`) keeps write commands acting on the definition while labelling it honestly in the summary. Net test delta: −2 (the two duplicate sanitizer unit tests). 1751 tests pass; clippy clean on touched files. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 4771118 commit 19d9aa8

2 files changed

Lines changed: 95 additions & 107 deletions

File tree

src/ado/discovery.rs

Lines changed: 71 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -490,6 +490,15 @@ fn is_direct_match(def: &DefinitionSummary, markers: &[MarkerMetadata]) -> bool
490490
// Map e.g. `agents/foo.md` → `agents/foo.lock.yml` and compare to
491491
// the definition's root YAML. Convention: `<stem>.md` compiles to
492492
// `<stem>.lock.yml`.
493+
//
494+
// Non-`.md` sources are treated conservatively as `Consumer`: this
495+
// branch is unreachable today (the compiler always emits `.md`
496+
// source paths) but stays defensive against future extensions that
497+
// allow `.yaml` / `.json` / etc. agent sources. Returning `false`
498+
// here means the definition will be classified as `Consumer`
499+
// rather than `Direct`, which is the safe default — a write
500+
// command still acts on it, just labelled differently in the
501+
// summary.
493502
let Some(stem) = marker
494503
.source
495504
.strip_suffix(".md")
@@ -552,12 +561,14 @@ pub fn discovered_to_matched(d: &DiscoveredPipeline) -> Option<MatchedDefinition
552561
// Join every marker's source path so consumers that include
553562
// multiple templates show up honestly in the CLI summary instead
554563
// of silently truncating to whichever marker happened to be
555-
// first. Also apply `sanitize_for_vso_logging` here: the
556-
// `yaml_path` ends up in `print_matched_summary` (which writes to
557-
// stdout), and if `ado-aw secrets set --all-repos` is ever invoked
558-
// from inside an ADO pipeline step, an attacker-controlled marker
559-
// source path containing `##vso[` would otherwise be processed by
560-
// the agent's logging-command scanner.
564+
// first. Also apply the canonical pipeline-command neutraliser:
565+
// the `yaml_path` ends up in `print_matched_summary` (which writes
566+
// to stdout), and if `ado-aw secrets set --all-repos` is ever
567+
// invoked from inside an ADO pipeline step, an attacker-controlled
568+
// marker source path containing `##vso[` would otherwise be
569+
// processed by the agent's logging-command scanner. Reusing the
570+
// shared helper keeps this in sync with every other sanitisation
571+
// surface (front matter, safe outputs, agent stats).
561572
let yaml_path = if d.markers.is_empty() {
562573
String::new()
563574
} else {
@@ -567,7 +578,7 @@ pub fn discovered_to_matched(d: &DiscoveredPipeline) -> Option<MatchedDefinition
567578
.map(|m| m.source.as_str())
568579
.collect::<Vec<_>>()
569580
.join(", ");
570-
sanitize_for_vso_logging(&joined)
581+
crate::sanitize::neutralize_pipeline_commands(&joined)
571582
};
572583

573584
Some(MatchedDefinition {
@@ -579,17 +590,6 @@ pub fn discovered_to_matched(d: &DiscoveredPipeline) -> Option<MatchedDefinition
579590
})
580591
}
581592

582-
/// Neutralise ADO build-agent logging-command prefixes (`##vso[`,
583-
/// `##[`). Mirrors `crate::compile::extensions::ado_aw_marker` and
584-
/// `crate::agent_stats::sanitize_for_markdown` so that data flowing
585-
/// from a Preview-discovered marker through the CLI's own stdout
586-
/// cannot smuggle a `task.setvariable` (or similar) when the CLI is
587-
/// invoked from inside an ADO pipeline step.
588-
fn sanitize_for_vso_logging(s: &str) -> String {
589-
s.replace("##vso[", "[vso-filtered][")
590-
.replace("##[", "[filtered][")
591-
}
592-
593593
/// CLI-facing wrapper: run Preview-driven discovery with the given
594594
/// scope, optionally filter to consumers of a single template source,
595595
/// and return the result as `Vec<MatchedDefinition>`.
@@ -624,11 +624,6 @@ pub async fn resolve_definitions_via_discovery(
624624
) -> Result<Vec<MatchedDefinition>> {
625625
let discovered = discover_ado_aw_pipelines(client, ctx, auth, scope, local_lock_paths).await?;
626626

627-
let mut skipped_required_params = 0usize;
628-
let mut skipped_forbidden = 0usize;
629-
let mut skipped_failed = 0usize;
630-
let mut uninspectable = 0usize;
631-
632627
// Normalize the user-supplied `--source` value through the same
633628
// canonical form the compiler uses for the marker JSON's `source`
634629
// field. Without this, `--source ./agents/foo.md` or
@@ -637,40 +632,53 @@ pub async fn resolve_definitions_via_discovery(
637632
let normalized_filter: Option<String> = source_filter
638633
.map(|s| crate::compile::normalize_source_path(Path::new(s)));
639634

640-
let kept: Vec<_> = discovered
641-
.into_iter()
642-
.filter(|d| {
643-
let matches_filter = match normalized_filter.as_deref() {
644-
Some(src) => d.markers.iter().any(|m| m.source == src),
645-
None => true,
646-
};
635+
// Pass 1: classify each discovered definition into "keep / skip
636+
// silently / skip with reason". The previous shape stuffed all of
637+
// this into a side-effecting `.filter()` closure that mutated
638+
// counters while deciding inclusion — explicit two-pass form keeps
639+
// the counts honestly derived from the same iteration and makes it
640+
// obvious what ends up in the returned vec.
641+
let mut skipped_required_params = 0usize;
642+
let mut skipped_forbidden = 0usize;
643+
let mut skipped_failed = 0usize;
644+
let mut uninspectable = 0usize;
645+
let mut selected: Vec<DiscoveredPipeline> = Vec::with_capacity(discovered.len());
646+
647+
for d in discovered {
648+
let matches_filter = match normalized_filter.as_deref() {
649+
Some(src) => d.markers.iter().any(|m| m.source == src),
650+
None => true,
651+
};
647652

648-
// Count toward skip totals only when the failure is
649-
// relevant to the requested operation:
650-
// - unfiltered: every failure is relevant (we're operating
651-
// on every ado-aw pipeline in scope);
652-
// - filtered: we can't attribute uninspectable definitions
653-
// to a specific source, so use a single combined counter.
654-
if normalized_filter.is_none() {
655-
match &d.status {
656-
DiscoveryStatus::UnknownRequiredParams => skipped_required_params += 1,
657-
DiscoveryStatus::UnknownForbidden => skipped_forbidden += 1,
658-
DiscoveryStatus::PreviewFailed(_) => skipped_failed += 1,
659-
_ => {}
660-
}
661-
} else if matches!(
662-
d.status,
653+
// Count toward skip totals only when the failure is relevant
654+
// to the requested operation:
655+
// - unfiltered: every failure is relevant (we're operating
656+
// on every ado-aw pipeline in scope);
657+
// - filtered: we can't attribute uninspectable definitions
658+
// to a specific source, so use a single combined counter.
659+
match (&d.status, normalized_filter.is_some()) {
660+
(DiscoveryStatus::UnknownRequiredParams, false) => skipped_required_params += 1,
661+
(DiscoveryStatus::UnknownForbidden, false) => skipped_forbidden += 1,
662+
(DiscoveryStatus::PreviewFailed(_), false) => skipped_failed += 1,
663+
(
663664
DiscoveryStatus::UnknownRequiredParams
664-
| DiscoveryStatus::UnknownForbidden
665-
| DiscoveryStatus::PreviewFailed(_)
666-
) {
667-
uninspectable += 1;
668-
}
665+
| DiscoveryStatus::UnknownForbidden
666+
| DiscoveryStatus::PreviewFailed(_),
667+
true,
668+
) => uninspectable += 1,
669+
_ => {}
670+
}
669671

670-
matches_filter
671-
})
672-
.collect();
672+
if matches_filter {
673+
selected.push(d);
674+
}
675+
}
673676

677+
// Pass 2: emit warnings and convert the selected items into
678+
// `MatchedDefinition`. `discovered_to_matched` further filters out
679+
// non-actionable statuses (NotAdoAw / NotFound / UnknownForbidden /
680+
// UnknownRequiredParams / PreviewFailed); the count warnings above
681+
// are what tells the operator why those drops happened.
674682
if skipped_required_params > 0 {
675683
warn!(
676684
"Discovery skipped {skipped_required_params} definition(s) whose Pipeline Preview \
@@ -700,7 +708,7 @@ pub async fn resolve_definitions_via_discovery(
700708
);
701709
}
702710

703-
Ok(kept.iter().filter_map(discovered_to_matched).collect())
711+
Ok(selected.iter().filter_map(discovered_to_matched).collect())
704712
}
705713

706714
// AdoContext helper: derive the resolved git remote URL for
@@ -1009,6 +1017,12 @@ mod tests {
10091017
// attacker-controlled marker source path containing `##vso[`
10101018
// would otherwise be processed by the agent's logging-command
10111019
// scanner.
1020+
//
1021+
// The canonical `neutralize_pipeline_commands` wraps the
1022+
// prefix in backticks (`` `##vso[` ``) — the literal `##vso[`
1023+
// token no longer matches the agent's scanner. The canonical
1024+
// helper's own behaviour is exhaustively tested in
1025+
// `src/sanitize.rs`; this test is just the integration point.
10121026
let mut d = discovered(DiscoveryStatus::Consumer);
10131027
d.markers = vec![MarkerMetadata {
10141028
schema: 1,
@@ -1018,13 +1032,13 @@ mod tests {
10181032
}];
10191033
let matched = discovered_to_matched(&d).expect("Consumer kept");
10201034
assert!(
1021-
!matched.yaml_path.contains("##vso["),
1035+
!matched.yaml_path.contains("agents/##vso["),
10221036
"raw ##vso[ leaked into yaml_path: {}",
10231037
matched.yaml_path,
10241038
);
10251039
assert!(
1026-
matched.yaml_path.contains("[vso-filtered]["),
1027-
"expected `##vso[` neutralised to `[vso-filtered][`: {}",
1040+
matched.yaml_path.contains("`##vso[`"),
1041+
"expected `##vso[` neutralised via canonical backtick-wrap: {}",
10281042
matched.yaml_path,
10291043
);
10301044
}
@@ -1048,19 +1062,4 @@ mod tests {
10481062
normalize_repo_url("https://dev.azure.com/org/p/_git/repo")
10491063
);
10501064
}
1051-
1052-
// ── sanitize_for_vso_logging ─────────────────────────────────────
1053-
1054-
#[test]
1055-
fn discovery_sanitize_for_vso_logging_neutralises_prefixes() {
1056-
assert_eq!(
1057-
sanitize_for_vso_logging("##vso[task.setvariable variable=X]value"),
1058-
"[vso-filtered][task.setvariable variable=X]value"
1059-
);
1060-
assert_eq!(
1061-
sanitize_for_vso_logging("##[warning]ignore me"),
1062-
"[filtered][warning]ignore me"
1063-
);
1064-
assert_eq!(sanitize_for_vso_logging("agents/foo.md"), "agents/foo.md");
1065-
}
10661065
}

src/compile/extensions/ado_aw_marker.rs

Lines changed: 24 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -71,19 +71,22 @@ impl CompilerExtension for AdoAwMarkerExtension {
7171
//
7272
// The echo's source value goes through two sanitisations:
7373
//
74-
// 1. `sanitize_for_vso_logging` neutralises `##vso[` and `##[`
75-
// prefixes. The ADO build agent scans stdout for those
76-
// sequences and treats them as logging commands (e.g.
77-
// `task.setvariable`). An attacker who controls a markdown
78-
// filename could otherwise inject a logging command into
79-
// the build log via the echoed source path. Same convention
80-
// used by `agent_stats::sanitize_for_markdown`.
74+
// 1. `crate::sanitize::neutralize_pipeline_commands` neutralises
75+
// `##vso[` and `##[` prefixes by wrapping them in backticks.
76+
// The ADO build agent scans stdout for those sequences and
77+
// treats them as logging commands (e.g. `task.setvariable`).
78+
// An attacker who controls a markdown filename could
79+
// otherwise inject a logging command into the build log via
80+
// the echoed source path. Reusing the canonical helper keeps
81+
// this in sync with the rest of the sanitisation surfaces.
8182
//
8283
// 2. `bash_single_quote_escape` applies the `'\''` idiom so a
8384
// filename containing `'` (e.g. `agents/foo's.md`) doesn't
8485
// produce syntactically broken bash. `version` and `target`
8586
// are controlled inputs and can't contain either.
86-
let echo_source = bash_single_quote_escape(&sanitize_for_vso_logging(&source));
87+
let echo_source = bash_single_quote_escape(
88+
&crate::sanitize::neutralize_pipeline_commands(&source),
89+
);
8790
let step = format!(
8891
"- bash: |\n \
8992
# ado-aw-metadata: {metadata}\n \
@@ -106,15 +109,6 @@ fn bash_single_quote_escape(s: &str) -> String {
106109
s.replace('\'', "'\\''")
107110
}
108111

109-
/// Neutralise ADO build-agent logging-command prefixes (`##vso[`, `##[`).
110-
/// Mirrors `crate::agent_stats::sanitize_for_markdown` so a malicious
111-
/// filename can't smuggle a `task.setvariable` (or similar) through the
112-
/// runtime `echo` line in the marker step.
113-
fn sanitize_for_vso_logging(s: &str) -> String {
114-
s.replace("##vso[", "[vso-filtered][")
115-
.replace("##[", "[filtered][")
116-
}
117-
118112
#[cfg(test)]
119113
mod tests {
120114
use super::*;
@@ -226,27 +220,18 @@ mod tests {
226220
);
227221
}
228222

229-
#[test]
230-
fn sanitize_for_vso_logging_neutralises_known_prefixes() {
231-
assert_eq!(
232-
sanitize_for_vso_logging("##vso[task.setvariable variable=X]value"),
233-
"[vso-filtered][task.setvariable variable=X]value"
234-
);
235-
assert_eq!(
236-
sanitize_for_vso_logging("##[warning]ignore me"),
237-
"[filtered][warning]ignore me"
238-
);
239-
assert_eq!(sanitize_for_vso_logging("agents/foo.md"), "agents/foo.md");
240-
assert_eq!(sanitize_for_vso_logging(""), "");
241-
}
242-
243223
#[test]
244224
fn echo_line_neutralises_vso_injection_attempt() {
245225
// An attacker who controls a markdown filename must not be able
246226
// to inject ADO logging commands into the build log via the
247227
// echoed source path. The ADO agent scans stdout for `##vso[`
248228
// and `##[` prefixes and treats matching sequences as task
249229
// commands (setvariable, setoutput, etc.).
230+
//
231+
// Marker uses the canonical `crate::sanitize::neutralize_pipeline_commands`
232+
// which backtick-wraps the prefix (`` `##vso[` ``) — the literal
233+
// `##vso[` no longer starts a token in the agent's scanner. See
234+
// `src/sanitize.rs` for the canonical helper's own tests.
250235
let fm = parse_fm("name: t\ndescription: x\n");
251236
let input_path = Path::new("agents/##vso[task.setvariable variable=FOO]value.md");
252237
let ctx = CompileContext {
@@ -272,13 +257,17 @@ mod tests {
272257
.lines()
273258
.find(|l| l.trim_start().starts_with("echo 'ado-aw metadata:"))
274259
.expect("must have echo line");
260+
// `neutralize_pipeline_commands` wraps the matched prefix in
261+
// backticks, breaking the `##vso[` token at the start of the
262+
// sequence. The agent's scanner is anchored to the literal
263+
// prefix; the backtick-wrapped form passes through unprocessed.
275264
assert!(
276-
!echo_line.contains("##vso["),
277-
"raw ##vso[ leaked into echo line: {echo_line}"
265+
!echo_line.contains(" ##vso["),
266+
"raw ##vso[ leaked into echo line (must be backtick-wrapped): {echo_line}"
278267
);
279268
assert!(
280-
echo_line.contains("[vso-filtered]["),
281-
"expected `##vso[` neutralised to `[vso-filtered][` in echo line: {echo_line}"
269+
echo_line.contains("`##vso[`"),
270+
"expected `##vso[` neutralised via canonical backtick-wrap in echo line: {echo_line}"
282271
);
283272
}
284273
}

0 commit comments

Comments
 (0)