Skip to content

Commit 9d9bee8

Browse files
haasonsaasclaude
andauthored
feat: audit fixes — graceful OTEL, async-safe plugins, URL validation, +100 tests (#36)
* feat: audit fixes — graceful OTEL, async-safe plugins, URL validation, +100 tests - OTEL init no longer panics on misconfigured endpoints; falls back gracefully - Semgrep/ESLint plugins use tokio::spawn_blocking to avoid blocking the runtime - Added semgrep --timeout flag (30s) for process execution safety - Git clone URL validation: only https://, git@, and http://*.git accepted - Added PartialEq derive to ContextType for test assertions - 100+ new tests across storage_json, interactive commands, plugins, CLI commands (pr.rs, git.rs), and duplicate_filter Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: 4 TDD-discovered bugs — comment detection, function boundaries, dedup severity, empty LLM responses Bug 1: `is_comment_line` missed bare `#` (shell/Makefile/Python comments) - Changed `#!` → `#![` in non-comment prefixes (Rust inner attrs always have `[`) - Any `#` line not matching a known code prefix is now treated as a comment Bug 2: `find_function_end` ignored single-quoted strings - `'{'` in JS/Python strings was counted as a real brace, breaking function boundary detection. Now tracks both `"` and `'` string delimiters. Bug 3: `deduplicate_comments` could drop higher-severity comments - `dedup_by` keeps the first element; sort didn't include severity. Now sorts by severity (highest first) within same file/line/content group. Bug 4: OpenAI/Anthropic silently returned empty string for empty responses - Empty `choices`/`content` arrays now return errors instead of succeeding with empty content, preventing silent failures in the review pipeline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: 5 TDD-discovered bugs — error rate, diff parsing, default model, file status - Error rate now counts only explicit review.failed events, not all non-completed - Empty lines in diffs treated as context instead of being silently skipped - parse_text_diff sets is_new/is_deleted based on empty old/new content - Default model changed from Sonnet to Opus per project conventions - Added 5 new tests covering all discovered issues Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review feedback — Rust lifetimes and shebang detection - Skip single-quote tracking for Rust in find_function_end (lifetimes like 'a/'static are not string delimiters, were breaking brace count) - Add #!/ to HASH_NON_COMMENT_PREFIXES so shebang lines are not misclassified as comments (changing interpreter is a real code change) - Added 2 regression tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: cargo fmt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: dedup severity bug in composable pipeline, accept ssh:// URLs, clean up comments - Fix DeduplicateStage in composable_pipeline.rs to preserve highest severity when deduplicating (same bug as comment.rs, different path) - Accept ssh:// URLs in is_safe_git_url (was rejecting legitimate SSH) - Consolidate is_git_source to delegate to is_safe_git_url - Replace stale "BUG:" test comments with "Regression:" phrasing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: process timeouts, test quality, and review cleanup - Wrap semgrep/eslint spawn_blocking in tokio::time::timeout to prevent hanging subprocesses from permanently consuming blocking pool threads - Replace false-positive semgrep test (tested struct construction, not analyzer) with actual timeout behavior test - Replace eslint extension test with direct JS_EXTENSIONS unit test that verifies filter membership rather than relying on end-to-end run - Fix DeduplicateStage in composable_pipeline to preserve highest severity (same bug as comment.rs dedup, different code path) - Accept ssh:// URLs in is_safe_git_url - Clean up stale BUG: comments to Regression: phrasing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: align PG backend error_rate with JSON backend Both backends now count only explicit review.failed events as failures, not all non-completed events. Fixes inconsistency where PG counted timeouts as failures but JSON did not. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent c8af0ed commit 9d9bee8

File tree

17 files changed

+1942
-102
lines changed

17 files changed

+1942
-102
lines changed

src/adapters/anthropic.rs

Lines changed: 28 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -121,14 +121,18 @@ impl LLMAdapter for AnthropicAdapter {
121121
.content
122122
.first()
123123
.map(|c| {
124-
// Verify it's a text content type
125124
if c.content_type == "text" {
126-
c.text.clone()
125+
Ok(c.text.clone())
127126
} else {
128-
format!("Unsupported content type: {}", c.content_type)
127+
Err(anyhow::anyhow!(
128+
"Unsupported content type: {}",
129+
c.content_type
130+
))
129131
}
130132
})
131-
.unwrap_or_default();
133+
.ok_or_else(|| {
134+
anyhow::anyhow!("Anthropic returned empty content array — no content generated")
135+
})??;
132136

133137
Ok(LLMResponse {
134138
content,
@@ -292,7 +296,7 @@ mod tests {
292296
}
293297

294298
#[tokio::test]
295-
async fn test_unsupported_content_type() {
299+
async fn test_unsupported_content_type_returns_error() {
296300
let mut server = mockito::Server::new_async().await;
297301
let _mock = server
298302
.mock("POST", "/messages")
@@ -311,17 +315,20 @@ mod tests {
311315
let adapter = AnthropicAdapter::new(test_config(&server.url())).unwrap();
312316
let result = adapter.complete(test_request()).await;
313317

314-
assert!(result.is_ok());
315-
let response = result.unwrap();
316318
assert!(
317-
response.content.contains("Unsupported content type"),
318-
"Expected 'Unsupported content type' in response, got: {}",
319-
response.content
319+
result.is_err(),
320+
"Unsupported content type should return an error"
321+
);
322+
let err = result.unwrap_err().to_string();
323+
assert!(
324+
err.contains("Unsupported content type"),
325+
"Error should mention unsupported type, got: {}",
326+
err
320327
);
321328
}
322329

323330
#[tokio::test]
324-
async fn test_empty_content_array() {
331+
async fn test_empty_content_array_returns_error() {
325332
let mut server = mockito::Server::new_async().await;
326333
let _mock = server
327334
.mock("POST", "/messages")
@@ -340,9 +347,16 @@ mod tests {
340347
let adapter = AnthropicAdapter::new(test_config(&server.url())).unwrap();
341348
let result = adapter.complete(test_request()).await;
342349

343-
assert!(result.is_ok());
344-
let response = result.unwrap();
345-
assert_eq!(response.content, "");
350+
assert!(
351+
result.is_err(),
352+
"Empty content array should return an error, not silently succeed"
353+
);
354+
let err = result.unwrap_err().to_string();
355+
assert!(
356+
err.contains("empty content"),
357+
"Error should mention empty content: {}",
358+
err
359+
);
346360
}
347361

348362
#[test]

src/adapters/openai.rs

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -196,7 +196,9 @@ impl OpenAIAdapter {
196196
.choices
197197
.first()
198198
.map(|c| c.message.content.clone())
199-
.unwrap_or_default();
199+
.ok_or_else(|| {
200+
anyhow::anyhow!("OpenAI returned empty choices array — no content generated")
201+
})?;
200202

201203
Ok(LLMResponse {
202204
content,
@@ -416,7 +418,7 @@ mod tests {
416418
}
417419

418420
#[tokio::test]
419-
async fn test_empty_choices_returns_empty_content() {
421+
async fn test_empty_choices_returns_error() {
420422
let mut server = mockito::Server::new_async().await;
421423
let _mock = server
422424
.mock("POST", "/chat/completions")
@@ -435,9 +437,16 @@ mod tests {
435437
let adapter = OpenAIAdapter::new(test_config(&server.url())).unwrap();
436438
let result = adapter.complete(test_request()).await;
437439

438-
assert!(result.is_ok());
439-
let response = result.unwrap();
440-
assert_eq!(response.content, "");
440+
assert!(
441+
result.is_err(),
442+
"Empty choices should return an error, not silently succeed"
443+
);
444+
let err = result.unwrap_err().to_string();
445+
assert!(
446+
err.contains("empty choices"),
447+
"Error should mention empty choices: {}",
448+
err
449+
);
441450
}
442451

443452
#[test]

src/config.rs

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1061,7 +1061,7 @@ impl Config {
10611061
}
10621062

10631063
fn default_model() -> String {
1064-
"claude-sonnet-4-6".to_string()
1064+
"claude-opus-4-6".to_string()
10651065
}
10661066

10671067
fn default_temperature() -> f32 {
@@ -1764,4 +1764,16 @@ temperature: 0.3
17641764
assert!(config.model_embedding.is_none());
17651765
assert!(config.fallback_models.is_empty());
17661766
}
1767+
1768+
#[test]
1769+
fn test_default_model_is_frontier() {
1770+
// Per CLAUDE.md: "Always use frontier models (Opus) for AI-powered features
1771+
// — never default to Sonnet or smaller"
1772+
let model = default_model();
1773+
assert!(
1774+
model.contains("opus"),
1775+
"Default model should be Opus (frontier), got: {}",
1776+
model
1777+
);
1778+
}
17671779
}

src/core/comment.rs

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -449,12 +449,23 @@ impl CommentSynthesizer {
449449
}
450450

451451
fn deduplicate_comments(comments: &mut Vec<Comment>) {
452+
let severity_rank = |s: &Severity| match s {
453+
Severity::Error => 0,
454+
Severity::Warning => 1,
455+
Severity::Info => 2,
456+
Severity::Suggestion => 3,
457+
};
458+
459+
// Sort by file/line/content, then by severity (highest first)
452460
comments.sort_by(|a, b| {
453461
a.file_path
454462
.cmp(&b.file_path)
455463
.then(a.line_number.cmp(&b.line_number))
456464
.then(a.content.cmp(&b.content))
465+
.then(severity_rank(&a.severity).cmp(&severity_rank(&b.severity)))
457466
});
467+
// dedup_by keeps the first element (b) of consecutive duplicates,
468+
// which is the highest severity due to our sort order
458469
comments.dedup_by(|a, b| {
459470
a.file_path == b.file_path && a.line_number == b.line_number && a.content == b.content
460471
});
@@ -548,6 +559,49 @@ pub struct RawComment {
548559
mod tests {
549560
use super::*;
550561

562+
#[test]
563+
fn test_deduplicate_preserves_highest_severity() {
564+
// Regression: dedup_by keeps the first element of a consecutive pair,
565+
// but doesn't consider severity. If Warning comes before Error
566+
// (due to stable sort on file/line/content), the Error is dropped.
567+
let raw_comments = vec![
568+
RawComment {
569+
file_path: PathBuf::from("src/lib.rs"),
570+
line_number: 10,
571+
content: "Missing null check".to_string(),
572+
rule_id: None,
573+
suggestion: None,
574+
severity: Some(Severity::Warning),
575+
category: Some(Category::Bug),
576+
confidence: Some(0.8),
577+
fix_effort: None,
578+
tags: Vec::new(),
579+
code_suggestion: None,
580+
},
581+
RawComment {
582+
file_path: PathBuf::from("src/lib.rs"),
583+
line_number: 10,
584+
content: "Missing null check".to_string(),
585+
rule_id: None,
586+
suggestion: None,
587+
severity: Some(Severity::Error),
588+
category: Some(Category::Bug),
589+
confidence: Some(0.9),
590+
fix_effort: None,
591+
tags: Vec::new(),
592+
code_suggestion: None,
593+
},
594+
];
595+
596+
let comments = CommentSynthesizer::synthesize(raw_comments).unwrap();
597+
assert_eq!(comments.len(), 1, "Should deduplicate to one comment");
598+
assert_eq!(
599+
comments[0].severity,
600+
Severity::Error,
601+
"Should keep the higher severity (Error), not the lower (Warning)"
602+
);
603+
}
604+
551605
#[test]
552606
fn test_severity_display() {
553607
assert_eq!(Severity::Error.to_string(), "Error");

src/core/composable_pipeline.rs

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -211,12 +211,21 @@ impl PipelineStage for DeduplicateStage {
211211
}
212212

213213
fn execute(&self, ctx: &mut PipelineContext) -> Result<()> {
214+
use crate::core::comment::Severity;
215+
let severity_rank = |s: &Severity| match s {
216+
Severity::Error => 0,
217+
Severity::Warning => 1,
218+
Severity::Info => 2,
219+
Severity::Suggestion => 3,
220+
};
214221
ctx.comments.sort_by(|a, b| {
215222
a.file_path
216223
.cmp(&b.file_path)
217224
.then(a.line_number.cmp(&b.line_number))
218225
.then(a.content.cmp(&b.content))
226+
.then(severity_rank(&a.severity).cmp(&severity_rank(&b.severity)))
219227
});
228+
// dedup_by keeps b (the earlier element); our sort puts highest severity first
220229
ctx.comments.dedup_by(|a, b| {
221230
a.file_path == b.file_path && a.line_number == b.line_number && a.content == b.content
222231
});
@@ -688,6 +697,28 @@ mod tests {
688697
assert_eq!(ctx.comments.len(), 2);
689698
}
690699

700+
// Regression: DeduplicateStage must keep the highest severity when deduplicating
701+
#[test]
702+
fn test_deduplicate_preserves_highest_severity() {
703+
let mut ctx = PipelineContext::new();
704+
let mut info = make_comment("test.rs", 10, "duplicate issue", 0.7);
705+
info.severity = Severity::Info;
706+
let mut error = make_comment("test.rs", 10, "duplicate issue", 0.7);
707+
error.severity = Severity::Error;
708+
// Insert lower severity first to ensure sort fixes order
709+
ctx.comments.push(info);
710+
ctx.comments.push(error);
711+
712+
let stage = DeduplicateStage;
713+
stage.execute(&mut ctx).unwrap();
714+
assert_eq!(ctx.comments.len(), 1);
715+
assert_eq!(
716+
ctx.comments[0].severity,
717+
Severity::Error,
718+
"DeduplicateStage should keep the highest severity"
719+
);
720+
}
721+
691722
// Regression: SortBySeverityStage must sort Error > Warning > Info
692723
#[test]
693724
fn test_sort_by_severity_order() {

src/core/context.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ pub struct LLMContextChunk {
1515
pub line_range: Option<(usize, usize)>,
1616
}
1717

18-
#[derive(Debug, Clone, Serialize, Deserialize)]
18+
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
1919
pub enum ContextType {
2020
FileContent,
2121
Definition,

src/core/diff_parser.rs

Lines changed: 71 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -181,8 +181,8 @@ impl DiffParser {
181181
new_content: Some(new_content.to_string()),
182182
hunks,
183183
is_binary: false,
184-
is_deleted: false,
185-
is_new: false,
184+
is_deleted: new_content.is_empty() && !old_content.is_empty(),
185+
is_new: old_content.is_empty() && !new_content.is_empty(),
186186
})
187187
}
188188

@@ -368,16 +368,17 @@ impl DiffParser {
368368
*i += 1;
369369
continue;
370370
}
371-
if line.is_empty() {
372-
*i += 1;
373-
continue;
374-
}
375-
376-
let (change_type, content) = match line.chars().next() {
377-
Some('+') => (ChangeType::Added, &line[1..]),
378-
Some('-') => (ChangeType::Removed, &line[1..]),
379-
Some(' ') => (ChangeType::Context, &line[1..]),
380-
_ => (ChangeType::Context, line),
371+
let (change_type, content) = if line.is_empty() {
372+
// Some diff tools emit truly empty lines for empty context lines
373+
// (omitting the leading space). Treat as context to keep line numbers in sync.
374+
(ChangeType::Context, "")
375+
} else {
376+
match line.chars().next() {
377+
Some('+') => (ChangeType::Added, &line[1..]),
378+
Some('-') => (ChangeType::Removed, &line[1..]),
379+
Some(' ') => (ChangeType::Context, &line[1..]),
380+
_ => (ChangeType::Context, line),
381+
}
381382
};
382383

383384
let diff_line = match change_type {
@@ -540,4 +541,62 @@ index 0000000..f735c20\n\
540541
assert!(diffs[0].is_new);
541542
assert!(!diffs[0].is_deleted);
542543
}
544+
545+
#[test]
546+
fn test_parse_hunk_empty_lines_not_skipped() {
547+
// Regression: empty lines in diff body were previously skipped.
548+
// An empty line (no leading space) in some diff tools represents an empty
549+
// context line. The parser should treat it as context, not skip it.
550+
let diff_text = "\
551+
diff --git a/test.txt b/test.txt\n\
552+
index abc..def 100644\n\
553+
--- a/test.txt\n\
554+
+++ b/test.txt\n\
555+
@@ -1,4 +1,4 @@\n\
556+
line1\n\
557+
\n\
558+
-old_line3\n\
559+
+new_line3\n";
560+
// The empty line (between "line1" and "-old_line3") should be treated as
561+
// context line 2. Without it, line numbers after the empty line are wrong.
562+
let diffs = DiffParser::parse_unified_diff(diff_text).unwrap();
563+
assert_eq!(diffs.len(), 1);
564+
let hunk = &diffs[0].hunks[0];
565+
566+
// Find the removed line — it should be on line 3, not line 2
567+
let removed = hunk
568+
.changes
569+
.iter()
570+
.find(|c| c.change_type == ChangeType::Removed)
571+
.expect("Should have a removed line");
572+
assert_eq!(
573+
removed.old_line_no,
574+
Some(3),
575+
"Removed line should be on old line 3 (after the empty context line 2), got {:?}",
576+
removed.old_line_no
577+
);
578+
}
579+
580+
#[test]
581+
fn test_parse_text_diff_sets_is_new_for_new_file() {
582+
// Regression: parse_text_diff must set is_new when old_content is empty
583+
let diff =
584+
DiffParser::parse_text_diff("", "new content\n", PathBuf::from("new_file.rs")).unwrap();
585+
assert!(
586+
diff.is_new,
587+
"parse_text_diff with empty old content should set is_new=true"
588+
);
589+
}
590+
591+
#[test]
592+
fn test_parse_text_diff_sets_is_deleted_for_deleted_file() {
593+
// Regression: parse_text_diff must set is_deleted when new_content is empty
594+
let diff =
595+
DiffParser::parse_text_diff("old content\n", "", PathBuf::from("deleted_file.rs"))
596+
.unwrap();
597+
assert!(
598+
diff.is_deleted,
599+
"parse_text_diff with empty new content should set is_deleted=true"
600+
);
601+
}
543602
}

0 commit comments

Comments
 (0)