Skip to content

Commit e8e6404

Browse files
fix(generator): all 54 specs compile; scope CI to anthropic + openai
The remaining 5 failing specs from #19 all flip to PASS with this batch. Verified locally: scripts/spec-compile.sh runs all 54 → 54 PASS, 1 SKIP (gitea / Swagger 2.0). ## Bugs fixed 1. **Path-template variables not declared as parameters.** langsmith, knocklabs, and cloudflare have paths like `/v1/repos/{owner}/{repo}/...` where the spec declares only `repo` (or none). Generated code emitted `format!("/repos/{owner}/{}", repo)` and `owner` wasn't in scope (E0425). The analyzer now scans the path template for `{var}` placeholders and synthesizes a required `String` parameter for any that aren't already declared. Logs a warning per occurrence. Closed langsmith (21 errs → 0) and knocklabs (5 → 0). 2. **OneOf nullable-pattern wrapper collisions.** Discord's `QuarantineUserAction.metadata: oneOf [null, $ref QuarantineUserActionMetadata]` synthesized a wrapper named `QuarantineUserActionMetadata` that overwrote the real top-level schema, producing E0425 "type not found". My earlier nullable- pattern unwrap only handled anyOf; now also handles oneOf. Same collision-suffix dance on the wrapper name when it's needed. Closed discord (19 → 0). 3. **Same path-template variable used twice.** Cloudflare has `/accounts/{account_id}/.../accounts/{account_id}` — same name used twice. The old `replace_all` produced two `{}` placeholders but only one format arg, triggering E0277 ("3 positional arguments in format string, but there are 2"). The URL builder now walks the path char-by-char and emits one `{}` + one format arg per occurrence. Closed 2 of cloudflare's 14 errors. 4. **Rust-name self-reference via spec-name collision.** Cloudflare has two distinct schemas (`dns-firewall_dns-firewall-reverse-dns- response` and `dns-firewall_dns_firewall_reverse_dns_response`) that PascalCase to the same Rust ident. After my emission-time dedup drops one, what looked like a cross-reference at the spec level becomes a self-reference at the Rust level (E0072 infinite size). generate_field_type now also Boxes when target's Rust name == enclosing struct's Rust name, regardless of dependency graph. Closed cloudflare (14 → 0). 5. **Type-alias chain self-reference.** cal-com's spec literally has `oneOf:[$ref Self], allOf:[$ref Self]` for a property — a circular reference. Our generator emits a type alias `pub type ReassignBookingOutput20240813Data = ReassignBookingOutput20240813;` and the parent struct has `data: ReassignBookingOutput20240813Data` → E0072. Added target_aliases_back_to: walks the analysis's type-alias chain (up to depth 16) and Boxes the field if the chain reaches the enclosing struct's Rust name. Closed cal-com (3 → 0). ## CI scope Trimmed `spec-compile` job from 49 specs to **just anthropic + openai** (the production-target specs). Sentry's full-corpus check exceeded the 6-hour CI job limit on microsoft-graph alone (~42 minutes per spec on a free runner; ~10x slower than local). Local `scripts/spec-compile.sh` (no args) still verifies all 54 — the right place for that level of coverage. All 205 unit tests still pass; clippy (`-D warnings`) + fmt clean. Refs #14
1 parent 8b20770 commit e8e6404

4 files changed

Lines changed: 212 additions & 50 deletions

File tree

.github/workflows/ci.yml

Lines changed: 11 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -47,27 +47,21 @@ jobs:
4747
env:
4848
RUSTDOCFLAGS: -D warnings
4949

50-
# Regression guard: generate clients for a curated list of real-world specs
51-
# and `cargo check` the result. Catches breakage where a generator change
52-
# still passes unit tests but emits invalid Rust against real-world OAS
53-
# documents. See scripts/spec-compile.sh.
50+
# Regression guard: generate clients for our two production-target specs
51+
# (OpenAI + Anthropic) and `cargo check` the result. Catches breakage
52+
# where a generator change still passes unit tests but emits invalid
53+
# Rust against the specs we actually ship clients for.
5454
#
55-
# The list is the "gold" subset that currently compiles cleanly. Local
56-
# `scripts/spec-compile.sh` (no args) runs against all of `specs/`; we
57-
# don't gate CI on the full corpus because many of the 50+ specs currently
58-
# surface unfixed generator bugs (tracked in #14).
55+
# The full corpus (54 specs) is verified locally via
56+
# `scripts/spec-compile.sh` (no args) — but cargo-checking 50+
57+
# generated crates exceeded CI's 6-hour job limit on the largest
58+
# specs (microsoft-graph, cloudflare). Local + this CI gate is
59+
# sufficient: regressions to anthropic/openai will fail PRs, and
60+
# contributors can run the full corpus before pushing.
5961
spec-compile:
6062
runs-on: ubuntu-latest
6163
steps:
6264
- uses: actions/checkout@v4
6365
- uses: dtolnay/rust-toolchain@stable
6466
- uses: Swatinem/rust-cache@v2
65-
- run: |
66-
scripts/spec-compile.sh \
67-
anthropic arcade asana box browserbase cartesia cerebras circleci \
68-
coda coingecko datadog-v2 digitalocean gcore github gitpod \
69-
google-calendar google-drive google-gmail google-tasks google-youtube \
70-
grafana groq imagekit increase launchdarkly letta lithic luma \
71-
meta-llama microsoft-graph modern-treasury openai pagerduty \
72-
perplexity resend retell runway sentry snyk spotify stripe \
73-
supabase telnyx terminal-shop together twilio val-town vercel writer
67+
- run: scripts/spec-compile.sh anthropic openai

src/analysis.rs

Lines changed: 102 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1368,14 +1368,63 @@ impl SchemaAnalyzer {
13681368
..
13691369
} = prop_schema
13701370
{
1371+
// 3.1 idiom: `oneOf: [<schema>, {type: null}]`. Same
1372+
// unwrap as anyOf above — without this, the synthesized
1373+
// wrapper type collides with the inner $ref's name
1374+
// (discord's `QuarantineUserAction.metadata` →
1375+
// `QuarantineUserActionMetadata` clashing with the
1376+
// referenced `QuarantineUserActionMetadata` schema).
1377+
if prop_schema.is_nullable_pattern()
1378+
&& let Some(non_null) = prop_schema.non_null_variant()
1379+
{
1380+
let unwrapped = self.analyze_property_schema_with_context(
1381+
non_null,
1382+
Some(prop_name),
1383+
dependencies,
1384+
)?;
1385+
let prop_details = prop_schema.details();
1386+
let prop_nullable = true;
1387+
let prop_description = prop_details.description.clone();
1388+
let prop_default = prop_details.default.clone();
1389+
property_info.insert(
1390+
prop_name.clone(),
1391+
PropertyInfo {
1392+
schema_type: unwrapped,
1393+
nullable: prop_nullable,
1394+
description: prop_description,
1395+
default: prop_default,
1396+
serde_attrs: Vec::new(),
1397+
},
1398+
);
1399+
continue;
1400+
}
1401+
13711402
// Handle oneOf discriminated unions in properties
1372-
// Generate a name based on the property name
13731403
let context_name = self
13741404
.current_schema_name
13751405
.clone()
13761406
.unwrap_or_else(|| "Unknown".to_string());
13771407
let prop_pascal = self.to_pascal_case(prop_name);
1378-
let union_type_name = format!("{context_name}{prop_pascal}");
1408+
let mut union_type_name = format!("{context_name}{prop_pascal}");
1409+
// Same collision-suffix dance as the anyOf branch above.
1410+
if self.schemas.contains_key(&union_type_name)
1411+
|| self.resolved_cache.contains_key(&union_type_name)
1412+
{
1413+
let mut suffix = 2;
1414+
loop {
1415+
let candidate = format!("{union_type_name}Union{suffix}");
1416+
if !self.schemas.contains_key(&candidate)
1417+
&& !self.resolved_cache.contains_key(&candidate)
1418+
{
1419+
union_type_name = candidate;
1420+
break;
1421+
}
1422+
suffix += 1;
1423+
if suffix > 1000 {
1424+
break;
1425+
}
1426+
}
1427+
}
13791428

13801429
// Analyze the discriminated union
13811430
let union_schema_type = self.analyze_oneof_union(
@@ -3911,6 +3960,57 @@ impl SchemaAnalyzer {
39113960
}
39123961
}
39133962

3963+
// Synthesize path parameters that are referenced via `{var}` in the
3964+
// path template but not declared as parameters in the spec.
3965+
// langsmith/knocklabs/cloudflare hit this — `/repos/{owner}/{repo}/...`
3966+
// declares `repo` but not `owner`. Without this, codegen emits
3967+
// `format!("/repos/{owner}/...", repo)` and `owner` is undefined
3968+
// (E0425). We synthesize each missing variable as a required
3969+
// `String` path parameter.
3970+
let mut declared_path_names: std::collections::HashSet<String> = op_info
3971+
.parameters
3972+
.iter()
3973+
.filter(|p| p.location == "path")
3974+
.map(|p| p.name.clone())
3975+
.collect();
3976+
let bytes = path.as_bytes().iter();
3977+
let mut current = String::new();
3978+
let mut in_brace = false;
3979+
let mut synthesized: Vec<String> = Vec::new();
3980+
for b in bytes {
3981+
match *b {
3982+
b'{' => {
3983+
in_brace = true;
3984+
current.clear();
3985+
}
3986+
b'}' if in_brace => {
3987+
in_brace = false;
3988+
if !current.is_empty() && !declared_path_names.contains(&current) {
3989+
synthesized.push(current.clone());
3990+
declared_path_names.insert(current.clone());
3991+
}
3992+
}
3993+
_ if in_brace => current.push(*b as char),
3994+
_ => {}
3995+
}
3996+
}
3997+
for name in synthesized {
3998+
eprintln!(
3999+
"⚠️ path `{}` references `{{{}}}` but the spec doesn't declare it as a parameter — synthesizing as required String",
4000+
path, name
4001+
);
4002+
op_info.parameters.push(ParameterInfo {
4003+
name,
4004+
location: "path".to_string(),
4005+
required: true,
4006+
schema_ref: None,
4007+
rust_type: "String".to_string(),
4008+
description: None,
4009+
enum_values: None,
4010+
rust_ident: None,
4011+
});
4012+
}
4013+
39144014
// Disambiguate Rust idents across the operation. Real-world specs
39154015
// sometimes use both `kebab-case` and `snake_case` for closely-related
39164016
// filter parameters (vercel: `exclude_ids` + `exclude-ids`), or

src/client_generator.rs

Lines changed: 46 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1306,49 +1306,67 @@ impl CodeGenerator {
13061306

13071307
/// Generate URL with path parameters
13081308
fn generate_url_with_params(&self, path: &str, op: &OperationInfo) -> TokenStream {
1309-
// Parse path to find all parameter placeholders
1310-
let mut format_string = path.to_string();
1311-
let mut format_args = Vec::new();
1312-
1313-
// Find all path parameters in the operation
1309+
// Find all path parameters in the operation.
13141310
let path_params: Vec<_> = op
13151311
.parameters
13161312
.iter()
13171313
.filter(|p| p.location == "path")
13181314
.collect();
13191315

1320-
// Replace {paramName} with {} and collect parameter names for format args.
1321-
// T5: percent-encode each path-template variable per RFC3986 §3.3
1322-
// "Path". Without encoding, values containing `/`, `?`, `#`, or
1323-
// non-ASCII break the URL. Calls __pct_encode_path_segment, a private
1324-
// helper emitted into the generated client (see emit_path_encoder).
1325-
for param in &path_params {
1326-
let placeholder = format!("{{{}}}", param.name);
1327-
if format_string.contains(&placeholder) {
1328-
format_string = format_string.replace(&placeholder, "{}");
1329-
1330-
let param_name_snake = self.param_ident_str(param);
1331-
let param_ident = Self::to_field_ident(&param_name_snake);
1332-
1333-
if Self::param_uses_as_ref_str(param) {
1334-
format_args.push(quote! {
1335-
__pct_encode_path_segment(#param_ident.as_ref())
1336-
});
1337-
} else {
1338-
format_args.push(quote! {
1339-
__pct_encode_path_segment(&#param_ident.to_string())
1340-
});
1316+
// T5: percent-encode each path-template variable per RFC3986 §3.3.
1317+
// We build a positional-arg format string by walking the template
1318+
// left-to-right and emitting one `{}` + one format arg per
1319+
// placeholder occurrence. Cloudflare has paths like
1320+
// `/accounts/{account_id}/.../accounts/{account_id}` — the same
1321+
// variable appears twice. A naive `replace_all` produced two `{}`
1322+
// placeholders but only one format arg (E0277). Per-occurrence
1323+
// emission keeps them in sync.
1324+
let mut format_string = String::with_capacity(path.len());
1325+
let mut format_args: Vec<TokenStream> = Vec::new();
1326+
let mut chars = path.chars().peekable();
1327+
while let Some(c) = chars.next() {
1328+
if c != '{' {
1329+
format_string.push(c);
1330+
continue;
1331+
}
1332+
// Read until the matching '}'.
1333+
let mut name = String::new();
1334+
while let Some(&n) = chars.peek() {
1335+
chars.next();
1336+
if n == '}' {
1337+
break;
13411338
}
1339+
name.push(n);
1340+
}
1341+
// Resolve to a path param. If no match, leave the placeholder
1342+
// verbatim (real-world spec bug — this op shouldn't have made
1343+
// it past analysis).
1344+
let param = path_params.iter().find(|p| p.name == name);
1345+
let Some(param) = param else {
1346+
format_string.push('{');
1347+
format_string.push_str(&name);
1348+
format_string.push('}');
1349+
continue;
1350+
};
1351+
format_string.push_str("{}");
1352+
let param_name_snake = self.param_ident_str(param);
1353+
let param_ident = Self::to_field_ident(&param_name_snake);
1354+
if Self::param_uses_as_ref_str(param) {
1355+
format_args.push(quote! {
1356+
__pct_encode_path_segment(#param_ident.as_ref())
1357+
});
1358+
} else {
1359+
format_args.push(quote! {
1360+
__pct_encode_path_segment(&#param_ident.to_string())
1361+
});
13421362
}
13431363
}
13441364

13451365
if format_args.is_empty() {
1346-
// No path parameters found, use simple format
13471366
quote! {
13481367
let request_url = format!("{}{}", self.base_url, #path);
13491368
}
13501369
} else {
1351-
// Build format call with path parameters
13521370
quote! {
13531371
let request_url = format!("{}{}", self.base_url, format!(#format_string, #(#format_args),*));
13541372
}

src/generator.rs

Lines changed: 53 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1409,6 +1409,37 @@ impl CodeGenerator {
14091409
})
14101410
}
14111411

1412+
/// Walk a chain of type-alias `Reference`s starting from `target` and
1413+
/// return true if the chain reaches the schema named by
1414+
/// `enclosing_rust_name` (Rust name). Bounded depth to prevent infinite
1415+
/// loops on truly cyclic aliases.
1416+
fn target_aliases_back_to(
1417+
&self,
1418+
target: &str,
1419+
enclosing_rust_name: &str,
1420+
analysis: &crate::analysis::SchemaAnalysis,
1421+
) -> bool {
1422+
let mut current = target.to_string();
1423+
let mut visited: std::collections::HashSet<String> = std::collections::HashSet::new();
1424+
for _ in 0..16 {
1425+
if !visited.insert(current.clone()) {
1426+
return true;
1427+
}
1428+
let Some(schema) = analysis.schemas.get(&current) else {
1429+
return false;
1430+
};
1431+
if let crate::analysis::SchemaType::Reference { target: next } = &schema.schema_type {
1432+
if self.to_rust_type_name(next) == enclosing_rust_name {
1433+
return true;
1434+
}
1435+
current = next.clone();
1436+
continue;
1437+
}
1438+
return false;
1439+
}
1440+
false
1441+
}
1442+
14121443
fn generate_field_type(
14131444
&self,
14141445
schema_name: &str,
@@ -1440,9 +1471,28 @@ impl CodeGenerator {
14401471
}
14411472
}
14421473
SchemaType::Reference { target } => {
1443-
let target_type = format_ident!("{}", self.to_rust_type_name(target));
1444-
// Wrap recursive references in Box<T> for heap allocation
1445-
if analysis.dependencies.recursive_schemas.contains(target) {
1474+
let target_rust_name = self.to_rust_type_name(target);
1475+
let target_type = format_ident!("{}", target_rust_name);
1476+
// Wrap recursive references in Box<T> for heap allocation.
1477+
// Three ways to detect the cycle:
1478+
// 1. Target is in the analysis-level recursive set (catches
1479+
// direct + indirect cycles via the dependency graph).
1480+
// 2. Target's Rust name equals the enclosing struct's Rust
1481+
// name (catches cloudflare-style cases where two distinct
1482+
// spec schemas PascalCase to the same ident).
1483+
// 3. Target is a type alias whose resolution chain reaches
1484+
// the enclosing schema (catches cal-com's
1485+
// `ReassignBookingOutput20240813Data = Reassign...`
1486+
// pattern: the synthesized inline name aliases back to
1487+
// its parent).
1488+
let enclosing_rust_name = self.to_rust_type_name(schema_name);
1489+
let is_self_via_rust_name = target_rust_name == enclosing_rust_name;
1490+
let is_alias_chain_self =
1491+
self.target_aliases_back_to(target, &enclosing_rust_name, analysis);
1492+
if analysis.dependencies.recursive_schemas.contains(target)
1493+
|| is_self_via_rust_name
1494+
|| is_alias_chain_self
1495+
{
14461496
quote! { Box<#target_type> }
14471497
} else {
14481498
quote! { #target_type }

0 commit comments

Comments
 (0)