Skip to content

feat(oq): add GCF as --format gcf output option#216

Merged
TristanSpeakEasy merged 3 commits into
speakeasy-api:mainfrom
blackwell-systems:gcf-format
Jun 22, 2026
Merged

feat(oq): add GCF as --format gcf output option#216
TristanSpeakEasy merged 3 commits into
speakeasy-api:mainfrom
blackwell-systems:gcf-format

Conversation

@blackwell-systems

@blackwell-systems blackwell-systems commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Summary

Add --format gcf as a new output format option for the oq query command, alongside existing table, json, markdown, and toon formats. Also available as an inline pipeline stage: format(gcf).

Why

Your TOON encoder has a silent data corruption bug

The hand-rolled toonValue function in format.go encodes arrays by joining elements with semicolons:

case expr.KindArray:
    return toonEscape(strings.Join(v.Arr, ";"))

If any array element contains a semicolon, the data silently corrupts on decode. A 2-element array becomes 4 elements:

Input:   ["v1;deprecated", "v2;current"]  (2 elements)
Encoded: v1;deprecated;v2;current
Decoded: ["v1", "deprecated", "v2", "current"]  (4 elements)

OpenAPI schemas have array fields (scopes, tags, enum values) where this can occur. GCF handles arrays natively with proper quoting, no lossy joins.

LLM comprehension

When oq output is consumed by LLMs (agent workflows, AI-assisted API exploration), format comprehension accuracy matters:

  • GCF: 100% on general structured data across every frontier model (Claude, GPT-5.5, Gemini)
  • GCF: 90.7% on adversarial/complex payloads (500 symbols, 200 edges)
  • TOON: 68.5% on the same adversarial data
  • JSON: 53.6%

1,700+ evaluations across 10 models from 3 providers. No model has been trained on GCF.

Full eval data: GCF benchmarks

Data integrity

GCF is verified lossless across 43 billion+ round-trips in 5 formats (JSON, YAML, TOML, CSV, MessagePack) and 6 language implementations. Zero failures.

Unlike the hand-rolled TOON encoder, GCF has a formal spec, a decoder, conformance fixtures, and fuzz testing backing every claimed number.

Full verification data: Lossless verification

Zero dependencies

gcf-go has zero runtime dependencies beyond the Go standard library, same as the rest of the oq package.

Changes

File Change
oq/format.go Add FormatGCF function, import gcf-go
oq/parse.go Accept "gcf" in inline format() stage
oq/oq.go Update FormatHint comment
cmd/openapi/commands/openapi/query.go Add case "gcf", update --format help text
go.mod / go.sum Add github.com/blackwell-systems/gcf-go

Usage

openapi query petstore.yaml "schemas | where(isComponent) | take(5)" --format gcf
openapi query petstore.yaml "schemas | where(isComponent) | take(5) | format(gcf)"

Links


Summary by cubic

Adds GCF as an output format to the oq query command and pipeline for lossless arrays and clearer LLM consumption. Use --format gcf or format(gcf).

  • New Features
    • Add --format gcf and pipeline stage format(gcf) to openapi query.
    • Implement FormatGCF using github.com/blackwell-systems/gcf-go v1.2.2; preserves types and emits native arrays.
    • Update help text, parser validation, and docs to include gcf.
    • Add tests for GCF output (count, groups, explain, special chars, inline stage).

Written for commit b882efa. Summary will update on new commits.

Review in cubic

@blackwell-systems blackwell-systems requested a review from a team as a code owner June 19, 2026 05:18

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 6 files

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

Comment thread go.mod Outdated

@TristanSpeakEasy TristanSpeakEasy left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes. I don’t think this is ready to merge as-is.

Main blockers:

  • The Go version bump breaks the workspace locally: go test ./oq and go list -m all fail with module . listed in go.work file requires go >= 1.26.1, but go.work lists go 1.26.0. Please drop the bump unless it is truly required; if gcf-go requires it, we should either refactor/choose a dependency that supports the repo’s existing Go version or make a deliberate workspace-wide toolchain change separately.
  • This adds a new third-party dependency for an optional output format. I audited the downloaded module and didn’t see obvious runtime network/process/unsafe behavior, and go list -deps only showed the package itself as a non-stdlib runtime import, but this repo is intentionally dependency-light and the existing table/json/markdown/toon formatters do not add format-specific third-party deps. The cost/benefit needs maintainer buy-in, especially since the dependency is new and from the PR author’s org.
  • No tests were added for FormatGCF, format(gcf), CLI wiring, count/empty output, or array/special-character behavior. Existing formats have coverage in oq/oq_test.go; this should match that bar.
  • The TOON semicolon issue cited in the PR body is not fixed by this PR. If TOON array elements containing ; are ambiguous, we should add a regression test and fix TOON directly rather than use it only as rationale for a new format.
  • CLI docs/help are incomplete: the long command help and cmd/openapi/commands/openapi/README.md still only list table, json, markdown, and toon.

Validation run:

  • go test ./oq — failed before compile due to the go.work / go.mod version mismatch.
  • go list -m all — same failure.
  • GOWORK=off go test ./oq — passed, after downloading github.com/blackwell-systems/gcf-go v1.2.1.
  • GOWORK=off go list -deps -f '{{if not .Standard}}{{.ImportPath}}{{end}}' github.com/blackwell-systems/gcf-go — only returned github.com/blackwell-systems/gcf-go.

Comment thread go.mod Outdated
Comment thread go.mod Outdated
Comment thread oq/format.go Outdated
@blackwell-systems

Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review. Addressed everything:

  • Go version: reverted to 1.25.0.
  • Tests: added 7 tests matching existing coverage.
  • Code comment: stripped benchmark numbers, kept factual.
  • Docs: updated both READMEs and CLI help text.
  • Dependency: marked as direct.

On the TOON semicolon issue: after investigating, a proper fix would require changing the output contract of the TOON format (the ; delimiter is ambiguous when elements contain ;). Leaving this to your discretion.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 7 files (changes from recent commits).

Tip: Review your code locally with the cubic CLI to iterate faster.

Fix all with cubic | Re-trigger cubic

Comment thread oq/format.go Outdated
Comment thread oq/oq_test.go Outdated
@TristanSpeakEasy

Copy link
Copy Markdown
Member

Dependency audit follow-up for github.com/blackwell-systems/gcf-go@v1.2.2, specifically checking whether OpenAPI data could be sent to a third party or processed remotely:

  • Verified the PR only imports the root package as github.com/blackwell-systems/gcf-go and calls gcf.EncodeGeneric(...) from oq.FormatGCF.
  • Audited the v1.2.2 tag in a temporary clone at commit 009f87b947ef9c0d9f8a1efb20554e2adb887d21.
  • The imported root package direct imports are local-only stdlib packages: crypto/sha256, encoding/json, fmt, io, math, reflect, regexp, sort, strconv, strings, sync, unicode, unicode/utf8.
  • The EncodeGeneric path builds output in memory with strings.Builder and returns a string. It does not accept a network/file/process sink and does not perform HTTP, DNS/socket, subprocess, filesystem, or environment access.
  • Repo-wide network/API/process references exist in eval/*_test.go benchmark/evaluation code, not in the imported runtime package. The only other compiled package is cmd/gcf, which imports os for CLI stdin/stdout/file handling and is not imported by this PR.
  • go.mod still lists gopkg.in/yaml.v3 as an indirect dependency in gcf-go, but rg 'gopkg.in/yaml|yaml\.' found no source usage at v1.2.2.
  • go test ./... in the dependency passed in the temporary clone.

Conclusion: I do not see a code path in the dependency used by this PR that can send OpenAPI/query data to a third party; the imported encoder appears to process locally in memory only.

@TristanSpeakEasy TristanSpeakEasy left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up after re-reviewing the latest branch, validating the current cubic findings, and auditing github.com/blackwell-systems/gcf-go@v1.2.2:

I think this is close and I’m no longer concerned about the dependency sending OpenAPI/query data to a third party. The imported gcf.EncodeGeneric(...) path processes in memory and returns a string; I did not find HTTP/DNS/socket, subprocess, filesystem, environment, or callback-sink behavior in the imported runtime package.

Recommendation before approval:

  1. Please rebase/resolve the current conflict with main. The TOON finding is valid against this PR branch’s raw JSON fallback, but main already has the safer semicolon-array fix from #217 (toonArrayValue/toonQuote). After rebasing, this PR should keep the main implementation rather than reintroducing the JSON fallback.
  2. Please address the remaining GCF test cleanup: TestFormatGCF_SpecialChars currently does not exercise special characters despite its name. Either rename it to reflect what it checks or add a real special-character case.

Targeted validation run locally:

  • mise test -run 'TestExecute_ToonEscape_SpecialChars|TestFormatToon_SpecialChars|TestFormatGCF_SpecialChars|TestFormatGCF_Success|TestFormatGCF_Count_Success|TestFormatGCF_Groups_Success|TestFormatGCF_Empty_Success|TestFormatGCF_Explain|TestFormatGCF_InlinePipeline' ./oq — passed.

Once the branch is rebased and the misleading GCF test is cleaned up, I expect this should be ready to approve.

@blackwell-systems

Copy link
Copy Markdown
Contributor Author

Rebased on main (picks up #217's toonArrayValue fix). Also renamed TestFormatGCF_SpecialChars to TestFormatGCF_BoolAndIntFields and added a real special-characters test using hash/location fields.

@TristanSpeakEasy TristanSpeakEasy left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final re-review looks good to me.

The previous blockers have been addressed:

  • PR is now mergeable against main.
  • Go directive is back to the repo-supported version.
  • gcf-go is a direct dependency at v1.2.2.
  • The comment is neutral/factual.
  • GCF docs/help/tests were added.
  • The prior TOON conflict is resolved by keeping the main semicolon-array implementation.
  • The GCF special-character test feedback is addressed with a real hash/location case.

Dependency audit note still stands: I do not see a path in the imported gcf.EncodeGeneric(...) runtime code that sends OpenAPI/query data to a third party; it processes in memory and returns a string.

Validation:

  • mise test -run 'TestFormatGCF|TestExecute_Format_Success|TestParse_NewSyntax_Success|TestExecute_ToonEscape_SpecialChars|TestFormatToon_SpecialChars' ./oq — passed.
  • PR checks currently report no failing or pending checks.

@TristanSpeakEasy TristanSpeakEasy enabled auto-merge (squash) June 22, 2026 00:12
@TristanSpeakEasy TristanSpeakEasy merged commit 3256ef6 into speakeasy-api:main Jun 22, 2026
12 checks passed
@codecov

codecov Bot commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.54839% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
oq/parse.go 0.00% 1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants