feat: add JSON output format via --format flag#4
Conversation
Add `--format json` flag to output structured JSON instead of plain text. The JSON output includes: - `summary`: source path, total files/size, patterns, max file size - `tree`: full directory tree as nested objects with name, path, type, size - `files`: flat array of all processed files with path, size, type, and content (when available) - `git_info`: repository metadata when processing a Git URL This enables programmatic consumption of pathdigest output by tools, CI pipelines, and LLM integrations that need structured data. Usage: pathdigest ./my-project --format json pathdigest ./my-project -f json -o digest.json The default format remains "text" for backward compatibility. Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
Adds a machine-readable JSON output mode to pathdigest so external tools (CI, editor integrations, MCP/LLM tooling) can consume digests programmatically.
Changes:
- Introduces JSON schema/types and
(*Result).FormatJSON()to serialize summary/tree/files (+ optional git info). - Adds
--format/-fflag (defaulttext) and routes CLI output to either JSON or existing text formatting.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| internal/digest/json.go | Adds JSON output structs + traversal/serialization helpers to emit summary/tree/files/git_info. |
| cmd/root.go | Adds --format flag and switches CLI output logic between text and JSON. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| TotalSizeHuman string `json:"total_size_human"` | ||
| ExcludePatterns []string `json:"exclude_patterns,omitempty"` | ||
| IncludePatterns []string `json:"include_patterns,omitempty"` | ||
| MaxFileSize int64 `json:"max_file_size,omitempty"` |
There was a problem hiding this comment.
include_patterns / exclude_patterns are tagged with omitempty, so the fields disappear when empty rather than being encoded as empty arrays. The PR description's schema shows these as arrays (possibly empty), so consider removing omitempty to keep the JSON schema stable for consumers.
| if node.Content != "" { | ||
| f.Content = node.Content | ||
| } | ||
| *files = append(*files, f) |
There was a problem hiding this comment.
For NodeTypeFile, content is only set when node.Content != "", and the field is also tagged omitempty. This means empty files (and read failures that leave Content empty) produce file entries without a content field, making it ambiguous for JSON consumers; consider always emitting content for processed text files (or using a *string / explicit error indicator).
| func (r *Result) FormatJSON(opts IngestionOptions) ([]byte, error) { | ||
| output := JSONOutput{ | ||
| Summary: JSONSummary{ | ||
| Source: opts.Source, | ||
| TotalFiles: r.TotalFiles, | ||
| TotalSize: r.TotalSize, | ||
| TotalSizeHuman: formatBytes(r.TotalSize), |
There was a problem hiding this comment.
FormatJSON introduces a new output contract but there are no unit tests covering the JSON structure/fields. Since internal/digest already has tests, add tests that marshal/unmarshal the output and assert key fields (summary counts/sizes, tree paths, files content omission rules, git_info presence) to prevent breaking changes.
| fmt.Fprintf(os.Stderr, "Error formatting JSON output: %v\n", errJSON) | ||
| os.Exit(1) | ||
| } | ||
| outputContent = string(jsonBytes) |
There was a problem hiding this comment.
outputContent = string(jsonBytes) forces an extra copy of the entire JSON payload, and the file-write path later converts it back to []byte. For large outputs, keep JSON as []byte and write it directly to stdout/file to avoid these allocations.
| fmt.Fprintln(os.Stderr, "\n--- Summary ---") | ||
| fmt.Fprint(os.Stderr, ingestResult.Summary) | ||
| if outputFormat != "json" { | ||
| ingestResult.FormatOutput(opts) |
There was a problem hiding this comment.
FormatOutput is invoked twice in the text path (once to build outputContent, and again before printing the summary). This re-traverses the tree and rebuilds large strings unnecessarily; call it once and reuse ingestResult.Summary/TreeStructure/FileContents.
| ingestResult.FormatOutput(opts) |
| ingestResult.FormatOutput(opts) | ||
| outputContent = ingestResult.TreeStructure + "\n" + ingestResult.FileContents | ||
| } |
There was a problem hiding this comment.
In text mode outputContent = ingestResult.TreeStructure + "\n" + ingestResult.FileContents creates an additional full copy of the digest in memory. For large repos, prefer writing TreeStructure and FileContents directly to the selected writer (stdout/file) instead of concatenating.
| } else { | ||
| fmt.Println(ingestResult.TreeStructure) | ||
| fmt.Println(ingestResult.FileContents) | ||
| fmt.Println(outputContent) |
There was a problem hiding this comment.
fmt.Println(outputContent) always appends an extra newline. For --format json this can be undesirable when piping to tools that expect exact JSON without trailing output; use fmt.Print/os.Stdout.Write to emit the JSON bytes as-is.
| fmt.Println(outputContent) | |
| fmt.Print(outputContent) |
| if outputFormat == "json" { | ||
| jsonBytes, errJSON := ingestResult.FormatJSON(opts) | ||
| if errJSON != nil { | ||
| fmt.Fprintf(os.Stderr, "Error formatting JSON output: %v\n", errJSON) | ||
| os.Exit(1) | ||
| } | ||
| outputContent = string(jsonBytes) | ||
| } else { |
There was a problem hiding this comment.
--format values other than "json" silently fall back to text output. Consider validating the flag (accept only "text"/"json", optionally case-insensitive) and returning a clear error for unsupported values to avoid surprising CLI behavior.
- Validate --format flag (reject unsupported values like 'yaml') - Remove duplicate FormatOutput call in text path - Write JSON bytes directly to file/stdout (avoid string round-trip) - Use fmt.Print for JSON stdout (no trailing newline from Println) - Remove omitempty from exclude/include_patterns (stable JSON schema) - Always include content field for files (eliminate empty file ambiguity) - Extract writeOutputFile helper to reduce duplication - Add comprehensive tests for FormatJSON (6 test cases)
Summary
Add structured JSON output format via
--format json(-f json) flag, enabling programmatic consumption of pathdigest output by tools, CI pipelines, and LLM integrations.Motivation
The README mentions JSON output as "in progress". This PR implements it. Structured JSON output is essential for:
Usage
JSON Schema
{ "summary": { "source": "/path/to/project", "total_files": 42, "total_size": 123456, "total_size_human": "120.6 KB", "exclude_patterns": ["node_modules/", ".git/"], "include_patterns": [], "max_file_size": 10485760 }, "tree": [ { "name": "src", "path": "src", "type": "directory", "size": 4096, "children": [...] } ], "files": [ { "path": "src/main.go", "size": 1234, "type": "file", "content": "package main\n..." } ], "git_info": { "repo_url": "https://github.com/user/repo.git", "branch": "main" } }Changes
internal/digest/json.go: New file with JSON types andFormatJSONmethodcmd/root.go: Add--format/-fflag (default:text), route to JSON or text formattingTest plan
go build ./cmd/pathdigestcompiles successfullypathdigest . -f json -o -produces valid JSONpathdigest . -f textmaintains backward-compatible text outputgit_infoin JSON output