feat(#752): Embed LLM traces in sync log entries by niti-go · Pull Request #784 · promptdriven/pdd

niti-go · 2026-04-19T23:58:49Z

Summary

Closes #752

Store LLM trace data in sync log entries so developers can see every prompt, response, and reasoning step when a pdd operation gives unexpected results.

It will be helpful to have a standardized way of storing LLM internal thinking processes so we can reference them when iterating upon pdd agentic LLM prompts.

Before this PR, the only method to view agentic traces was to manually dig into provider-specific folders to search for session files (~/.claude/projects/, ~/.gemini/tmp/). And for non-agentic pdd operations, the LLM input/output pairs were not saved entirely.

Move sync log from .pdd/meta/ to .pdd/logs/ (gitignored). Migrate legacy files automatically on read and write.
llm_traces: Every LiteLLM operation (generate, test, example, fix) now records all prompt/response pairs in an llm_traces list on the log entry — on both success and failure. Each item has prompt, response, model, and thinking (string or null).
agentic_trace: When a CLI agent runs (Claude, Gemini, Codex), the log entry gets an agentic_trace pointer to the provider's session file on disk.
Session discovery: Claude (construct path from session_id), Gemini (glob by first-8-chars of UUID via projects.json slug), Codex (filesystem snapshot-and-diff). Best-effort, never raises, never warns.
Truncation & redaction: Fields capped at 20k chars. Bearer tokens, api_key, token, secret, password scrubbed from all text fields — including JSON-quoted values like "api_key": "sk-...".
Thinking normalization: List-of-objects, int, None all normalized to string-or-null.
Core dump: No longer embeds *_sync.log in file_contents. Adds sync_log_refs array. Reads sync_steps directly from disk.
Old agentic logging removed: _log_agentic_interaction, AGENTIC_LOG_DIR, _AGENTIC_SESSION_ID deleted. .pdd/agentic-logs/ added to .gitignore.
Thread safety: _last_agentic_trace changed from plain global to ContextVar, matching llm_trace.py's pattern.
Stale trace drain: Pre-operation cleanup now drains both LLM pairs and agentic trace to prevent leakage from a prior exception path.
Migration race safety: Fast path uses os.rename (atomic on POSIX) with FileNotFoundError/OSError handling for concurrent access. Append path uses a single r+b file handle to check trailing newline and write in one shot, eliminating TOCTOU.
ContextVar default fix: _pairs_by_operation default changed from {} (shared mutable) to None to prevent cross-context mutation.
pop_last_pair non-destructive: Backward-compat alias now peeks at the last pair without draining the list, so it doesn't destroy traces that pop_all_pairs expects to collect.
Import cleanup: Two mid-file import logging as ... in agentic_common.py consolidated to a single top-level import logging.

Files changed

File	What changed
`pdd/core/llm_trace.py`	List of pairs per op, `thinking` field, `pop_all_pairs`, JSON-quoted redaction pattern, `ContextVar` default `None` instead of `{}`, `pop_last_pair` is now non-destructive peek
`pdd/llm_invoke.py`	Pass `thinking` to `record_llm_pair` at all 3 call sites
`pdd/operation_log.py`	New `.pdd/logs/` path, migration via atomic `os.rename` with race-safe fallback, `r+b` newline guard
`pdd/sync_orchestration.py`	Attach `llm_traces` and `agentic_trace` after each op, pre-op drain of stale agentic trace
`pdd/agentic_common.py`	Session discovery, 4-tuple return, old logging removed, `_last_agentic_trace` as `ContextVar`, top-level `import logging`
`pdd/core/dump.py`	`sync_log_refs`, read steps from disk, stop embedding logs
`.gitignore`	`.pdd/meta/*_sync.log`, `.pdd/logs/`, `.pdd/agentic-logs/`

Tests

`test_llm_traces.py` — 90 mocked tests

No LLM calls. Fast, deterministic, runs in CI.

python -m pytest tests/test_llm_traces.py -v

Section	Tests	What they cover
Log location & migration (6)	`test_new_write_path`, `test_migration_on_write`, `test_migration_on_read`, `test_already_migrated`, `test_only_sync_logs_migrate`, `test_directory_creation_safe_twice`	`get_log_path()` returns `.pdd/logs/`, legacy files move from `.pdd/meta/`, fingerprint and run report stay put
LLM trace structure (7)	`test_generate_gets_llm_traces_list`, `test_crash_retries_collect_all`, `test_trace_item_schema`, `test_traces_on_success_and_failure`, `test_failed_operation_keeps_all_traces`, `test_thinking_populated`, `test_thinking_null_when_absent`	Multiple pairs accumulate per operation, thinking is string or null, traces kept on failure
Agentic trace structure (4)	`test_claude_discovery_returns_correct_schema`, `test_gemini_discovery_returns_json_format`, `test_codex_discovery_returns_correct_schema`, `test_both_llm_traces_and_agentic_trace_on_same_entry`	Each discovery function returns correct `{session_file, provider, format}` shape; sync_orchestration attaches both `llm_traces` and `agentic_trace` to the same entry
Entries without traces (3)	`test_skip_entry_has_no_trace_keys`, `test_error_decision_entry_has_no_trace_keys`, `test_event_logged_via_log_event_has_no_trace_keys`	Runs sync_orchestration with skip/error decisions and `log_event()` directly — verifies logged entries have no `llm_traces` or `agentic_trace`
Truncation & redaction (4)	`test_prompt_truncation`, `test_response_truncation`, `test_thinking_truncation`, `test_secret_redaction`	Fields capped at 20k chars, Bearer/api_key/token/secret/password scrubbed
Session discovery: Claude (9)	`test_finds_session_by_id`, `test_slug_has_leading_dash`, `test_slug_with_spaces`, `test_deep_path`, `test_missing_session_id`, `test_file_not_found`, `test_missing_claude_dir`, `test_empty_session_file`, `test_parallel_safety`	Path construction from session_id, slug format, all failure modes return None
Session discovery: Gemini (12)	`test_happy_path`, `test_falls_back_to_jsonl`, `test_prefers_json_over_jsonl`, `test_multiple_json_picks_newest`, `test_uses_first_8_chars`, `test_slug_with_special_chars`, `test_rejects_subdirectory`, `test_bad_projects_json_schema`, `test_missing_projects_json`, `test_no_session_id`, `test_corrupt_projects_json`, `test_cwd_not_in_projects`, `test_parallel_safety`	Glob by UUID prefix, projects.json lookup, .json/.jsonl fallback, all failure modes
Session discovery: Codex (7)	`test_happy_path_one_new_file`, `test_multiple_new_files`, `test_pre_existing_files_ignored`, `test_respects_codex_home`, `test_missing_codex_dir`, `test_only_rollout_files`, `test_no_new_files`	Snapshot-and-diff of rollout files, CODEX_HOME env var, all failure modes
Session discovery: shared (6)	`test_never_raises_claude`, `test_never_raises_gemini`, `test_never_raises_codex`, `test_no_warnings`, `test_failed_discovery_does_not_block_logging`, `test_run_with_provider_sets_last_agentic_trace`	PermissionError returns None, only DEBUG logs, None discovery still writes entry without `agentic_trace` (via sync_orchestration), `_run_with_provider` stores trace in module state
Core dump (5)	`test_reads_sync_steps_from_logs_dir`, `test_includes_sync_log_refs`, `test_does_not_embed_sync_logs`, `test_sync_steps_excludes_trace_data`, `test_falls_back_to_meta_for_unmigrated`	Steps read from disk not file_contents, sync_log_refs present, trace data excluded from steps
Dry-run (3)	`test_reads_from_new_location`, `test_triggers_migration`, `test_does_not_show_trace_content`	`_display_sync_log` uses `get_log_path()`, doesn't print prompt/response text
Multi-operation sync (3)	`test_each_operation_gets_own_traces`, `test_no_trace_leakage`, `test_mixed_entries`	Operations get only their own traces, `pop_all_pairs` drains correctly
pop_all_pairs semantics (3)	`test_returns_all_recorded`, `test_empty_when_nothing_recorded`, `test_scoped_by_operation`	Returns all then empty, scoped by operation name
Concurrent sync (1)	`test_separate_log_files`	Two modules write to separate logs, no cross-contamination
Old agentic log removal (4)	`test_log_agentic_interaction_removed`, `test_agentic_log_dir_removed`, `test_agentic_session_id_removed`, `test_no_agentic_logs_dir_created`	Deleted symbols are gone from the module
Redaction on all fields (2)	`test_response_redacted`, `test_thinking_redacted`	Secrets scrubbed from response and thinking, not just prompt
Thinking normalization (3)	`test_thinking_list_of_objects`, `test_thinking_unexpected_type`, `test_thinking_none`	List of dicts joined to string, int to str, None to null
JSON-quoted secret redaction (4)	`test_json_api_key_redacted`, `test_json_token_redacted`, `test_json_secret_in_response`, `test_json_secret_in_thinking`	`"api_key": "sk-..."` style JSON values redacted across prompt, response, and thinking fields
Agentic trace thread safety (2)	`test_agentic_trace_is_context_var`, `test_agentic_trace_isolated_across_threads`	`_last_agentic_trace` is a `ContextVar`, traces set in one thread are invisible in another
Append migration newline boundary (1)	`test_append_migration_preserves_newline_boundary`	When both old and new sync logs exist and new file lacks trailing newline, merged result has valid JSONL (no corrupted lines)

`test_sync_orchestration.py` — 4 new tests (1 mocked + 3 real LLM)

The mocked test runs with all other sync orchestration tests. The 3 real LLM tests are gated by PDD_RUN_REAL_LLM_TESTS=1.

# Mocked (runs with the rest of the file):
python -m pytest tests/test_sync_orchestration.py -v -k "attaches_llm_trace"

# Real LLM:
pip install -e .
PDD_RUN_REAL_LLM_TESTS=1 python -m pytest tests/test_sync_orchestration.py -v -k "e2e_litellm or e2e_skip or e2e_legacy"

Test	Type	What it does
`test_sync_orchestration_attaches_llm_trace_on_failed_operation`	Mocked	Mocks `pop_all_pairs` and `code_generator_main`, verifies failed generate entry has `llm_traces`
`test_e2e_litellm_sync_traces`	Real LLM	Runs one `pdd sync`. Checks: log in `.pdd/logs/` not `.pdd/meta/`, generate entry has `llm_traces` with valid prompt/response/model/thinking, example has traces, model field is recognizable, every traced operation has >= 2 items (no orphaned calls), fingerprint intact
`test_e2e_skip_dryrun_coredump`	Real LLM	Syncs twice (second skips), then runs `--dry-run` and `--core-dump`. Checks: skip entries have no traces, dry-run shows operation names. Unconditionally asserts core dump file exists, contains `sync_log_refs`, and all refs point to `.pdd/logs/`
`test_e2e_legacy_sync_log_migrates`	Real LLM	Plants a legacy log in `.pdd/meta/`, syncs. Checks: old file gone, new file in `.pdd/logs/`, old entries preserved

`test_llm_traces_e2e.py` — 3 agentic tests (3 providers x 1 test)

Parametrized over [anthropic, google, openai]. Skips if the provider's CLI isn't installed.

# All providers:
pip install -e .
PDD_RUN_AGENTIC_TESTS=1 python -m pytest tests/test_llm_traces_e2e.py -v

# Single provider:
PDD_RUN_AGENTIC_TESTS=1 python -m pytest tests/test_llm_traces_e2e.py -v -k "google"
PDD_RUN_AGENTIC_TESTS=1 python -m pytest tests/test_llm_traces_e2e.py -v -k "anthropic"
PDD_RUN_AGENTIC_TESTS=1 python -m pytest tests/test_llm_traces_e2e.py -v -k "openai"

Test	What it does
`test_agentic_sync_clean_path`	Runs `pdd sync --agentic` on a fresh project. Asserts: log in `.pdd/logs/`, generate entry has `llm_traces`, any `agentic_trace` entries have valid session file on disk, provider-specific path patterns (Claude `/.claude/projects/`, Gemini `/.gemini/tmp/.../chats/`, Codex `jsonl`)

Test results

Suite	Result
`test_llm_traces.py` (90 mocked)	all passed
`test_sync_orchestration.py` mocked trace test	passed
`test_e2e_litellm_sync_traces`	passed
`test_e2e_skip_dryrun_coredump`	passed
`test_e2e_legacy_sync_log_migrates`	passed
`test_agentic_sync_clean_path[google]`	passed
`test_agentic_*[anthropic]`	skipped (no Claude CLI)
`test_agentic_*[openai]`	skipped (no Codex CLI)

Existing test suites — all green

Suite	Result
`test_operation_log.py`	27 passed
`test_core_dump.py`	28 passed
`test_agentic_common.py`	198 passed
`test_llm_invoke.py`	261 passed

Pre-existing test failures (not related to this PR)

This PR has 10 CI failures, but these tests also fail on upstream main (Unit Tests run 24949525104). These failures are unrelated to this PR, but described below so they can be fixed on main:

Group 1: `test_ci_detect_changed_modules.py` — 5 failures

Root cause: Test loader references scripts/ci_detect_changed_modules.py but the file lives at pdd/ci_detect_changed_modules.py. Wrong path in the test.

Test	Error
`test_basename_from_nested_code_paths`	`FileNotFoundError: scripts/ci_detect_changed_modules.py`
`test_basename_from_nested_context_and_tests`	Same
`test_reverse_dep_detects_recursive_nested_include`	Same
`test_reverse_dep_detects_include_many`	Same
`test_detect_combines_nested_direct_and_reverse_dependencies`	Same

Fix (not this PR): Change "scripts" to "pdd" on line 11 of the test.

Group 2: `test_agentic_e2e_fix_orchestrator.py` — 3 failures

Root cause: Tests assert the orchestrator prompt documents safety features (_detect_meaningful_changes, _is_noise_path, cycle_start_hashes), but the prompt file was never updated to include these tokens.

Test	Missing token
`test_orchestrator_prompt_documents_direct_edit_guard`	`_detect_meaningful_changes`, `has_direct_edits`
`test_orchestrator_prompt_documents_noise_filter`	`_is_noise_path`, `__pycache__`, `.pytest_cache`, `node_modules`
`test_orchestrator_prompt_documents_cycle_waste_breaker`	`cycle_start_hashes`, `current_cycle > 1`

Fix (not this PR): Add the required documentation tokens to prompts/agentic_e2e_fix_orchestrator_python.prompt.

Group 3: `test_issue_1240_generate_prompt_meta_framing.py` — 2 failures

Root cause: Tests expect utils/mcp/prompts/generate_prompt.prompt and utils/mcp/prompts/*_python.prompt files, but the MCP prompts directory was never committed to main.

Test	Error
`test_mcp_generate_prompt_template_is_not_meta_framed`	`FileNotFoundError: utils/mcp/prompts/generate_prompt.prompt`
`test_mcp_checked_in_python_prompts_are_not_meta_framed`	`AssertionError: no *_python.prompt files found`

Fix (not this PR): Commit the MCP prompt files or add skip guards for when the directory is missing.

…n#752) - Move sync log from .pdd/meta/ to .pdd/logs/ (gitignored); migrate legacy files on both read and write inside get_log_path() - Store all LLM prompt/response pairs per operation in llm_traces list (was: only last pair, only on failure) - Add thinking field to LLMTracePair; normalize list-of-objects, int, and None inputs to string-or-null before storing - Rename pop_last_pair to pop_all_pairs (backward-compat alias kept) - Pass thinking from LiteLLM responses to record_llm_pair at all 3 call sites in llm_invoke.py - Truncate prompt/response/thinking at 20k chars; redact Bearer tokens, api_key, token, secret, password from all text fields - Add agentic_trace pointer (session_file, provider, format) to log entries when a CLI agent runs; replaces inlined session content - Add session discovery for Claude (construct path from session_id), Gemini (glob by first-8-chars of UUID via projects.json slug), and Codex (filesystem snapshot-and-diff of rollout files) - Session discovery is best-effort: never raises, never warns, returns None on any failure, logs at DEBUG only - Remove _log_agentic_interaction, AGENTIC_LOG_DIR, _AGENTIC_SESSION_ID from agentic_common.py; add .pdd/agentic-logs/ to .gitignore - Core dump no longer embeds *_sync.log in file_contents; adds sync_log_refs array and reads sync_steps directly from disk - _display_sync_log (--dry-run) now uses get_log_path() instead of hardcoded .pdd/meta/ path - Add .pdd/meta/*_sync.log, .pdd/logs/, .pdd/agentic-logs/ to .gitignore - Add 82 Tier 1 mocked tests (sections A-Q) in test_llm_traces.py - Add Tier 2 E2E tests (sections S-Y) in test_llm_traces_e2e.py, gated by PDD_RUN_REAL_LLM_TESTS and PDD_RUN_AGENTIC_TESTS env vars - Update existing tests for new log path and 4-tuple return type Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The test mocked pop_last_pair but sync_orchestration now calls pop_all_pairs and writes traces to llm_traces (top-level list) instead of details.llm_trace (single pair). Update the mock and assertion to match. Same regression coverage, new API shape. This test was already failing before promptdriven#752 (the drain call on line 2170 consumed the mock before the attach call could use it). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge test_agentic_sync_writes_llm_traces, test_format_matches_provider, test_claude_path_from_session_id, and test_gemini_path_by_prefix into a single test_agentic_sync_clean_path that runs one pdd sync --agentic and checks everything: llm_traces on generate, agentic_trace structure, format/extension match, provider-specific path patterns, and session file exists on disk with content. Cuts agentic clean-path tests from 5 separate syncs to 1 per provider. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

test_test_operation_has_traces failed because sync reused existing tests (cost 0, no LLM calls) instead of generating new ones. Only assert llm_traces on entries that actually called an LLM. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The test operation on this minimal greeter project always has cost 0 and no LLM calls — sync runs existing tests rather than generating new ones. The guarded assertion never executed, making it a vacuous pass. Coverage is already provided by test_generate_writes_traces (verifies llm_traces on a real LLM-calling operation) and test_every_llm_operation_has_traces (catches any operation with LLM calls that fails to capture traces). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge 5 separate LiteLLM tests (generate traces, example traces, model field, no log in meta, fingerprint, orphaned calls) into one test_litellm_sync_traces that runs a single pdd sync. Merge skip, dry-run, and core-dump tests into one test_skip_dryrun_coredump that syncs twice then checks all three. Total LLM sync runs drops from 11 to 5. Same assertions, less cost. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

LiteLLM trace tests (sync traces, skip/dry-run/core-dump, migration) now live in test_sync_orchestration.py alongside the other sync tests, gated by PDD_RUN_REAL_LLM_TESTS=1. This matches the existing pattern where that file already has an @pytest.mark.e2e test. test_llm_traces_e2e.py now only contains agentic provider-parametrized tests (Claude/Gemini/Codex), gated by PDD_RUN_AGENTIC_TESTS=1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…cation The crash tests tried to force a "crash" operation by breaking code after initial sync, but operation classification is unreliable in this context. The clean-path test already covers the actual feature (traces in sync log). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…on, stale trace leak - Add JSON-quoted-value redaction pattern so "api_key": "sk-..." in LLM traces gets scrubbed (the old regex stopped at quote characters) - Change _last_agentic_trace from plain global to ContextVar for thread safety, matching llm_trace.py's existing pattern - Add pre-operation drain of stale agentic traces in sync_orchestration so an unconsumed trace from a prior exception path can't leak forward - Add defensive newline guard in append migration so JSONL lines don't merge when the target file lacks a trailing newline - Remove unused pop_last_pair import from sync_orchestration - Rewrite Sections C, D, and I.test_failed_discovery tests to exercise actual production code (discovery functions, sync_orchestration) instead of asserting on hand-built dict literals Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… imports, destructive alias - Replace shutil.move with os.rename (atomic on POSIX) for the fast migration path; handle FileNotFoundError/OSError for concurrent access. Append path uses a single r+b file handle to check trailing newline and write in one shot, eliminating TOCTOU between separate read and append. - Change _pairs_by_operation ContextVar default from {} (shared mutable) to None; all call sites already use `or {}` copy pattern. - Move two mid-file `import logging as ...` statements to a single top-level `import logging` in agentic_common.py. - Make pop_last_pair non-destructive: peek at last pair instead of draining all pairs via pop_all_pairs, so backward-compat callers don't silently destroy the trace list. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The comment said .pdd/meta was used for "logs and locks". Since sync logs moved to .pdd/logs/, clarify that meta holds fingerprints and run reports while logs/ holds the sync logs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The .pdd directory listing said meta/ held sync logs. Since promptdriven#752 moved them to .pdd/logs/ (gitignored), update the listing and the version control note accordingly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

greptile-apps · 2026-04-27T00:41:18Z

Greptile Summary

This PR embeds LLM traces (all prompt/response pairs per operation) and agentic session pointers directly into sync log entries, and moves the sync log from .pdd/meta/ to .pdd/logs/ with automatic migration. The feature covers LiteLLM and all three CLI providers (Claude, Gemini, Codex), with truncation, redaction, and ContextVar-based thread safety throughout.

Confidence Score: 5/5

Safe to merge; all remaining findings are minor style or edge-case quality issues with no impact on the primary sync path.

The implementation is thorough and well-tested (90 mocked + several real-LLM tests). All three findings are P2: an empty-list thinking normalisation edge case, a cosmetic redaction UX issue, and a partial-write recovery path that silently continues. None affect correctness under normal operation.

pdd/core/llm_trace.py (empty-list thinking normalizes to empty string instead of None), pdd/operation_log.py (silent swallow of write errors in slow-path migration).

Important Files Changed

Filename	Overview
pdd/core/llm_trace.py	Accumulates all LLM pairs per operation instead of keeping only the last; adds thinking normalization and JSON-quoted redaction. Two minor issues: empty-list thinking normalizes to "" instead of None; new redaction pattern strips key names alongside values.
pdd/operation_log.py	Moves sync log from .pdd/meta/ to .pdd/logs/ with best-effort migration. Atomic rename fast path is correct; slow-path append silently swallows write errors without resetting to a clean state, risking JSONL corruption on retry.
pdd/agentic_common.py	Adds session discovery for Claude/Gemini/Codex, ContextVar-based _last_agentic_trace, replaces _log_agentic_interaction with the new trace pointer scheme. All failure paths return None and never raise. 4-tuple return is consistent throughout.
pdd/sync_orchestration.py	Attaches llm_traces and agentic_trace to log entries on both success and failure; pre-op drain correctly flushes stale traces. Import of get_last_agentic_trace is guarded by ImportError for environments without agentic support.
pdd/core/dump.py	Replaces file_contents-based sync log embedding with direct disk reads; adds sync_log_refs. Correctly deduplicates logs_dir vs meta_dir entries and strips trace data from step records.
pdd/llm_invoke.py	Passes thinking output to record_llm_pair at all three call sites (cloud, litellm, retry); best-effort extraction wrapped in try/except at each site.

Sequence Diagram

sequenceDiagram
    participant SO as sync_orchestration
    participant LT as llm_trace (ContextVar)
    participant LI as llm_invoke
    participant AC as agentic_common
    participant OL as operation_log

    SO->>LT: set_current_operation(op)
    SO->>LT: pop_all_pairs(op)  [drain stale]
    SO->>AC: get_last_agentic_trace()  [drain stale]

    SO->>LI: code_generator_main / test / fix
    LI->>LT: record_llm_pair(prompt, response, model, thinking)
    Note over LT: accumulated in _pairs_by_operation[op]

    SO->>AC: run_agentic_task(provider, ...)
    AC->>AC: _run_with_provider()
    AC->>AC: _discover_session()
    AC->>LT: _last_agentic_trace.set(trace)

    SO->>LT: pop_all_pairs(op) -> llm_traces list
    SO->>AC: get_last_agentic_trace() -> agentic_trace

    SO->>OL: append_log_entry(entry + llm_traces + agentic_trace)
    OL->>OL: get_log_path() -> .pdd/logs/ (migrates legacy .pdd/meta/)

Prompt To Fix All With AI

This is a comment left during a code review.
Path: pdd/core/llm_trace.py
Line: 75-86

Comment:
**Empty-list thinking normalizes to `""` instead of `None`**

When `thinking` is `[]` or a list where every dict has empty `"thinking"`/`"text"`/`"content"`, `_normalize_thinking` returns `"\n".join([])` = `""`. Back in `record_llm_pair`, the guard `if thinking_text is not None` is satisfied by `""`, so `LLMTracePair.thinking` is stored as an empty string rather than `None`. Callers doing `thinking is None` to mean "absent" will silently get `False` and treat it as a non-null field with no content.

```suggestion
        return "\n".join(parts) or None
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: pdd/core/llm_trace.py
Line: 30

Comment:
**JSON-quoted redaction drops the key name**

The new pattern replaces the full `"api_key": "sk-..."` span with `<redacted>`, erasing the key name. A prompt snippet like `{"api_key": "sk-abc...", "model": "gpt-4"}` becomes `{<redacted>, "model": "gpt-4"}`, which gives developers no hint that a secret key field was present. Consider preserving the key while only scrubbing the value by using a replacement like `"\1": "<redacted>"`.

This is only a usability concern — the redaction itself is secure either way.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: pdd/operation_log.py
Line: 83-102

Comment:
**Partial write leaves `new_path` corrupted without clearing `old_path`**

If `f.write(old_content)` fails mid-write (e.g., disk full), the broad `except Exception: pass` swallows the error and `old_path.unlink()` never runs. On the next call to `get_log_path`, `old_path` still exists and `new_path` exists with a partial/corrupted tail, so the slow path fires again and appends `old_content` a second time. The partial first write can corrupt the JSONL line at the boundary, not merely duplicate it.

The doc comment acknowledges that "the worst case is a duplicate append (idempotent JSONL lines)" — but that covers the concurrent-process race, not the partial-write case. Logging the failure at DEBUG level (without re-raising) would at least surface the issue without breaking the workflow.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "Update README: sync logs moved from .pdd..." | Re-trigger Greptile}

- Update test_issue_902 mock return values to 4-tuple (added agentic_trace) - Update test_e2e_issue_894 unpacking to 4-tuple for _run_with_provider - Fix _normalize_thinking returning "" instead of None for empty lists - Preserve key names in JSON-quoted secret redaction - Log migration failures at DEBUG instead of silently swallowing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gltanaka

Thanks for taking this on. The motivation here is absolutely useful: having better traces around agentic and LLM-driven workflows would make prompt iteration and debugging much easier.

I am going to request changes because I think this PR currently solves a different problem than #752, and it also removes some existing debug coverage that we still need.

Issue #752 asks for compact, standardized agentic trace files under .pdd/agentic-traces/, with provider-specific raw logs distilled into a common, redacted format. This PR instead stores LiteLLM prompt/response pairs in sync logs and adds best-effort pointers to provider-owned session files. That is useful in places, but it is not the same artifact the issue describes.

A few concerns I would like addressed before merge:

Please preserve or replace the existing agentic failure logging path. Today main records agentic failures via .pdd/agentic-logs/; this PR removes that, while the new provider-session discovery misses cases like timeout, CLI not found, non-zero exit, invalid JSON, and parse failures.
Please wire agentic trace capture at the run_agentic_task() level, not only through sync_orchestration, since bug/change/fix/update/split/checkup workflows also use agentic tasks.
Please generate project-local standardized trace files, preferably under .pdd/agentic-traces/, rather than only storing absolute pointers to provider-private session files.
Please keep the sync-log relocation / migration / core-dump behavior separate unless it is strictly required. That part adds a lot of blast radius and seems independent from the trace feature.

The LiteLLM llm_traces work may be worth splitting into a smaller PR, since collecting all prompt/response pairs for sync operations is useful on its own. For #752 specifically, I would like to see this refocused around the standardized agentic trace artifact described in the issue.

Thanks again for pushing this forward. The direction is valuable; I think it just needs to be realigned with the requested trace format and avoid regressing the existing debug logs.

niti-go closed this Apr 20, 2026

niti-go force-pushed the feature/llm-traces-in-sync-log-752 branch from 2f29976 to da5810a Compare April 20, 2026 19:55

niti-go reopened this Apr 26, 2026

niti-go changed the title ~~feat(#752): embed LLM traces in sync log entries~~ feat(#752): LLM traces in sync log, move logs to .pdd/logs/ Apr 26, 2026

niti-go and others added 5 commits April 25, 2026 23:32

niti-go changed the title ~~feat(#752): LLM traces in sync log, move logs to .pdd/logs/~~ feat(#752): Embed LLM traces in sync log entries Apr 26, 2026

niti-go and others added 3 commits April 26, 2026 19:46

niti-go marked this pull request as ready for review April 27, 2026 00:31

niti-go and others added 2 commits April 26, 2026 20:34

greptile-apps Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread pdd/core/llm_trace.py

Comment thread pdd/core/llm_trace.py Outdated

Comment thread pdd/operation_log.py Outdated

gltanaka requested changes Apr 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(#752): Embed LLM traces in sync log entries#784

feat(#752): Embed LLM traces in sync log entries#784
niti-go wants to merge 13 commits intopromptdriven:mainfrom
niti-go:feature/llm-traces-in-sync-log-752

niti-go commented Apr 19, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Apr 27, 2026

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gltanaka left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

niti-go commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files changed

Tests

test_llm_traces.py — 90 mocked tests

test_sync_orchestration.py — 4 new tests (1 mocked + 3 real LLM)

test_llm_traces_e2e.py — 3 agentic tests (3 providers x 1 test)

Test results

Existing test suites — all green

Pre-existing test failures (not related to this PR)

Group 1: test_ci_detect_changed_modules.py — 5 failures

Group 2: test_agentic_e2e_fix_orchestrator.py — 3 failures

Group 3: test_issue_1240_generate_prompt_meta_framing.py — 2 failures

Uh oh!

greptile-apps Bot commented Apr 27, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gltanaka left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

niti-go commented Apr 19, 2026 •

edited

Loading

`test_llm_traces.py` — 90 mocked tests

`test_sync_orchestration.py` — 4 new tests (1 mocked + 3 real LLM)

`test_llm_traces_e2e.py` — 3 agentic tests (3 providers x 1 test)

Group 1: `test_ci_detect_changed_modules.py` — 5 failures

Group 2: `test_agentic_e2e_fix_orchestrator.py` — 3 failures

Group 3: `test_issue_1240_generate_prompt_meta_framing.py` — 2 failures