Skip to content

feat(#752): Embed LLM traces in sync log entries#784

Open
niti-go wants to merge 13 commits intopromptdriven:mainfrom
niti-go:feature/llm-traces-in-sync-log-752
Open

feat(#752): Embed LLM traces in sync log entries#784
niti-go wants to merge 13 commits intopromptdriven:mainfrom
niti-go:feature/llm-traces-in-sync-log-752

Conversation

@niti-go
Copy link
Copy Markdown
Contributor

@niti-go niti-go commented Apr 19, 2026

Summary

Closes #752

Store LLM trace data in sync log entries so developers can see every prompt, response, and reasoning step when a pdd operation gives unexpected results.

It will be helpful to have a standardized way of storing LLM internal thinking processes so we can reference them when iterating upon pdd agentic LLM prompts.

Before this PR, the only method to view agentic traces was to manually dig into provider-specific folders to search for session files (~/.claude/projects/, ~/.gemini/tmp/). And for non-agentic pdd operations, the LLM input/output pairs were not saved entirely.

  • Move sync log from .pdd/meta/ to .pdd/logs/ (gitignored). Migrate legacy files automatically on read and write.
  • llm_traces: Every LiteLLM operation (generate, test, example, fix) now records all prompt/response pairs in an llm_traces list on the log entry — on both success and failure. Each item has prompt, response, model, and thinking (string or null).
  • agentic_trace: When a CLI agent runs (Claude, Gemini, Codex), the log entry gets an agentic_trace pointer to the provider's session file on disk.
  • Session discovery: Claude (construct path from session_id), Gemini (glob by first-8-chars of UUID via projects.json slug), Codex (filesystem snapshot-and-diff). Best-effort, never raises, never warns.
  • Truncation & redaction: Fields capped at 20k chars. Bearer tokens, api_key, token, secret, password scrubbed from all text fields — including JSON-quoted values like "api_key": "sk-...".
  • Thinking normalization: List-of-objects, int, None all normalized to string-or-null.
  • Core dump: No longer embeds *_sync.log in file_contents. Adds sync_log_refs array. Reads sync_steps directly from disk.
  • Old agentic logging removed: _log_agentic_interaction, AGENTIC_LOG_DIR, _AGENTIC_SESSION_ID deleted. .pdd/agentic-logs/ added to .gitignore.
  • Thread safety: _last_agentic_trace changed from plain global to ContextVar, matching llm_trace.py's pattern.
  • Stale trace drain: Pre-operation cleanup now drains both LLM pairs and agentic trace to prevent leakage from a prior exception path.
  • Migration race safety: Fast path uses os.rename (atomic on POSIX) with FileNotFoundError/OSError handling for concurrent access. Append path uses a single r+b file handle to check trailing newline and write in one shot, eliminating TOCTOU.
  • ContextVar default fix: _pairs_by_operation default changed from {} (shared mutable) to None to prevent cross-context mutation.
  • pop_last_pair non-destructive: Backward-compat alias now peeks at the last pair without draining the list, so it doesn't destroy traces that pop_all_pairs expects to collect.
  • Import cleanup: Two mid-file import logging as ... in agentic_common.py consolidated to a single top-level import logging.

Files changed

File What changed
pdd/core/llm_trace.py List of pairs per op, thinking field, pop_all_pairs, JSON-quoted redaction pattern, ContextVar default None instead of {}, pop_last_pair is now non-destructive peek
pdd/llm_invoke.py Pass thinking to record_llm_pair at all 3 call sites
pdd/operation_log.py New .pdd/logs/ path, migration via atomic os.rename with race-safe fallback, r+b newline guard
pdd/sync_orchestration.py Attach llm_traces and agentic_trace after each op, pre-op drain of stale agentic trace
pdd/agentic_common.py Session discovery, 4-tuple return, old logging removed, _last_agentic_trace as ContextVar, top-level import logging
pdd/core/dump.py sync_log_refs, read steps from disk, stop embedding logs
.gitignore .pdd/meta/*_sync.log, .pdd/logs/, .pdd/agentic-logs/

Tests

test_llm_traces.py — 90 mocked tests

No LLM calls. Fast, deterministic, runs in CI.

python -m pytest tests/test_llm_traces.py -v
Section Tests What they cover
Log location & migration (6) test_new_write_path, test_migration_on_write, test_migration_on_read, test_already_migrated, test_only_sync_logs_migrate, test_directory_creation_safe_twice get_log_path() returns .pdd/logs/, legacy files move from .pdd/meta/, fingerprint and run report stay put
LLM trace structure (7) test_generate_gets_llm_traces_list, test_crash_retries_collect_all, test_trace_item_schema, test_traces_on_success_and_failure, test_failed_operation_keeps_all_traces, test_thinking_populated, test_thinking_null_when_absent Multiple pairs accumulate per operation, thinking is string or null, traces kept on failure
Agentic trace structure (4) test_claude_discovery_returns_correct_schema, test_gemini_discovery_returns_json_format, test_codex_discovery_returns_correct_schema, test_both_llm_traces_and_agentic_trace_on_same_entry Each discovery function returns correct {session_file, provider, format} shape; sync_orchestration attaches both llm_traces and agentic_trace to the same entry
Entries without traces (3) test_skip_entry_has_no_trace_keys, test_error_decision_entry_has_no_trace_keys, test_event_logged_via_log_event_has_no_trace_keys Runs sync_orchestration with skip/error decisions and log_event() directly — verifies logged entries have no llm_traces or agentic_trace
Truncation & redaction (4) test_prompt_truncation, test_response_truncation, test_thinking_truncation, test_secret_redaction Fields capped at 20k chars, Bearer/api_key/token/secret/password scrubbed
Session discovery: Claude (9) test_finds_session_by_id, test_slug_has_leading_dash, test_slug_with_spaces, test_deep_path, test_missing_session_id, test_file_not_found, test_missing_claude_dir, test_empty_session_file, test_parallel_safety Path construction from session_id, slug format, all failure modes return None
Session discovery: Gemini (12) test_happy_path, test_falls_back_to_jsonl, test_prefers_json_over_jsonl, test_multiple_json_picks_newest, test_uses_first_8_chars, test_slug_with_special_chars, test_rejects_subdirectory, test_bad_projects_json_schema, test_missing_projects_json, test_no_session_id, test_corrupt_projects_json, test_cwd_not_in_projects, test_parallel_safety Glob by UUID prefix, projects.json lookup, .json/.jsonl fallback, all failure modes
Session discovery: Codex (7) test_happy_path_one_new_file, test_multiple_new_files, test_pre_existing_files_ignored, test_respects_codex_home, test_missing_codex_dir, test_only_rollout_files, test_no_new_files Snapshot-and-diff of rollout files, CODEX_HOME env var, all failure modes
Session discovery: shared (6) test_never_raises_claude, test_never_raises_gemini, test_never_raises_codex, test_no_warnings, test_failed_discovery_does_not_block_logging, test_run_with_provider_sets_last_agentic_trace PermissionError returns None, only DEBUG logs, None discovery still writes entry without agentic_trace (via sync_orchestration), _run_with_provider stores trace in module state
Core dump (5) test_reads_sync_steps_from_logs_dir, test_includes_sync_log_refs, test_does_not_embed_sync_logs, test_sync_steps_excludes_trace_data, test_falls_back_to_meta_for_unmigrated Steps read from disk not file_contents, sync_log_refs present, trace data excluded from steps
Dry-run (3) test_reads_from_new_location, test_triggers_migration, test_does_not_show_trace_content _display_sync_log uses get_log_path(), doesn't print prompt/response text
Multi-operation sync (3) test_each_operation_gets_own_traces, test_no_trace_leakage, test_mixed_entries Operations get only their own traces, pop_all_pairs drains correctly
pop_all_pairs semantics (3) test_returns_all_recorded, test_empty_when_nothing_recorded, test_scoped_by_operation Returns all then empty, scoped by operation name
Concurrent sync (1) test_separate_log_files Two modules write to separate logs, no cross-contamination
Old agentic log removal (4) test_log_agentic_interaction_removed, test_agentic_log_dir_removed, test_agentic_session_id_removed, test_no_agentic_logs_dir_created Deleted symbols are gone from the module
Redaction on all fields (2) test_response_redacted, test_thinking_redacted Secrets scrubbed from response and thinking, not just prompt
Thinking normalization (3) test_thinking_list_of_objects, test_thinking_unexpected_type, test_thinking_none List of dicts joined to string, int to str, None to null
JSON-quoted secret redaction (4) test_json_api_key_redacted, test_json_token_redacted, test_json_secret_in_response, test_json_secret_in_thinking "api_key": "sk-..." style JSON values redacted across prompt, response, and thinking fields
Agentic trace thread safety (2) test_agentic_trace_is_context_var, test_agentic_trace_isolated_across_threads _last_agentic_trace is a ContextVar, traces set in one thread are invisible in another
Append migration newline boundary (1) test_append_migration_preserves_newline_boundary When both old and new sync logs exist and new file lacks trailing newline, merged result has valid JSONL (no corrupted lines)

test_sync_orchestration.py — 4 new tests (1 mocked + 3 real LLM)

The mocked test runs with all other sync orchestration tests. The 3 real LLM tests are gated by PDD_RUN_REAL_LLM_TESTS=1.

# Mocked (runs with the rest of the file):
python -m pytest tests/test_sync_orchestration.py -v -k "attaches_llm_trace"

# Real LLM:
pip install -e .
PDD_RUN_REAL_LLM_TESTS=1 python -m pytest tests/test_sync_orchestration.py -v -k "e2e_litellm or e2e_skip or e2e_legacy"
Test Type What it does
test_sync_orchestration_attaches_llm_trace_on_failed_operation Mocked Mocks pop_all_pairs and code_generator_main, verifies failed generate entry has llm_traces
test_e2e_litellm_sync_traces Real LLM Runs one pdd sync. Checks: log in .pdd/logs/ not .pdd/meta/, generate entry has llm_traces with valid prompt/response/model/thinking, example has traces, model field is recognizable, every traced operation has >= 2 items (no orphaned calls), fingerprint intact
test_e2e_skip_dryrun_coredump Real LLM Syncs twice (second skips), then runs --dry-run and --core-dump. Checks: skip entries have no traces, dry-run shows operation names. Unconditionally asserts core dump file exists, contains sync_log_refs, and all refs point to .pdd/logs/
test_e2e_legacy_sync_log_migrates Real LLM Plants a legacy log in .pdd/meta/, syncs. Checks: old file gone, new file in .pdd/logs/, old entries preserved

test_llm_traces_e2e.py — 3 agentic tests (3 providers x 1 test)

Parametrized over [anthropic, google, openai]. Skips if the provider's CLI isn't installed.

# All providers:
pip install -e .
PDD_RUN_AGENTIC_TESTS=1 python -m pytest tests/test_llm_traces_e2e.py -v

# Single provider:
PDD_RUN_AGENTIC_TESTS=1 python -m pytest tests/test_llm_traces_e2e.py -v -k "google"
PDD_RUN_AGENTIC_TESTS=1 python -m pytest tests/test_llm_traces_e2e.py -v -k "anthropic"
PDD_RUN_AGENTIC_TESTS=1 python -m pytest tests/test_llm_traces_e2e.py -v -k "openai"
Test What it does
test_agentic_sync_clean_path Runs pdd sync --agentic on a fresh project. Asserts: log in .pdd/logs/, generate entry has llm_traces, any agentic_trace entries have valid session file on disk, provider-specific path patterns (Claude /.claude/projects/, Gemini /.gemini/tmp/.../chats/, Codex jsonl)

Test results

Suite Result
test_llm_traces.py (90 mocked) all passed
test_sync_orchestration.py mocked trace test passed
test_e2e_litellm_sync_traces passed
test_e2e_skip_dryrun_coredump passed
test_e2e_legacy_sync_log_migrates passed
test_agentic_sync_clean_path[google] passed
test_agentic_*[anthropic] skipped (no Claude CLI)
test_agentic_*[openai] skipped (no Codex CLI)

Existing test suites — all green

Suite Result
test_operation_log.py 27 passed
test_core_dump.py 28 passed
test_agentic_common.py 198 passed
test_llm_invoke.py 261 passed

Pre-existing test failures (not related to this PR)

This PR has 10 CI failures, but these tests also fail on upstream main (Unit Tests run 24949525104). These failures are unrelated to this PR, but described below so they can be fixed on main:

Group 1: test_ci_detect_changed_modules.py — 5 failures

Root cause: Test loader references scripts/ci_detect_changed_modules.py but the file lives at pdd/ci_detect_changed_modules.py. Wrong path in the test.

Test Error
test_basename_from_nested_code_paths FileNotFoundError: scripts/ci_detect_changed_modules.py
test_basename_from_nested_context_and_tests Same
test_reverse_dep_detects_recursive_nested_include Same
test_reverse_dep_detects_include_many Same
test_detect_combines_nested_direct_and_reverse_dependencies Same

Fix (not this PR): Change "scripts" to "pdd" on line 11 of the test.

Group 2: test_agentic_e2e_fix_orchestrator.py — 3 failures

Root cause: Tests assert the orchestrator prompt documents safety features (_detect_meaningful_changes, _is_noise_path, cycle_start_hashes), but the prompt file was never updated to include these tokens.

Test Missing token
test_orchestrator_prompt_documents_direct_edit_guard _detect_meaningful_changes, has_direct_edits
test_orchestrator_prompt_documents_noise_filter _is_noise_path, __pycache__, .pytest_cache, node_modules
test_orchestrator_prompt_documents_cycle_waste_breaker cycle_start_hashes, current_cycle > 1

Fix (not this PR): Add the required documentation tokens to prompts/agentic_e2e_fix_orchestrator_python.prompt.

Group 3: test_issue_1240_generate_prompt_meta_framing.py — 2 failures

Root cause: Tests expect utils/mcp/prompts/generate_prompt.prompt and utils/mcp/prompts/*_python.prompt files, but the MCP prompts directory was never committed to main.

Test Error
test_mcp_generate_prompt_template_is_not_meta_framed FileNotFoundError: utils/mcp/prompts/generate_prompt.prompt
test_mcp_checked_in_python_prompts_are_not_meta_framed AssertionError: no *_python.prompt files found

Fix (not this PR): Commit the MCP prompt files or add skip guards for when the directory is missing.

@niti-go niti-go closed this Apr 20, 2026
@niti-go niti-go force-pushed the feature/llm-traces-in-sync-log-752 branch from 2f29976 to da5810a Compare April 20, 2026 19:55
…n#752)

- Move sync log from .pdd/meta/ to .pdd/logs/ (gitignored); migrate
  legacy files on both read and write inside get_log_path()
- Store all LLM prompt/response pairs per operation in llm_traces list
  (was: only last pair, only on failure)
- Add thinking field to LLMTracePair; normalize list-of-objects, int,
  and None inputs to string-or-null before storing
- Rename pop_last_pair to pop_all_pairs (backward-compat alias kept)
- Pass thinking from LiteLLM responses to record_llm_pair at all 3
  call sites in llm_invoke.py
- Truncate prompt/response/thinking at 20k chars; redact Bearer tokens,
  api_key, token, secret, password from all text fields
- Add agentic_trace pointer (session_file, provider, format) to log
  entries when a CLI agent runs; replaces inlined session content
- Add session discovery for Claude (construct path from session_id),
  Gemini (glob by first-8-chars of UUID via projects.json slug), and
  Codex (filesystem snapshot-and-diff of rollout files)
- Session discovery is best-effort: never raises, never warns, returns
  None on any failure, logs at DEBUG only
- Remove _log_agentic_interaction, AGENTIC_LOG_DIR, _AGENTIC_SESSION_ID
  from agentic_common.py; add .pdd/agentic-logs/ to .gitignore
- Core dump no longer embeds *_sync.log in file_contents; adds
  sync_log_refs array and reads sync_steps directly from disk
- _display_sync_log (--dry-run) now uses get_log_path() instead of
  hardcoded .pdd/meta/ path
- Add .pdd/meta/*_sync.log, .pdd/logs/, .pdd/agentic-logs/ to .gitignore
- Add 82 Tier 1 mocked tests (sections A-Q) in test_llm_traces.py
- Add Tier 2 E2E tests (sections S-Y) in test_llm_traces_e2e.py,
  gated by PDD_RUN_REAL_LLM_TESTS and PDD_RUN_AGENTIC_TESTS env vars
- Update existing tests for new log path and 4-tuple return type

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@niti-go niti-go reopened this Apr 26, 2026
The test mocked pop_last_pair but sync_orchestration now calls
pop_all_pairs and writes traces to llm_traces (top-level list)
instead of details.llm_trace (single pair). Update the mock and
assertion to match. Same regression coverage, new API shape.

This test was already failing before promptdriven#752 (the drain call on line
2170 consumed the mock before the attach call could use it).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@niti-go niti-go changed the title feat(#752): embed LLM traces in sync log entries feat(#752): LLM traces in sync log, move logs to .pdd/logs/ Apr 26, 2026
niti-go and others added 5 commits April 25, 2026 23:32
Merge test_agentic_sync_writes_llm_traces, test_format_matches_provider,
test_claude_path_from_session_id, and test_gemini_path_by_prefix into a
single test_agentic_sync_clean_path that runs one pdd sync --agentic and
checks everything: llm_traces on generate, agentic_trace structure,
format/extension match, provider-specific path patterns, and session
file exists on disk with content.

Cuts agentic clean-path tests from 5 separate syncs to 1 per provider.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
test_test_operation_has_traces failed because sync reused existing
tests (cost 0, no LLM calls) instead of generating new ones. Only
assert llm_traces on entries that actually called an LLM.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test operation on this minimal greeter project always has cost 0
and no LLM calls — sync runs existing tests rather than generating new
ones. The guarded assertion never executed, making it a vacuous pass.

Coverage is already provided by test_generate_writes_traces (verifies
llm_traces on a real LLM-calling operation) and
test_every_llm_operation_has_traces (catches any operation with LLM
calls that fails to capture traces).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merge 5 separate LiteLLM tests (generate traces, example traces,
model field, no log in meta, fingerprint, orphaned calls) into one
test_litellm_sync_traces that runs a single pdd sync.

Merge skip, dry-run, and core-dump tests into one
test_skip_dryrun_coredump that syncs twice then checks all three.

Total LLM sync runs drops from 11 to 5. Same assertions, less cost.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LiteLLM trace tests (sync traces, skip/dry-run/core-dump, migration)
now live in test_sync_orchestration.py alongside the other sync tests,
gated by PDD_RUN_REAL_LLM_TESTS=1. This matches the existing pattern
where that file already has an @pytest.mark.e2e test.

test_llm_traces_e2e.py now only contains agentic provider-parametrized
tests (Claude/Gemini/Codex), gated by PDD_RUN_AGENTIC_TESTS=1.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@niti-go niti-go changed the title feat(#752): LLM traces in sync log, move logs to .pdd/logs/ feat(#752): Embed LLM traces in sync log entries Apr 26, 2026
niti-go and others added 3 commits April 26, 2026 19:46
…cation

The crash tests tried to force a "crash" operation by breaking code after
initial sync, but operation classification is unreliable in this context.
The clean-path test already covers the actual feature (traces in sync log).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…on, stale trace leak

- Add JSON-quoted-value redaction pattern so "api_key": "sk-..." in LLM
  traces gets scrubbed (the old regex stopped at quote characters)
- Change _last_agentic_trace from plain global to ContextVar for thread
  safety, matching llm_trace.py's existing pattern
- Add pre-operation drain of stale agentic traces in sync_orchestration
  so an unconsumed trace from a prior exception path can't leak forward
- Add defensive newline guard in append migration so JSONL lines don't
  merge when the target file lacks a trailing newline
- Remove unused pop_last_pair import from sync_orchestration
- Rewrite Sections C, D, and I.test_failed_discovery tests to exercise
  actual production code (discovery functions, sync_orchestration) instead
  of asserting on hand-built dict literals

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… imports, destructive alias

- Replace shutil.move with os.rename (atomic on POSIX) for the fast
  migration path; handle FileNotFoundError/OSError for concurrent access.
  Append path uses a single r+b file handle to check trailing newline and
  write in one shot, eliminating TOCTOU between separate read and append.
- Change _pairs_by_operation ContextVar default from {} (shared mutable)
  to None; all call sites already use `or {}` copy pattern.
- Move two mid-file `import logging as ...` statements to a single
  top-level `import logging` in agentic_common.py.
- Make pop_last_pair non-destructive: peek at last pair instead of
  draining all pairs via pop_all_pairs, so backward-compat callers
  don't silently destroy the trace list.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@niti-go niti-go marked this pull request as ready for review April 27, 2026 00:31
niti-go and others added 2 commits April 26, 2026 20:34
The comment said .pdd/meta was used for "logs and locks". Since sync
logs moved to .pdd/logs/, clarify that meta holds fingerprints and run
reports while logs/ holds the sync logs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The .pdd directory listing said meta/ held sync logs. Since promptdriven#752 moved
them to .pdd/logs/ (gitignored), update the listing and the version
control note accordingly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 27, 2026

Greptile Summary

This PR embeds LLM traces (all prompt/response pairs per operation) and agentic session pointers directly into sync log entries, and moves the sync log from .pdd/meta/ to .pdd/logs/ with automatic migration. The feature covers LiteLLM and all three CLI providers (Claude, Gemini, Codex), with truncation, redaction, and ContextVar-based thread safety throughout.

Confidence Score: 5/5

Safe to merge; all remaining findings are minor style or edge-case quality issues with no impact on the primary sync path.

The implementation is thorough and well-tested (90 mocked + several real-LLM tests). All three findings are P2: an empty-list thinking normalisation edge case, a cosmetic redaction UX issue, and a partial-write recovery path that silently continues. None affect correctness under normal operation.

pdd/core/llm_trace.py (empty-list thinking normalizes to empty string instead of None), pdd/operation_log.py (silent swallow of write errors in slow-path migration).

Important Files Changed

Filename Overview
pdd/core/llm_trace.py Accumulates all LLM pairs per operation instead of keeping only the last; adds thinking normalization and JSON-quoted redaction. Two minor issues: empty-list thinking normalizes to "" instead of None; new redaction pattern strips key names alongside values.
pdd/operation_log.py Moves sync log from .pdd/meta/ to .pdd/logs/ with best-effort migration. Atomic rename fast path is correct; slow-path append silently swallows write errors without resetting to a clean state, risking JSONL corruption on retry.
pdd/agentic_common.py Adds session discovery for Claude/Gemini/Codex, ContextVar-based _last_agentic_trace, replaces _log_agentic_interaction with the new trace pointer scheme. All failure paths return None and never raise. 4-tuple return is consistent throughout.
pdd/sync_orchestration.py Attaches llm_traces and agentic_trace to log entries on both success and failure; pre-op drain correctly flushes stale traces. Import of get_last_agentic_trace is guarded by ImportError for environments without agentic support.
pdd/core/dump.py Replaces file_contents-based sync log embedding with direct disk reads; adds sync_log_refs. Correctly deduplicates logs_dir vs meta_dir entries and strips trace data from step records.
pdd/llm_invoke.py Passes thinking output to record_llm_pair at all three call sites (cloud, litellm, retry); best-effort extraction wrapped in try/except at each site.

Sequence Diagram

sequenceDiagram
    participant SO as sync_orchestration
    participant LT as llm_trace (ContextVar)
    participant LI as llm_invoke
    participant AC as agentic_common
    participant OL as operation_log

    SO->>LT: set_current_operation(op)
    SO->>LT: pop_all_pairs(op)  [drain stale]
    SO->>AC: get_last_agentic_trace()  [drain stale]

    SO->>LI: code_generator_main / test / fix
    LI->>LT: record_llm_pair(prompt, response, model, thinking)
    Note over LT: accumulated in _pairs_by_operation[op]

    SO->>AC: run_agentic_task(provider, ...)
    AC->>AC: _run_with_provider()
    AC->>AC: _discover_session()
    AC->>LT: _last_agentic_trace.set(trace)

    SO->>LT: pop_all_pairs(op) -> llm_traces list
    SO->>AC: get_last_agentic_trace() -> agentic_trace

    SO->>OL: append_log_entry(entry + llm_traces + agentic_trace)
    OL->>OL: get_log_path() -> .pdd/logs/ (migrates legacy .pdd/meta/)
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: pdd/core/llm_trace.py
Line: 75-86

Comment:
**Empty-list thinking normalizes to `""` instead of `None`**

When `thinking` is `[]` or a list where every dict has empty `"thinking"`/`"text"`/`"content"`, `_normalize_thinking` returns `"\n".join([])` = `""`. Back in `record_llm_pair`, the guard `if thinking_text is not None` is satisfied by `""`, so `LLMTracePair.thinking` is stored as an empty string rather than `None`. Callers doing `thinking is None` to mean "absent" will silently get `False` and treat it as a non-null field with no content.

```suggestion
        return "\n".join(parts) or None
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: pdd/core/llm_trace.py
Line: 30

Comment:
**JSON-quoted redaction drops the key name**

The new pattern replaces the full `"api_key": "sk-..."` span with `<redacted>`, erasing the key name. A prompt snippet like `{"api_key": "sk-abc...", "model": "gpt-4"}` becomes `{<redacted>, "model": "gpt-4"}`, which gives developers no hint that a secret key field was present. Consider preserving the key while only scrubbing the value by using a replacement like `"\1": "<redacted>"`.

This is only a usability concern — the redaction itself is secure either way.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: pdd/operation_log.py
Line: 83-102

Comment:
**Partial write leaves `new_path` corrupted without clearing `old_path`**

If `f.write(old_content)` fails mid-write (e.g., disk full), the broad `except Exception: pass` swallows the error and `old_path.unlink()` never runs. On the next call to `get_log_path`, `old_path` still exists and `new_path` exists with a partial/corrupted tail, so the slow path fires again and appends `old_content` a second time. The partial first write can corrupt the JSONL line at the boundary, not merely duplicate it.

The doc comment acknowledges that "the worst case is a duplicate append (idempotent JSONL lines)" — but that covers the concurrent-process race, not the partial-write case. Logging the failure at DEBUG level (without re-raising) would at least surface the issue without breaking the workflow.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "Update README: sync logs moved from .pdd..." | Re-trigger Greptile

Comment thread pdd/core/llm_trace.py
Comment thread pdd/core/llm_trace.py Outdated
Comment thread pdd/operation_log.py Outdated
- Update test_issue_902 mock return values to 4-tuple (added agentic_trace)
- Update test_e2e_issue_894 unpacking to 4-tuple for _run_with_provider
- Fix _normalize_thinking returning "" instead of None for empty lists
- Preserve key names in JSON-quoted secret redaction
- Log migration failures at DEBUG instead of silently swallowing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@gltanaka gltanaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on. The motivation here is absolutely useful: having better traces around agentic and LLM-driven workflows would make prompt iteration and debugging much easier.

I am going to request changes because I think this PR currently solves a different problem than #752, and it also removes some existing debug coverage that we still need.

Issue #752 asks for compact, standardized agentic trace files under .pdd/agentic-traces/, with provider-specific raw logs distilled into a common, redacted format. This PR instead stores LiteLLM prompt/response pairs in sync logs and adds best-effort pointers to provider-owned session files. That is useful in places, but it is not the same artifact the issue describes.

A few concerns I would like addressed before merge:

  • Please preserve or replace the existing agentic failure logging path. Today main records agentic failures via .pdd/agentic-logs/; this PR removes that, while the new provider-session discovery misses cases like timeout, CLI not found, non-zero exit, invalid JSON, and parse failures.
  • Please wire agentic trace capture at the run_agentic_task() level, not only through sync_orchestration, since bug/change/fix/update/split/checkup workflows also use agentic tasks.
  • Please generate project-local standardized trace files, preferably under .pdd/agentic-traces/, rather than only storing absolute pointers to provider-private session files.
  • Please keep the sync-log relocation / migration / core-dump behavior separate unless it is strictly required. That part adds a lot of blast radius and seems independent from the trace feature.

The LiteLLM llm_traces work may be worth splitting into a smaller PR, since collecting all prompt/response pairs for sync operations is useful on its own. For #752 specifically, I would like to see this refocused around the standardized agentic trace artifact described in the issue.

Thanks again for pushing this forward. The direction is valuable; I think it just needs to be realigned with the requested trace format and avoid regressing the existing debug logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Store simplified LLM session traces after all agentic tasks

2 participants