Skip to content

feat: enhance citation handling with confidence and match kind#230

Open
rkarmaka wants to merge 4 commits into
NVIDIA-AI-Blueprints:developfrom
rkarmaka:feat/citation-confidence-scoring
Open

feat: enhance citation handling with confidence and match kind#230
rkarmaka wants to merge 4 commits into
NVIDIA-AI-Blueprints:developfrom
rkarmaka:feat/citation-confidence-scoring

Conversation

@rkarmaka
Copy link
Copy Markdown

  • Updated the citation emission process to include confidence scores and match kinds when a SourceRegistry is attached, allowing the UI to display verification badges.
  • Modified the API interfaces and internal state management to accommodate the new citation attributes.
  • Added tests to ensure proper handling of confidence and match kind in citation updates.
  • Implemented a confidence threshold in the DeepResearcherAgent to filter citations based on their strength before inclusion in reports.

This change improves the user experience by providing clearer insights into citation validity and enhances the overall citation verification process.

- Updated the citation emission process to include confidence scores and match kinds when a SourceRegistry is attached, allowing the UI to display verification badges.
- Modified the API interfaces and internal state management to accommodate the new citation attributes.
- Added tests to ensure proper handling of confidence and match kind in citation updates.
- Implemented a confidence threshold in the DeepResearcherAgent to filter citations based on their strength before inclusion in reports.

This change improves the user experience by providing clearer insights into citation validity and enhances the overall citation verification process.

Signed-off-by: Ranit Karmakar <rkarmaka@mtu.edu>
@rkarmaka rkarmaka force-pushed the feat/citation-confidence-scoring branch from b556739 to aac7b3c Compare May 12, 2026 20:46
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 12, 2026

Greptile Summary

This PR wires citation confidence scores and match strategies (MatchKind) from the Python source-registry resolver all the way to the UI, enabling a verification badge on CitationCard for heuristic-matched citations. It also adds a verifications list to CitationVerificationResult and introduces the passthrough_threshold parameter to verify_citations.

  • Backend (citation_verification.py): Introduces MatchKind, _ResolveMatch, and VerifiedCitation dataclasses; resolve_url now returns structured match objects instead of bare strings; verify_citations populates a verifications list and emits OTel metrics per match kind.
  • API callback (callbacks.py): _emit_cited_urls replaces has_url with resolve_url and forwards confidence/match_kind in the citation_use SSE event when a registry is attached.
  • Frontend: CitationSource type gains confidence/matchKind fields; the store propagates them through with monotonic-max confidence; CitationCard renders a ConfidenceBadge only for non-exact/non-normalized matches.

Confidence Score: 3/5

Not safe to merge — citation verification crashes at runtime for any report that has a references section.

The refactoring in verify_citations removed the valid_citations and removed_citations local variable initializations, but the dedup block that iterates for c in valid_citations: (line 1018) was left in place. This raises NameError on every invocation against a report with a references section, crashing citation verification in both researcher agents.

src/aiq_agent/common/citation_verification.py — lines 1008-1044 (dedup block) reference variables removed by this PR.

Important Files Changed

Filename Overview
src/aiq_agent/common/citation_verification.py Adds MatchKind, _ResolveMatch, VerifiedCitation dataclasses and rewires verify_citations to emit structured records — but the dedup block (lines 1008-1044) references valid_citations/removed_citations which were removed from local initialization, causing a NameError at runtime.
frontends/aiq_api/src/aiq_api/jobs/callbacks.py _emit_cited_urls now calls resolve_url() instead of has_url() and forwards confidence/match_kind into the citation_use SSE event when a registry is attached; falls back correctly when no registry.
frontends/ui/src/features/layout/components/CitationCard.tsx Adds ConfidenceBadge component with tier classification (high/medium/low) and tooltip; only renders for non-exact/non-normalized matches.
frontends/ui/src/features/chat/store.ts Extends addDeepResearchCitation with confidence/matchKind params; confidence is treated as monotonically increasing (Math.max) to prevent downgrading existing scores.
frontends/ui/src/features/chat/types.ts Adds CitationMatchKind union type and extends CitationSource with confidence/matchKind fields.
src/aiq_agent/agents/deep_researcher/agent.py Adds passthrough_threshold to the verify_citations call and updates _is_report_complete to use v.resolved without the threshold — consistent with the new contract.
src/aiq_agent/agents/shallow_researcher/agent.py Passes passthrough_threshold=0.0 to verify_citations so all resolved shallow-research citations are kept.
frontends/ui/src/adapters/api/deep-research-client.ts Extends ArtifactUpdateEvent and onCitationUpdate callback type with confidence/match_kind; both citation_source and citation_use cases forward the new fields correctly.
tests/aiq_agent/common/test_citation_verification.py Comprehensive new TestVerifyCitationsConfidence test class covering per-match-kind confidence values, threshold semantics, telemetry log structure, and ambiguous/unmatched edge cases.
frontends/ui/src/features/layout/components/CitationCard.spec.tsx Adds full test coverage for the confidence badge across all match kinds and tier classifications.

Comments Outside Diff (1)

  1. src/aiq_agent/common/citation_verification.py, line 1018-1044 (link)

    P1 Dedup block references undefined valid_citations — NameError at runtime

    This PR removed the earlier variable initializations (valid_citations: list[dict] = [] and removed_citations: list[dict] = []) and replaced them with the new verifications: list[VerifiedCitation] = [] idiom — but the dedup block at lines 1008-1044 was not updated. Line 1018 (for c in valid_citations:) references valid_citations before it is defined; the variable is only assigned at line 1055.

    Every call to verify_citations on a report that has a references section will raise NameError: name 'valid_citations' is not defined. This breaks citation verification for both the deep and shallow researcher agents. Additionally, duplicate_rewrites will remain empty because the loop is never entered, so the dedup rewrite at lines 1108-1109 is silently a no-op, meaning duplicate citations pointing to the same source are no longer collapsed.

Reviews (4): Last reviewed commit: "Merge branch 'develop' into feat/citatio..." | Re-trigger Greptile

Comment thread src/aiq_agent/common/citation_verification.py Outdated
Comment thread src/aiq_agent/agents/deep_researcher/agent.py Outdated
Comment thread src/aiq_agent/agents/deep_researcher/agent.py Outdated
Comment thread src/aiq_agent/common/citation_verification.py
- Updated the `citation_passthrough_threshold` documentation to specify its role in marking citations as verified in the UI, rather than filtering them from the report.
- Adjusted the `verify_citations` function to ensure that only unresolved citations are stripped from the report, allowing all resolved citations to remain and carry their confidence scores.
- Enhanced comments throughout the code to improve clarity on citation verification processes and UI interactions.

These changes improve the understanding of citation handling and ensure that the UI accurately reflects citation confidence without losing important context in reports.
Comment thread src/aiq_agent/agents/deep_researcher/register.py Outdated
…tConfig

- Updated the documentation for the confidence cutoff parameter to clarify its role in marking citations as verified in the UI.
- Improved the explanation of how the threshold affects citation verification without filtering them from the report body.
- Ensured that the default value and its implications for citation handling are clearly articulated.

These changes aim to provide better guidance on citation confidence settings and their impact on the user interface.
@AjayThorve
Copy link
Copy Markdown
Collaborator

Thank you @rkarmaka, we will review this soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants