PaperSource is a string enum in src/perspicacite/models/papers.py that records
the origin database or ingestion path for every Paper object in the system. It
appears in:
Paper.source— the source value stamped at ingest time- KB statistics (
GET /api/kb/{name}/stats) — breakdown of papers by source - Provenance traces — source database of each retrieved paper
- SQLite paper metadata table
| Value | String | When assigned |
|---|---|---|
OPENALEX |
"openalex" |
Paper discovered and fetched primarily via OpenAlex (e.g., DOI-based lookup, citation-graph expansion) |
PUBMED |
"pubmed" |
Paper from PubMed/NCBI search (pubmed-search command, entrez fetcher) |
ARXIV |
"arxiv" |
Paper from the arXiv API (arXiv-specific fetch path, arXiv HTML retrieval) |
CROSSREF |
"crossref" |
Paper metadata resolved via Crossref (DOI metadata enrichment, CrossRef-primary path) |
SEMANTIC_SCHOLAR |
"semantic_scholar" |
Paper fetched directly via the Semantic Scholar API (SS fallback in cite-graph expansion, direct SS search) |
BIBTEX |
"bibtex" |
Paper ingested from a user-provided .bib file |
SCILEX |
"scilex" |
Paper returned by a SciLEx multi-database fan-out search |
WEB_SEARCH |
"web_search" |
Legacy value; kept for backward compatibility. No ingestion path currently assigns this. |
USER_UPLOAD |
"user_upload" |
Paper from a direct user upload (UI or API upload of a PDF without a DOI) |
CITATION_FOLLOW |
"citation_follow" |
Paper added by following a citation link (pre-migration legacy value; now superseded by OPENALEX/SEMANTIC_SCHOLAR for new ingest) |
LOCAL |
"local" |
Paper ingested from the local file system (ingest-local command or MCP ingest_local_documents) |
Before the PaperSource migration, most papers ingested via the download pipeline were
stamped WEB_SEARCH regardless of which API actually returned them. The migration
(commit feat(models): thread PaperSource through CrossRef + cite-graph adapters and
related) changed every Paper construction site to stamp the true origin:
- CrossRef enrichment →
CROSSREF - OpenAlex citation edges →
OPENALEX - Semantic Scholar citation edges →
SEMANTIC_SCHOLAR - arXiv-primary fetch →
ARXIV - PubMed search →
PUBMED
The WEB_SEARCH value is preserved for backward compatibility (existing SQLite rows
from before the migration keep their value) but is no longer assigned by any ingestion
path in v2.0.0+.
If you have a KB created before 2026-05-15, papers in it may carry web_search as
their source. Re-ingesting them (by deleting and rebuilding the KB from the same
.bib) will assign the correct source values.
from perspicacite.models.papers import Paper, PaperSource
paper = Paper(
id="10.1038/s41586-023-06924-6",
title="...",
source=PaperSource.OPENALEX,
# ...
)
# Check source in provenance filtering
if paper.source in (PaperSource.ARXIV, PaperSource.SEMANTIC_SCHOLAR):
# arXiv-seeded paper — may benefit from SS fallback cite-graph
...- concepts/citation-graph.md — how
OPENALEXandSEMANTIC_SCHOLARvalues are assigned during cite-graph expansion - concepts/provenance.md — where
sourceappears in retrieval traces - VISION.md — the design principle behind honest sourcing