docs: fix docs

Ang · Ang · commit e5934f2715d7 · 2026-04-18T04:54:01.000+08:00
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -67,19 +67,17 @@ as a deprecated alias for this release cycle.
 
 ### Removed
 - `IdentifierModule._check_doi_content_consistency` and the
-  `consistency_score` / `low_consistency` warning path. The fuzzy
-  string-similarity score was empirically unable to detect subtle
-  LLM-hallucinated references (scored 85/100 on author-only
-  hallucinations against a real DOI) and was only surfaced as a
-  `logger.warning` that downstream tools could not observe, producing
-  false reassurance. Citation authenticity verification belongs at the
-  abstract-vs-claim semantic layer in the consuming tool (e.g. the
-  `sci` skill), not at the bibliographic-string layer here.
+  `consistency_score` / `low_consistency` warning path. A fuzzy
+  string-similarity score on bibliographic fields is not a reliable
+  signal for detecting fabricated references, and it was only emitted
+  as a `logger.warning` that downstream tools could not act on.
+  Citation-authenticity verification belongs at the abstract-vs-claim
+  semantic layer in the consuming tool, not at the bibliographic-string
+  layer here.
 
 ## [0.1.0] - 2026-04-17
 
-First formal PyPI release since `0.0.12`.  Incorporates the complete
-pyOpenSci review pass (issues #3, #5–#34, #36) plus follow-up cleanup.
+First formal PyPI release since `0.0.12`.
 
 ### Added
 - RST documentation using Sphinx
@@ -94,25 +92,25 @@ pyOpenSci review pass (issues #3, #5–#34, #36) plus follow-up cleanup.
 - **Split monolithic `pipeline.py` (~3000 lines)** into a proper
   `onecite/pipeline/` package with one module per stage
   (`parser.py` / `identifier.py` / `enricher.py` / `formatter.py`)
-  plus a `_utils.py` for shared helpers (#17).  Public imports
+  plus a `_utils.py` for shared helpers.  Public imports
   (`from onecite.pipeline import IdentifierModule`) and mocking targets
   (`patch("onecite.pipeline.requests.get", ...)`) continue to work
   unchanged because `__init__.py` re-exports every public symbol and
   keeps `requests` at the package level.
-- Unify CrossRef request and parsing methods (#26); all CrossRef calls
+- Unify CrossRef request and parsing methods; all CrossRef calls
   now go through a single helper with a proper `User-Agent` header and
-  `mailto` query-string parameter (#21).
+  `mailto` query-string parameter.
 - Rewrite fuzzy-search scoring as a weighted title / author / year /
   venue model with three confidence tiers (auto-adopt / interactive /
-  cautious) and a unified low-confidence threshold (#3, #23, #27).
+  cautious) and a unified low-confidence threshold.
 - Simplify identifier routing; CrossRef and Semantic Scholar are always
   consulted for text queries, with signal-based additional queries to
-  PubMed / Google Books / OpenAIRE / BASE (#8, #23).
-- Use `bibtexparser.dumps()` for BibTeX rendering (#30).
+  PubMed / Google Books / OpenAIRE / BASE.
+- Use `bibtexparser.dumps()` for BibTeX rendering.
 - Expose `use_google_scholar` as a real CLI flag and API parameter
-  instead of a hard-coded `False` (#10).
+  instead of a hard-coded `False`.
 - Clarify that templates define metadata-field requirements and a
-  fallback BibTeX entry type, not output formatting (#16, #29).
+  fallback BibTeX entry type, not output formatting.
 - Refactored exception hierarchy
 - Added type hints to Python API
 - Updated README examples
@@ -125,42 +123,40 @@ pyOpenSci review pass (issues #3, #5–#34, #36) plus follow-up cleanup.
 - APA and MLA output renderers; they produced inconsistent output and
   the CLI now rejects anything other than `--output-format bibtex`.
   Users wanting APA/MLA should post-process the BibTeX through pandoc
-  or citeproc-py (#31, #32).
+  or citeproc-py.
 - Hard-coded "well-known paper" shortcut that masked failures on the
-  main example input (#19).
+  main example input.
 - MCP integration page and all related references
 - `.readthedocs.yml` (docs now hosted on GitHub Pages)
 - `docs/_build/` build artifacts from repository
 
 ### Fixed
 - README / `docs/index.rst` / `docs/faq.rst` no longer advertise
   OpenAlex or dblp as data sources — they were never wired into the
-  code (#6).
+  code.
 - README quick-start example now shows `booktitle` (NeurIPS) instead
-  of `journal = "arXiv preprint"` for the `@inproceedings` sample
-  (#28).
+  of `journal = "arXiv preprint"` for the `@inproceedings` sample.
 - `docs/api/pipeline.rst` rewritten to match the actual module
   structure; removed references to classes and methods that never
   existed (`Validator` / `Identifier` / `Completer` / `Formatter`,
-  `set_source_priority`, `set_timeout`, `add_template_path`) (#11).
+  `set_source_priority`, `set_timeout`, `add_template_path`).
 - `docs/output_formats.rst`, `docs/faq.rst`, `docs/quick_start.rst`,
   `docs/python_api.rst`, `docs/templates.rst`, `docs/index.rst` and
   docstrings in `core.py` / `formatter.py` no longer advertise APA /
-  MLA output (#31, #32).
+  MLA output.
 - Crossref author names parsed as `given family` instead of mangled
-  concatenations (#22).
+  concatenations.
 - Semantic Scholar HTTP 429 responses return an empty candidate list
-  cleanly instead of bubbling up (#25).
+  cleanly instead of bubbling up.
 - Previously-unused exception classes (`ParseError`, `ValidationError`,
-  `FormatError`) are now actually raised in the right places (#13).
+  `FormatError`) are now actually raised in the right places.
 - `CONTRIBUTING.md` no longer tells developers to use a `requirements.txt`
-  that does not exist; the documented install is `pip install -e .[dev]`
-  (#12).
+  that does not exist; the documented install is `pip install -e .[dev]`.
 - `black` formatting is enforced via `pyproject.toml` `[tool.black]`
-  plus a pre-commit hook (#15).
-- URL-bearing entries are no longer queried twice (#20).
+  plus a pre-commit hook.
+- URL-bearing entries are no longer queried twice.
 - Fallback paths mark entries as `identification_failed` rather than
-  fabricating plausible-looking but invented metadata (#24).
+  fabricating plausible-looking but invented metadata.
 - CrossRef and Semantic Scholar response parsing edge cases
 - API documentation using incorrect return value fields (`output_content` -> `results`)
 - Version number inconsistencies across metadata files
diff --git a/docs/api/pipeline.rst b/docs/api/pipeline.rst
@@ -12,10 +12,9 @@ into formatted BibTeX:
 3. **Enrich** — fetch full metadata for the identified entries
 4. **Format** — render the completed entries as BibTeX
 
-Since pyOpenSci review issue #17, the implementation lives in the
-``onecite/pipeline/`` package with one module per stage.  For
-backwards-compatibility all public symbols remain importable from
-``onecite.pipeline``:
+The implementation lives in the ``onecite/pipeline/`` package with one
+module per stage.  For backwards-compatibility all public symbols remain
+importable from ``onecite.pipeline``:
 
 .. code-block:: python
 
@@ -116,8 +115,9 @@ year / venue similarity to the query.  The decision logic in
 - ``match_score >= 50`` and a title is present → adopt cautiously
 - otherwise → mark the entry as ``identification_failed``
 
-This matches what the pyOpenSci review flagged in issues #3, #23 and
-#27.  Fallbacks never fabricate data (see #24).
+Fallback paths never fabricate data: an entry that cannot be resolved is
+marked ``identification_failed`` rather than filled with invented
+metadata.
 
 .. code-block:: python
 
diff --git a/docs/changelog.rst b/docs/changelog.rst
@@ -76,22 +76,18 @@ Removed
 ~~~~~~~
 
 - ``IdentifierModule._check_doi_content_consistency`` and the
-  ``consistency_score`` / ``low_consistency`` warning path.  The fuzzy
-  string-similarity score was empirically unable to detect subtle
-  LLM-hallucinated references (scored 85/100 on author-only
-  hallucinations against a real DOI) and was only surfaced as a
-  ``logger.warning`` that downstream tools could not observe, producing
-  false reassurance.  Citation-authenticity verification belongs at
-  the abstract-vs-claim semantic layer in the consuming tool
-  (e.g. the ``sci`` skill), not at the bibliographic-string layer
-  here.
+  ``consistency_score`` / ``low_consistency`` warning path.  A fuzzy
+  string-similarity score on bibliographic fields is not a reliable
+  signal for detecting fabricated references, and it was only emitted
+  as a ``logger.warning`` that downstream tools could not act on.
+  Citation-authenticity verification belongs at the abstract-vs-claim
+  semantic layer in the consuming tool, not at the bibliographic-string
+  layer here.
 
 [0.1.0] - 2026-04-17
 ---------------------
 
-First formal PyPI release since ``0.0.12``.  Incorporates the complete
-pyOpenSci review pass (issues #3, #5–#34, #36) plus follow-up cleanup.
-See ``CHANGELOG.md`` at the repository root for the full per-issue list.
+First formal PyPI release since ``0.0.12``.
 
 Added
 ~~~~~
@@ -108,18 +104,18 @@ Changed
 ~~~~~~~
 
 - **Split monolithic pipeline.py (~3000 lines)** into a proper
-  ``onecite/pipeline/`` package with one module per stage (#17)
+  ``onecite/pipeline/`` package with one module per stage
 - Unify CrossRef request and parsing methods, with ``User-Agent`` and
-  ``mailto`` set per CrossRef etiquette (#21, #26)
+  ``mailto`` set per CrossRef etiquette
 - Rewrite fuzzy-search scoring as a weighted title/author/year/venue
-  model with three confidence tiers (#3, #23, #27)
+  model with three confidence tiers
 - Simplify identifier routing; CrossRef and Semantic Scholar are the
   always-on sources, with signal-based PubMed / Google Books /
-  OpenAIRE / BASE queries (#8, #23)
-- Use ``bibtexparser.dumps()`` for BibTeX rendering (#30)
-- Expose ``use_google_scholar`` as a real CLI flag and API parameter (#10)
+  OpenAIRE / BASE queries
+- Use ``bibtexparser.dumps()`` for BibTeX rendering
+- Expose ``use_google_scholar`` as a real CLI flag and API parameter
 - Clarify that templates define metadata-field requirements and a
-  fallback BibTeX entry type, not output formatting (#16, #29)
+  fallback BibTeX entry type, not output formatting
 - Refactored exception hierarchy
 - Added type hints to Python API
 
@@ -128,9 +124,9 @@ Removed
 
 - APA and MLA output renderers; the CLI now rejects anything other than
   ``--output-format bibtex``.  Use pandoc or citeproc-py to convert the
-  generated BibTeX to APA / MLA (#31, #32)
+  generated BibTeX to APA / MLA
 - Hard-coded "well-known paper" shortcut that masked failures on the
-  main example input (#19)
+  main example input
 - MCP integration page and all related references
 - ``.readthedocs.yml`` (docs now hosted on GitHub Pages)
 - ``docs/_build/`` build artifacts from repository
@@ -139,20 +135,19 @@ Fixed
 ~~~~~
 
 - OpenAlex and dblp no longer listed as data sources — they were never
-  wired into the code (#6)
+  wired into the code
 - ``docs/api/pipeline.rst`` rewritten to match the real modules;
-  removed references to nonexistent classes / methods (#11)
+  removed references to nonexistent classes / methods
 - README and docs ``@inproceedings`` example now uses ``booktitle``
-  instead of ``journal = "arXiv preprint"`` (#28)
-- Crossref author names parsed as ``given family`` (#22)
-- Semantic Scholar HTTP 429 handled cleanly (#25)
+  instead of ``journal = "arXiv preprint"``
+- Crossref author names parsed as ``given family``
+- Semantic Scholar HTTP 429 handled cleanly
 - Previously-unused exception classes now raised in the right places
-  (#13)
 - ``CONTRIBUTING.md`` documents ``pip install -e .[dev]`` instead of
-  the non-existent ``requirements.txt`` (#12)
-- URL-bearing entries no longer queried twice (#20)
+  the non-existent ``requirements.txt``
+- URL-bearing entries no longer queried twice
 - Fallback paths mark entries as ``identification_failed`` rather than
-  fabricating invented metadata (#24)
+  fabricating invented metadata
 - CrossRef and Semantic Scholar response parsing edge cases
 - API documentation using incorrect return value fields
 - Version number inconsistencies across metadata files
diff --git a/onecite/pipeline/__init__.py b/onecite/pipeline/__init__.py
@@ -3,9 +3,9 @@
 
 """OneCite's 4-stage processing pipeline.
 
-Historically this lived in a single ``pipeline.py`` of ~3000 lines.  It was
-split per pyOpenSci review issue #17 into one module per stage.  All public
-symbols are re-exported here so callers and tests that do
+Historically this lived in a single ``pipeline.py`` of ~3000 lines.  It has
+been split into one module per stage.  All public symbols are re-exported
+here so callers and tests that do
 
     from onecite.pipeline import IdentifierModule
     import onecite.pipeline as pm  # and then: patch("onecite.pipeline.requests.get", ...)
diff --git a/onecite/pipeline/enricher.py b/onecite/pipeline/enricher.py
@@ -581,11 +581,12 @@ def _complete_fields(self, base_record: Dict, template: Dict,
         ``10.1007/s10462-019-09792-7``), which is strictly worse for
         downstream semantic checks than returning ``None``.
 
-        Older versions of this function attempted template-driven completion
-        of many fields across several sources (including Google Scholar),
-        which the pyOpenSci review (#29) correctly flagged as a no-op in
-        the default CLI path and as structurally wrong. That machinery is
-        not being reintroduced. The narrow abstract cascade here is
+        Older versions of this function attempted template-driven
+        completion of many fields across several sources (including Google
+        Scholar), which was a no-op in the default CLI path and
+        structurally wrong (the declared sources were never actually wired
+        for broad field completion). That machinery is not being
+        reintroduced. The narrow abstract cascade here is
         directly observable by downstream tools via the ``abstract`` field
         in the emitted BibTeX and was empirically the only way to bridge
         the gap between CrossRef-only (~44% coverage on a 10-DOI
diff --git a/tests/test_pipeline_unit.py b/tests/test_pipeline_unit.py
@@ -1207,9 +1207,8 @@ def test_conference_proceedings_type(self):
 
     def test_complete_fields_no_google_scholar_for_abstract(self):
         """Even when template asks for google_scholar_scraper, we never call
-        Google Scholar from _complete_fields (pyOpenSci #29). PubMed is the
-        only abstract fallback, and if it returns nothing the result stays
-        untouched."""
+        Google Scholar from _complete_fields. PubMed is the only abstract
+        fallback, and if it returns nothing the result stays untouched."""
         enr = EnricherModule(use_google_scholar=False)
         base = {'title': 'T', 'author': 'A', 'year': '2020'}
         template = {'fields': [{'name': 'abstract', 'source_priority': ['google_scholar_scraper']}]}