Skip to content

Commit 780a7ad

Browse files
Merge pull request #71 from ContextLab/claude/investigate-issue-37-011CUqwKy1eU4BhoDN8nHAuR
Claude/investigate issue 37
2 parents 0899c33 + e244fba commit 780a7ad

2 files changed

Lines changed: 672 additions & 10 deletions

File tree

README.md

Lines changed: 101 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -10,11 +10,13 @@ The main bibtex file ([cdl.bib](https://raw.githubusercontent.com/ContextLab/CDL
1010
- [Using the bibtex checker tools](#using-the-bibtex-checker-tools)
1111
- [Installation](#installation)
1212
- [Overview](#overview)
13+
- [bibcheck.py - Format Verification](#bibcheckpy---format-verification)
14+
- [bibverify.py - Accuracy Verification](#bibverifypy---accuracy-verification)
1315
- [Suggested workflow](#suggested-workflow)
1416
- [Additional information and usage instructions](#additional-information-and-usage-instructions)
15-
- [`verify`](#verify)
16-
- [`compare`](#compare)
17-
- [`commit`](#commit)
17+
- [`bibcheck verify`](#verify)
18+
- [`bibcheck compare`](#compare)
19+
- [`bibcheck commit`](#commit)
1820
- [Using the bibtex file as a common bibliography for all *local* LaTeX files](#using-the-bibtex-file-as-a-common-bibliography-for-all-local-latex-files)
1921
- [General Unix/Linux Setup (Command Line Compilation)](#general-unixlinux-setup-command-line-compilation)
2022
- [MacOS Setup with TeXShop and TeX Live](#macos-setup-with-texshop-and-tex-live)
@@ -35,10 +37,27 @@ You may find the included bibtex file and/or readme file useful for any of the f
3537
- Instructions for adding this repository as a sub-module to Overleaf projects, so that you can share a common bibtex file across your Overleaf projects
3638

3739
## Using the bibtex checker tools
38-
You may find the bibtex checker tools useful for:
39-
- Verifying the integrity of a .bib file
40+
41+
This repository includes two complementary verification tools:
42+
43+
1. **bibcheck.py** - Verifies formatting and consistency
44+
- Checks key naming conventions
45+
- Validates author/editor name formatting
46+
- Ensures proper capitalization
47+
- Verifies page number formatting
48+
- Removes duplicate entries
49+
50+
2. **bibverify.py** - Verifies accuracy against external sources
51+
- Cross-references entries with CrossRef database (170M+ records)
52+
- Validates volume, issue/number, and page fields
53+
- Detects common errors (e.g., DOI in pages field)
54+
- Uses conservative matching to prevent false positives
55+
56+
You may find these tools useful for:
57+
- Verifying the integrity and accuracy of a .bib file
4058
- Autocorrecting a .bib file (use with caution!)
4159
- Automatically generating change logs and commit messages
60+
- Finding and fixing metadata errors
4261

4362
### Installation
4463
The bibtex checker has only been tested on MacOS, but it will probably work without modification on other Unix systems, and with minor modification on Windows systems.
@@ -51,7 +70,9 @@ pip install -r requirements.txt
5170

5271
### Overview
5372

54-
The included checker has three general functions: `verify`, `compare`, and `commit`:
73+
#### bibcheck.py - Format Verification
74+
75+
The format verification tool has three main functions: `verify`, `compare`, and `commit`:
5576
```bash
5677
Usage: bibcheck.py [OPTIONS] COMMAND [ARGS]...
5778

@@ -68,25 +89,95 @@ Commands:
6889
verify
6990
```
7091

92+
#### bibverify.py - Accuracy Verification
93+
94+
The accuracy verification tool checks entries against the CrossRef database:
95+
```bash
96+
Usage: python bibverify.py [OPTIONS] COMMAND [ARGS]...
97+
98+
Commands:
99+
verify Verify bibliographic entries against CrossRef database
100+
info Show information about the verification tool
101+
```
102+
103+
**Key Features:**
104+
- **Fast:** Verifies 6,151 entries in ~6 minutes using parallel processing
105+
- **Conservative:** Requires strong similarity in title, authors, AND journal before reporting issues
106+
- **Accurate:** Prevents false positives by rejecting uncertain matches
107+
- **Focused:** Only checks volume, issue, and pages metadata (not formatting)
108+
109+
**Basic Usage:**
110+
```bash
111+
# Verify entire bibliography with 10 parallel workers
112+
python bibverify.py verify cdl.bib --workers 10
113+
114+
# Get detailed output
115+
python bibverify.py verify cdl.bib --verbose --workers 10
116+
117+
# Save report to file
118+
python bibverify.py verify cdl.bib --workers 10 > verification_report.txt 2>&1
119+
```
120+
121+
**How it Works:**
122+
1. Queries CrossRef API by DOI (if present) or by title/authors
123+
2. **Conservative Matching:** Requires ALL of:
124+
- Title similarity ≥ 85%
125+
- Author similarity ≥ 70%
126+
- Journal similarity ≥ 60%
127+
- Year difference ≤ 1 year
128+
3. Only reports discrepancies when confident it's the same paper
129+
4. Checks for volume/number mismatches, incorrect pages, and common errors
130+
131+
**Example Output:**
132+
```
133+
============================================================
134+
VERIFICATION SUMMARY
135+
============================================================
136+
✓ Verified: 3,988 (65%)
137+
✗ Errors: 724 (12%)
138+
⚠ Warnings: 1,434 (23%)
139+
140+
Common errors found:
141+
- Volume/issue number mismatches
142+
- Page range errors or off-by-one issues
143+
- DOI placed in pages field instead of doi field
144+
- Year discrepancies (preprint vs published versions)
145+
```
146+
147+
**Performance:** With 10 workers, verifies ~17 entries/second. Full bibliography verification takes approximately 6 minutes.
148+
149+
**Note:** 23% of entries may not be found in CrossRef (arXiv preprints, technical reports, very new/old publications). The tool correctly rejects uncertain matches rather than suggesting false corrections.
150+
71151
# Suggested workflow
72152

73153
After making changes to `cdl.bib` (manually, using
74154
[bibdesk](https://bibdesk.sourceforge.io/), etc.), please follow the suggested
75155
workflow below in order to safely update the shared lab resource:
76156

77-
1. Verify the integrity of the modified cdl.bib file (correct any changes until this passes):
157+
1. **(Optional) Verify accuracy against CrossRef:**
158+
```bash
159+
python bibverify.py verify cdl.bib --workers 10 > verification_report.txt 2>&1
160+
# Review verification_report.txt and fix any genuine errors found
161+
```
162+
163+
2. Verify the formatting/integrity of the modified cdl.bib file (correct any changes until this passes):
78164
```bash
79165
python bibcheck.py verify --verbose
80166
```
81-
2. Generate a change log and commit your changes:
167+
168+
3. Generate a change log and commit your changes:
82169
```bash
83170
python bibcheck.py commit --verbose
84171
```
85-
3. Push your changes to your fork:
172+
173+
4. Push your changes to your fork:
86174
```bash
87175
git push
88176
```
89-
4. Create a pull request for pulling your changes into the ContextLab fork
177+
178+
5. Create a pull request for pulling your changes into the ContextLab fork
179+
180+
**Note:** The bibverify step is optional but recommended for catching metadata errors. It's especially useful when adding new entries or updating existing ones.
90181

91182
## Additional information and usage instructions
92183

0 commit comments

Comments
 (0)