Download PDFs for list.bib entries from Zotero API#19
Conversation
…list.bib Co-authored-by: physikerwelt <2777736+physikerwelt@users.noreply.github.com>
|
@copilot that works nicely. As a follow up: Only download the PDF it the PDF does not exist in bib/docs/preprints and rename it according to the naming convention (from the unique tex key). New files should also be committed and referenced as preprint if the field tex.preprint is not set. All files that will be put to bib/docs/preprints will be available from https://ag-gipp.github.io/bib/preprints/filename.pdf |
….preprint Co-authored-by: physikerwelt <2777736+physikerwelt@users.noreply.github.com>
Done in commit
|
|
@copilot not bad, however the files are not renamed So it downloads for example to bib/docs/preprints/Petrera2021--MORS--zbmath_open_api_solutions_and_research_challenges.pdf instead of bib/docs/preprints/PetreraTBE21a.pdf which would be derived from the bibtex key. Also the API key I used for testing has no write access see logs below 2026-02-19 19:49:13,980 - root - INFO - Downloaded IKC87KPI -> bib/docs/preprints/Petrera2021--MORS--zbmath_open_api_solutions_and_research_challenges.pdf please add a top level entrypoint or a cli parameter e.g. max-writes=1 so I can manually test writing changes to individual entries. Double check that existing tex.preprint entries are not overwritten or modified. |
…d argparse CLI Co-authored-by: physikerwelt <2777736+physikerwelt@users.noreply.github.com>
Fixed in commit
Example usage for testing a single write: |
|
@copilot The filename is still incorrect. for the example mentioned above the JSON snippet looks like {
"biblatex": "\n@inproceedings{Petrera2021,\n\ttitle = {{zbMATH} Open: {API} Solutions and Research Challenges},\n\tvolume = {2976},\n\turl = {http://ceur-ws.org/Vol-2976/paper-1.pdf},\n\tseries = {{CEUR} Workshop Proceedings},\n\tpages = {4--13},\n\tbooktitle = {Proceedings of the Workshop on Digital Infrastructures for Scholarly Content Objects ({DISCO} 2021) co-located with {ACM}/{IEEE} Joint Conference on Digital Libraries 2021 ({JCDL} 2021), Online (due to the global pandemic), September 30, 2021},\n\tpublisher = {{CEUR}-{WS}.org},\n\tauthor = {Petrera, Matteo and Trautwein, Dennis and Beckenbach, Isabel and Ehsani, Dariush and M\u00fcller, Fabian and Teschke, Olaf and Gipp, Bela and Schubotz, Moritz},\n\teditor = {Balke, Wolf-Tilo and de Waard, Anita and Fu, Yuanxi and Hua, Bolin and Schneider, Jodi and Song, Ningyuan and Wang, Xiaoguang},\n\tdate = {2021},\n\tnote = {tex.topic: mathir},\n\tkeywords = {!ms\\_author, no-doi, \u26d4 No {DOI} found},\n}\n",
"data": {
"DOI": "",
"ISBN": "",
"ISSN": "",
"abstractNote": "",
"accessDate": "",
"archive": "",
"archiveLocation": "",
"callNumber": "",
"citationKey": "Petrera2021",
"collections": [],
"conferenceName": "",
"creators": [
{
"creatorType": "author",
"firstName": "Matteo",
"lastName": "Petrera"
},
{
"creatorType": "author",
"firstName": "Dennis",
"lastName": "Trautwein"
},
{
"creatorType": "author",
"firstName": "Isabel",
"lastName": "Beckenbach"
},
{
"creatorType": "author",
"firstName": "Dariush",
"lastName": "Ehsani"
},
{
"creatorType": "author",
"firstName": "Fabian",
"lastName": "M\u00fcller"
},
{
"creatorType": "author",
"firstName": "Olaf",
"lastName": "Teschke"
},
{
"creatorType": "author",
"firstName": "Bela",
"lastName": "Gipp"
},
{
"creatorType": "author",
"firstName": "Moritz",
"lastName": "Schubotz"
},
{
"creatorType": "editor",
"firstName": "Wolf-Tilo",
"lastName": "Balke"
},
{
"creatorType": "editor",
"firstName": "Anita",
"lastName": "de Waard"
},
{
"creatorType": "editor",
"firstName": "Yuanxi",
"lastName": "Fu"
},
{
"creatorType": "editor",
"firstName": "Bolin",
"lastName": "Hua"
},
{
"creatorType": "editor",
"firstName": "Jodi",
"lastName": "Schneider"
},
{
"creatorType": "editor",
"firstName": "Ningyuan",
"lastName": "Song"
},
{
"creatorType": "editor",
"firstName": "Xiaoguang",
"lastName": "Wang"
}
],
"date": "2021",
"dateAdded": "2023-10-03T19:54:33Z",
"dateModified": "2025-05-29T10:30:37Z",
"eventPlace": "",
"extra": "tex.topic: mathir",
"issue": "",
"itemType": "conferencePaper",
"key": "SIEJ7T47",
"language": "",
"libraryCatalog": "",
"numberOfVolumes": "",
"pages": "4\u201313",
"place": "",
"proceedingsTitle": "Proceedings of the Workshop on Digital Infrastructures for Scholarly Content Objects (DISCO 2021) co-located with ACM/IEEE Joint Conference on Digital Libraries 2021 (JCDL 2021), Online (due to the global pandemic), September 30, 2021",
"publisher": "CEUR-WS.org",
"relations": {},
"rights": "",
"series": "CEUR Workshop Proceedings",
"seriesNumber": "",
"shortTitle": "",
"tags": [
{
"tag": "!ms_author"
},
{
"tag": "no-doi"
},
{
"tag": "\u26d4 No DOI found",
"type": 1
}
],
"title": "zbMATH Open: API Solutions and Research Challenges",
"url": "http://ceur-ws.org/Vol-2976/paper-1.pdf",
"version": 39746,
"volume": "2976"
},
"key": "SIEJ7T47",
"library": {
"id": 2480461,
"links": {
"alternate": {
"href": "https://www.zotero.org/groups/2480461",
"type": "text/html"
}
},
"name": "ag-gipp",
"type": "group"
},
"links": {
"alternate": {
"href": "https://www.zotero.org/groups/2480461/items/SIEJ7T47",
"type": "text/html"
},
"attachment": {
"attachmentSize": 1734052,
"attachmentType": "application/pdf",
"href": "https://api.zotero.org/groups/2480461/items/IKC87KPI",
"type": "application/json"
},
"self": {
"href": "https://api.zotero.org/groups/2480461/items/SIEJ7T47",
"type": "application/json"
}
},
"meta": {
"createdByUser": {
"id": 3075416,
"links": {
"alternate": {
"href": "https://www.zotero.org/physikerwelt",
"type": "text/html"
}
},
"name": "Moritz Schubotz",
"username": "physikerwelt"
},
"creatorSummary": "Petrera et al.",
"lastModifiedByUser": {
"id": 8024352,
"links": {
"alternate": {
"href": "https://www.zotero.org/frederickirstein",
"type": "text/html"
}
},
"name": "",
"username": "FredericKirstein"
},
"numChildren": 1,
"parsedDate": "2021"
},
"version": 39746
},
``` So you don't want to parse the dirty bibtex entry but use "citationKey": "Petrera2021", as filename |
Co-authored-by: physikerwelt <2777736+physikerwelt@users.noreply.github.com>
... Fixed in commit |
PDF attachments for publication list entries are not downloaded from Zotero, leaving
bib/docs/preprints/unpopulated.Changes
download-pdfs.py(new)download.py--tagargument (same literal-match logic asextract-bibtex.py)itemType: attachment,contentType: application/pdf) via parent key lookupdata['citationKey'](the structured field populated by Better BibTeX), with fallback toCitation Key:in the extra field, then the original attachment filenameGET https://api.zotero.org/{prefix}/items/{key}/filewith env var validation and network error handlingtex.preprintis not set on the parent Zotero item, PATCHes the item via the API to addtex.preprint: https://ag-gipp.github.io/bib/preprints/<citation_key>.pdftex.preprintvalues are never overwritten--max-writes Nlimits the number of Zotero PATCH requests in a single run (useful for testing; default: unlimited)CI workflow
Added a step after
list.bibextraction that downloads PDFs tobib/docs/preprints, committed and pushed with the rest of the bib repo updates.Makefile
Added
pdfstarget for local use:💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.