arXiv Article Fetcher Bot

A GitHub CI/CD pipeline that automatically fetches the latest scientific papers from arXiv.org, stores their PDFs, and saves metadata in CSV files — all version‑controlled inside a Git repository.

🎯 Why this exists

Researchers often need offline access to the newest literature, especially in regions with unreliable internet. This project was born to help Iranian scholars cope with recurrent internet blackouts by caching articles during available connectivity windows. Once downloaded, the papers live in the repository, accessible even when the network is down. As I'm writing this README, the goverment blocked internet for 69 days.

🤖 This entire project was vibe‑coded by DeepSeek, the AI assistant, with no human hand in its creation.

✨ Features

Fetches the top 15 newest papers for each configured arXiv category
Downloads full PDFs and saves them in a clean directory structure
Creates a CSV metadata file (latest_articles.csv) with title, first author, DOI, and category
Runs automatically every 1st and 16th of the month (approximately every 15 days) via GitHub Actions
Manual trigger supported for on‑demand updates
No API keys or authentication needed — arXiv’s public API is free
Built‑in repository size management with an optional history‑squash workflow

🚀 Quick start

Fork this repository to your own GitHub account
Go to Actions → enable workflows
Edit fetch_articles.py and adjust the CATEGORIES set to your interests:

   CATEGORIES = {"cs.AI", "cs.CL", "quant-ph"}

(Optional) Change the schedule in .github/workflows/articles-bot.yml if you need a different frequency
The bot will run automatically on the 1st and 16th, or you can manually trigger it from the Actions tab

📁 Repository structure

.
├── .github/workflows/
│   ├── articles-bot.yml      # Main scheduled workflow
│   └── squash-history.yml    # Manual history clean‑up
├── articles/
│   └── <category>/           # e.g., cs/AI/
│       ├── <arxiv_id>.pdf    # Downloaded papers
│       └── latest_articles.csv
├── fetch_articles.py         # The core bot script
└── README.md

⚙️ Customization

Simply edit the CATEGORIES dictionary in fetch_articles.py. Each entry must be a valid arXiv category (e.g., "cs.LG", "math.NA", "physics.optics"). A complete taxonomy list is available at arXiv.org.

You can also change the number of articles fetched by modifying max_results=15 inside the script.

📦 Repository size management

Because every run adds new PDFs, the repo can grow quickly. A manual squash workflow is included:

Go to Actions → Squash Article History → Run workflow
Type YES to confirm

This rewrites the main branch, keeping only the latest version of all articles while discarding old commits. ⚠️ It rewrites Git history — run it only when the repository becomes too large (recommended every few months).

🙏 Credits

All code in this repository was generated by DeepSeek, a large language model, through an iterative vibe‑coding process.
Thank you to arXiv for providing an amazing open‑access resource and API.

📜 License

GPL3 — do whatever you want, just keep the papers free.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
articles		articles
LICENSE		LICENSE
README.md		README.md
arxiv_categories.md		arxiv_categories.md
fetch_articles.py		fetch_articles.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

arXiv Article Fetcher Bot

🎯 Why this exists

✨ Features

🚀 Quick start

📁 Repository structure

⚙️ Customization

📦 Repository size management

🙏 Credits

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

arXiv Article Fetcher Bot

🎯 Why this exists

✨ Features

🚀 Quick start

📁 Repository structure

⚙️ Customization

📦 Repository size management

🙏 Credits

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages