Skip to content

Commit 2a1ded3

Browse files
committed
adding colab stuff
1 parent 2e9f96d commit 2a1ded3

8 files changed

Lines changed: 674 additions & 0 deletions

File tree

LAUNCH_COLAB.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# Launch Pygetpapers v2.0 in Google Colab
2+
3+
## Quick Start (3 Steps)
4+
5+
### 1. Open the Notebook
6+
**Option A - Download & Upload (Recommended):**
7+
1. **Download the notebook file:**
8+
- Go to this repository: https://github.com/pygetpapers/pygetpapers
9+
- Click on the file `pygetpapers_colab_demo.ipynb` in the file list
10+
- Click the "Download" button (or right-click → "Save link as...")
11+
2. **Upload to Colab:**
12+
- Go to [Google Colab](https://colab.research.google.com/)
13+
- Click "File" → "Upload notebook"
14+
- Select the downloaded `pygetpapers_colab_demo.ipynb` file
15+
16+
**Option B - Manual Creation:**
17+
1. Go to [Google Colab](https://colab.research.google.com/)
18+
2. Create a new notebook
19+
3. Copy the code from the cells in `pygetpapers_colab_demo.ipynb`
20+
21+
**Option C - GitHub (After Commit):**
22+
```
23+
https://colab.research.google.com/github/pygetpapers/pygetpapers/blob/main/pygetpapers_colab_demo.ipynb
24+
```
25+
26+
### 2. Install Pygetpapers
27+
Run the first code cell to install:
28+
```python
29+
!pip install git+https://github.com/pygetpapers/pygetpapers.git
30+
```
31+
32+
### 3. Run Demos
33+
Execute cells sequentially to see:
34+
- 🌍 Climate change research across repositories
35+
- 📄 PDF downloads from Europe PMC
36+
- 📊 Interactive datatables
37+
- 🔍 No-execute mode (count results)
38+
39+
## What You'll Get
40+
41+
**5 repositories tested** (Europe PMC, Crossref, bioRxiv, medRxiv, OpenAlex)
42+
**Climate change queries** (following project style guide)
43+
**CSV metadata files** for analysis
44+
**HTML datatables** for browsing
45+
**PDF downloads** where available
46+
47+
## Output Location
48+
All files saved to: `/content/pygetpapers_output/`
49+
50+
## Customize
51+
Change queries in any cell:
52+
```python
53+
query = "your topic" # Replace with your research topic
54+
limit = 10 # Number of papers to download
55+
```
56+
57+
## Troubleshooting
58+
59+
**"Notebook not found" error?**
60+
- Use Option A (Download & Upload) instead
61+
- The GitHub link will work after the notebook is committed to the repository
62+
63+
**Can't find the notebook file?**
64+
- Look for `pygetpapers_colab_demo.ipynb` in the main directory of the repository
65+
- If not visible, the file may not be committed yet - use Option B instead
66+
67+
---
68+
69+
**Ready to research? Download the notebook and upload to Colab! 🚀**

README_COLAB.md

Lines changed: 220 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,220 @@
1+
# Pygetpapers v2.0 Google Colab Demo
2+
3+
This document provides instructions for launching and using the pygetpapers v2.0 Google Colab notebook.
4+
5+
## Quick Start
6+
7+
### Option 1: Direct Launch (Recommended)
8+
9+
1. **Click the link below to open the notebook directly in Google Colab:**
10+
```
11+
https://colab.research.google.com/github/pygetpapers/pygetpapers/blob/main/pygetpapers_colab_demo.ipynb
12+
```
13+
14+
2. **If the above link doesn't work, use this alternative:**
15+
- Go to [Google Colab](https://colab.research.google.com/)
16+
- Click "File" → "Open notebook"
17+
- Select "GitHub" tab
18+
- Enter: `pygetpapers/pygetpapers`
19+
- Select `pygetpapers_colab_demo.ipynb`
20+
21+
### Option 2: Manual Upload
22+
23+
1. **Download the notebook:**
24+
- Download `pygetpapers_colab_demo.ipynb` from this repository
25+
26+
2. **Upload to Colab:**
27+
- Go to [Google Colab](https://colab.research.google.com/)
28+
- Click "File" → "Upload notebook"
29+
- Select the downloaded `.ipynb` file
30+
31+
## What the Notebook Demonstrates
32+
33+
### 🌍 Climate Change Research Queries
34+
The notebook uses climate-related queries as per the project style guide:
35+
- **Climate change** (Europe PMC)
36+
- **Global warming** (Crossref)
37+
- **Carbon dioxide** (bioRxiv)
38+
- **Temperature** (medRxiv)
39+
- **Greenhouse gas** (OpenAlex)
40+
- **Atmospheric CO2** (PDF downloads)
41+
42+
### 📚 Repository Coverage
43+
- **Europe PMC**: Biomedical and life sciences papers
44+
- **Crossref**: Cross-disciplinary academic papers
45+
- **bioRxiv**: Biology preprints
46+
- **medRxiv**: Medicine preprints
47+
- **OpenAlex**: Academic papers and citations
48+
49+
### 🛠️ Features Demonstrated
50+
- ✅ Installation and setup
51+
- ✅ Multi-repository searching
52+
- ✅ PDF downloads (where available)
53+
- ✅ CSV metadata export
54+
- ✅ HTML datatables creation
55+
- ✅ No-execute mode (count results without downloading)
56+
- ✅ Batch processing
57+
58+
## Running the Notebook
59+
60+
### Prerequisites
61+
- Google account (free)
62+
- Internet connection
63+
- No local installation required
64+
65+
### Step-by-Step Instructions
66+
67+
1. **Open the notebook** using one of the methods above
68+
69+
2. **Runtime Setup:**
70+
- Click "Runtime" → "Change runtime type"
71+
- Ensure "Python 3" is selected
72+
- GPU/TPU not required (CPU is sufficient)
73+
74+
3. **Run Installation Cell:**
75+
- Execute the first code cell to install pygetpapers
76+
- Wait for installation to complete (may take 1-2 minutes)
77+
78+
4. **Run Demo Cells:**
79+
- Execute cells sequentially to see different features
80+
- Each cell demonstrates a different repository or feature
81+
- Results are saved to `/content/pygetpapers_output/`
82+
83+
5. **View Results:**
84+
- Use Colab's file browser to explore downloaded files
85+
- Open HTML files in new tabs to view datatables
86+
- Download files to your local machine if needed
87+
88+
## Expected Outputs
89+
90+
### File Structure
91+
```
92+
/content/pygetpapers_output/
93+
├── europe_pmc_climate/ # Climate change papers
94+
│ ├── datatables.html # Interactive table
95+
│ ├── metadata.csv # Paper metadata
96+
│ └── index.html # Summary page
97+
├── crossref_global_warming/ # Global warming papers
98+
├── biorxiv_carbon_dioxide/ # Carbon dioxide papers
99+
├── medrxiv_temperature/ # Temperature papers
100+
├── openalex_greenhouse_gas/ # Greenhouse gas papers
101+
└── europe_pmc_pdfs/ # PDF downloads
102+
├── *.pdf # Downloaded PDFs
103+
└── datatables.html # Table with PDF links
104+
```
105+
106+
### Sample Results
107+
- **5-10 papers per repository** (configurable)
108+
- **CSV files** with metadata (title, authors, DOI, etc.)
109+
- **HTML datatables** for easy browsing
110+
- **PDF files** where available (Europe PMC)
111+
- **Summary statistics** for each search
112+
113+
## Customization
114+
115+
### Modify Queries
116+
Change the search terms in any cell:
117+
```python
118+
query = "your search term" # Replace with your topic
119+
limit = 10 # Number of papers to download
120+
```
121+
122+
### Add New Repositories
123+
Use different APIs:
124+
```python
125+
!pygetpapers --query "your query" --api europe_pmc --limit 5
126+
!pygetpapers --query "your query" --api crossref --limit 5
127+
!pygetpapers --query "your query" --api biorxiv --limit 5
128+
```
129+
130+
### Download Formats
131+
Choose what to download:
132+
```python
133+
# Metadata only
134+
!pygetpapers --query "query" --makecsv --makehtml
135+
136+
# Include PDFs (where available)
137+
!pygetpapers --query "query" --pdf --makecsv --datatables
138+
139+
# Full XML content
140+
!pygetpapers --query "query" --xml --makecsv
141+
```
142+
143+
## Troubleshooting
144+
145+
### Common Issues
146+
147+
1. **Installation fails:**
148+
- Check internet connection
149+
- Restart runtime: "Runtime" → "Restart runtime"
150+
- Try running installation cell again
151+
152+
2. **No results found:**
153+
- Try different search terms
154+
- Check if repository is accessible
155+
- Reduce limit to 1-2 papers for testing
156+
157+
3. **PDF downloads fail:**
158+
- PDFs are only available for Europe PMC and some OpenAlex papers
159+
- Other repositories don't provide PDF downloads
160+
161+
4. **Slow performance:**
162+
- Reduce `limit` parameter
163+
- Some repositories may be slow to respond
164+
- Check Colab's runtime status
165+
166+
### Getting Help
167+
168+
- **Repository Issues**: Check the [pygetpapers GitHub repository](https://github.com/pygetpapers/pygetpapers)
169+
- **Colab Issues**: See [Google Colab documentation](https://colab.research.google.com/notebooks/basic_features_overview.ipynb)
170+
- **Network Issues**: Try different repositories or check your internet connection
171+
172+
## Advanced Usage
173+
174+
### Batch Processing
175+
Run multiple searches at once:
176+
```python
177+
queries = ["climate change", "global warming", "carbon dioxide"]
178+
apis = ["europe_pmc", "crossref", "biorxiv"]
179+
180+
for api in apis:
181+
for query in queries:
182+
!pygetpapers --query "{query}" --api {api} --limit 3 --output /content/batch_output/{api}_{query.replace(' ', '_')}
183+
```
184+
185+
### Data Analysis
186+
Use the CSV files for further analysis:
187+
```python
188+
import pandas as pd
189+
190+
# Load metadata
191+
df = pd.read_csv("/content/pygetpapers_output/europe_pmc_climate/metadata.csv")
192+
print(f"Found {len(df)} papers")
193+
print(df.head())
194+
```
195+
196+
### Custom Datatables
197+
Modify the datatables for your needs:
198+
```python
199+
# Create custom datatables with specific columns
200+
!pygetpapers --query "your query" --datatables /content/custom_output --makecsv
201+
```
202+
203+
## Contributing
204+
205+
If you find issues or want to improve the notebook:
206+
207+
1. **Fork the repository**
208+
2. **Make your changes**
209+
3. **Test in Colab**
210+
4. **Submit a pull request**
211+
212+
## License
213+
214+
This notebook is part of the pygetpapers project and follows the same license terms.
215+
216+
---
217+
218+
**Happy researching! 🌍📚**
219+
220+
*For more information, visit: https://github.com/pygetpapers/pygetpapers*

0 commit comments

Comments
 (0)