This FAQ addresses common questions, issues, and use cases for the Publication Figure Retrieval Tool.
The Publication Figure Retrieval Tool is an open-source Node.js application that automatically downloads figures from scientific publications. It searches the NCBI PMC (PubMed Central) database for articles related to specific species and extracts figure images for research and analysis purposes.
- Input: Species names (scientific and common names)
- Output: Image formats extracted from article packages; supported extensions are defined in the code (
IMAGE_EXTENSIONS) and include common formats such asjpg,png,tiff,gif, andsvg(seesrc/constants.ts). - Data: A progress cache file (
build/output/cache/id.json) that stores processed PMC IDs as a JSON array, written alongside the extracted image files
The project is open-source; consult the repository package.json for the declared license. Users must comply with NCBI API usage guidelines and any applicable publication copyright restrictions.
A:
- Node.js 20 or higher
- 2GB available disk space (recommended)
- Stable internet connection
A:
# Clone the repository
git clone https://github.com/AlexJSully/Publication-Figure-Retrieval.git
cd Publication-Figure-Retrieval
# Install dependencies
npm ci
# Run the tool
npm run startA: An NCBI API key is optional but recommended for better rate limits:
# Add to .env file
NCBI_API_KEY=your_api_key_hereGet your API key from: https://www.ncbi.nlm.nih.gov/account/settings/
A: Add your species to src/data/species.json:
{
"Homo_sapiens": {
"alias": ["Human", "Homo sapiens", "H. sapiens"]
}
}A: Yes, the tool processes all species defined in the species.json file:
# This will process all species in the configuration
npm run startA: Currently, the tool searches all available articles. You can modify the search parameters in src/processor/searchArticleBySpecies.ts:
// Modify the retmax parameter
const searchUrl = `${BASE_URL}/esearch.fcgi?db=pmc&term=${encodedTerm}&retmode=json&retmax=100`;For available esearch parameters, refer to the NCBI E-utilities documentation.
For the example above:
db: The database to search (e.g.,pmc)term: The search term (e.g., species name)retmode: The return mode (e.g.,json)retmax: The maximum number of results to return
A: At runtime the tool writes extracted images to the build/output/ directory (when running the compiled JavaScript). The layout is organized by species and PMC ID; the package extraction and write behaviour are implemented in src/processor/parseFigures.ts and src/processor/downloadArticlePackage.ts. Example:
build/output/
├── cache/
│ └── id.json
├── Homo_sapiens/
│ ├── PMC123456/
│ │ ├── figure1.jpg
│ │ └── figure2.png
A: Several optimization strategies:
- Increase throttle limits (be careful with NCBI limits):
// In src/index.ts
const throttle = throttledQueue({ maxPerInterval: 5, interval: 1000 }); // 5 requests per second-
Use an API key for better rate limits
-
Reduce species count for testing
-
Check your internet connection
A: Common solutions:
- Check internet connection
- Verify NCBI service status
- Reduce request rate:
const throttle = throttledQueue({ maxPerInterval: 2, interval: 2000 }); // Slower rate- Note: Failed requests are logged and skipped; the tool continues with the next item rather than retrying automatically
A: Possible causes and solutions:
-
Species not found in PMC:
- Check species names in
species.json - Add more aliases for better search coverage
- Check species names in
-
Articles have no figures:
- Some articles may not contain extractable figures
- Check the console output for per-article processing logs
-
Network issues:
- Check console output for error messages
- Verify internet connection
A: Modify the species aliases in src/data/species.json:
{
"Mus_musculus": {
"alias": ["Mouse", "Mus musculus", "M. musculus", "Laboratory mouse", "House mouse"]
}
}A: See our Contributing Guide for detailed instructions:
- Fork the repository
- Create a branch
- Make your changes
- Add tests
- Submit a pull request
A:
# Run all tests
npm test
# Run tests in watch mode
npm run test:watch
# Run tests with coverage
npm run test:coverage
# Run specific test file
npm test -- searchArticleBySpecies.test.tsA: Several debugging options:
- Console logging:
console.log("Debug info:", variable);- VS Code debugging:
- Set breakpoints in TypeScript files
- Press F5 to start debugging
A: The tool primarily uses:
- PMC (PubMed Central): For searching open-access articles
- ESearch: For article search functionality
- EFetch: For retrieving article details
A: The current implementation primarily tracks progress in build/output/cache/id.json and writes extracted image files to per-species/per-PMCID directories. It does not currently generate a per-article metadata JSON file.
A: The tool automatically handles duplicates:
- Articles are identified by PMC ID
- Duplicate PMC IDs are skipped during processing
- Existing figure files are not re-downloaded
A: The tool can process thousands of articles, but consider:
- NCBI rate limits: 3 requests/second without API key, 10 with key
- Disk space: Each figure is typically 100KB-2MB
A: NCBI guidelines:
- Without API key: 3 requests per second
- With API key: 10 requests per second
- Large jobs: Contact NCBI for permission
Need more help? Check our documentation or open an issue on GitHub.