Frequently Asked Questions (FAQ)

This FAQ addresses common questions, issues, and use cases for the Publication Figure Retrieval Tool.

General Questions

What is the Publication Figure Retrieval Tool?

The Publication Figure Retrieval Tool is an open-source Node.js application that automatically downloads figures from scientific publications. It searches the NCBI PMC (PubMed Central) database for articles related to specific species and extracts figure images for research and analysis purposes.

What formats are supported?

Input: Species names (scientific and common names)
Output: Image formats extracted from article packages; supported extensions are defined in the code (IMAGE_EXTENSIONS) and include common formats such as jpg, png, tiff, gif, and svg (see src/constants.ts).
Data: A progress cache file (build/output/cache/id.json) that stores processed PMC IDs as a JSON array, written alongside the extracted image files

Is this tool free to use?

The project is open-source; consult the repository package.json for the declared license. Users must comply with NCBI API usage guidelines and any applicable publication copyright restrictions.

Installation and Setup

Q: What are the system requirements?

Node.js 20 or higher
2GB available disk space (recommended)
Stable internet connection

Q: How do I install the tool?

# Clone the repository
git clone https://github.com/AlexJSully/Publication-Figure-Retrieval.git
cd Publication-Figure-Retrieval

# Install dependencies
npm ci

# Run the tool
npm run start

Q: Do I need an API key?

A: An NCBI API key is optional but recommended for better rate limits:

# Add to .env file
NCBI_API_KEY=your_api_key_here

Get your API key from: https://www.ncbi.nlm.nih.gov/account/settings/

Usage Questions

Q: How do I search for a specific species?

A: Add your species to src/data/species.json:

{
	"Homo_sapiens": {
		"alias": ["Human", "Homo sapiens", "H. sapiens"]
	}
}

Q: Can I download figures for multiple species at once?

A: Yes, the tool processes all species defined in the species.json file:

# This will process all species in the configuration
npm run start

Q: How do I limit the number of articles searched?

A: Currently, the tool searches all available articles. You can modify the search parameters in src/processor/searchArticleBySpecies.ts:

// Modify the retmax parameter
const searchUrl = `${BASE_URL}/esearch.fcgi?db=pmc&term=${encodedTerm}&retmode=json&retmax=100`;

For available esearch parameters, refer to the NCBI E-utilities documentation.

For the example above:

db: The database to search (e.g., pmc)
term: The search term (e.g., species name)
retmode: The return mode (e.g., json)
retmax: The maximum number of results to return

Q: Where are the downloaded figures saved?

A: At runtime the tool writes extracted images to the build/output/ directory (when running the compiled JavaScript). The layout is organized by species and PMC ID; the package extraction and write behaviour are implemented in src/processor/parseFigures.ts and src/processor/downloadArticlePackage.ts. Example:

build/output/
├── cache/
│   └── id.json
├── Homo_sapiens/
│   ├── PMC123456/
│   │   ├── figure1.jpg
│   │   └── figure2.png

Troubleshooting

Q: The tool is running slowly. How can I speed it up?

A: Several optimization strategies:

Increase throttle limits (be careful with NCBI limits):

// In src/index.ts
const throttle = throttledQueue({ maxPerInterval: 5, interval: 1000 }); // 5 requests per second

Use an API key for better rate limits
Reduce species count for testing
Check your internet connection

Q: I'm getting "Request failed" errors. What should I do?

A: Common solutions:

Check internet connection
Verify NCBI service status
Reduce request rate:

const throttle = throttledQueue({ maxPerInterval: 2, interval: 2000 }); // Slower rate

Note: Failed requests are logged and skipped; the tool continues with the next item rather than retrying automatically

Q: No figures are being downloaded. Why?

A: Possible causes and solutions:

Species not found in PMC:
- Check species names in species.json
- Add more aliases for better search coverage
Articles have no figures:
- Some articles may not contain extractable figures
- Check the console output for per-article processing logs
Network issues:
- Check console output for error messages
- Verify internet connection

Configuration Questions

Q: How do I customize the search terms?

A: Modify the species aliases in src/data/species.json:

{
	"Mus_musculus": {
		"alias": ["Mouse", "Mus musculus", "M. musculus", "Laboratory mouse", "House mouse"]
	}
}

Development Questions

Q: How do I contribute to the project?

A: See our Contributing Guide for detailed instructions:

Fork the repository
Create a branch
Make your changes
Add tests
Submit a pull request

Q: How do I run tests during development?

# Run all tests
npm test

# Run tests in watch mode
npm run test:watch

# Run tests with coverage
npm run test:coverage

# Run specific test file
npm test -- searchArticleBySpecies.test.ts

Q: How do I debug the application?

A: Several debugging options:

Console logging:

console.log("Debug info:", variable);

VS Code debugging:
- Set breakpoints in TypeScript files
- Press F5 to start debugging

API and Data Questions

Q: What NCBI databases are used?

A: The tool primarily uses:

PMC (PubMed Central): For searching open-access articles
ESearch: For article search functionality
EFetch: For retrieving article details

Q: What metadata is collected?

A: The current implementation primarily tracks progress in build/output/cache/id.json and writes extracted image files to per-species/per-PMCID directories. It does not currently generate a per-article metadata JSON file.

Q: How are duplicate articles handled?

A: The tool automatically handles duplicates:

Articles are identified by PMC ID
Duplicate PMC IDs are skipped during processing
Existing figure files are not re-downloaded

Performance and Limitations

Q: How many articles can the tool process?

A: The tool can process thousands of articles, but consider:

NCBI rate limits: 3 requests/second without API key, 10 with key
Disk space: Each figure is typically 100KB-2MB

Q: What are the NCBI usage limits?

A: NCBI guidelines:

Without API key: 3 requests per second
With API key: 10 requests per second
Large jobs: Contact NCBI for permission

Need more help? Check our documentation or open an issue on GitHub.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Frequently Asked Questions (FAQ)

General Questions

What is the Publication Figure Retrieval Tool?

What formats are supported?

Is this tool free to use?

Installation and Setup

Q: What are the system requirements?

Q: How do I install the tool?

Q: Do I need an API key?

Usage Questions

Q: How do I search for a specific species?

Q: Can I download figures for multiple species at once?

Q: How do I limit the number of articles searched?

Q: Where are the downloaded figures saved?

Troubleshooting

Q: The tool is running slowly. How can I speed it up?

Q: I'm getting "Request failed" errors. What should I do?

Q: No figures are being downloaded. Why?

Configuration Questions

Q: How do I customize the search terms?

Development Questions

Q: How do I contribute to the project?

Q: How do I run tests during development?

Q: How do I debug the application?

API and Data Questions

Q: What NCBI databases are used?

Q: What metadata is collected?

Q: How are duplicate articles handled?

Performance and Limitations

Q: How many articles can the tool process?

Q: What are the NCBI usage limits?

Uh oh!

FilesExpand file tree

faq.md

Latest commit

History

faq.md

File metadata and controls

Frequently Asked Questions (FAQ)

General Questions

What is the Publication Figure Retrieval Tool?

What formats are supported?

Is this tool free to use?

Installation and Setup

Q: What are the system requirements?

Q: How do I install the tool?

Q: Do I need an API key?

Usage Questions

Q: How do I search for a specific species?

Q: Can I download figures for multiple species at once?

Q: How do I limit the number of articles searched?

Q: Where are the downloaded figures saved?

Troubleshooting

Q: The tool is running slowly. How can I speed it up?

Q: I'm getting "Request failed" errors. What should I do?

Q: No figures are being downloaded. Why?

Configuration Questions

Q: How do I customize the search terms?

Development Questions

Q: How do I contribute to the project?

Q: How do I run tests during development?

Q: How do I debug the application?

API and Data Questions

Q: What NCBI databases are used?

Q: What metadata is collected?

Q: How are duplicate articles handled?

Performance and Limitations

Q: How many articles can the tool process?

Q: What are the NCBI usage limits?