You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**Contributing:** Pull requests for bug fixes and security patches are still welcome. Please review the [Contributing Guidelines](CONTRIBUTING.md) before submitting.
Processes XML article data to extract figure information:
146
+
Processes XML article data to extract PMC IDs and orchestrate package downloads:
147
147
148
148
```mermaid
149
149
graph LR
150
-
A[XML Article Data] --> B[Parse XML Structure]
151
-
B --> C[Extract Article Metadata]
152
-
C --> D[Find Figure Elements]
153
-
D --> E[Extract Figure URLs]
154
-
E --> F[Process Each Figure]
155
-
F --> G[Download Figure]
156
-
G --> H[Save to File System]
150
+
A[XML Article Data] --> B[Parse XML Structure]
151
+
B --> C[Extract Article Metadata]
152
+
C --> D[Locate PMC ID]
153
+
D --> E[Request Article Package]
154
+
E --> F[Extract Images from Package]
155
+
F --> G[Save Images to File System]
157
156
```
158
157
159
-
**XML Structure Navigation:**
158
+
**XML Structure Navigation (PMC ID extraction):**
159
+
160
+
The parser locates the PMC identifier in the article front matter (see implementation: [`src/processor/parseFigures.ts`](../src/processor/parseFigures.ts)).
Handles actual file downloads with proper error handling:
176
+
Downloads a complete PMC article package (.tar.gz) and extracts image files. The implementation fetches a package URL from the OA Web Service API, downloads the archive, extracts media, and selects the highest-priority image format per basename before copying results to the output directory (see implementation: [`src/processor/downloadArticlePackage.ts`](../src/processor/downloadArticlePackage.ts)).
- Fetches OA package metadata via the OA API and converts FTP links to HTTPS (see [`src/processor/fetchPackageUrl.ts`](../src/processor/fetchPackageUrl.ts)).
181
+
- Downloads the package archive and extracts it to a temporary directory (see [`src/processor/downloadArticlePackage.ts`](../src/processor/downloadArticlePackage.ts)).
182
+
- Groups files by basename and keeps the highest-priority extension using the `IMAGE_EXTENSIONS` priority map (see [`src/constants.ts`](../src/constants.ts)).
183
+
184
+
Console-level messages written by the implementation include `Fetching package URL for <PMCID>`, `Package downloaded. Extracting images...`, `Extracted image: <filename>`, and `Successfully extracted <N> images from package.` (see [`src/processor/downloadArticlePackage.ts`](../src/processor/downloadArticlePackage.ts)).
Copy file name to clipboardExpand all lines: docs/faq.md
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,12 +11,12 @@ The Publication Figure Retrieval Tool is an open-source Node.js application that
11
11
### What formats are supported?
12
12
13
13
-**Input**: Species names (scientific and common names)
14
-
-**Output**: JPEG images from publication figures
14
+
-**Output**: Image formats extracted from article packages; supported extensions are defined in the code (`IMAGE_EXTENSIONS`) and include common formats such as `jpg`, `png`, `tiff`, `gif`, and `svg` (see [`src/constants.ts`](../src/constants.ts)).
15
15
-**Data**: JSON metadata files with article and figure information
16
16
17
17
### Is this tool free to use?
18
18
19
-
Yes, this is an open-source tool released under the MIT License. However, please respect the NCBI API usage guidelines and publication copyright restrictions.
19
+
The project is open-source; consult the repository [`package.json`](../package.json) for the declared license. Users must comply with NCBI API usage guidelines and any applicable publication copyright restrictions.
20
20
21
21
## Installation and Setup
22
22
@@ -99,16 +99,16 @@ For the example above:
99
99
100
100
### Q: Where are the downloaded figures saved?
101
101
102
-
**A:**Figures are saved in the `src/output/` directory, organized by species:
102
+
**A:**At runtime the tool writes extracted images to the `build/output/` directory (when running the compiled JavaScript). The layout is organized by species and PMC ID; the package extraction and write behavior are implemented in [`src/processor/parseFigures.ts`](../src/processor/parseFigures.ts) and [`src/processor/downloadArticlePackage.ts`](../src/processor/downloadArticlePackage.ts). Example:
Copy file name to clipboardExpand all lines: docs/index.md
+9-10Lines changed: 9 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -38,8 +38,8 @@ graph TD
38
38
D --> E[Get Article PMC IDs]
39
39
E --> F[Fetch Article Details]
40
40
F --> G[Parse XML Response]
41
-
G --> H[Extract Figure URLs]
42
-
H --> I[Download Figures]
41
+
G --> H[Download Article Package]
42
+
H --> I[Extract Images from Package]
43
43
I --> J[Save to Species/PMCID Directory]
44
44
J --> K{More Species?}
45
45
K -->|Yes| C
@@ -102,10 +102,10 @@ npm run start
102
102
103
103
The tool will:
104
104
105
-
1. Read species from `src/data/species.json`
106
-
2. Search PMC for each species
107
-
3.Download figures to `build/output/[species]/[pmcid]/`
108
-
4. Cache progress in `build/output/cache/id.json`
105
+
1. Read species from [`src/data/species.json`](../src/data/species.json)
106
+
2. Search PMC for each species (see [`src/processor/searchArticleBySpecies.ts`](../src/processor/searchArticleBySpecies.ts))
107
+
3.For each article: fetch article XML, identify the PMC ID, download the article package (.tar.gz) and extract images into `build/output/[species]/[pmcid]/` (see [`src/processor/parseFigures.ts`](../src/processor/parseFigures.ts) and [`src/processor/downloadArticlePackage.ts`](../src/processor/downloadArticlePackage.ts))
108
+
4. Cache progress in `build/output/cache/id.json` to enable resume
@@ -40,8 +40,8 @@ export async function searchArticlesBySpecies(throttle: any, species: string): P
40
40
### Return Value
41
41
42
42
-**Type**: `Promise<string[]>`
43
-
-**Description**: Array of PMC IDs (without "PMC" prefix)
44
-
-**Example**: `["123456", "789012", "345678"]`
43
+
-**Description**: The function returns the ID list provided by the NCBI response at `response.data.esearchresult.idlist`. The implementation returns the value directly from the API response (see [`src/processor/searchArticleBySpecies.ts`](../src/processor/searchArticleBySpecies.ts)).
44
+
-**Example**: `["PMC123456", "PMC789012"]` (exact contents depend on the API response)
4.**Download article packages and extract images**into`build/output/[species]/[pmcid]/` (see [`src/processor/parseFigures.ts`](../src/processor/parseFigures.ts) and [`src/processor/downloadArticlePackage.ts`](../src/processor/downloadArticlePackage.ts))
61
61
5.**Cache progress** for resume capability
62
62
63
63
### Example Output
@@ -67,10 +67,13 @@ Searching articles for the species: Arabidopsis_thaliana...
67
67
Found 1,247 articles for Arabidopsis_thaliana
68
68
Fetching Arabidopsis thaliana article details for batch 1-50...
69
69
Processing article PMC ID: PMC123456
70
-
Found 3 figures in the article.
71
-
Downloaded image: figure1.jpg
72
-
Downloaded image: figure2.png
73
-
Downloaded image: supplementary1.tiff
70
+
Fetching package URL for PMC123456...
71
+
Downloading package from https://.../PMC123456.tar.gz
72
+
Package downloaded. Extracting images...
73
+
Extracted image: figure1.jpg (priority: jpg)
74
+
Extracted image: figure2.png (priority: png)
75
+
Successfully extracted 2 images from package.
76
+
Successfully processed article package for PMC123456
0 commit comments