Skip to content

Commit f0c1c1e

Browse files
authored
Upgrade TypeScript to v6 (#236)
Upgrade TypeScript to v6 as well as update other packages at the same time.
1 parent 0c296c7 commit f0c1c1e

15 files changed

Lines changed: 755 additions & 1166 deletions

README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -107,17 +107,17 @@ We aim to make this tool as perfect as possible but unfortunately, there may be
107107

108108
## Documentation
109109

110-
For comprehensive documentation, see the [`docs/`](docs/) folder:
110+
For comprehensive documentation, see the [docs](docs/index.md):
111111

112112
- [**Getting Started**](docs/index.md) - Complete overview and setup guide
113-
- [**Architecture**](docs/architecture/) - Technical architecture and design decisions
114-
- [**Usage Examples**](docs/usage/examples/) - Detailed usage examples and troubleshooting
115-
- [**API Reference**](docs/usage/api/) - Complete API documentation
116-
- [**Contributing**](docs/contributing/) - Development setup and contribution guidelines
113+
- [**Architecture**](docs/architecture/index.md) - Technical architecture and processing flow
114+
- [**Usage Guide**](docs/usage/index.md) - Setup, run, and troubleshooting workflows
115+
- [**API Reference**](docs/usage/api/index.md) - Module-level function documentation
116+
- [**Contributing**](CONTRIBUTING.md) - Development setup and contribution guidelines
117117

118118
## License
119119

120-
[GLP-2.0](LICENSE.md)
120+
[GNU GPL v2.0](LICENSE.md)
121121

122122
## Maintenance Mode
123123

docs/architecture/dependencies.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ This project makes use of the following open-source dependencies and APIs:
44

55
## Open Source Dependencies
66

7-
The following open-source packages are used in this project. For a complete and up-to-date list, see the `package.json` file in the project root.
7+
The following open-source packages are used in this project. For a complete and up-to-date list, see [`package.json`](../../package.json) in the project root.
88

99
- axios
1010
- throttled-queue

docs/architecture/index.md

Lines changed: 12 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,7 @@ graph LR
157157

158158
**XML Structure Navigation (PMC ID extraction):**
159159

160-
The parser locates the PMC identifier in the article front matter (see implementation: [`src/processor/parseFigures.ts`](../src/processor/parseFigures.ts)).
160+
The parser locates the PMC identifier in the article front matter (see implementation: [`src/processor/parseFigures.ts`](../../src/processor/parseFigures.ts)).
161161

162162
```xml
163163
<pmc-articleset>
@@ -173,15 +173,15 @@ The parser locates the PMC identifier in the article front matter (see implement
173173

174174
### 5. Download Module (`src/processor/downloadArticlePackage.ts`)
175175

176-
Downloads a complete PMC article package (.tar.gz) and extracts image files. The implementation fetches a package URL from the OA Web Service API, downloads the archive, extracts media, and selects the highest-priority image format per basename before copying results to the output directory (see implementation: [`src/processor/downloadArticlePackage.ts`](../src/processor/downloadArticlePackage.ts)).
176+
Downloads a complete PMC article package (.tar.gz) and extracts image files. The implementation fetches a package URL from the OA Web Service API, downloads the archive, extracts media, and selects the highest-priority image format per basename before copying results to the output directory (see implementation: [`src/processor/downloadArticlePackage.ts`](../../src/processor/downloadArticlePackage.ts)).
177177

178178
Key implementation behaviors (implementation proof):
179179

180-
- Fetches OA package metadata via the OA API and converts FTP links to HTTPS (see [`src/processor/fetchPackageUrl.ts`](../src/processor/fetchPackageUrl.ts)).
181-
- Downloads the package archive and extracts it to a temporary directory (see [`src/processor/downloadArticlePackage.ts`](../src/processor/downloadArticlePackage.ts)).
182-
- Groups files by basename and keeps the highest-priority extension using the `IMAGE_EXTENSIONS` priority map (see [`src/constants.ts`](../src/constants.ts)).
180+
- Fetches OA package metadata via the OA API and converts FTP links to HTTPS (see [`src/processor/fetchPackageUrl.ts`](../../src/processor/fetchPackageUrl.ts)).
181+
- Downloads the package archive and extracts it to a temporary directory (see [`src/processor/downloadArticlePackage.ts`](../../src/processor/downloadArticlePackage.ts)).
182+
- Groups files by basename and keeps the highest-priority extension using the `IMAGE_EXTENSIONS` priority map (see [`src/constants.ts`](../../src/constants.ts)).
183183

184-
Console-level messages written by the implementation include `Fetching package URL for <PMCID>`, `Package downloaded. Extracting images...`, `Extracted image: <filename>`, and `Successfully extracted <N> images from package.` (see [`src/processor/downloadArticlePackage.ts`](../src/processor/downloadArticlePackage.ts)).
184+
Console-level messages written by the implementation include `Fetching package URL for <PMCID>`, `Package downloaded. Extracting images...`, `Extracted image: <filename>`, and `Successfully extracted <N> images from package.` (see [`src/processor/downloadArticlePackage.ts`](../../src/processor/downloadArticlePackage.ts)).
185185

186186
## Data Flow Architecture
187187

@@ -282,14 +282,14 @@ graph TB
282282
H --> I[Save Cache to Disk]
283283
```
284284

285-
### 4. Error Recovery and Resilience
285+
### 4. Error Handling and Continuation
286286

287-
The system implements multiple levels of error recovery:
287+
The system logs operation-level failures and continues processing subsequent species/articles:
288288

289-
1. **Network Level**: Automatic retries with exponential backoff
290-
2. **API Level**: Rate limit compliance and quota management
291-
3. **Data Level**: Graceful handling of malformed XML or missing figures
292-
4. **File Level**: Directory creation and permission handling
289+
1. **Search failures**: `searchArticlesBySpecies` returns an empty list on request failures
290+
2. **Batch fetch failures**: `fetchArticleDetails` logs batch-level errors and continues with remaining batches
291+
3. **Package failures**: `parseFigures` logs package-level failures and continues with remaining articles
292+
4. **Filesystem setup**: output/cache directories are created on demand before writes
293293

294294
## Performance Considerations
295295

@@ -321,7 +321,6 @@ graph TD
321321

322322
- [Dependencies](./dependencies.md) - External libraries and tools used
323323
- [Pipelines](./pipelines.md) - Detailed workflow diagrams
324-
- [Design Decisions](./design-decisions.md) - Architectural choices and trade-offs
325324

326325
## Real-World Scenarios
327326

docs/architecture/pipelines.md

Lines changed: 14 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,8 @@ graph TD
2424
J --> L{More Species?}
2525
K --> M[Fetch Article Details]
2626
M --> N[Parse XML Content]
27-
N --> O[Extract Figure URLs]
28-
O --> P[Download Figures]
27+
N --> O[Download Article Package]
28+
O --> P[Extract Images from Package]
2929
P --> Q[Update Progress Cache]
3030
Q --> L
3131
@@ -85,7 +85,7 @@ graph TD
8585
F --> N
8686
```
8787

88-
### Step 3: XML Parsing and Figure Extraction
88+
### Step 3: XML Parsing and Package Extraction
8989

9090
```mermaid
9191
graph LR
@@ -101,16 +101,16 @@ graph LR
101101
F --> G[Extract Figure Elements]
102102
end
103103
104-
subgraph "Figure Processing"
105-
G --> H[Process Figure Graphics]
106-
H --> I[Construct Figure URLs]
107-
I --> J[Validate URL Format]
108-
J --> K[Add .jpg if No Extension]
104+
subgraph "Article Package Processing"
105+
G --> H[Resolve OA Package URL]
106+
H --> I[Download .tar.gz Package]
107+
I --> J[Extract Package Contents]
108+
J --> K[Select Highest-Priority Image Per Basename]
109109
end
110110
111111
subgraph "Download Orchestration"
112112
K --> L[Create Output Directory]
113-
L --> M[Queue Figure Download]
113+
L --> M[Copy Selected Images]
114114
M --> N[Update Progress]
115115
end
116116
```
@@ -238,11 +238,7 @@ sequenceDiagram
238238
### Cache Structure
239239

240240
```json
241-
{
242-
"cached_ids": ["PMC123456", "PMC789012", "PMC345678"],
243-
"last_updated": "2024-01-15T10:30:00Z",
244-
"species_processed": ["Arabidopsis_thaliana", "Cannabis_sativa"]
245-
}
241+
["PMC123456", "PMC789012", "PMC345678"]
246242
```
247243

248244
## Performance Optimization Pipeline
@@ -332,10 +328,10 @@ graph TD
332328
[INFO] Found 1,234 articles for Arabidopsis_thaliana
333329
[INFO] Fetching Arabidopsis thaliana article details for batch 1-50...
334330
[INFO] Processing article PMC ID: PMC123456
335-
[INFO] Found 3 figures in the article.
336-
[INFO] Downloaded image: figure1.jpg
337-
[INFO] Downloaded image: figure2.png
338-
[INFO] Downloaded image: supplementary1.tiff
331+
[INFO] Fetching package URL for PMC123456...
332+
[INFO] Package downloaded. Extracting images...
333+
[INFO] Extracted image: figure1.jpg (priority: jpg)
334+
[INFO] Successfully extracted 1 images from package.
339335
[INFO] All IDs in Arabidopsis thaliana batch 51-100 are already cached.
340336
[INFO] Processing complete for Arabidopsis_thaliana
341337
```

docs/faq.md

Lines changed: 5 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,6 @@ The project is open-source; consult the repository [`package.json`](../package.j
2525
**A:**
2626

2727
- Node.js 20 or higher
28-
- npm 9 or higher
2928
- 2GB available disk space (recommended)
3029
- Stable internet connection
3130

@@ -42,7 +41,7 @@ cd Publication-Figure-Retrieval
4241
npm ci
4342

4443
# Run the tool
45-
npm start
44+
npm run start
4645
```
4746

4847
### Q: Do I need an API key?
@@ -76,7 +75,7 @@ Get your API key from: <https://www.ncbi.nlm.nih.gov/account/settings/>
7675

7776
```bash
7877
# This will process all species in the configuration
79-
npm start
78+
npm run start
8079
```
8180

8281
### Q: How do I limit the number of articles searched?
@@ -178,7 +177,7 @@ const throttle = throttledQueue(2, 2000); // Slower rate
178177

179178
### Q: How do I contribute to the project?
180179

181-
**A:** See our [Contributing Guide](../contributing/index.md) for detailed instructions:
180+
**A:** See our [Contributing Guide](../CONTRIBUTING.md) for detailed instructions:
182181

183182
1. Fork the repository
184183
2. Create a branch
@@ -230,26 +229,7 @@ console.log("Debug info:", variable);
230229

231230
### Q: What metadata is collected?
232231

233-
**A:** For each article:
234-
235-
```json
236-
{
237-
"pmcId": "PMC1234567",
238-
"title": "Article Title",
239-
"authors": ["Author 1", "Author 2"],
240-
"journal": "Journal Name",
241-
"publicationDate": "2023-01-15",
242-
"doi": "10.1000/example",
243-
"figureCount": 3,
244-
"figures": [
245-
{
246-
"caption": "Figure caption",
247-
"url": "https://...",
248-
"filename": "figure1.jpg"
249-
}
250-
]
251-
}
252-
```
232+
**A:** The current implementation primarily tracks progress in `build/output/cache/id.json` and writes extracted image files to per-species/per-PMCID directories. It does not currently generate a per-article metadata JSON file.
253233

254234
### Q: How are duplicate articles handled?
255235

@@ -276,4 +256,4 @@ console.log("Debug info:", variable);
276256
- **With API key**: 10 requests per second
277257
- **Large jobs**: Contact NCBI for permission
278258

279-
Need more help? Check our [documentation](../index.md) or [open an issue](https://github.com/AlexJSully/Publication-Figure-Retrieval/issues) on GitHub.
259+
Need more help? Check our [documentation](./index.md) or [open an issue](https://github.com/AlexJSully/Publication-Figure-Retrieval/issues) on GitHub.

docs/index.md

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -223,18 +223,12 @@ Each species entry includes aliases for better search coverage:
223223
}
224224
```
225225

226-
## Screenshots
227-
228-
{INSERT SCREENSHOT HERE - Terminal output showing progress}
229-
{INSERT SCREENSHOT HERE - File explorer showing organized output structure}
230-
{INSERT SCREENSHOT HERE - Example downloaded scientific figures}
231-
232226
## Next Steps
233227

234228
- [Architecture Overview](./architecture/index.md) - Understand the system design
235229
- [Usage Guide](./usage/index.md) - Detailed usage instructions and examples
236230
- [API Documentation](./usage/api/index.md) - Module and function references
237-
- [Contributing](./contributing/index.md) - How to contribute to the project
231+
- [Contributing](../CONTRIBUTING.md) - How to contribute to the project
238232
- [FAQ](./faq.md) - Common questions and troubleshooting
239233

240234
## Support

0 commit comments

Comments
 (0)