You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/architecture/dependencies.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ This project makes use of the following open-source dependencies and APIs:
4
4
5
5
## Open Source Dependencies
6
6
7
-
The following open-source packages are used in this project. For a complete and up-to-date list, see the `package.json` file in the project root.
7
+
The following open-source packages are used in this project. For a complete and up-to-date list, see [`package.json`](../../package.json) in the project root.
Copy file name to clipboardExpand all lines: docs/architecture/index.md
+12-13Lines changed: 12 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -157,7 +157,7 @@ graph LR
157
157
158
158
**XML Structure Navigation (PMC ID extraction):**
159
159
160
-
The parser locates the PMC identifier in the article front matter (see implementation: [`src/processor/parseFigures.ts`](../src/processor/parseFigures.ts)).
160
+
The parser locates the PMC identifier in the article front matter (see implementation: [`src/processor/parseFigures.ts`](../../src/processor/parseFigures.ts)).
161
161
162
162
```xml
163
163
<pmc-articleset>
@@ -173,15 +173,15 @@ The parser locates the PMC identifier in the article front matter (see implement
Downloads a complete PMC article package (.tar.gz) and extracts image files. The implementation fetches a package URL from the OA Web Service API, downloads the archive, extracts media, and selects the highest-priority image format per basename before copying results to the output directory (see implementation: [`src/processor/downloadArticlePackage.ts`](../src/processor/downloadArticlePackage.ts)).
176
+
Downloads a complete PMC article package (.tar.gz) and extracts image files. The implementation fetches a package URL from the OA Web Service API, downloads the archive, extracts media, and selects the highest-priority image format per basename before copying results to the output directory (see implementation: [`src/processor/downloadArticlePackage.ts`](../../src/processor/downloadArticlePackage.ts)).
- Fetches OA package metadata via the OA API and converts FTP links to HTTPS (see [`src/processor/fetchPackageUrl.ts`](../src/processor/fetchPackageUrl.ts)).
181
-
- Downloads the package archive and extracts it to a temporary directory (see [`src/processor/downloadArticlePackage.ts`](../src/processor/downloadArticlePackage.ts)).
182
-
- Groups files by basename and keeps the highest-priority extension using the `IMAGE_EXTENSIONS` priority map (see [`src/constants.ts`](../src/constants.ts)).
180
+
- Fetches OA package metadata via the OA API and converts FTP links to HTTPS (see [`src/processor/fetchPackageUrl.ts`](../../src/processor/fetchPackageUrl.ts)).
181
+
- Downloads the package archive and extracts it to a temporary directory (see [`src/processor/downloadArticlePackage.ts`](../../src/processor/downloadArticlePackage.ts)).
182
+
- Groups files by basename and keeps the highest-priority extension using the `IMAGE_EXTENSIONS` priority map (see [`src/constants.ts`](../../src/constants.ts)).
183
183
184
-
Console-level messages written by the implementation include `Fetching package URL for <PMCID>`, `Package downloaded. Extracting images...`, `Extracted image: <filename>`, and `Successfully extracted <N> images from package.` (see [`src/processor/downloadArticlePackage.ts`](../src/processor/downloadArticlePackage.ts)).
184
+
Console-level messages written by the implementation include `Fetching package URL for <PMCID>`, `Package downloaded. Extracting images...`, `Extracted image: <filename>`, and `Successfully extracted <N> images from package.` (see [`src/processor/downloadArticlePackage.ts`](../../src/processor/downloadArticlePackage.ts)).
185
185
186
186
## Data Flow Architecture
187
187
@@ -282,14 +282,14 @@ graph TB
282
282
H --> I[Save Cache to Disk]
283
283
```
284
284
285
-
### 4. Error Recovery and Resilience
285
+
### 4. Error Handling and Continuation
286
286
287
-
The system implements multiple levels of error recovery:
287
+
The system logs operation-level failures and continues processing subsequent species/articles:
288
288
289
-
1.**Network Level**: Automatic retries with exponential backoff
290
-
2.**API Level**: Rate limit compliance and quota management
291
-
3.**Data Level**: Graceful handling of malformed XML or missing figures
292
-
4.**File Level**: Directory creation and permission handling
289
+
1.**Search failures**: `searchArticlesBySpecies` returns an empty list on request failures
290
+
2.**Batch fetch failures**: `fetchArticleDetails` logs batch-level errors and continues with remaining batches
291
+
3.**Package failures**: `parseFigures` logs package-level failures and continues with remaining articles
292
+
4.**Filesystem setup**: output/cache directories are created on demand before writes
293
293
294
294
## Performance Considerations
295
295
@@ -321,7 +321,6 @@ graph TD
321
321
322
322
-[Dependencies](./dependencies.md) - External libraries and tools used
**A:** The current implementation primarily tracks progress in `build/output/cache/id.json` and writes extracted image files to per-species/per-PMCID directories. It does not currently generate a per-article metadata JSON file.
Need more help? Check our [documentation](../index.md) or [open an issue](https://github.com/AlexJSully/Publication-Figure-Retrieval/issues) on GitHub.
259
+
Need more help? Check our [documentation](./index.md) or [open an issue](https://github.com/AlexJSully/Publication-Figure-Retrieval/issues) on GitHub.
0 commit comments