Skip to content

Commit 9d4b959

Browse files
committed
graphical docs
1 parent aa6c0c1 commit 9d4b959

18 files changed

Lines changed: 2144 additions & 1 deletion

docs/graphviz_diagrams/README.md

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# Graphviz Diagrams
2+
3+
**Date:** February 6, 2026 (system date)
4+
5+
## Overview
6+
7+
This directory contains Graphviz DOT format (`.gv`) files converted from Mermaid diagrams. These files can be used to generate high-quality SVG, PNG, PDF, or other image formats using Graphviz.
8+
9+
## Generated Files
10+
11+
1. **architecture_diagram_1.gv** - Current Architecture (v1.x)
12+
2. **architecture_diagram_2.gv** - Proposed Architecture (v2.0)
13+
3. **declarative-operations-framework_diagram_1.gv** - Declarative Operations Framework
14+
15+
## Converting to Images
16+
17+
### Prerequisites
18+
19+
Install Graphviz:
20+
```bash
21+
# macOS
22+
brew install graphviz
23+
24+
# Ubuntu/Debian
25+
sudo apt-get install graphviz
26+
27+
# Or download from: https://graphviz.org/download/
28+
```
29+
30+
### Generate SVG Images
31+
32+
```bash
33+
cd /Users/pm286/workspace/pygetpapers
34+
35+
# Convert single file
36+
dot -Tsvg docs/graphviz_diagrams/architecture_diagram_1.gv -o docs/images/architecture-v1.svg
37+
38+
# Convert all files
39+
for gv_file in docs/graphviz_diagrams/*.gv; do
40+
dot -Tsvg "$gv_file" -o "docs/images/$(basename "$gv_file" .gv).svg"
41+
done
42+
```
43+
44+
### Generate PNG Images
45+
46+
```bash
47+
# Convert single file
48+
dot -Tpng docs/graphviz_diagrams/architecture_diagram_1.gv -o docs/images/architecture-v1.png
49+
50+
# Convert all files
51+
for gv_file in docs/graphviz_diagrams/*.gv; do
52+
dot -Tpng "$gv_file" -o "docs/images/$(basename "$gv_file" .gv).png"
53+
done
54+
```
55+
56+
### Generate PDF Images
57+
58+
```bash
59+
dot -Tpdf docs/graphviz_diagrams/architecture_diagram_1.gv -o docs/images/architecture-v1.pdf
60+
```
61+
62+
## Output Formats
63+
64+
Graphviz supports many output formats:
65+
- **SVG** (`-Tsvg`) - Scalable vector graphics (recommended)
66+
- **PNG** (`-Tpng`) - Raster image
67+
- **PDF** (`-Tpdf`) - PDF document
68+
- **EPS** (`-Teps`) - Encapsulated PostScript
69+
- **PNG** (`-Tpng`) - High-resolution raster
70+
71+
## Customization
72+
73+
You can edit the `.gv` files to customize:
74+
- Node colors: `node [fillcolor=lightblue, style="rounded,filled"]`
75+
- Edge styles: `edge [style=dashed, color=gray]`
76+
- Layout: Change `rankdir=TB` to `rankdir=LR` for left-right layout
77+
- Font sizes: `node [fontsize=12]`
78+
79+
## Conversion Script
80+
81+
The conversion script is located at:
82+
```
83+
scripts/mermaid_to_graphviz.py
84+
```
85+
86+
To regenerate Graphviz files from Mermaid:
87+
```bash
88+
python scripts/mermaid_to_graphviz.py
89+
```
90+
91+
## Advantages of Graphviz
92+
93+
-**High-quality output** - Professional-looking diagrams
94+
-**Multiple formats** - SVG, PNG, PDF, EPS, etc.
95+
-**Customizable** - Full control over styling
96+
-**Command-line friendly** - Easy to automate
97+
-**No browser required** - Unlike mermaid-cli
98+
99+
## Files Structure
100+
101+
```
102+
docs/
103+
├── mermaid_diagrams/ # Source Mermaid files (.mmd)
104+
├── graphviz_diagrams/ # Graphviz DOT files (.gv)
105+
└── images/ # Generated images (SVG/PNG)
106+
```
107+
108+
---
109+
110+
**Note:** The Graphviz `.gv` files are the converted format. The original Mermaid `.mmd` files remain the source of truth.
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
digraph G {
2+
rankdir=TB;
3+
node [shape=box, style=rounded];
4+
5+
subgraph cluster_Output {
6+
label="Output";
7+
style=rounded;
8+
FILES [label="Downloaded Files\n- XML\n- PDF\n- HTML\n- Metadata"];
9+
METADATA [label="Metadata Files\n- JSON\n- CSV"];
10+
}
11+
12+
subgraph cluster_Frontend_Layer {
13+
label="Frontend Layer";
14+
style=rounded;
15+
ST [label="Streamlit App\nstreamlit_app.py\n3700+ lines"];
16+
UI [label="UI Components\n- Search Interface\n- Corpus Manager\n- Data Tables\n- File Browser"];
17+
}
18+
19+
subgraph cluster_Integration_Layer {
20+
label="Integration Layer";
21+
style=rounded;
22+
BI [label="BioRxiv Integration\nbiorxiv_integration.py"];
23+
BS [label="BioRxiv Scraper\nbiorxiv_advanced_scraper.py"];
24+
DT [label="DataTables Integration\ndatatables_integration.py"];
25+
JATS [label="JATS4R Integration\njats4r_integration.py"];
26+
}
27+
28+
subgraph cluster_Core_pygetpapers {
29+
label="Core pygetpapers";
30+
style=rounded;
31+
PP [label="pygetpapers Core\npygetpapers/pygetpapers.py"];
32+
REPO [label="Repository Modules\n- europe_pmc.py\n- arxiv.py\n- crossref.py\n- openalex.py"];
33+
WEB [label="Web Scraping\npygetpapers/web_scraping/"];
34+
}
35+
36+
subgraph cluster_Data_Sources {
37+
label="Data Sources";
38+
style=rounded;
39+
EUPMC [label="Europe PMC API"];
40+
ARXIV [label="arXiv API"];
41+
CROSSREF [label="Crossref API"];
42+
OPENALEX [label="OpenAlex API"];
43+
BIORXIV [label="BioRxiv Web"];
44+
MEDRXIV [label="MedRxiv Web"];
45+
}
46+
47+
// Edges
48+
ST -> BI;
49+
ST -> DT;
50+
ST -> JATS;
51+
ST -> PP;
52+
BI -> BS;
53+
BS -> BIORXIV;
54+
BS -> MEDRXIV;
55+
PP -> REPO;
56+
REPO -> EUPMC;
57+
REPO -> ARXIV;
58+
REPO -> CROSSREF;
59+
REPO -> OPENALEX;
60+
PP -> FILES;
61+
PP -> METADATA;
62+
BI -> FILES;
63+
BI -> METADATA;
64+
}
127 KB
Loading
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
digraph G {
2+
rankdir=TB;
3+
node [shape=box, style=rounded];
4+
5+
subgraph cluster_External_APIs {
6+
label="External APIs";
7+
style=rounded;
8+
EUPMC_API [label="Europe PMC API"];
9+
ARXIV_API [label="arXiv API"];
10+
CROSSREF_API [label="Crossref API"];
11+
OPENALEX_API [label="OpenAlex API"];
12+
BIORXIV_WEB [label="BioRxiv Web"];
13+
MEDRXIV_WEB [label="MedRxiv Web"];
14+
REDALYC_WEB [label="Redalyc Web"];
15+
}
16+
17+
subgraph cluster_Presentation_Layer {
18+
label="Presentation Layer";
19+
style=rounded;
20+
ST [label="Streamlit UI\nstreamlit_app.py\n~500 lines"];
21+
CLI [label="CLI Interface\npygetpapers_cli.py"];
22+
API [label="REST API\npygetpapers_api.py"];
23+
}
24+
25+
subgraph cluster_Service_Layer {
26+
label="Service Layer";
27+
style=rounded;
28+
SS [label="Search Service\nservices/search_service.py"];
29+
CS [label="Corpus Service\nservices/corpus_service.py"];
30+
FS [label="File Service\nservices/file_service.py"];
31+
MS [label="Metadata Service\nservices/metadata_service.py"];
32+
}
33+
34+
subgraph cluster_Repository_Layer {
35+
label="Repository Layer";
36+
style=rounded;
37+
RI [label="Repository Interface\nrepositories/base.py"];
38+
}
39+
40+
subgraph cluster_Repository_Implementations {
41+
label="Repository Implementations";
42+
style=rounded;
43+
EUPMC [label="Europe PMC\nrepositories/europe_pmc.py"];
44+
ARXIV [label="arXiv\nrepositories/arxiv.py"];
45+
CROSSREF [label="Crossref\nrepositories/crossref.py"];
46+
OPENALEX [label="OpenAlex\nrepositories/openalex.py"];
47+
BIORXIV [label="BioRxiv\nrepositories/biorxiv.py"];
48+
MEDRXIV [label="MedRxiv\nrepositories/medrxiv.py"];
49+
REDALYC [label="Redalyc\nrepositories/redalyc.py"];
50+
}
51+
52+
subgraph cluster_Web_Scraping_Layer {
53+
label="Web Scraping Layer";
54+
style=rounded;
55+
WS [label="Web Scraper Base\nscrapers/base_scraper.py"];
56+
}
57+
58+
subgraph cluster_Scraper_Implementations {
59+
label="Scraper Implementations";
60+
style=rounded;
61+
BS [label="BioRxiv Scraper\nscrapers/biorxiv_scraper.py"];
62+
MS [label="MedRxiv Scraper\nscrapers/medrxiv_scraper.py"];
63+
RS [label="Redalyc Scraper\nscrapers/redalyc_scraper.py"];
64+
}
65+
66+
subgraph cluster_Data_Processing_Layer {
67+
label="Data Processing Layer";
68+
style=rounded;
69+
PARSER [label="Content Parser\nprocessors/content_parser.py"];
70+
CONVERTER [label="Format Converter\nprocessors/format_converter.py"];
71+
VALIDATOR [label="Data Validator\nprocessors/data_validator.py"];
72+
}
73+
74+
subgraph cluster_Storage_Layer {
75+
label="Storage Layer";
76+
style=rounded;
77+
FS_STORAGE [label="File Storage\nstorage/file_storage.py"];
78+
DB [label="Database\nstorage/database.py"];
79+
CACHE [label="Cache\nstorage/cache.py"];
80+
}
81+
82+
// Edges
83+
ST -> SS;
84+
ST -> CS;
85+
ST -> FS;
86+
ST -> MS;
87+
CLI -> SS;
88+
CLI -> CS;
89+
API -> SS;
90+
API -> CS;
91+
SS -> RI;
92+
CS -> RI;
93+
FS -> FS_STORAGE;
94+
MS -> DB;
95+
RI -> EUPMC;
96+
RI -> ARXIV;
97+
RI -> CROSSREF;
98+
RI -> OPENALEX;
99+
RI -> BIORXIV;
100+
RI -> MEDRXIV;
101+
RI -> REDALYC;
102+
BIORXIV -> WS;
103+
MEDRXIV -> WS;
104+
REDALYC -> WS;
105+
WS -> BS;
106+
WS -> MS;
107+
WS -> RS;
108+
BS -> BIORXIV_WEB;
109+
MS -> MEDRXIV_WEB;
110+
RS -> REDALYC_WEB;
111+
EUPMC -> EUPMC_API;
112+
ARXIV -> ARXIV_API;
113+
CROSSREF -> CROSSREF_API;
114+
OPENALEX -> OPENALEX_API;
115+
PARSER -> FS_STORAGE;
116+
CONVERTER -> FS_STORAGE;
117+
VALIDATOR -> DB;
118+
}
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
digraph G {
2+
rankdir=TB;
3+
node [shape=box, style=rounded];
4+
5+
subgraph cluster_Integration {
6+
label="Integration";
7+
style=rounded;
8+
CLI [label="CLI Interface"];
9+
API [label="Python API"];
10+
EXISTING [label="Existing pygetpapers"];
11+
}
12+
13+
subgraph cluster_Configuration_Layer {
14+
label="Configuration Layer";
15+
style=rounded;
16+
YAML [label="YAML Config"];
17+
JSON [label="JSON Config"];
18+
INI [label="INI Config"];
19+
}
20+
21+
subgraph cluster_Framework_Core {
22+
label="Framework Core";
23+
style=rounded;
24+
DOM [label="DeclarativeOperationsManager"];
25+
OP [label="Operation"];
26+
DEP [label="Dependency"];
27+
PAT [label="File Patterns"];
28+
}
29+
30+
subgraph cluster_Dependency_Engine {
31+
label="Dependency Engine";
32+
style=rounded;
33+
CHECK [label="Dependency Checker"];
34+
SORT [label="Topological Sort"];
35+
EXEC [label="Operation Executor"];
36+
}
37+
38+
// Edges
39+
YAML -> DOM;
40+
JSON -> DOM;
41+
INI -> DOM;
42+
DOM -> OP;
43+
DOM -> DEP;
44+
DOM -> PAT;
45+
DOM -> CHECK;
46+
DOM -> SORT;
47+
DOM -> EXEC;
48+
CLI -> DOM;
49+
API -> DOM;
50+
EXEC -> EXISTING;
51+
}

0 commit comments

Comments
 (0)