|
1 | 1 | # BioGraph-Edge-Quantizer |
2 | 2 |
|
3 | | -**Lead Architect:** Swapin Vidya |
4 | | -**ORCID:** [0009-0009-5758-3845](https://orcid.org/0009-0009-5758-3845) |
5 | | -**Email:** swapin@peachbot.in |
6 | | -**Professional Context:** Senior Systems Architect and Backend Developer. Research developed during an academic sabbatical to align with on-device clinical intelligence goals. |
| 3 | +**Lead Architect:** Swapin Vidya <br> |
| 4 | +**ORCID:** [0009-0009-5758-3845](https://orcid.org/0009-0009-5758-3845)<br> |
| 5 | +**Email:** [swapin@peachbot.in](mailto:swapin@peachbot.in) |
7 | 6 |
|
8 | 7 |  |
9 | | - |
10 | 8 |  |
11 | 9 |  |
12 | 10 |  |
13 | | - |
14 | 11 |
|
15 | | -A deterministic framework for optimizing **Graph Neural Networks (GNNs)** for biological network analysis on edge hardware. This implementation utilizes a **Data Structuring & Preprocessing Layer** to ingest real-world **STRING** dataset protein interactions. |
| 12 | +--- |
| 13 | +## Overview |
| 14 | + |
| 15 | +BioGraph-Edge-Quantizer is a **resource-aware Graph Neural Network pipeline** designed for: |
| 16 | + |
| 17 | +* edge-constrained inference |
| 18 | +* large-scale biological graphs |
| 19 | +* reproducible performance evaluation |
| 20 | + |
| 21 | +The system focuses on: |
| 22 | + |
| 23 | +* **bounded-variance inference latency** |
| 24 | +* **reduced model footprint via INT8 weight packing** |
| 25 | +* **deployable execution using TorchScript** |
| 26 | + |
| 27 | +--- |
| 28 | + |
| 29 | +## Problem Definition |
| 30 | + |
| 31 | +This project models protein–protein interaction graphs derived from the |
| 32 | +STRING database. |
| 33 | + |
| 34 | +**Task (Current Prototype):** |
| 35 | + |
| 36 | +* Node-level inference (binary classification placeholder) |
| 37 | + |
| 38 | +**Input Characteristics:** |
| 39 | + |
| 40 | +* Node features: 4096-dimensional embeddings |
| 41 | +* Graph size: ~10,000 nodes / ~50,000 edges |
| 42 | + |
| 43 | +**Objective:** |
| 44 | +Enable **practical inference under CPU-only, edge-constrained environments**. |
16 | 45 |
|
17 | 46 | --- |
18 | 47 |
|
19 | 48 | ## System Architecture |
20 | | -* **`core_quantizer/`**: Python environment for GNN optimization using **Edge-GNN** principles, featuring a **GraphSAGE** architecture optimized for ARMv8-A. |
21 | | -* **`api_gateway/`**: PHP/Laravel 12 implementation serving inference results via a **FHIR-compliant** GraphQL interface. |
| 49 | + |
| 50 | +* **`core_quantizer/`** |
| 51 | + Python-based GNN pipeline using GraphSAGE and PyTorch Geometric |
| 52 | + |
| 53 | +* **`api_gateway/`** |
| 54 | + Laravel-based interface exposing inference through a structured API |
22 | 55 |
|
23 | 56 | --- |
24 | 57 |
|
25 | | -## Setup & Initialization |
| 58 | +## ⚙️ Setup & Initialization |
26 | 59 |
|
27 | 60 | ### 1. ML Core (Python) |
| 61 | + |
28 | 62 | ```bash |
29 | 63 | cd core_quantizer |
30 | 64 | python -m venv venv |
31 | | -source venv/bin/activate # On Windows: venv\Scripts\activate |
| 65 | +source venv/bin/activate # Windows: venv\Scripts\activate |
| 66 | + |
32 | 67 | pip install pandas torch torch-geometric scikit-learn numpy |
33 | | -python -m src.data_loader --generate-sample # Ingests STRING dataset slice |
34 | | -python -m src.quantizer # Generates optimized INT8 packed model |
35 | | -python -m src.benchmark # Generates performance metrics |
| 68 | + |
| 69 | +python -m src.data_loader --generate-sample |
| 70 | +python -m src.quantizer |
| 71 | +python -m src.benchmark |
36 | 72 | ``` |
37 | 73 |
|
| 74 | +--- |
| 75 | + |
38 | 76 | ### 2. API Gateway (Laravel) |
39 | | -The gateway acts as the bridge between clinical requests and the edge-native ML core. |
40 | 77 |
|
41 | | -**Environment Configuration:** |
42 | | -Ensure your `.env` file points to the correct Python executable within the `core_quantizer` virtual environment to ensure deterministic execution. |
43 | 78 | ```bash |
44 | 79 | cd api_gateway |
45 | 80 | composer install |
46 | 81 | echo "PYTHON_PATH=$(pwd)/../core_quantizer/venv/Scripts/python.exe" >> .env |
| 82 | +php artisan migrate |
47 | 83 | php artisan serve |
48 | 84 | ``` |
49 | 85 |
|
50 | | -**Running the Gateway:** |
51 | | -1. **Initialize Database**: `php artisan migrate` (Sets up system tables for logging and audit trails). |
52 | | -2. **Start Server**: `php artisan serve` (Default: `http://localhost:8000`). |
| 86 | +--- |
| 87 | + |
| 88 | +## Benchmark Configuration |
| 89 | + |
| 90 | +**Hardware:** |
| 91 | +- CPU: Intel Core i5-10210U (4C/8T, 1.60 GHz) |
| 92 | +- RAM: 8 GB (7.88 GB usable) |
| 93 | +- OS: Windows 11 Home Single Language (Build 22600, x64) |
| 94 | +- System: AVITA NS14A8 |
| 95 | + |
| 96 | +**Execution Settings:** |
| 97 | + |
| 98 | +* Runs: 100 |
| 99 | +* Threads: 1 (controlled variance mode) |
| 100 | +* Input: full graph |
53 | 101 |
|
54 | 102 | --- |
55 | 103 |
|
56 | | -## Performance Validation (Benchmarked) |
57 | | -Testing conducted on research-grade parameters (**4096-dimensional embeddings**) to simulate production clinical intelligence. |
| 104 | +## Performance Results |
58 | 105 |
|
59 | | -| Metric | Baseline (FP32) | Optimized (INT8) | Status | |
60 | | -| :--- | :--- | :--- | :--- | |
61 | | -| **Model Weights** | 64.03 MB | **16.02 MB** | **74.99% Compression** | |
62 | | -| **Avg Latency** | 323.36 ms | **313.64 ms** | **Outperforming** | |
63 | | -| **P95 Latency** | 334.77 ms | **333.91 ms** | **Real-Time Ready** | |
64 | | -| **System Jitter (SD)** | **±13.90 ms** | **±14.46 ms** | **Deterministic** | |
| 106 | +| Metric | FP32 Baseline | INT8 Packed | Observation | |
| 107 | +| -------------------- | ------------- | ------------ | ------------------ | |
| 108 | +| **Model Weights** | 64.03 MB | **16.02 MB** | **~75% reduction** | |
| 109 | +| **Avg Latency** | 323.36 ms | 313.64 ms | ~3% improvement | |
| 110 | +| **P95 Latency** | 334.77 ms | 333.91 ms | negligible change | |
| 111 | +| **Std Dev (Jitter)** | ±13.90 ms | ±14.46 ms | bounded variance | |
65 | 112 |
|
66 | 113 | --- |
67 | 114 |
|
68 | | -## Technical Explanations |
69 | | -* **Manual Weight Packing**: Unlike standard library-driven quantization, this framework manually quantizes weights into `int8` and stores them as a packed state dictionary, ensuring absolute control over the storage footprint. |
70 | | -* **GraphSAGE Architecture**: Utilizes inductive learning to generate embeddings for nodes (proteins) not seen during training, essential for evolving biological networks. |
71 | | -* **FHIR Mapping**: Automatically translates raw ML logits into standard-compliant `DiagnosticReport` resources, enabling immediate interoperability with hospital data systems. |
72 | | -* **Standard Deviation (SD)**: Used as a core metric for clinical auditing to verify that system "jitter" remains within acceptable safety bounds for real-time monitoring. |
| 115 | +## Key Insight |
| 116 | + |
| 117 | +Quantization does **not significantly improve latency** in this pipeline because: |
| 118 | + |
| 119 | +* graph aggregation dominates compute |
| 120 | +* high-dimensional feature movement is memory-bound |
| 121 | +* Linear layers are not the primary bottleneck |
| 122 | + |
| 123 | +👉 **Conclusion:** |
| 124 | +Optimization primarily reduces **storage footprint**, not raw compute time. |
73 | 125 |
|
74 | 126 | --- |
75 | 127 |
|
76 | | -## Limitations |
77 | | -* **Dynamic Dequantization Overhead**: For small-scale models (<10MB), the CPU cycles required to dequantize INT8 weights back to FP32 during the forward pass can occasionally exceed the memory bandwidth savings, resulting in a "latency plateau." |
78 | | -* **Metadata Floor**: Serialization formats like TorchScript introduce a fixed metadata overhead (approx. 4-8MB) that can mask compression gains on low-dimensional architectures. |
79 | | -* **Cache Locality Dependence**: Performance gains are most visible when the model size exceeds the L3 cache of the target processor, forcing the system to rely on memory bandwidth efficiency. |
80 | | -* **Subprocess Latency**: The Laravel-to-Python bridge introduces a nominal overhead (approx. 10-15ms) per request due to process initialization in the current `proc_open` implementation. |
| 128 | +## Quantization Strategy |
| 129 | + |
| 130 | +This implementation uses **manual INT8 weight packing**: |
| 131 | + |
| 132 | +* Weights converted → `int8` |
| 133 | +* Scale factors stored separately |
| 134 | +* Dequantization occurs during inference |
| 135 | + |
| 136 | +**Trade-offs:** |
| 137 | + |
| 138 | +* ✔ ~70–75% model size reduction |
| 139 | +* ❗ Dequantization overhead |
| 140 | +* ❗ Limited latency gain under current architecture |
| 141 | + |
| 142 | +--- |
| 143 | + |
| 144 | +## 🔌 System Integration |
| 145 | + |
| 146 | +Current pipeline: |
| 147 | + |
| 148 | +``` |
| 149 | +Laravel → subprocess → Python → GNN → Response |
| 150 | +``` |
| 151 | + |
| 152 | +**Measured Overhead:** |
| 153 | + |
| 154 | +* ~10–15 ms per request |
| 155 | + |
| 156 | +**Limitation:** |
| 157 | + |
| 158 | +* Not scalable for high-throughput systems |
| 159 | + |
| 160 | +**Future Direction:** |
| 161 | + |
| 162 | +* Replace subprocess with persistent inference service (FastAPI / gRPC) |
81 | 163 |
|
82 | 164 | --- |
83 | 165 |
|
84 | | -## Implementation Rationale |
85 | | -* **ML Credibility**: Utilizes **PyTorch Geometric** for non-Euclidean biological data processing rather than generic mocks. |
86 | | -* **Resource Efficiency**: Implements **Manual INT8 Weight Packing** to reduce model footprint by 75%, enabling deployment on resource-constrained edge hardware. |
87 | | -* **Deterministic Intelligence**: Uses absolute path resolution and explicit virtual environment execution to eliminate environmental noise during clinical auditing. |
88 | | -* **IP Alignment**: Developed in coordination with modular on-device clinical intelligence research (**Indian Patent No. 202541127477**). |
| 166 | +## Clinical Alignment (Experimental) |
| 167 | + |
| 168 | +The system includes structured output compatible with FHIR-style schemas |
| 169 | +to simulate integration into clinical workflows. |
| 170 | + |
| 171 | +**Note:** |
| 172 | +This is a research prototype and **not validated for medical use**. |
| 173 | + |
| 174 | +--- |
| 175 | + |
| 176 | +## ⚠️ Limitations |
| 177 | + |
| 178 | +* No formal accuracy benchmarking yet |
| 179 | +* Quantization does not significantly reduce latency |
| 180 | +* TorchScript size does not reflect compression gains |
| 181 | +* Subprocess-based execution adds overhead |
| 182 | +* No ARM / edge hardware validation yet |
| 183 | + |
| 184 | +--- |
| 185 | + |
| 186 | +## Intellectual Property |
| 187 | + |
| 188 | +Indian Patent Application: **202541127477** |
| 189 | + |
| 190 | +--- |
| 191 | + |
| 192 | +## Roadmap |
| 193 | + |
| 194 | +* [ ] Accuracy validation (FP32 vs INT8) |
| 195 | +* [ ] ARM / edge hardware benchmarking |
| 196 | +* [ ] Persistent inference service |
| 197 | +* [ ] Sparse GNN optimization |
| 198 | +* [ ] ONNX INT8 deployment pipeline |
89 | 199 |
|
90 | 200 | --- |
91 | 201 |
|
92 | 202 | ## Technical Glossary |
93 | | -| Term | Description | |
94 | | -| :--- | :--- | |
95 | | -| **GraphSAGE** | Inductive learning architecture used for analyzing unseen protein nodes. | |
96 | | -| **STRING** | The biological interaction dataset utilized for research-grade validation. | |
97 | | -| **Quantization** | Converting Float32 weights to Int8 to optimize for edge-native execution layers. | |
98 | | -| **FHIR** | Standard protocol for exchanging electronic health records. | |
99 | | -| **P95 Latency** | The latency threshold under which 95% of requests fall, indicating system stability. | |
| 203 | + |
| 204 | +| Term | Description | |
| 205 | +| ------------ | ------------------------------ | |
| 206 | +| GraphSAGE | Inductive GNN for unseen nodes | |
| 207 | +| STRING | Protein interaction dataset | |
| 208 | +| Quantization | FP32 → INT8 weight conversion | |
| 209 | +| P95 Latency | 95th percentile latency | |
| 210 | + |
| 211 | +--- |
0 commit comments