Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
4957986
Enhance README and architectural documentation for clarity and comple…
Zalfsten Apr 1, 2026
7085b91
Update VS Code settings and enhance README for Dev Container usage
Zalfsten Apr 1, 2026
01c24d1
Enhance README with updated development instructions and example YAML…
Zalfsten Apr 1, 2026
33fc13a
Update VS Code settings and enhance README with detailed configuratio…
Zalfsten Apr 1, 2026
6bb3825
Enhance development environment with demo setup and configuration upd…
Zalfsten Apr 2, 2026
5dfb151
Enhance demo API with type hints, error handling, and directory owner…
Zalfsten Apr 2, 2026
28bb7b8
Enhance demo environment by adding health checks for db-init service …
Zalfsten Apr 2, 2026
542e249
Enhance README and configuration files with quick start instructions,…
Zalfsten Apr 2, 2026
a08081b
Fix documentation links in README for correct relative paths
Zalfsten Apr 2, 2026
a53bafe
Fix relative paths in README documentation links for consistency
Zalfsten Apr 2, 2026
2999cb9
Enhance database and processor modules with improved comments and err…
Zalfsten Apr 2, 2026
72abfa1
Enhance development environment by adding new extensions for improved…
Zalfsten Apr 8, 2026
a9555a3
Merge branch 'main' into feature/doc_enhancement
Zalfsten Apr 8, 2026
f6e86c2
Potential fix for pull request finding 'CodeQL / Uncontrolled data us…
Zalfsten Apr 8, 2026
cf2305f
Potential fix for pull request finding 'CodeQL / Uncontrolled data us…
Zalfsten Apr 8, 2026
48a934e
Potential fix for pull request finding 'CodeQL / Uncontrolled data us…
Zalfsten Apr 8, 2026
d76eb75
Fix typo in Architecture Rules section header in AGENTS.md
Zalfsten Apr 8, 2026
5c84645
Refactor db-init service: remove infinite sleep and update healthchec…
Zalfsten Apr 8, 2026
d365783
Potential fix for pull request finding 'CodeQL / Uncontrolled data us…
Zalfsten Apr 8, 2026
0f3b4b7
Merge branch 'feature/doc_enhancement' of https://github.com/fairagro…
Zalfsten Apr 8, 2026
e864741
Potential fix for pull request finding 'CodeQL / Uncontrolled data us…
Zalfsten Apr 8, 2026
5e8c2af
Potential fix for pull request finding 'CodeQL / Uncontrolled data us…
Zalfsten Apr 8, 2026
9510a63
Potential fix for pull request finding 'CodeQL / Uncontrolled data us…
Zalfsten Apr 8, 2026
986dcfd
Potential fix for pull request finding 'CodeQL / Uncontrolled data us…
Zalfsten Apr 8, 2026
a3222d1
Potential fix for pull request finding 'CodeQL / Uncontrolled data us…
Zalfsten Apr 8, 2026
2a7e549
Refactor error handling in ARC processing to simplify logging
Zalfsten Apr 8, 2026
e1df71c
Refactor Docker Compose setup to streamline PostgreSQL initialization…
Zalfsten Apr 8, 2026
345a9bf
Refactor _derive_safe_arc_id to ensure valid ARC identifiers and prev…
Zalfsten Apr 8, 2026
7e6becc
Update PostgreSQL environment variables in Docker Compose to use dyna…
Zalfsten Apr 8, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .devcontainer/antigravity/devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,9 @@
"github.copilot-chat",
"charliermarsh.ruff",
"tim-koehler.helm-intellisense",
"vadzimnestsiarenka.helm-template-preview-and-more"
"vadzimnestsiarenka.helm-template-preview-and-more",
"jebbs.plantuml",
"systemticks.c4-dsl-extension"
]
}
},
Expand Down
4 changes: 3 additions & 1 deletion .devcontainer/vscode/devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,9 @@
"github.copilot-chat",
"charliermarsh.ruff",
"tim-koehler.helm-intellisense",
"vadzimnestsiarenka.helm-template-preview-and-more"
"vadzimnestsiarenka.helm-template-preview-and-more",
"jebbs.plantuml",
"systemticks.c4-dsl-extension"
]
}
},
Expand Down
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,5 @@ Dockerfile text eol=lf
*.dll binary
*.so binary
*.sql filter=lfs diff=lfs merge=lfs -text
dev_environment/demo.sql text !filter !diff !merge
docs/create_empty_views.sql text !filter !diff !merge
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -224,3 +224,5 @@ helmchart/**/server.conf

# ggshield cache
.cache_ggshield

dev_environment/demo_output/
2 changes: 1 addition & 1 deletion .vscode/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ uv run mypy middleware/
source scripts/load-env.sh

# Docker
cd dev_environment && ./start.sh --build
cd dev_environment && ./start-dev.sh --build
```

## ⚠️ Common Patterns
Expand Down
4 changes: 3 additions & 1 deletion .vscode/extensions.json
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@
"charliermarsh.ruff",
"tim-koehler.helm-intellisense",
"vadzimnestsiarenka.helm-template-preview-and-more",
"ms-vscode-remote.remote-containers"
"ms-vscode-remote.remote-containers",
"jebbs.plantuml",
"systemticks.c4-dsl-extension"
],
}
2 changes: 1 addition & 1 deletion .vscode/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
"envFile": "${workspaceFolder}/.env",
"args": [
"-c",
"${workspaceFolder}/dev_environment/debug_config.yaml"
"${workspaceFolder}/dev_environment/config.debug.yaml"
]
},
{
Expand Down
1 change: 0 additions & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@
"python.testing.autoTestDiscoverOnSaveEnabled": true,
"python.testing.pytestPath": "${workspaceFolder}/.venv/bin/pytest",
"python.testing.cwd": "${workspaceFolder}",
"python-envs.pythonProjects": [],

// Code formatting - use ruff for both linting and formatting
"[python]": {
Expand Down
33 changes: 26 additions & 7 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,9 @@ scripts/
└── post-merge

dev_environment/
├── start.sh # Start Docker Compose (Postgres + Converter)
├── compose.yaml # Docker services definition
└── config.yaml # Development configuration for the converter
├── start-dev.sh # Start Docker Compose (Postgres + Converter)
├── compose.dev.yaml # Docker services definition
└── config.dev.yaml # Development configuration for the converter
```

## 🔧 Important Commands
Expand All @@ -61,9 +61,13 @@ uv sync --dev --all-packages
### Development Environment

```bash
# Start local database and run converter
# Start a full local demo (including mock API, no secrets/mTLS required)
cd dev_environment
./start.sh --build
./start-demo.sh --build

# Start local database and run converter (requires decryption via sops)
cd dev_environment
./start-dev.sh --build

# View logs
docker compose logs -f
Expand All @@ -72,7 +76,22 @@ docker compose logs -f
docker compose down
```

## 📝 Key Implementation Details
## Architecture Rules

Before generating or modifying code, read **[docs/ARCHITECTURE_RULES.md](docs/ARCHITECTURE_RULES.md)**.

It defines binding constraints that MUST be followed:

- **Module Dependency Graph**: Which module may import from which (no circular imports).
- **Extension Points**: How to add new DB entities, mapper functions, or config values.
- **Concurrency Rules**: IPC contract for worker processes, Semaphore scope.
- **Error Handling**: Per-investigation failure isolation, stats update pattern.
- **Config**: NEVER use `os.environ` directly — always extend `Config` in `config.py`.
- **Database Access**: All SQL goes through `Database`; always use server-side cursors and bulk fetches.

---

## �📝 Key Implementation Details

### External Dependencies

Expand Down Expand Up @@ -108,7 +127,7 @@ services:
sql_to_arc: # The converter component (this repo)
```

**Configuration**: `dev_environment/config.yaml`
**Configuration**: `dev_environment/config.dev.yaml`

- Connects to `postgres` service on port 5432.
- Uses `api_url` pointing to an external Middleware API if needed.
Expand Down
76 changes: 74 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,75 @@
# m4.2_advanced_middleware_api
# FAIRagro SQL-to-ARC Middleware

The API component of the advanced middleware that accepts ARCs in RO-Create format and pushes them to the datahub
This repository contains the **SQL-to-ARC Converter**, a core component of the FAIRagro advanced middleware architecture. It enables Research Data Infrastructure (RDI) providers to transform their relational metadata into standardized Annotated Research Context (ARC) objects and transmit them to the central FAIRagro Middleware API.

## 📁 Repository Structure

| Folder | Description |
| :--- | :--- |
| [middleware/](middleware/) | Source code of the converter component. |
| [docs/](docs/) | Architectural design, database view specifications, and API documentation. |
| [dev_environment/](dev_environment/) | Docker-based local development setup (Postgres, Mock API). |
| [scripts/](scripts/) | Tooling for quality checks, environment setup, and Git LFS. |
| [docker/](docker/) | Dockerfiles and container structure tests. |

## 🌟 Quick Start (Full Local Demo)

For the best **out-of-the-box experience**, you can run a complete local demonstration. This setup starts a PostgreSQL database with demo data, a local Mock Middleware API, and the SQL-to-ARC converter to process and save results locally:

```bash
# Start the full demo stack (requires Docker)
./dev_environment/start-demo.sh --build
```

> **Note:** This demo does not require any secrets or mTLS keys. Generated ARCs will be saved to `dev_environment/demo_output/`.

## 🚀 Getting Started (Development)

The preferred method for working with this repository is using the **Dev Container** (VS Code).

While it is possible to develop without the Dev Container (see next steps below), this approach is not tested and is therefore neither documented nor officially supported.

### 1. Prerequisites (for manual setups only)

- **Python 3.12+**
- **[uv](https://github.com/astral-sh/uv)** (Dependency Management & Workspace Orchestration)
- **Docker & Docker Compose**
- **Git LFS** (installed via `./scripts/setup-git-lfs.sh`)

### 2. Environment Setup

Clone the repository and install all workspace dependencies:

```bash
uv sync --all-packages
```

### 3. Start Local Development Environment

The `dev_environment` folder provides a full stack including a PostgreSQL database pre-filled with edaphobase data.

Please refer to the **[Development Environment README](dev_environment/README.md)** for detailed instructions on prerequisites (like secret management and mTLS keys), setup, and usage.

## 🔧 Component Documentation

Detailed information on how to use, configure, and deploy the specific components can be found in their respective subdirectories:

- **[SQL-to-ARC Converter README](middleware/sql_to_arc/README.md)**: Configuration (YAML/Env), CLI options, and production deployment.
- **[Architectural Design](docs/ARCHITECTURAL_DESIGN.md)**: Deep dive into the concurrency model, memory management, and data flow.
- **[Database View Spec](docs/sql_to_arc_database_views.md)**: The SQL views required for the RDI provider database.

## 🧪 Quality Standards

We maintain high code quality through automated checks:

```bash
# Run all quality checks (Ruff, Mypy, Pylint, Bandit)
./scripts/quality-check.sh

# Run unit and integration tests
uv run pytest middleware/sql_to_arc/tests/
```

---
**Maintained by:** FAIRagro Middleware Team
**License:** [LICENSE](LICENSE)
53 changes: 33 additions & 20 deletions dev_environment/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,30 +30,43 @@ The SQL-to-ARC converter that:
- Waits for db-init to complete
- Connects to PostgreSQL and Middleware API
- Mounts encrypted secrets via sops
- Currently set to `sleep 3600` (modify compose.yaml to enable converter)
- Currently set to `sleep 3600` (modify compose.dev.yaml to enable converter)

### 4. middleware-api
### 4. middleware-api (Demo Only)

The FAIRagro Middleware API service that:
A simple mock service that simulates the Middleware API for local testing.

- Builds from `../docker/Dockerfile.api`
- Runs on port `8000`
- Provides REST API for ARC management
- No mTLS validation in dev mode (HTTP without client certs)
- Health check via `/live` endpoint
- No mTLS required (uses HTTP)
- Logs all incoming ARC uploads
- Available via `http://localhost:8000`

## Quick Start
## Quick Start (Demo Mode - Recommended for First Time)

If you don't have the mTLS keys yet and just want to see the workflow in action with a local database and a mock API:

```bash
./start-demo.sh --build
```

This starts:

1. **postgres**: A local DB.
2. **db-init**: Fills the DB with sample data.
3. **middleware-api**: A local mock API.
4. **sql-to-arc**: The converter, pointing to the local mock.

## Quick Start (Standard/External Mode)

### Prerequisites

- Docker and Docker Compose
- [sops](https://github.com/getsops/sops) for secret management
- Age or PGP key configured for sops decryption

### Start Everything
### Start with Decryption

```bash
./start.sh
./start-dev.sh
```

This will:
Expand All @@ -65,7 +78,7 @@ This will:
With image rebuild:

```bash
./start.sh --build
./start-dev.sh --build
```

### Start with External Middleware API
Expand Down Expand Up @@ -126,9 +139,9 @@ sops client.key
sops -d client.key
```

The `start.sh` script uses `sops exec-file` to temporarily decrypt `client.key` during container startup.
The `start-dev.sh` script uses `sops exec-file` to temporarily decrypt `client.key` during container startup.

### config.yaml
### config.dev.yaml

Application configuration for sql_to_arc:

Expand Down Expand Up @@ -172,7 +185,7 @@ docker compose logs sql_to_arc
Common issues:

- Secrets not mounted → verify sops decryption works: `sops -d client.key`
- API unreachable → check `api_url` in config.yaml
- API unreachable → check `api_url` in config.dev.yaml
- Database connection → verify db-init completed successfully

### Rebuild specific service
Expand All @@ -182,7 +195,7 @@ docker compose build sql_to_arc
docker compose up sql_to_arc
```

## Manual Usage (without start.sh)
## Manual Usage (without start-dev.sh)

If you don't want to use sops or the start script:

Expand All @@ -201,15 +214,15 @@ sops exec-file client.key \
## Development Workflow

1. Make changes to sql_to_arc code
2. Rebuild image: `./start.sh --build`
2. Rebuild image: `./start-dev.sh --build`
3. View logs: `docker compose logs -f sql_to_arc`
4. Iterate

## Files

- `compose.yaml` - Docker Compose service definitions
- `config.yaml` - Application configuration
- `compose.dev.yaml` - Docker Compose service definitions
- `config.dev.yaml` - Application configuration
- `client.crt` - Client certificate (plain)
- `client.key` - Client private key (encrypted with sops)
- `start.sh` - Startup script with sops integration
- `start-dev.sh` - Startup script with sops integration
- `run.sh` - **DEPRECATED** - Old script (kept for reference)
57 changes: 57 additions & 0 deletions dev_environment/compose.demo.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
services:
postgres:
image: postgres:15
restart: unless-stopped
environment:
POSTGRES_USER: ${POSTGRES_USER}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
POSTGRES_DB: rdi
ports:
- "5432:5432"
volumes:
- ./demo.sql:/docker-entrypoint-initdb.d/01-demo.sql:ro
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres -d rdi"]
interval: 5s
timeout: 5s
retries: 20

middleware-api:
image: python:3.12-slim
container_name: middleware-api-demo
environment:
- LOG_LEVEL=INFO
- LOCAL_UID=${LOCAL_UID:-1000}
- LOCAL_GID=${LOCAL_GID:-1000}
ports:
- "8000:8000"
volumes:
- ./demo_output:/data/arcs
- ./demo_api_main.py:/app/main.py:ro
command:
- /bin/sh
- -c
- |
set -e
pip install fastapi uvicorn arctrl
uvicorn main:app --app-dir /app --host 0.0.0.0 --port 8000
healthcheck:
test: ["CMD-SHELL", "python -c \"import urllib.request; urllib.request.urlopen('http://localhost:8000/live')\" 2>/dev/null || exit 1"]
interval: 5s
timeout: 5s
retries: 10

sql_to_arc:
image: sql_to_arc:latest
build:
context: ..
dockerfile: docker/Dockerfile.sql_to_arc
depends_on:
postgres:
condition: service_healthy
middleware-api:
condition: service_healthy
volumes:
- ./config.demo.yaml:/etc/sql_to_arc/config.yaml:ro
environment:
SQL_TO_ARC_CONNECTION_STRING: "postgresql+psycopg://postgres:postgres@postgres:5432/rdi"
Loading
Loading