Skip to content

Commit 542e249

Browse files
committed
Enhance README and configuration files with quick start instructions, example configurations, and improved environment setup for better usability
1 parent 28bb7b8 commit 542e249

4 files changed

Lines changed: 107 additions & 51 deletions

File tree

README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,17 @@ This repository contains the **SQL-to-ARC Converter**, a core component of the F
1212
| [scripts/](scripts/) | Tooling for quality checks, environment setup, and Git LFS. |
1313
| [docker/](docker/) | Dockerfiles and container structure tests. |
1414

15+
## 🌟 Quick Start (Full Local Demo)
16+
17+
For the best **out-of-the-box experience**, you can run a complete local demonstration. This setup starts a PostgreSQL database with demo data, a local Mock Middleware API, and the SQL-to-ARC converter to process and save results locally:
18+
19+
```bash
20+
# Start the full demo stack (requires Docker)
21+
./dev_environment/start-demo.sh --build
22+
```
23+
24+
> **Note:** This demo does not require any secrets or mTLS keys. Generated ARCs will be saved to `dev_environment/demo_output/`.
25+
1526
## 🚀 Getting Started (Development)
1627

1728
The preferred method for working with this repository is using the **Dev Container** (VS Code).

middleware/sql_to_arc/README.md

Lines changed: 10 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,8 @@ In containerized environments, sensitive values like the `connection_string` or
102102

103103
## Usage
104104

105+
The following examples assume you are in the root of the repository.
106+
105107
### 1. From Source (Development)
106108

107109
Requires [uv](https://github.com/astral-sh/uv) installed.
@@ -110,26 +112,22 @@ Requires [uv](https://github.com/astral-sh/uv) installed.
110112
# Install dependencies for all workspace members
111113
uv sync --all-packages
112114
113-
# Run the converter with a specific config file
114-
uv run python -m middleware.sql_to_arc.main -c my_config.yaml
115+
# Run the converter using the example config directly
116+
uv run python -m middleware.sql_to_arc.main -c middleware/sql_to_arc/config.example.yaml
115117
```
116118

117119
### 2. Local Docker Image
118120

119121
Build the image from the repository root:
120122

121123
```bash
124+
# Build the converter image
122125
docker build -f docker/Dockerfile.sql_to_arc -t sql-to-arc:local .
123-
```
124-
125-
Run with environment variables:
126126
127-
```bash
127+
# Run using the example config via a volume mount
128128
docker run --rm \
129-
-e SQL_TO_ARC_CONNECTION_STRING="postgresql://..." \
130-
-e SQL_TO_ARC_RDI="my-rdi" \
131-
-v $(pwd)/certs:/certs \
132-
-e SQL_TO_ARC_API_CLIENT__CLIENT_CERT_PATH="/certs/client.crt" \
129+
--env-file .env \
130+
-v $(pwd)/middleware/sql_to_arc/config.example.yaml:/etc/sql_to_arc/config.yaml:ro \
133131
sql-to-arc:local
134132
```
135133

@@ -138,12 +136,10 @@ docker run --rm \
138136
Pull the latest official image from Docker Hub (once available):
139137

140138
```bash
141-
docker pull fairagro/sql-to-arc:latest
142-
143139
docker run --rm \
144140
--env-file .env \
145-
-v $(pwd)/config.yaml:/etc/sql_to_arc/config.yaml:ro \
146-
fairagro/sql-to-arc:latest -c /etc/sql_to_arc/config.yaml
141+
-v $(pwd)/middleware/sql_to_arc/config.example.yaml:/etc/sql_to_arc/config.yaml:ro \
142+
zalf/fairagro-advanced-middleware-sql_to_arc/sql-to-arc:latest
147143
```
148144

149145
---
Lines changed: 80 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,49 +1,96 @@
11
# SQL to ARC Converter Configuration Example
2-
# Copy this file to config.yaml and adjust the values
2+
# This file contains ALL available configuration options with their default or example values.
33

4-
# Database Connection Settings
5-
# ---------------------------
6-
# Name of the source database (e.g., PostgreSQL database name)
7-
db_name: "edaphobase"
4+
# ------------------------------------------------------------------------------
5+
# 1. CORE SETTINGS
6+
# ------------------------------------------------------------------------------
87

9-
# Database user with read access
10-
db_user: "reader"
8+
# Full SQLAlchemy connection string for the source database.
9+
# Supported: postgresql+psycopg
10+
connection_string: "postgresql+psycopg://postgres:postgres@localhost:5432/rdi"
1111

12-
# Database password (will be handled securely)
13-
db_password: "secure_password_here"
12+
# Unique identifier for the Research Data Infrastructure (RDI).
13+
rdi: "edaphobase"
1414

15-
# Database host address (hostname or IP)
16-
db_host: "localhost"
15+
# Public URL of the RDI portal (used for provenance metadata).
16+
rdi_url: "https://edaphobase.org"
1717

18-
# Database port (default: 5432 for PostgreSQL)
19-
db_port: 5432
18+
# Console logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL).
19+
log_level: "INFO"
2020

21-
# RDI Identifier
22-
# -------------
23-
# Unique identifier for the Research Data Infrastructure (RDI)
24-
# This is used to tag or namespace the converted ARCs
25-
rdi: "edaphobase"
2621

22+
# ------------------------------------------------------------------------------
23+
# 2. PROCESSING & PERFORMANCE
24+
# ------------------------------------------------------------------------------
25+
26+
# Number of parallel worker processes for ARC generation (CPU-bound).
27+
# Recommended: Number of CPU cores available.
28+
max_concurrent_arc_builds: 4
29+
30+
# Maximum concurrent tasks (IO + CPU). Defaults to 4 * max_concurrent_arc_builds.
31+
max_concurrent_tasks: ~
32+
33+
# Number of investigations to fetch from the database per batch.
34+
db_batch_size: 100
35+
36+
# Safety limit: Maximum number of studies per investigation.
37+
max_studies: 5000
38+
39+
# Safety limit: Maximum number of assays per investigation.
40+
max_assays: 10000
41+
42+
# Timeout in minutes for generating a single ARC.
43+
arc_generation_timeout_minutes: 30
44+
45+
# (Optional) Limit processing to the first N investigations for debugging.
46+
debug_limit: ~
2747

28-
# API Client Configuration
29-
# -----------------------
30-
# Settings for connecting to the Middleware API to upload ARCs
31-
api_client:
32-
# Base URL of the Middleware API
33-
api_url: "http://localhost:8000"
3448

35-
# Path to the client certificate file (PEM format) for mTLS authentication
36-
client_cert_path: "/path/to/client.crt"
49+
# ------------------------------------------------------------------------------
50+
# 3. MIDDLEWARE API CLIENT (mTLS)
51+
# ------------------------------------------------------------------------------
3752

38-
# Path to the client private key file (PEM format)
39-
client_key_path: "/path/to/client.key"
53+
api_client:
54+
# Base URL of the FAIRagro Middleware API.
55+
api_url: "http://localhost:8000"
4056

41-
# Path to the CA certificate file (optional, for self-signed server certs)
42-
# ca_cert_path: "/path/to/ca.crt"
57+
# Mutual TLS (mTLS) Credentials
58+
client_cert_path: "dev_environment/client.crt"
59+
client_key_path: "dev_environment/client.key"
60+
61+
# (Optional) Path to a custom CA certificate to verify the API server.
62+
ca_cert_path: ~
4363

44-
# Request timeout in seconds (default: 30.0)
64+
# Request timeout in seconds.
4565
timeout: 30.0
4666

47-
# Verify SSL certificates (default: true)
48-
# Set to false only for testing with self-signed certs without CA
67+
# Whether to verify the API server's SSL certificate.
4968
verify_ssl: true
69+
70+
# Whether to follow HTTP redirects.
71+
follow_redirects: true
72+
73+
# Maximum concurrent HTTP requests to the API.
74+
max_concurrency: 10
75+
76+
# Maximum retries for transient HTTP errors (5xx, timeouts).
77+
max_retries: 3
78+
79+
# Exponential backoff factor for retries.
80+
retry_backoff_factor: 2.0
81+
82+
83+
# ------------------------------------------------------------------------------
84+
# 4. OPENTELEMETRY TRACING
85+
# ------------------------------------------------------------------------------
86+
87+
otel:
88+
# OTel collector endpoint (e.g., http://localhost:4318).
89+
# If null (~), tracing is disabled or uses default env vars.
90+
endpoint: ~
91+
92+
# Whether to print OpenTelemetry spans to the console in a readable format.
93+
log_console_spans: false
94+
95+
# Logging level for OTLP log export.
96+
log_level: "INFO"

scripts/load-env.sh

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -126,13 +126,15 @@ fi
126126
# Decrypt the encrypted file and write to .env
127127
if grep -q '"sops"' "$ENCRYPTED_FILE" 2>/dev/null; then
128128
# Decrypt encrypted file and write to .env
129-
sops -d "$ENCRYPTED_FILE" > "$DECRYPTED_FILE" 2>/dev/null
129+
# We remove the CLIENT_KEY for the .env file because it breaks Docker's --env-file parser.
130+
# We use a perl regex to find the CLIENT_KEY="..." multiline block and delete it entirely.
131+
sops -d "$ENCRYPTED_FILE" 2>/dev/null | perl -0777 -pe 's/CLIENT_KEY=".*?"\n?//gs' > "$DECRYPTED_FILE"
130132
if [ $? -eq 0 ]; then
131-
echo "✅ Encrypted secrets decrypted to $DECRYPTED_FILE"
133+
echo "✅ Encrypted secrets decrypted to $DECRYPTED_FILE (CLIENT_KEY omitted for Docker compatibility)"
132134

133-
# Also load for current shell
135+
# Also load for current shell (the shell CAN handle the full file, so we re-decrypt for memory)
134136
set -a
135-
source "$DECRYPTED_FILE"
137+
source <(sops -d "$ENCRYPTED_FILE" 2>/dev/null)
136138
set +a
137139
else
138140
echo "❌ Error decrypting $ENCRYPTED_FILE"

0 commit comments

Comments
 (0)