Skip to content

Commit 0983e6c

Browse files
committed
feat: Add comprehensive AI Agent Workflow documentation
- Introduced `ai_workflow.md` detailing the integration of AI coding agents, project structure, and VS Code integration. - Documented the entry point for agents in `AGENTS.md`, emphasizing its role in providing project context. - Explained the spec-driven development approach with a two-level layout for specs in `spec/` and `middleware/*/spec/`. - Outlined the use of agent skills in `.agents/skills/`, including discovery, activation, and execution processes. - Provided practical workflow examples for agents, including adding new config fields and fixing serialization bugs. - Added new specs and design documents for API upload, ARC building, database access, and SQL-to-ARC conversion. - Established principles for the FAIRagro SQL-to-ARC project, emphasizing correctness, memory safety, and security. - Created demo environment specifications to facilitate local testing without external dependencies.
1 parent b2a5fa5 commit 0983e6c

18 files changed

Lines changed: 527 additions & 243 deletions

File tree

Lines changed: 86 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,16 @@
1-
# ARCtrl — Usage Skill
1+
---
2+
name: arctrl
3+
description: >
4+
Reference for using the arctrl Python library (v3.x) to build ARC (Annotated
5+
Research Context) objects and serialize them to RO-Crate JSON-LD. Use when
6+
working with ArcInvestigation, ArcStudy, ArcAssay, ArcTable, CompositeHeader,
7+
CompositeCell, OntologyAnnotation, OntologySourceReference, Person, or
8+
Publication objects, or when calling ToROCrateJsonString / WriteAsync.
9+
compatibility: Python 3.12+, arctrl (Fable-transpiled F# library)
10+
---
11+
12+
# ARCtrl — Usage Reference
213

3-
Reference for using the `arctrl` Python library (v3.x) in this project.
414
ARCtrl is a Fable-transpiled F# library — the Python surface is idiomatic
515
but some internals are Fable runtime types.
616

@@ -37,19 +47,46 @@ from arctrl.py.fable_modules.fable_library.async_ import start_as_task # type:
3747
# Empty / unknown
3848
oa = OntologyAnnotation()
3949

40-
# With values: name=term, tan=URI (TermAccessionNumber), tsr=TermSourceREF
41-
oa = OntologyAnnotation(name="soil texture", tan="http://purl.obolibrary.org/obo/ENVO_00002001", tsr="")
42-
# tsr is usually left empty when only a URI is available
50+
# With values — all parameters optional:
51+
oa = OntologyAnnotation(
52+
name="soil texture", # human-readable term
53+
tan="http://purl.obolibrary.org/obo/ENVO_00002001", # TermAccessionNumber (URI)
54+
tsr="ENVO", # TermSourceREF: short name of the ontology source
55+
)
56+
# tsr is a back-reference to an OntologySourceReference registered on the
57+
# investigation (by its .Name). If no OntologySourceReference is registered,
58+
# tsr can be left empty or omitted.
59+
```
60+
61+
### OntologySourceReference
62+
63+
Registered on `ArcInvestigation.OntologySourceReferences`. Describes an
64+
ontology source and holds its version.
65+
66+
```python
67+
from arctrl import OntologySourceReference
68+
69+
osr = OntologySourceReference(
70+
name="ENVO", # short name — must match OntologyAnnotation.tsr
71+
description="Environment Ontology",
72+
file="http://purl.obolibrary.org/obo/envo.owl",
73+
version="2024-01-01", # ontology version / access date
74+
)
75+
investigation.OntologySourceReferences.append(osr)
4376
```
4477

78+
**Relationship:** `OntologyAnnotation.tsr` is a string key that references
79+
`OntologySourceReference.name`. ARCtrl does not enforce referential integrity
80+
at runtime, but the RO-Crate serialization will include both objects.
81+
4582
### ArcInvestigation
4683

4784
```python
4885
inv = ArcInvestigation.create(
49-
identifier="inv001", # required, must be non-empty
86+
identifier="inv001", # required, must be non-empty
5087
title="My Investigation",
5188
description="...",
52-
submission_date="2024-01-15", # ISO string or None
89+
submission_date="2024-01-15", # ISO string or None
5390
public_release_date="2025-01-01",
5491
)
5592
```
@@ -73,7 +110,7 @@ assay = ArcAssay.create(
73110
identifier="assay001",
74111
measurement_type=OntologyAnnotation("soil metagenome", "http://...", ""),
75112
technology_type=OntologyAnnotation("nucleotide sequencing", "http://...", ""),
76-
technology_platform=OntologyAnnotation("Illumina", None, None), # platform is text; OA is allowed
113+
technology_platform=OntologyAnnotation("Illumina", None, None),
77114
# technology_platform=None is fine if unknown
78115
)
79116
```
@@ -121,16 +158,16 @@ arc.AddRegisteredStudy(study)
121158
arc.AddAssay(assay)
122159

123160
# 4. Link assay → study
124-
study.RegisterAssay(assay.Identifier) # pass the string identifier
161+
study.RegisterAssay(assay.Identifier) # pass the string identifier
125162

126163
# 5. Attach contacts
127-
arc.Contacts.append(person) # investigation-level
128-
study.Contacts.append(person) # study-level
129-
assay.Performers.append(person) # assay-level
164+
arc.Contacts.append(person) # investigation-level
165+
study.Contacts.append(person) # study-level
166+
assay.Performers.append(person) # assay-level
130167

131168
# 6. Attach publications
132-
arc.Publications.append(pub) # investigation-level
133-
study.Publications.append(pub) # study-level
169+
arc.Publications.append(pub) # investigation-level
170+
study.Publications.append(pub) # study-level
134171

135172
# 7. Serialize to RO-Crate JSON-LD string
136173
json_str: str = arc.ToROCrateJsonString()
@@ -140,51 +177,45 @@ json_str: str = arc.ToROCrateJsonString()
140177

141178
## ArcTable (Annotation Tables)
142179

143-
Tables attach to a study or assay.
144-
145180
```python
146181
# Create table
147182
table = ArcTable.init("my-table-name")
148183

149-
# Build a header
150-
header_input = CompositeHeader.input(IOType.of_string("source_name"))
184+
# Build headers
185+
header_input = CompositeHeader.input(IOType.of_string("source_name"))
151186
header_output = CompositeHeader.output(IOType.of_string("sample_name"))
152-
header_char = CompositeHeader.characteristic(OntologyAnnotation("pH", "", ""))
187+
header_char = CompositeHeader.characteristic(OntologyAnnotation("pH", "", ""))
153188
header_factor = CompositeHeader.factor(OntologyAnnotation("temperature", "", ""))
154-
header_param = CompositeHeader.parameter(OntologyAnnotation("extraction", "", ""))
155-
header_comp = CompositeHeader.component(OntologyAnnotation("reagent", "", ""))
156-
header_cmt = CompositeHeader.comment("My comment label")
157-
header_perf = CompositeHeader.performer # property, not callable
158-
header_date = CompositeHeader.date # property, not callable
189+
header_param = CompositeHeader.parameter(OntologyAnnotation("extraction", "", ""))
190+
header_comp = CompositeHeader.component(OntologyAnnotation("reagent", "", ""))
191+
header_cmt = CompositeHeader.comment("My comment label")
192+
header_perf = CompositeHeader.performer # property, not callable
193+
header_date = CompositeHeader.date # property, not callable
159194
# Fallback for unknown/simple header names:
160-
header_any = CompositeHeader.OfHeaderString("SomeColumnName")
195+
header_any = CompositeHeader.OfHeaderString("SomeColumnName")
161196

162197
# IOType known strings (IOType.of_string):
163198
# "source_name", "sample_name", "raw_data_file", "derived_data_file",
164199
# "image_file", "material"
165200

166-
# Build cells (one per row, same order as rows in the table)
167-
cell_text = CompositeCell.free_text("some value")
168-
cell_term = CompositeCell.term(OntologyAnnotation("sandy loam", "http://...", ""))
169-
cell_unitized = CompositeCell.unitized("6.8", OntologyAnnotation("pH", "http://...", ""))
170-
cell_empty = CompositeCell.free_text("")
201+
# Build cells
202+
cell_text = CompositeCell.free_text("some value")
203+
cell_term = CompositeCell.term(OntologyAnnotation("sandy loam", "http://...", ""))
204+
cell_unitized = CompositeCell.unitized("6.8", OntologyAnnotation("pH", "http://...", ""))
205+
cell_empty = CompositeCell.free_text("")
171206

172207
# Add column (header + matching cell list)
173208
table.AddColumn(header_char, [cell_term, cell_term, cell_empty])
174209

175-
# Attach table to study or assay
176-
study.AddTable(table)
177-
assay.AddTable(table)
178-
```
179-
180-
### CompositeHeader.IsTermColumn
181-
182-
```python
183-
# Check before building a cell: if True, wrap plain values in OntologyAnnotation
210+
# Check whether a header expects a term cell
184211
if header.IsTermColumn:
185212
cell = CompositeCell.term(OntologyAnnotation(str(value), "", ""))
186213
else:
187214
cell = CompositeCell.free_text(str(value))
215+
216+
# Attach table to study or assay
217+
study.AddTable(table)
218+
assay.AddTable(table)
188219
```
189220

190221
---
@@ -196,7 +227,6 @@ else:
196227
arc = ARC.from_rocrate_json_string(json_str)
197228

198229
# Async write to directory (creates ISA file structure on disk)
199-
# Must be awaited via start_as_task (Fable async bridge)
200230
await start_as_task(arc.WriteAsync("/path/to/output/dir"))
201231
```
202232

@@ -213,53 +243,49 @@ await start_as_task(arc.WriteAsync("/path/to/output/dir"))
213243
## Known Pitfalls
214244

215245
**`start_as_task` is untyped** — always add `# type: ignore[import-untyped]`
216-
on the import. It is an internal Fable module and has no stubs.
246+
on the import.
217247

218248
**`CompositeHeader.performer` and `.date` are properties, not constructors**
219249
— call them without `()`:
220250

221251
```python
222-
# CORRECT
223-
header = CompositeHeader.performer
224-
# WRONG
225-
header = CompositeHeader.performer() # TypeError
252+
header = CompositeHeader.performer # CORRECT
253+
header = CompositeHeader.performer() # TypeError
226254
```
227255

228-
**`OntologyAnnotation()` without args is valid** — use it for empty/unknown
229-
ontology terms rather than `None` to avoid null-ref errors in the F# layer.
256+
**`OntologyAnnotation()` without args is valid** — use for empty/unknown terms
257+
instead of `None` to avoid null-ref errors in the F# layer.
230258

231-
**ARC objects carry .NET interop state** — do not pickle them or transfer
232-
them across multiprocessing boundaries. Always serialize to JSON string first.
259+
**ARC objects carry .NET interop state** — do not pickle or transfer across
260+
multiprocessing boundaries. Serialize to JSON string first.
233261

234262
**`ToROCrateJsonString()` + `gc.collect()`** — after serializing in a worker
235263
process, explicitly `del arc` and call `gc.collect()` to release .NET bridge
236264
memory promptly.
237265

238-
**`ArcAssay.create(technology_platform=None)`** — passing `None` is safe and
239-
means "unknown platform". Passing an empty `OntologyAnnotation()` is also
240-
accepted.
266+
**`ArcAssay.create(technology_platform=None)`**`None` is safe. An empty
267+
`OntologyAnnotation()` is also accepted.
241268

242269
---
243270

244271
## RO-Crate JSON-LD Output Shape
245272

246273
```json
247274
{
248-
"@context": { ... },
275+
"@context": { "...": "..." },
249276
"@graph": [
250-
{ "@id": "inv001", "@type": "Dataset", "identifier": "inv001", ... },
251-
{ "@id": "study001", "@type": "Dataset", ... },
252-
{ "@id": "assay001", "@type": "Dataset", ... },
253-
{ "@id": "#Doe_John", "@type": "Person", "familyName": "Doe", ... },
254-
...
277+
{ "@id": "inv001", "@type": "Dataset", "identifier": "inv001" },
278+
{ "@id": "study001", "@type": "Dataset" },
279+
{ "@id": "assay001", "@type": "Dataset" },
280+
{ "@id": "#Doe_John", "@type": "Person", "familyName": "Doe" }
255281
]
256282
}
257283
```
258284

259-
Useful for test assertions:
285+
Test assertion pattern:
260286

261287
```python
262288
graph = json.loads(arc.ToROCrateJsonString()).get("@graph", [])
263289
inv_node = next(item for item in graph if item.get("identifier") == "inv001")
264-
person = next(item for item in graph if item.get("familyName") == "Doe")
290+
person = next(item for item in graph if item.get("familyName") == "Doe")
265291
```
Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
---
2+
name: config-wrapper
3+
description: >
4+
Reference for the ConfigWrapper / ConfigBase pattern from middleware.shared.
5+
Use when adding config fields, reading config values, overriding via
6+
environment variables or Docker secrets, or extending ConfigBase in a new
7+
component. ConfigWrapper is the single source of truth for all configuration.
8+
compatibility: Python 3.12+, pydantic v2, middleware.shared
9+
---
10+
11+
# ConfigWrapper — Usage Reference
12+
13+
`ConfigWrapper` (from `middleware.shared.config`) wraps a YAML file and adds
14+
environment variable and Docker secret overrides. A component's `Config` class
15+
extends `ConfigBase` and is populated via `Config.from_config_wrapper(wrapper)`.
16+
17+
---
18+
19+
## Loading Configuration
20+
21+
```python
22+
from middleware.shared.config.config_wrapper import ConfigWrapper
23+
from mycomponent.config import Config # extends ConfigBase
24+
25+
wrapper = ConfigWrapper.from_yaml_file(path, prefix="MY_PREFIX")
26+
config = Config.from_config_wrapper(wrapper)
27+
```
28+
29+
---
30+
31+
## Override Resolution Order
32+
33+
For every config field, the wrapper resolves values in this order:
34+
35+
1. **Environment variable**: `{PREFIX}_{FIELD_PATH}` (uppercase, `_` as separator)
36+
2. **Docker secret file**: `/run/secrets/{prefix}_{field_path}` (lowercase)
37+
3. **YAML file value**
38+
4. **Pydantic field default**
39+
40+
Nested fields use `_` as path separator:
41+
- `api_client.api_url` with prefix `MY_APP``MY_APP_API_CLIENT_API_URL`
42+
43+
---
44+
45+
## Type Coercion (env / secret values are always strings)
46+
47+
| String value | Parsed as |
48+
|---|---|
49+
| `"true"` / `"True"` / `"TRUE"` | `True` (bool) |
50+
| `"false"` / `"False"` / `"FALSE"` | `False` (bool) |
51+
| `"123"` | `123` (int) |
52+
| `"3.14"` | `3.14` (float) |
53+
| `""` (empty) | `None` |
54+
| anything else | `str` |
55+
56+
---
57+
58+
## Extending ConfigBase
59+
60+
`ConfigBase` is an optional convenience base class from `middleware.shared`
61+
that bundles config options shared across FAIRagro middleware components. You
62+
can subclass it to inherit those fields, or use plain `pydantic.BaseModel` if
63+
your component doesn't need them.
64+
65+
```python
66+
from typing import Annotated
67+
from pydantic import Field, SecretStr
68+
from middleware.shared.config.config_base import ConfigBase # optional
69+
70+
71+
class Config(ConfigBase): # or BaseModel if ConfigBase fields aren't needed
72+
# Required field (no default)
73+
connection_string: Annotated[SecretStr, Field(description="DB connection URI")]
74+
75+
# Optional field with default
76+
batch_size: Annotated[
77+
int,
78+
Field(description="Records to fetch per batch.", ge=1),
79+
] = 100
80+
```
81+
82+
---
83+
84+
## ConfigBase (optional convenience base)
85+
86+
`ConfigBase` from `middleware.shared` is a FAIRagro-specific convenience class.
87+
Use it when your component should share the standard logging and OpenTelemetry
88+
fields; skip it for components that don't need them.
89+
90+
Inherited fields:
91+
92+
```python
93+
log_level: LogLevel = "INFO" # "DEBUG" | "INFO" | "WARNING" | "ERROR" | "CRITICAL"
94+
otel: OtelConfig # OpenTelemetry settings
95+
```
96+
97+
`OtelConfig` fields:
98+
- `endpoint: str | None` — OTLP collector URL
99+
- `log_console_spans: bool` — print spans to stdout
100+
- `log_level: LogLevel` — OTLP log export level
101+
102+
---
103+
104+
## Secrets Handling
105+
106+
- `SecretStr` fields: access the value as `.get_secret_value()` only at the
107+
point of use (e.g., when creating a DB engine). Never pass them to `str()`
108+
or log them directly.
109+
- Docker secrets: mount files to `/run/secrets/`; the wrapper resolves them
110+
automatically using the full key name (lowercase).
111+
112+
---
113+
114+
## Testing
115+
116+
In unit tests, instantiate `Config` directly without the wrapper:
117+
118+
```python
119+
config = Config(
120+
connection_string=SecretStr("postgresql+asyncpg://user:pass@localhost/db"),
121+
# ... other required fields
122+
)
123+
```
124+
125+
In integration tests, mock at the wrapper boundary:
126+
127+
```python
128+
mocker.patch("mycomponent.main.ConfigWrapper.from_yaml_file")
129+
mocker.patch("mycomponent.main.Config.from_config_wrapper", return_value=mock_config)
130+
```

0 commit comments

Comments
 (0)