Skip to content

Commit 3304440

Browse files
committed
cooking domain fuzzy matching patch
Made-with: Cursor
1 parent da0901a commit 3304440

5 files changed

Lines changed: 351 additions & 27 deletions

File tree

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -197,3 +197,6 @@ service-account.json
197197
tests/data/okh_generation/clones/
198198
.llm_chunk_cache/*
199199

200+
201+
# frontend node_modules
202+
frontend/node_modules/

.repo-map.md

Lines changed: 57 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ REPOSITORY MAP (Aider Style)
33
================================================================================
44
Repository: supply-graph-ai
55

6-
Total Python files: 300
6+
Total Python files: 303
77

88
├── demo/
99
│ ├── __init__.py
@@ -501,17 +501,23 @@ Total Python files: 300
501501
│ │ │ │ class Context
502502
│ │ │ │ class ContextsResponse
503503
│ │ │ │ class ErrorResponse
504-
│ │ │ └── validation/
504+
│ │ │ ├── validation/
505+
│ │ │ │ ├── __init__.py
506+
│ │ │ │ ├── context.py
507+
│ │ │ │ │ class ValidationContextModel
508+
│ │ │ │ ├── request.py
509+
│ │ │ │ │ class ValidationRequest
510+
│ │ │ │ │ class ValidationContextRequest
511+
│ │ │ │ └── response.py
512+
│ │ │ │ class ValidationIssue
513+
│ │ │ │ class ValidationResponse
514+
│ │ │ │ class ValidationContextResponse
515+
│ │ │ └── visualization/
505516
│ │ │ ├── __init__.py
506-
│ │ │ ├── context.py
507-
│ │ │ │ class ValidationContextModel
508-
│ │ │ ├── request.py
509-
│ │ │ │ class ValidationRequest
510-
│ │ │ │ class ValidationContextRequest
511517
│ │ │ └── response.py
512-
│ │ │ class ValidationIssue
513-
│ │ │ class ValidationResponse
514-
│ │ │ class ValidationContextResponse
518+
│ │ │ class VisualizationSection
519+
│ │ │ class VisualizationBundleData
520+
│ │ │ class VisualizationBundleResponse
515521
│ │ └── routes/
516522
│ │ ├── __init__.py
517523
│ │ ├── convert.py
@@ -528,6 +534,7 @@ Total Python files: 300
528534
│ │ │ def render_match_summary()
529535
│ │ │ def _build_optional_human_summary()
530536
│ │ │ def _build_match_suggestions()
537+
│ │ │ def _detect_domain_from_manifest()
531538
│ │ │ def _matches_filters()
532539
│ │ ├── okh.py
533540
│ │ ├── okw.py
@@ -551,6 +558,8 @@ Total Python files: 300
551558
│ │ │ │ class CookingExtractor (_initial_parse_requirements, _detailed_extract_requirements, _initial_parse_capabilities...)
552559
│ │ │ ├── matchers.py
553560
│ │ │ │ class CookingMatcher (match, _can_satisfy, generate_supply_tree)
561+
│ │ │ │ def _fuzzy_match()
562+
│ │ │ │ def _fuzzy_overlap_count()
554563
│ │ │ ├── models.py
555564
│ │ │ │ class KitchenCapability (from_dict, to_dict, is_kitchen_data...)
556565
│ │ │ ├── validation/
@@ -1044,13 +1053,15 @@ Total Python files: 300
10441053
│ │ │ def register_cooking_domain()
10451054
│ │ │ def register_manufacturing_domain()
10461055
│ │ │ def initialize_default_domains()
1047-
│ │ └── storage_service.py
1048-
│ │ class StorageRegistry (register_handler, get_handler)
1049-
│ │ class StorageService (__init__)
1050-
│ │ class DomainStorageHandler (__init__, _get_domain, _get_storage_key...)
1051-
│ │ class OKHStorageHandler (_serialize, _deserialize, _get_object_id...)
1052-
│ │ class OKWStorageHandler (_serialize, _deserialize, _get_object_id...)
1053-
│ │ def _register_handlers()
1056+
│ │ ├── storage_service.py
1057+
│ │ │ class StorageRegistry (register_handler, get_handler)
1058+
│ │ │ class StorageService (__init__)
1059+
│ │ │ class DomainStorageHandler (__init__, _get_domain, _get_storage_key...)
1060+
│ │ │ class OKHStorageHandler (_serialize, _deserialize, _get_object_id...)
1061+
│ │ │ class OKWStorageHandler (_serialize, _deserialize, _get_object_id...)
1062+
│ │ │ def _register_handlers()
1063+
│ │ └── visualization_service.py
1064+
│ │ class VisualizationService (_now_iso, build_match_visualization_bundle, build_solution_visualization_bundle...)
10541065
│ ├── storage/
10551066
│ │ ├── __init__.py
10561067
│ │ ├── auth_storage.py
@@ -1260,7 +1271,7 @@ Total Python files: 300
12601271

12611272
## Overview
12621273

1263-
Total files analyzed: 300
1274+
Total files analyzed: 303
12641275

12651276
## Entry Points
12661277

@@ -1653,6 +1664,18 @@ Public entrypoint:
16531664
- `DomainStorageHandler` (inherits: Generic)
16541665
- Base class for domain-specific storage handlers...
16551666

1667+
### `src/core/services/visualization_service.py`
1668+
> Visualization bundle and report generation service....
1669+
1670+
**Exports:** VisualizationService
1671+
1672+
**Classes:**
1673+
- `VisualizationService`
1674+
- Methods: build_match_visualization_bundle, build_solution_visualization_bundle, render_html_report, normalize_graphml_metadata
1675+
- Builds canonical visualization artifacts for API/CLI consumers....
1676+
1677+
**Internal Dependencies:** 1 imports
1678+
16561679
### `src/core/storage/migration_service.py`
16571680
> Storage Migration Service
16581681
@@ -2109,6 +2132,22 @@ This module provides Pydantic models for validation r...
21092132
- `ValidationContextResponse` (inherits: BaseModel)
21102133
- Response model for validation context operations...
21112134

2135+
### `src/core/api/models/visualization/__init__.py`
2136+
> Visualization API models....
2137+
2138+
### `src/core/api/models/visualization/response.py`
2139+
> Response models for visualization artifacts....
2140+
2141+
**Exports:** VisualizationSection, VisualizationBundleData, VisualizationBundleResponse
2142+
2143+
**Classes:**
2144+
- `VisualizationSection` (inherits: BaseModel)
2145+
- Generic visualization section for additive contract evolution....
2146+
- `VisualizationBundleData` (inherits: BaseModel)
2147+
- Top-level visualization payload....
2148+
- `VisualizationBundleResponse` (inherits: SuccessResponse)
2149+
- Standard success envelope for visualization payloads....
2150+
21122151
### `src/core/domains/cooking/models.py`
21132152

21142153
**Exports:** KitchenCapability
Lines changed: 250 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,250 @@
1+
# Match endpoint — end-to-end validation runbook
2+
3+
Repeatable process for verifying the `POST /v1/api/match` integration between
4+
`project-data-platform-ts` (frontend) and `supply-graph-ai` (backend).
5+
6+
---
7+
8+
## Prerequisites
9+
10+
| Requirement | Notes |
11+
|---|---|
12+
| Docker Desktop running | `docker info` must succeed |
13+
| Node ≥ 18 | For the verification script |
14+
| Network access to Azure Blob Storage | `projdatablobstorage.blob.core.windows.net` |
15+
| `.env` present in `supply-graph-ai/` | Contains Azure credentials |
16+
17+
---
18+
19+
## 1 — Start the backend
20+
21+
The backend must be running before any match request is made.
22+
23+
```bash
24+
cd supply-graph-ai
25+
26+
# First-time or after code changes: build the image
27+
docker compose build ohm-api
28+
29+
# Start (or restart) the container
30+
docker compose up -d --force-recreate ohm-api
31+
```
32+
33+
> **Important:** always use `--force-recreate` so the container picks up the
34+
> latest image and `.env` values. Plain `docker compose up -d` reuses an
35+
> existing container and may run stale code even after a rebuild.
36+
37+
Wait until the health check passes (takes ~20 s while spaCy models load):
38+
39+
```bash
40+
# Poll until HTTP 200
41+
until curl -sf http://localhost:8001/health; do sleep 3; done && echo "Ready"
42+
```
43+
44+
Expected response:
45+
```json
46+
{"status":"ok","domains":["cooking","manufacturing"],"version":"1.0.0"}
47+
```
48+
49+
---
50+
51+
## 2 — Run the frontend verification script
52+
53+
`project-data-platform-ts` ships a script that fires exactly the same request
54+
shape the Nuxt app uses.
55+
56+
```bash
57+
cd project-data-platform-ts
58+
59+
# Cookie-recipe demo (OKH file already in Azure `okh` container)
60+
OKH_FNAME=okh-chococolate-chip-cookies-recipe.json node scripts/verify-ohm-match.mjs
61+
```
62+
63+
**Expected output:**
64+
```
65+
Request URL: http://localhost:8001/v1/api/match
66+
Request body: {"okh_url":"https://projdatablobstorage.blob.core.windows.net/okh/okh-chococolate-chip-cookies-recipe.json"}
67+
HTTP 200 OK
68+
Response envelope: {
69+
status: 'success',
70+
message: 'Matching completed successfully',
71+
total_solutions: 2,
72+
solutions_length: 2
73+
}
74+
OK: match request matches front-end contract and returned HTTP 200.
75+
```
76+
77+
Any other `total_solutions` or a non-zero exit code indicates a problem —
78+
see the **Troubleshooting** section below.
79+
80+
---
81+
82+
## 3 — Optional: inspect solution detail via curl
83+
84+
```bash
85+
curl -s -X POST http://localhost:8001/v1/api/match \
86+
-H 'Content-Type: application/json' \
87+
-d '{"okh_url":"https://projdatablobstorage.blob.core.windows.net/okh/okh-chococolate-chip-cookies-recipe.json"}' \
88+
| python3 -c "
89+
import sys, json
90+
d = json.load(sys.stdin)
91+
print('total_solutions:', d['data']['total_solutions'])
92+
for s in d['data']['solutions']:
93+
meta = s.get('tree', {}).get('metadata', {}) if isinstance(s.get('tree'), dict) else {}
94+
print(f\" {s['facility_name']} confidence={s['confidence']:.3f}\")
95+
print(f\" ingredient_overlap={meta.get('ingredient_overlap','?')}/{meta.get('ingredient_count','?')}\")
96+
print(f\" tool_overlap={meta.get('tool_overlap','?')}/{meta.get('tool_count','?')}\")
97+
"
98+
```
99+
100+
---
101+
102+
## 4 — Using a different OKH file
103+
104+
Set `OKH_URL` to the full public blob URL, or `OKH_FNAME` to the filename
105+
inside the `okh` container:
106+
107+
```bash
108+
# By filename
109+
OKH_FNAME=my-other-recipe.json node scripts/verify-ohm-match.mjs
110+
111+
# By full URL
112+
OKH_URL=https://projdatablobstorage.blob.core.windows.net/okh/my-file.json \
113+
node scripts/verify-ohm-match.mjs
114+
```
115+
116+
---
117+
118+
## Troubleshooting
119+
120+
### `total_solutions: 0` — empty result set
121+
122+
Work through the checklist in order:
123+
124+
#### A. Is the container running the latest image?
125+
126+
```bash
127+
cd supply-graph-ai
128+
docker compose ps # container should show "Up"
129+
docker compose build ohm-api && docker compose up -d --force-recreate ohm-api
130+
```
131+
132+
Verify the domain re-detection patch is present in the running container:
133+
134+
```bash
135+
docker exec ohm-api grep -c "_detect_domain_from_manifest" \
136+
/app/src/core/api/routes/match.py
137+
# Must print 2; if it prints 0, the image is stale — rebuild above
138+
```
139+
140+
#### B. Does the container reach Azure Storage?
141+
142+
Tail the logs for a request:
143+
144+
```bash
145+
docker compose logs ohm-api -f &
146+
# (trigger a request in another terminal)
147+
```
148+
149+
Look for these log lines in order — each confirms a stage of the pipeline:
150+
151+
| Log message | What it means |
152+
|---|---|
153+
| `Detected domain: manufacturing` | Request received; initial detection (expected) |
154+
| `Re-detected domain as cooking from OKH manifest content` | Manifest inspected; domain corrected to cooking ✓ |
155+
| `Listing kitchen capabilities` | OKW service called for cooking domain ✓ |
156+
| `Found N unique kitchen capabilities` | N kitchen OKW files loaded from Azure ✓ |
157+
| `Filtered facilities: N out of N` | Facilities passed filter ✓ |
158+
| `Enhanced matching completed: N results` | Matcher produced N candidates ✓ |
159+
| `Processed matching results: N solutions` | Final count after confidence threshold |
160+
161+
**If `Re-detected domain as cooking` is missing:** the OKH manifest did not
162+
trigger cooking detection. Check that the manifest has at least one of:
163+
- `domain: "cooking"` field
164+
- `manufacturing_processes` containing only cooking terms (bake, mix, etc.)
165+
- A `function` field with a cooking keyword plus `tool_list` or `making_instructions`
166+
167+
**If `Found 0 unique kitchen capabilities`:** no kitchen OKW files are visible
168+
in the configured storage container. Verify `.env`:
169+
```
170+
STORAGE_PROVIDER=azure_blob
171+
AZURE_STORAGE_ACCOUNT=projdatablobstorage
172+
AZURE_STORAGE_CONTAINER=newformats # contains ButlerKitchen.json etc.
173+
AZURE_STORAGE_OKW_CONTAINER_NAME=okw # path prefix inside container
174+
```
175+
176+
**If `Processed matching results: 0 solutions`** after `Enhanced matching
177+
completed: N results`: the confidence filter is removing all candidates.
178+
The default threshold is `min_confidence=0.1`. Pass a lower value explicitly
179+
to confirm:
180+
181+
```bash
182+
curl -s -X POST http://localhost:8001/v1/api/match \
183+
-H 'Content-Type: application/json' \
184+
-d '{"okh_url":"https://projdatablobstorage.blob.core.windows.net/okh/okh-chococolate-chip-cookies-recipe.json","min_confidence":0.05}'
185+
```
186+
187+
If solutions appear with a lower threshold, the kitchen OKW files have sparse
188+
capability data. Update the files in Azure to add more `ingredients`,
189+
`tools`, and `appliances` so matches score above 0.1.
190+
191+
### Container crashes / fails to start
192+
193+
```bash
194+
docker compose logs ohm-api --tail 50
195+
```
196+
197+
Common causes:
198+
- Missing `.env` or missing `AZURE_STORAGE_KEY`
199+
- spaCy model not downloaded inside the image (re-run `docker compose build`)
200+
201+
### `node: not found` when running the verify script
202+
203+
Install Node.js ≥ 18 or use `npx node@18`:
204+
```bash
205+
npx --yes node@18 scripts/verify-ohm-match.mjs OKH_FNAME=okh-chococolate-chip-cookies-recipe.json
206+
```
207+
208+
---
209+
210+
## How the pipeline works (summary)
211+
212+
```
213+
frontend supply-graph-ai
214+
─────── ───────────────
215+
POST /v1/api/match
216+
{ okh_url: "...blob.../okh/...json" }
217+
──────────►
218+
1. Fetch OKH manifest from okh_url
219+
2. Inspect manifest fields → detect domain
220+
(cooking if manufacturing_processes = ["bake"] etc.)
221+
3. Load kitchen OKW files from Azure (newformats/okw/)
222+
4. For each kitchen:
223+
- Extract capabilities (ingredients, tools, appliances)
224+
- Fuzzy-match against recipe requirements
225+
- Compute confidence score (0–1)
226+
5. Filter by min_confidence (default 0.1)
227+
6. Return solutions sorted by confidence
228+
◄──────────────────────
229+
{ data: { solutions: [...], total_solutions: 2 } }
230+
231+
frontend renders facility cards
232+
with facility_name + confidence
233+
```
234+
235+
---
236+
237+
## Known data quality notes
238+
239+
The kitchen OKW files in Azure (`newformats/okw/`) currently have partial
240+
ingredient and tool lists. Confidence scores reflect actual overlap:
241+
242+
| Kitchen | Confidence (cookie recipe) | Notes |
243+
|---|---|---|
244+
| Butler Kitchen | ~0.30 | flour, sugar, chocolate chips match; spatula matches |
245+
| Rob's Dessert Kitchen | ~0.23 | flour, sugar match; spatula matches |
246+
247+
The matching engine uses **fuzzy substring matching** (e.g. `"sugar"` in
248+
kitchen matches `"brown sugar"` in recipe, `"chocolate chips"` matches
249+
`"chocolate chip"`). To improve scores, add more ingredients and appliances
250+
to the kitchen OKW files in the `newformats` container.

0 commit comments

Comments
 (0)