The Lab 4 schema summary evaluator. An Express server that accepts schema summaries, grades them with Claude against a six-criterion rubric, and issues a Credly badge on pass.
TypeScript, compiled to dist/ via tsc. npm run dev uses tsx --watch for hot-reload from backend/*.ts; npm start runs the compiled dist/backend/server.js. Dockerfile is multi-stage: build stage compiles TS, runtime stage runs node dist/backend/server.js.
backend/server.ts: Express. POST/api/evaluateaccepts{session, name, email, summary}. Calls Anthropic via the Grove gateway (grove-gateway-prod.azure-api.net/grove-foundry-prod/anthropic/v1/messages, authenticated with theapi-keyheader) with the rubric as system prompt. On pass, callsbackend/credly.ts(unlessCREDLY_DRY_RUN=1).backend/credly.ts: Credly v1 API integration. HTTP Basic auth (token as username, blank password). Up to 3 retries on 5xx. 422 with duplicate-badge is treated as success.frontend/index.html: single-file vanilla JS form. Requires?session=...query param; refuses to render without it. Submits to/api/evaluateand renders the verdict.
GROVE_API_KEY: required. Auth for the Grove gateway, which proxies Anthropic. Passed as theapi-keyheader (notx-api-key). Static, separate from anything else.ANTHROPIC_MODEL: defaults toclaude-opus-4-8. Don't downgrade. Model IDs stay Anthropic-shaped because Grove forwards to Anthropic.CREDLY_TOKEN: Credly Acclaim API token. Required in production.CREDLY_ORG_ID: Credly organization UUID. Required in production.CREDLY_BADGE_TEMPLATE_ID: Badge template UUID. Required in production.CREDLY_DRY_RUN: set to1for local rehearsal. Logs "would issue badge" without calling Credly.PORT: defaults to 8080.
The rubric lives in two places. /rubric.md is the human-readable version. The RUBRIC_PROMPT constant in backend/server.ts is what the grading model sees. Both must stay in sync.
Six criteria, weights sum to 100. Pass threshold: 80, with no criterion at 0.
| # | Criterion | Weight |
|---|---|---|
| 1 | Schema reflects access patterns, not relational normalization habits | 20 |
| 2 | Embed vs. reference decisions are justified with explicit reasoning | 20 |
| 3 | Indexes are present and tied to specific query patterns | 15 |
| 4 | No naive relational translation | 15 |
| 5 | MongoDB-native features are used | 15 |
| 6 | The schema could evolve without a migration | 15 |
The grading model is instructed to return JSON only, in this exact shape:
{
"overall_score": 0,
"overall_verdict": "pass" | "needs-revision",
"criteria": [
{ "name": "...", "weight": 20, "score": 0, "verdict": "pass" | "partial" | "needs-revision", "feedback": "..." }
]
}Server-side, overall_score is recomputed from the criterion scores (don't trust the model's sum) and overall_verdict is enforced as pass only when overall_score >= 80 AND no criterion scored 0.
This is the part that needs the most human judgment. Generate a deliberately-mediocre summary and a deliberately-excellent one, submit both, and tune the rubric prompt until both score appropriately. Don't delegate this calibration to the agent. You are the only one who knows what good schema design looks like for this workshop.
Don't change the rubric weights or the pass threshold without an explicit conversation with the workshop owner. Other MongoDB skill badges use 80/100 and the calibration is consistent across them.
Don't change the verdict JSON shape without updating the frontend renderer to match. The frontend assumes the exact field names above.
Don't print sensitive data (Grove key, Credly token) in logs. The submission log includes email, name, session, scores, and badge issuance result; it never includes the raw summary text or API credentials.
The session ID is opaque to the evaluator. It exists for tracking which event a submission came from (e.g., ai-coding-with-mongodb-devday-20260120-newyork). The evaluator records it but doesn't validate it.
Credly issuance is idempotent. A repeated submission with the same email and session should not double-issue. Credly's 422 duplicate response is treated as success, and the dry-run path skips the call entirely.