evaluator/CLAUDE.md

The Lab 4 schema summary evaluator. An Express server that accepts schema summaries, grades them with Claude against a six-criterion rubric, and issues a Credly badge on pass.

Architecture

TypeScript, compiled to dist/ via tsc. npm run dev uses tsx --watch for hot-reload from backend/*.ts; npm start runs the compiled dist/backend/server.js. Dockerfile is multi-stage: build stage compiles TS, runtime stage runs node dist/backend/server.js.

backend/server.ts: Express. POST /api/evaluate accepts {session, name, email, summary}. Calls Anthropic via the Grove gateway (grove-gateway-prod.azure-api.net/grove-foundry-prod/anthropic/v1/messages, authenticated with the api-key header) with the rubric as system prompt. On pass, calls backend/credly.ts (unless CREDLY_DRY_RUN=1).
backend/credly.ts: Credly v1 API integration. HTTP Basic auth (token as username, blank password). Up to 3 retries on 5xx. 422 with duplicate-badge is treated as success.
frontend/index.html: single-file vanilla JS form. Requires ?session=... query param; refuses to render without it. Submits to /api/evaluate and renders the verdict.

Environment variables

GROVE_API_KEY: required. Auth for the Grove gateway, which proxies Anthropic. Passed as the api-key header (not x-api-key). Static, separate from anything else.
ANTHROPIC_MODEL: defaults to claude-opus-4-8. Don't downgrade. Model IDs stay Anthropic-shaped because Grove forwards to Anthropic.
CREDLY_TOKEN: Credly Acclaim API token. Required in production.
CREDLY_ORG_ID: Credly organization UUID. Required in production.
CREDLY_BADGE_TEMPLATE_ID: Badge template UUID. Required in production.
CREDLY_DRY_RUN: set to 1 for local rehearsal. Logs "would issue badge" without calling Credly.
PORT: defaults to 8080.

Rubric coordination

The rubric lives in two places. /rubric.md is the human-readable version. The RUBRIC_PROMPT constant in backend/server.ts is what the grading model sees. Both must stay in sync.

Six criteria, weights sum to 100. Pass threshold: 80, with no criterion at 0.

#	Criterion	Weight
1	Schema reflects access patterns, not relational normalization habits	20
2	Embed vs. reference decisions are justified with explicit reasoning	20
3	Indexes are present and tied to specific query patterns	15
4	No naive relational translation	15
5	MongoDB-native features are used	15
6	The schema could evolve without a migration	15

Verdict shape

The grading model is instructed to return JSON only, in this exact shape:

{
  "overall_score": 0,
  "overall_verdict": "pass" | "needs-revision",
  "criteria": [
    { "name": "...", "weight": 20, "score": 0, "verdict": "pass" | "partial" | "needs-revision", "feedback": "..." }
  ]
}

Server-side, overall_score is recomputed from the criterion scores (don't trust the model's sum) and overall_verdict is enforced as pass only when overall_score >= 80 AND no criterion scored 0.

Calibration

This is the part that needs the most human judgment. Generate a deliberately-mediocre summary and a deliberately-excellent one, submit both, and tune the rubric prompt until both score appropriately. Don't delegate this calibration to the agent. You are the only one who knows what good schema design looks like for this workshop.

Things to be careful about

Don't change the rubric weights or the pass threshold without an explicit conversation with the workshop owner. Other MongoDB skill badges use 80/100 and the calibration is consistent across them.

Don't change the verdict JSON shape without updating the frontend renderer to match. The frontend assumes the exact field names above.

Don't print sensitive data (Grove key, Credly token) in logs. The submission log includes email, name, session, scores, and badge issuance result; it never includes the raw summary text or API credentials.

The session ID is opaque to the evaluator. It exists for tracking which event a submission came from (e.g., ai-coding-with-mongodb-devday-20260120-newyork). The evaluator records it but doesn't validate it.

Credly issuance is idempotent. A repeated submission with the same email and session should not double-issue. Credly's 422 duplicate response is treated as success, and the dry-run path skips the call entirely.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

evaluator/CLAUDE.md

Architecture

Environment variables

Rubric coordination

Verdict shape

Calibration

Things to be careful about

Uh oh!

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

evaluator/CLAUDE.md

Architecture

Environment variables

Rubric coordination

Verdict shape

Calibration

Things to be careful about