Skip to content

Latest commit

 

History

History
197 lines (134 loc) · 9.17 KB

File metadata and controls

197 lines (134 loc) · 9.17 KB

Architecture

Read this before writing code. For the why behind key decisions, follow the ADR links in docs/decisions/. Do not make architectural changes without first writing an ADR.


Stack

Layer Technology
Frontend Next.js 16 (App Router) + Tailwind CSS
Backend FastAPI (Python 3.13) + Pydantic v2
Database PostgreSQL 18 + Alembic migrations
Cache / rate-limiting Redis
Shared types FastAPI openapi.json → TypeScript via @hey-api/openapi-ts (CI-generated, never hand-written)
Deployment Single AWS EC2 instance (ELB to be added when needed)
CI/CD GitHub Actions

Repo layout

routis/
├── .github/          # Workflows, issue/PR templates, CODEOWNERS
├── apps/
│   ├── web/          # Next.js 16
│   └── api/          # FastAPI
│       └── migrations/  ← Alembic — canonical schema source
├── packages/
│   └── shared/       # Pydantic schemas + generated TS types
├── infra/            # EC2 setup scripts, nginx config, systemd units
├── docs/
│   ├── decisions/    # ADRs
│   ├── schema/       # schema.dbml + erd.png (CI-generated)
│   ├── roadmap/      # One .md per milestone
│   └── runbooks/     # Deploy, migrate, incident procedures
├── docker-compose.yml
├── turbo.json
├── CONTRIBUTING.md
└── README.md

Data model

Core principle: reviews attach to a CourseImplementation (course + year + period), never directly to a Course. This is what enables rating timelines and prevents a redesigned course from inheriting an old reputation.

Faculty → DegreeProgram → Course → CourseImplementation → Review
                                                        ↑
                                                   Lecturer (many-to-many)

Key entities:

Course — stable identity keyed by course_code. Holds only static metadata.

CourseImplementation — one run of a course: course_id + academic_year + period. All reviews hang off this.

Review — contains all fields related to a review submission. No user FK. No email. No IP address.

Lecturer — linked to CourseImplementation via join table.

ReviewVote — keyed by device fingerprint hash. Rate-limited in Redis. Never in Postgres.

VerifiedSubmission — stores SHA256(email + implementation_id + SECRET_SALT). Checked before issuing a token; created on successful verification. This is the only trace of a verification event. It proves a slot is taken without revealing whose slot it is. One row per student per CourseImplementation. No email, no user FK.

Alembic migrations are the source of truth. docs/schema/schema.dbml mirrors them in human-readable form. docs/schema/erd.png is regenerated by CI on every migration PR — never draw it by hand.


Anonymity model

See ADR-001.

Two submission paths:

Unverified (default) — student submits form. Anti-abuse check via device fingerprint hash + IP hash (Redis only, 1h TTL). Review stored with is_verified = false. No identity data collected.

NAT warning: University networks (Eduroam, PanOulu) use NAT — hundreds of students may share one public IP. Device fingerprint is the primary rate-limit signal. IP hash is a secondary, softer limit. Never hard-block on IP alone or you will lock out entire campus.

Verified (opt-in) — student provides university email → receives one-time token → submits token with review. The anonymity service:

  1. Validates the token (confirms @student.oulu.fi address existed).
  2. Computes SHA256(email + implementation_id + SECRET_SALT).
  3. Checks verified_submissions for that hash — rejects if it exists (one verified review per student per course).
  4. Inserts the hash into verified_submissions.
  5. Marks the token consumed.
  6. Sets is_verified = true on the review row.
  7. Drops the email. It is never written to any database table.

What is and is not stored:

Data Stored?
Email address No — dropped after step 6
IP address (raw) No
IP hash (salted) Redis only, 1h TTL, rate-limiting only
Device fingerprint hash Redis only, 1h TTL
SHA256(email + impl_id + salt) Yes — verified_submissions table, permanently
is_verified flag Yes, on the review row

The hash in verified_submissions is a one-way slot marker. It cannot be reversed to recover the email. It only answers the question "has this slot been used?" — nothing else.


Aggregation

Scores are calculated from review rows and cached in Redis. They are never stored as columns.

Weighted average: score = Σ(rating × weight) / Σ(weight)

Weight decay: weight = 1.0 / (1 + 0.002 × days_since_submission) — roughly halves over 500 days.

Verified reviews get a 1.5× weight multiplier before decay.

Leaderboard: composite of average_rating × 0.6 + participation_score × 0.4. Recalculated by a daily cron job. Not real-time by design.

participation_score is not a raw review count — that would permanently favour large mandatory first year courses over small, highly-rated master's courses. Instead: log(1 + review_count) / log(1 + enrolled_students), capped at 1.0. The logarithm means the benefit of volume tapers off after ~20–30 reviews. enrolled_students comes from manually maintained course metadata; if unknown, a configurable course-size-tier default is used.

Helpfulness (Wilson score): recalculated on every vote event and stored on reviews.helpfulness_score.


Deployment

Single EC2 instance running all services as Docker containers via Docker Compose. Nginx sits in front as a reverse proxy, terminating TLS and routing /api/* to FastAPI and everything else to Next.js.

Internet → Nginx (EC2) → Next.js container  (port 3000)
                       → FastAPI container  (port 8000)
                       → Postgres container (port 5432)
                       → Redis container    (port 6379)

ELB and a second EC2 instance will be added when traffic justifies it. The Compose setup is designed so containers can be split across hosts without changes to application code.

Database backups: A db-backup sidecar container runs pg_dump on a cron schedule and uploads the compressed dump to a dedicated S3 bucket. Retention: daily dumps for 30 days, weekly dumps for 6 months. The backup bucket has versioning and object lock enabled. The runbook for restore is in docs/runbooks/restore-db.md. If the project migrates to RDS, this sidecar is removed and RDS automated snapshots replace it — that migration path is documented in docs/decisions/.

Config and scripts for nginx, TLS (Let's Encrypt via certbot), systemd service management, and the backup sidecar live in infra/.


CI/CD

Every PR to develop runs: lint → typecheck → schema-sync check → openapi codegen drift check → tests → build.

On merge to develop: Staging EC2 → git pulldocker compose up -d --buildalembic upgrade head.

On merge to main: same against production EC2 → regenerate docs/schema/erd.png → commit back.


Local development

git clone https://github.com/your-org/routis.git && cd routis
cp .env.example .env
docker compose up
docker compose exec api python scripts/seed.py   # seed course catalog
Service URL
Web http://localhost:3000
API http://localhost:8000
API docs http://localhost:8000/docs

Run migrations: docker compose exec api alembic upgrade head

Regenerate TS types after API changes: pnpm --filter web run codegen — this fetches openapi.json from the running API and runs @hey-api/openapi-ts. Commit the output.


Constraints worth knowing

  • No user accounts for reviewers — by design. A submitter cannot edit or delete their review because there is no way to re-identify them as the author.
  • One verified review per student per course — enforced via the verified_submissions hash table. A student can submit verified reviews for multiple courses but cannot submit a second verified review for the same CourseImplementation.
  • Verification is stateless for the student — no account, no session. Each verification goes through the email token flow independently.
  • Leaderboard is daily, not real-time — do not attempt to make it real-time.
  • Course catalog is manually seeded — no live Peppi integration in v1.

Glossary

Term Definition
CourseImplementation One run of a course: course_id + academic_year + period. The unit reviews attach to.
ADR Architecture Decision Record. Immutable log of a significant design choice. Lives in docs/decisions/.
Verified review Submitted via email token flow. is_verified = true. No identity stored.
Weight Per-review float used in score aggregation. Decays over time.
Helpfulness score Wilson score lower bound from upvote/downvote counts.
VerifiedSubmission A row storing SHA256(email + implementation_id + salt). Proves a verification slot is taken without revealing who took it.
Period University of Oulu teaching period 1–5 (5 = summer).