|
1 | 1 | # Solution: Elite Track / Algorithms and Complexity Lab |
2 | 2 |
|
3 | | -> **STOP** — Have you attempted this project yourself first? |
| 3 | +> **STOP** -- Have you attempted this project yourself first? |
4 | 4 | > |
5 | 5 | > Learning happens in the struggle, not in reading answers. |
6 | 6 | > Spend at least 20 minutes trying before reading this solution. |
7 | | -> If you are stuck, try the [Walkthrough](./WALKTHROUGH.md) first — it guides |
8 | | -> your thinking without giving away the answer. |
| 7 | +> Check the [README](./README.md) for requirements and the |
| 8 | +> [Walkthrough](./WALKTHROUGH.md) for guided hints. |
9 | 9 |
|
10 | 10 | --- |
11 | 11 |
|
12 | | - |
13 | 12 | ## Complete solution |
14 | 13 |
|
15 | 14 | ```python |
16 | | -# WHY parse_args: [explain the design reason] |
17 | | -# WHY load_lines: [explain the design reason] |
18 | | -# WHY classify_line: [explain the design reason] |
19 | | -# WHY build_summary: [explain the design reason] |
20 | | -# WHY write_summary: [explain the design reason] |
21 | | -# WHY main: [explain the design reason] |
22 | | - |
23 | | -# [paste the complete working solution here] |
24 | | -# Include WHY comments on every non-obvious line. |
| 15 | +"""Algorithms and Complexity Lab. |
| 16 | +
|
| 17 | +This project is part of the elite extension track. |
| 18 | +It intentionally emphasizes explicit, testable engineering decisions. |
| 19 | +""" |
| 20 | + |
| 21 | +# WHY deterministic execution? -- Algorithms labs must produce repeatable results |
| 22 | +# so learners can compare Big-O predictions against actual run behavior. Any |
| 23 | +# randomness would make performance analysis impossible to reproduce. |
| 24 | + |
| 25 | +from __future__ import annotations |
| 26 | + |
| 27 | +import argparse |
| 28 | +import json |
| 29 | +from datetime import datetime, timezone |
| 30 | +from pathlib import Path |
| 31 | +from typing import Any |
| 32 | + |
| 33 | + |
| 34 | +def parse_args() -> argparse.Namespace: |
| 35 | + """Parse CLI inputs for deterministic project execution.""" |
| 36 | + parser = argparse.ArgumentParser(description="Algorithms and Complexity Lab") |
| 37 | + # WHY required --input/--output? -- Explicit file paths make every run |
| 38 | + # reproducible. No hidden defaults means the learner controls exactly |
| 39 | + # where data comes from and where results go. |
| 40 | + parser.add_argument("--input", required=True, help="Path to input text data") |
| 41 | + parser.add_argument("--output", required=True, help="Path to output JSON summary") |
| 42 | + # WHY optional run-id? -- Traceability across repeated benchmark sessions. |
| 43 | + # When comparing algorithm variants, the run-id distinguishes results. |
| 44 | + parser.add_argument("--run-id", default="manual-run", help="Optional run identifier") |
| 45 | + return parser.parse_args() |
| 46 | + |
| 47 | + |
| 48 | +def load_lines(input_path: Path) -> list[str]: |
| 49 | + """Load normalized input lines and reject empty datasets safely.""" |
| 50 | + if not input_path.exists(): |
| 51 | + raise FileNotFoundError(f"input file not found: {input_path}") |
| 52 | + # WHY strip whitespace? -- Cross-platform consistency. Windows editors |
| 53 | + # add \r\n, macOS uses \n. Stripping normalizes both to the same output |
| 54 | + # so test assertions are stable across platforms. |
| 55 | + lines = [line.strip() for line in input_path.read_text(encoding="utf-8").splitlines() if line.strip()] |
| 56 | + if not lines: |
| 57 | + raise ValueError("input file contains no usable lines") |
| 58 | + return lines |
| 59 | + |
| 60 | + |
| 61 | +def classify_line(line: str) -> dict[str, Any]: |
| 62 | + """Transform one CSV-like line into structured fields with validation.""" |
| 63 | + parts = [piece.strip() for piece in line.split(",")] |
| 64 | + # WHY exactly 3 fields? -- Strict schema validation catches corrupt data |
| 65 | + # at parse time rather than producing mysterious downstream errors. |
| 66 | + if len(parts) != 3: |
| 67 | + raise ValueError(f"invalid line format (expected 3 comma fields): {line}") |
| 68 | + |
| 69 | + name, score_raw, severity = parts |
| 70 | + score = int(score_raw) |
| 71 | + return { |
| 72 | + "name": name, |
| 73 | + "score": score, |
| 74 | + "severity": severity, |
| 75 | + # WHY is_high_risk boolean? -- Creates a consistent risk lens for |
| 76 | + # downstream summaries. Centralizing the definition of "high risk" |
| 77 | + # here means the summary logic does not need to re-derive it. |
| 78 | + "is_high_risk": severity in {"warn", "critical"} or score < 5, |
| 79 | + } |
| 80 | + |
| 81 | + |
| 82 | +def build_summary(records: list[dict[str, Any]], project_title: str, run_id: str) -> dict[str, Any]: |
| 83 | + """Build deterministic summary payload for testing and teach-back review.""" |
| 84 | + high_risk_count = sum(1 for record in records if record["is_high_risk"]) |
| 85 | + avg_score = round(sum(record["score"] for record in records) / len(records), 2) |
| 86 | + |
| 87 | + return { |
| 88 | + "project_title": project_title, |
| 89 | + "run_id": run_id, |
| 90 | + "generated_utc": datetime.now(timezone.utc).isoformat(), |
| 91 | + "record_count": len(records), |
| 92 | + "high_risk_count": high_risk_count, |
| 93 | + "average_score": avg_score, |
| 94 | + "records": records, |
| 95 | + } |
| 96 | + |
| 97 | + |
| 98 | +def write_summary(output_path: Path, payload: dict[str, Any]) -> None: |
| 99 | + """Write JSON output with parent directory creation for first-time runs.""" |
| 100 | + # WHY mkdir(parents=True)? -- First-time runners may not have the data/ |
| 101 | + # directory yet. Creating it automatically prevents a confusing |
| 102 | + # FileNotFoundError on the very first run. |
| 103 | + output_path.parent.mkdir(parents=True, exist_ok=True) |
| 104 | + output_path.write_text(json.dumps(payload, indent=2), encoding="utf-8") |
| 105 | + |
| 106 | + |
| 107 | +def main() -> int: |
| 108 | + """Execute end-to-end project run.""" |
| 109 | + args = parse_args() |
| 110 | + input_path = Path(args.input) |
| 111 | + output_path = Path(args.output) |
| 112 | + |
| 113 | + lines = load_lines(input_path) |
| 114 | + records = [classify_line(line) for line in lines] |
| 115 | + |
| 116 | + payload = build_summary(records, "Algorithms and Complexity Lab", args.run_id) |
| 117 | + write_summary(output_path, payload) |
| 118 | + |
| 119 | + print(f"output_summary.json written to {output_path}") |
| 120 | + return 0 |
| 121 | + |
| 122 | + |
| 123 | +if __name__ == "__main__": |
| 124 | + raise SystemExit(main()) |
25 | 125 | ``` |
26 | 126 |
|
27 | 127 | ## Design decisions |
28 | 128 |
|
29 | 129 | | Decision | Why | Alternative considered | |
30 | 130 | |----------|-----|----------------------| |
31 | | -| parse_args function | [reason] | [alternative] | |
32 | | -| load_lines function | [reason] | [alternative] | |
33 | | -| classify_line function | [reason] | [alternative] | |
| 131 | +| Deterministic CLI pipeline (input -> transform -> output) | Reproducible runs enable comparing algorithm variants by diffing output files | Interactive REPL -- non-reproducible, cannot be automated in CI | |
| 132 | +| Fail-fast validation (FileNotFoundError, ValueError) | Corrupt data caught immediately at the source rather than producing silent wrong results downstream | Lenient parsing with defaults -- hides data quality issues | |
| 133 | +| Pure transformation functions | Each function takes input and returns output with no side effects; easy to test, easy to benchmark | Stateful class with mutable state -- harder to isolate for performance measurement | |
| 134 | +| JSON output with embedded records | Full traceability: summary stats plus raw data for debugging; diffable across runs | Summary-only output -- loses the ability to inspect individual records | |
34 | 135 |
|
35 | 136 | ## Alternative approaches |
36 | 137 |
|
37 | | -### Approach B: [Name] |
| 138 | +### Approach B: Timing-instrumented benchmark runner |
38 | 139 |
|
39 | 140 | ```python |
40 | | -# [Different valid approach with trade-offs explained] |
| 141 | +import time |
| 142 | +from typing import Callable |
| 143 | + |
| 144 | +def benchmark(fn: Callable, *args, iterations: int = 100) -> dict[str, float]: |
| 145 | + """Measure function execution time across multiple iterations.""" |
| 146 | + times = [] |
| 147 | + for _ in range(iterations): |
| 148 | + start = time.perf_counter_ns() |
| 149 | + fn(*args) |
| 150 | + elapsed = time.perf_counter_ns() - start |
| 151 | + times.append(elapsed / 1_000_000) # convert to ms |
| 152 | + |
| 153 | + return { |
| 154 | + "min_ms": round(min(times), 3), |
| 155 | + "max_ms": round(max(times), 3), |
| 156 | + "avg_ms": round(sum(times) / len(times), 3), |
| 157 | + "iterations": iterations, |
| 158 | + } |
41 | 159 | ``` |
42 | 160 |
|
43 | | -**Trade-off:** [When you would prefer this approach vs the primary one] |
| 161 | +**Trade-off:** Adding timing instrumentation lets learners compare theoretical Big-O complexity against measured wall-clock time. However, the current scaffold focuses on data pipeline correctness first. Timing can be layered on top once the pipeline is solid, following the principle of "make it correct, then make it fast." |
44 | 162 |
|
45 | | -## What could go wrong |
| 163 | +## Common pitfalls |
46 | 164 |
|
47 | 165 | | Scenario | What happens | Prevention | |
48 | 166 | |----------|-------------|------------| |
49 | | -| [bad input] | [error/behavior] | [how to handle] | |
50 | | -| [edge case] | [behavior] | [how to handle] | |
51 | | - |
52 | | -## Key takeaways |
53 | | - |
54 | | -1. [Most important lesson from this project] |
55 | | -2. [Second lesson] |
56 | | -3. [Connection to future concepts] |
| 167 | +| Input file has trailing blank lines | `load_lines` strips them via the `if line.strip()` filter, so they are silently ignored | The filter is the prevention; no action needed | |
| 168 | +| Score field contains non-integer text | `int(score_raw)` raises `ValueError` with a clear message | Add a try/except around the int conversion with a descriptive error | |
| 169 | +| Output directory does not exist | `mkdir(parents=True, exist_ok=True)` creates it automatically | Already handled by the write_summary function | |
0 commit comments