|
| 1 | +# Solution: Level 1 / Project 01 - Input Validator Lab |
| 2 | + |
| 3 | +> **STOP** — Have you attempted this project yourself first? |
| 4 | +> |
| 5 | +> Learning happens in the struggle, not in reading answers. |
| 6 | +> Spend at least 20 minutes trying before reading this solution. |
| 7 | +> If you are stuck, try the [Walkthrough](./WALKTHROUGH.md) first — it guides |
| 8 | +> your thinking without giving away the answer. |
| 9 | +
|
| 10 | +--- |
| 11 | + |
| 12 | + |
| 13 | +## Complete solution |
| 14 | + |
| 15 | +```python |
| 16 | +"""Level 1 project: Input Validator Lab. |
| 17 | +
|
| 18 | +Validate common input formats: email addresses, phone numbers, |
| 19 | +and zip codes using string methods (no regex at this level). |
| 20 | +
|
| 21 | +Concepts: string methods (find, count, isdigit), validation patterns, re basics. |
| 22 | +""" |
| 23 | + |
| 24 | + |
| 25 | +import argparse |
| 26 | +import json |
| 27 | +import re |
| 28 | +from pathlib import Path |
| 29 | + |
| 30 | + |
| 31 | +# WHY validate_email: Emails are the most common user-submitted data on |
| 32 | +# the web. Validating them with basic string methods teaches you that |
| 33 | +# you can do a lot of useful checking before ever learning regex. |
| 34 | +def validate_email(email: str) -> dict: |
| 35 | + """Check whether a string looks like a valid email address.""" |
| 36 | + # WHY strip: Users often paste text with trailing spaces or newlines. |
| 37 | + # Stripping whitespace prevents those invisible characters from |
| 38 | + # causing a false "contains spaces" error. |
| 39 | + email = email.strip() |
| 40 | + errors = [] |
| 41 | + |
| 42 | + # WHY check spaces first: Spaces are never valid in email addresses. |
| 43 | + # Checking this independently of the @ logic keeps each rule simple. |
| 44 | + if " " in email: |
| 45 | + errors.append("contains spaces") |
| 46 | + |
| 47 | + # WHY count("@") != 1: An email must have exactly one @ symbol. |
| 48 | + # count() returns how many times a character appears in the string, |
| 49 | + # which is more precise than just checking "@ in email". |
| 50 | + if email.count("@") != 1: |
| 51 | + errors.append("must contain exactly one @") |
| 52 | + elif "@" in email: |
| 53 | + # WHY split on @: Once we know there is exactly one @, splitting |
| 54 | + # gives us the local part (before @) and the domain part (after @). |
| 55 | + local, domain = email.split("@") |
| 56 | + if not local: |
| 57 | + errors.append("nothing before @") |
| 58 | + # WHY check for dot in domain: A domain like "example" without |
| 59 | + # a TLD (.com, .org) is not valid for email delivery. |
| 60 | + if not domain or "." not in domain: |
| 61 | + errors.append("domain must contain a dot") |
| 62 | + |
| 63 | + # WHY return a dict: Returning structured data (not just True/False) |
| 64 | + # lets the caller display helpful error messages to the user. |
| 65 | + return {"value": email, "type": "email", "valid": len(errors) == 0, "errors": errors} |
| 66 | + |
| 67 | + |
| 68 | +# WHY validate_phone: Phone numbers come in many formats (dashes, |
| 69 | +# parentheses, spaces). Extracting only digits is a practical technique |
| 70 | +# that handles all common US phone formats in one pass. |
| 71 | +def validate_phone(phone: str) -> dict: |
| 72 | + """Check whether a string looks like a US phone number.""" |
| 73 | + phone = phone.strip() |
| 74 | + # WHY build digits manually: At this level, a for-loop with isdigit() |
| 75 | + # is more understandable than a regex like r"\d". It shows exactly |
| 76 | + # what is happening: keep only the digit characters. |
| 77 | + digits = "" |
| 78 | + for char in phone: |
| 79 | + if char.isdigit(): |
| 80 | + digits += char |
| 81 | + |
| 82 | + errors = [] |
| 83 | + # WHY 10 digits: US phone numbers are always 10 digits (area code + |
| 84 | + # 7-digit number). This is the simplest valid-length check. |
| 85 | + if len(digits) != 10: |
| 86 | + errors.append(f"expected 10 digits, got {len(digits)}") |
| 87 | + |
| 88 | + return {"value": phone, "type": "phone", "valid": len(errors) == 0, "errors": errors} |
| 89 | + |
| 90 | + |
| 91 | +# WHY validate_zip_code: Zip codes are a fixed-format string, making |
| 92 | +# them a perfect first use of regex. The pattern is simple enough to |
| 93 | +# learn without being overwhelming. |
| 94 | +def validate_zip_code(zipcode: str) -> dict: |
| 95 | + """Check whether a string looks like a US zip code.""" |
| 96 | + zipcode = zipcode.strip() |
| 97 | + errors = [] |
| 98 | + |
| 99 | + # WHY regex here: Zip codes follow a strict pattern (5 digits, or |
| 100 | + # 5+4 digits with a dash). A regex captures this in one line, |
| 101 | + # whereas string methods would require multiple checks. |
| 102 | + # ^ anchors to the start, $ anchors to the end, \d{5} matches |
| 103 | + # exactly 5 digits, (-\d{4})? optionally matches a dash + 4 digits. |
| 104 | + pattern = r"^\d{5}(-\d{4})?$" |
| 105 | + if not re.match(pattern, zipcode): |
| 106 | + errors.append("must be 5 digits or 5+4 format (12345-6789)") |
| 107 | + |
| 108 | + return {"value": zipcode, "type": "zip", "valid": len(errors) == 0, "errors": errors} |
| 109 | + |
| 110 | + |
| 111 | +# WHY validate_input: This is the dispatcher — it reads a line like |
| 112 | +# "email: user@example.com", figures out which validator to call, and |
| 113 | +# routes the value to the right function. This pattern scales: adding |
| 114 | +# a new type means adding one entry to the validators dict. |
| 115 | +def validate_input(line: str) -> dict: |
| 116 | + """Parse a line like 'email: user@example.com' and validate it.""" |
| 117 | + # WHY check for colon: The colon separates the type label from the |
| 118 | + # value. Without it, we cannot determine which validator to use. |
| 119 | + if ":" not in line: |
| 120 | + return {"raw": line.strip(), "error": "Expected format: type: value"} |
| 121 | + |
| 122 | + # WHY maxsplit=1: The value itself might contain colons (e.g., a |
| 123 | + # URL). maxsplit=1 ensures we only split on the first colon. |
| 124 | + input_type, value = line.split(":", maxsplit=1) |
| 125 | + input_type = input_type.strip().lower() |
| 126 | + value = value.strip() |
| 127 | + |
| 128 | + # WHY a dict of validators: This is the "dispatch table" pattern. |
| 129 | + # Instead of a chain of if/elif, we map type names to functions. |
| 130 | + # Adding a new validator is one line, not a new branch. |
| 131 | + validators = { |
| 132 | + "email": validate_email, |
| 133 | + "phone": validate_phone, |
| 134 | + "zip": validate_zip_code, |
| 135 | + } |
| 136 | + |
| 137 | + if input_type not in validators: |
| 138 | + return {"raw": line.strip(), "error": f"Unknown type: {input_type}"} |
| 139 | + |
| 140 | + return validators[input_type](value) |
| 141 | + |
| 142 | + |
| 143 | +# WHY process_file: Separating file I/O from validation logic makes |
| 144 | +# each function testable on its own. validate_email() can be tested |
| 145 | +# with strings; process_file() handles the filesystem. |
| 146 | +def process_file(path: Path) -> list[dict]: |
| 147 | + """Read input lines and validate each one.""" |
| 148 | + if not path.exists(): |
| 149 | + raise FileNotFoundError(f"Input file not found: {path}") |
| 150 | + |
| 151 | + lines = path.read_text(encoding="utf-8").splitlines() |
| 152 | + results = [] |
| 153 | + for line in lines: |
| 154 | + # WHY skip blanks: Empty lines in input files are common. |
| 155 | + # Skipping them avoids parse errors on non-data. |
| 156 | + if not line.strip(): |
| 157 | + continue |
| 158 | + results.append(validate_input(line)) |
| 159 | + return results |
| 160 | + |
| 161 | + |
| 162 | +# WHY parse_args: argparse gives us --input and --output flags for free, |
| 163 | +# with help text and error handling. Hardcoded paths would make the |
| 164 | +# script inflexible. |
| 165 | +def parse_args() -> argparse.Namespace: |
| 166 | + parser = argparse.ArgumentParser(description="Input Validator Lab") |
| 167 | + parser.add_argument("--input", default="data/sample_input.txt") |
| 168 | + parser.add_argument("--output", default="data/output.json") |
| 169 | + return parser.parse_args() |
| 170 | + |
| 171 | + |
| 172 | +# WHY main: Wrapping the top-level logic in a main() function keeps |
| 173 | +# the module importable without side effects. Other code can import |
| 174 | +# validate_email() without triggering file reads. |
| 175 | +def main() -> None: |
| 176 | + args = parse_args() |
| 177 | + results = process_file(Path(args.input)) |
| 178 | + |
| 179 | + print("=== Validation Results ===\n") |
| 180 | + for r in results: |
| 181 | + if "error" in r: |
| 182 | + print(f" PARSE ERROR: {r['error']}") |
| 183 | + elif r["valid"]: |
| 184 | + print(f" PASS [{r['type']}] {r['value']}") |
| 185 | + else: |
| 186 | + print(f" FAIL [{r['type']}] {r['value']} -- {', '.join(r['errors'])}") |
| 187 | + |
| 188 | + valid_count = sum(1 for r in results if r.get("valid", False)) |
| 189 | + print(f"\n {valid_count}/{len(results)} passed validation") |
| 190 | + |
| 191 | + output_path = Path(args.output) |
| 192 | + output_path.parent.mkdir(parents=True, exist_ok=True) |
| 193 | + output_path.write_text(json.dumps(results, indent=2), encoding="utf-8") |
| 194 | + |
| 195 | + |
| 196 | +if __name__ == "__main__": |
| 197 | + main() |
| 198 | +``` |
| 199 | + |
| 200 | +## Design decisions |
| 201 | + |
| 202 | +| Decision | Why | Alternative considered | |
| 203 | +|----------|-----|----------------------| |
| 204 | +| Separate validator functions per type | Each validator has its own rules and error messages; isolating them makes testing and modification easy | One big function with nested if/elif for all types — harder to test and extend | |
| 205 | +| Dispatch table (dict mapping type to function) | Adding a new type is one line instead of a new elif branch; the dict is also iterable for help text | if/elif chain — works but does not scale well and is harder to extend | |
| 206 | +| Return dict with `valid` + `errors` list | Gives the caller both the pass/fail decision and the specific reasons, enabling rich error messages | Return just True/False — caller loses context about what failed | |
| 207 | +| Use `re.match` for zip codes but string methods for email/phone | Zip codes have a strict fixed pattern ideal for regex; emails and phones benefit from step-by-step checks that are easier to understand at this level | Use regex for everything — works but is harder to debug at Level 1 | |
| 208 | + |
| 209 | +## Alternative approaches |
| 210 | + |
| 211 | +### Approach B: All-regex validation |
| 212 | + |
| 213 | +```python |
| 214 | +import re |
| 215 | + |
| 216 | +def validate_email_regex(email: str) -> dict: |
| 217 | + """Use a single regex pattern for email validation.""" |
| 218 | + email = email.strip() |
| 219 | + # WHY this pattern: \S+ matches non-whitespace before @, then |
| 220 | + # requires at least one dot in the domain portion. |
| 221 | + pattern = r"^\S+@\S+\.\S+$" |
| 222 | + valid = bool(re.match(pattern, email)) |
| 223 | + errors = [] if valid else ["does not match email pattern"] |
| 224 | + return {"value": email, "type": "email", "valid": valid, "errors": errors} |
| 225 | +``` |
| 226 | + |
| 227 | +**Trade-off:** A regex approach is more concise but gives less specific error messages. The string-methods approach tells users exactly what is wrong ("nothing before @", "domain must contain a dot"), while the regex approach can only say "does not match pattern". For a learning project, the string-methods approach teaches more about how validation logic works step by step. |
| 228 | + |
| 229 | +## What could go wrong |
| 230 | + |
| 231 | +| Scenario | What happens | Prevention | |
| 232 | +|----------|-------------|------------| |
| 233 | +| Input line has no colon separator | `validate_input()` returns an error dict instead of crashing, because we check for ":" before splitting | The check is already in place; always test with malformed input | |
| 234 | +| Email like `user@` (no domain) | `validate_email()` catches it — after splitting on @, the domain part is empty, triggering "domain must contain a dot" | The elif branch handles this; add test cases for edge-case emails | |
| 235 | +| Unknown type like `ssn: 123-45-6789` | `validate_input()` returns `{"error": "Unknown type: ssn"}` because `ssn` is not in the validators dict | Already handled; the dict lookup pattern naturally rejects unknown keys | |
| 236 | +| File does not exist | `process_file()` raises `FileNotFoundError` with a clear message before attempting to read | The existence check is explicit; argparse defaults to a sample file | |
| 237 | + |
| 238 | +## Key takeaways |
| 239 | + |
| 240 | +1. **Validate one thing at a time.** Each validator checks a single format, and each check within a validator tests one rule. This makes bugs easy to isolate and fixes easy to verify. |
| 241 | +2. **Return structured results, not just True/False.** Returning a dict with `valid`, `type`, `value`, and `errors` gives callers everything they need to display helpful feedback — a pattern used in every form validation library. |
| 242 | +3. **The dispatch table pattern (dict mapping names to functions) will appear repeatedly** in future projects: command dispatchers, API routers, plugin systems. Learning it here at Level 1 prepares you for the pattern everywhere else. |
0 commit comments