05 - Automation: Files and Excel Reporting (Capstone A Build Guide)

Home: README

Who this is for

Learners ready to produce business-value automation from spreadsheets.
Teams that need repeatable reporting with validation and auditability.

What you will build

A full Excel ingestion pipeline that:

scans input/ for .xlsx files,
normalizes headers,
validates rows,
writes Master_Report.xlsx, Master_Report.csv, and rejects.csv,
writes a run log with summary counts.

Prerequisites

Foundations phase complete.
Quality tooling baseline from 09_QUALITY_TOOLING.md.
Installed packages: openpyxl, optional pandas.

Step-by-step lab pack

Step 1 - Project scaffolding

Create structure:

excel_merger/
  input/
  output/
  logs/
  src/
    excel_merger/
      __init__.py
      main.py
      schema.py
      normalize.py
      validate.py
      io_excel.py
      io_csv.py
  tests/

Step 2 - Define schema contract

Required columns:

Customer
Site
Status
Opened
TicketID

Define allowed status values and date rules.

Step 3 - Header normalization rules

Implement normalize_header(text) rules:

trim whitespace,
lowercase,
remove _, -, and extra spaces,
map known aliases (for example ticket id, ticket_id, ticketid).

Step 4 - Workbook ingestion (`openpyxl` path)

Detect header row.
Build row dictionaries by normalized headers.
Capture source filename and row number.

Step 5 - Row validation

Implement validate_row(row) checks:

missing required fields,
invalid status,
invalid date format,
duplicate TicketID in same file.

Write failures to rejects.csv with reason codes.

Step 6 - Merge and transform

Combine valid rows from all files.
Add metadata fields:
- source_file
- ingested_at_utc

Step 7 - Write outputs

output/Master_Report.csv
output/Master_Report.xlsx
- worksheet AllRows
- worksheet CriticalRows
- highlight critical rows.

Step 8 - Logging standards

one log file per run: logs/run_YYYYMMDD_HHMMSS.log
include:
- files discovered,
- rows accepted/rejected,
- final output paths,
- fatal errors with traceback.

Step 9 - Optional `pandas` path

After openpyxl baseline works:

load dataframes,
apply vectorized transforms,
compare results to baseline outputs.

Step 10 - CLI wrapper

Implement command:

python -m excel_merger.main --input ./input --output ./output --log-dir ./logs

Expected output

A rerunnable tool that handles malformed files safely.
Clear output artifacts and rejects report.
Deterministic behavior when rerun on same input.

Break/fix drills

Remove TicketID column in one file and confirm it lands in rejects.
Introduce mixed header styles and confirm normalization works.
Add a corrupted workbook and confirm pipeline continues with logging.

Troubleshooting

Missing package errors:
- activate .venv and reinstall dependencies.
Date parsing errors:
- normalize date formats before validation.
Excel formatting issues:
- verify workbook writes happen after data transforms, not before.

Mastery check

You are ready for SQL integration when you can:

process 20 files with mixed quality,
produce clean master outputs,
explain every reject reason,
rerun without duplicate or conflicting outputs.

Learning-style options (Play/Build/Dissect/Teach-back)

Play: modify header aliases and test edge cases.
Build: follow steps exactly and track completion.
Dissect: inspect one malformed file and explain why it failed.
Teach-back: present schema and validation rules to a teammate.

Acceptance checklist and rubric

Pass criteria:

functional: all required outputs generated.
reliability: malformed data does not crash full run.
traceability: each rejected row has a reason.
maintainability: tests exist for normalization and validation.

Scoring rubric (0-2 each):

correctness,
resilience,
logging quality,
test coverage,
usability of CLI.

Primary Sources

Optional Resources

Next: projects/level-3/README.md →

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

05 - Automation: Files and Excel Reporting (Capstone A Build Guide)

Who this is for

What you will build

Prerequisites

Step-by-step lab pack

Step 1 - Project scaffolding

Step 2 - Define schema contract

Step 3 - Header normalization rules

Step 4 - Workbook ingestion (`openpyxl` path)

Step 5 - Row validation

Step 6 - Merge and transform

Step 7 - Write outputs

Step 8 - Logging standards

Step 9 - Optional `pandas` path

Step 10 - CLI wrapper

Expected output

Break/fix drills

Troubleshooting

Mastery check

Learning-style options (Play/Build/Dissect/Teach-back)

Acceptance checklist and rubric

Primary Sources

Optional Resources

Next

FilesExpand file tree

05_AUTOMATION_FILES_EXCEL.md

Latest commit

History

05_AUTOMATION_FILES_EXCEL.md

File metadata and controls

05 - Automation: Files and Excel Reporting (Capstone A Build Guide)

Who this is for

What you will build

Prerequisites

Step-by-step lab pack

Step 1 - Project scaffolding

Step 2 - Define schema contract

Step 3 - Header normalization rules

Step 4 - Workbook ingestion (openpyxl path)

Step 5 - Row validation

Step 6 - Merge and transform

Step 7 - Write outputs

Step 8 - Logging standards

Step 9 - Optional pandas path

Step 10 - CLI wrapper

Expected output

Break/fix drills

Troubleshooting

Mastery check

Learning-style options (Play/Build/Dissect/Teach-back)

Acceptance checklist and rubric

Primary Sources

Optional Resources

Next

Step 4 - Workbook ingestion (`openpyxl` path)

Step 9 - Optional `pandas` path