Skip to content

feat: add pytest test suite for 3 Python scripts#2247

Open
AnandSundar wants to merge 3 commits into
OWASP:masterfrom
AnandSundar:feat/add-pytest-tests-3-scripts
Open

feat: add pytest test suite for 3 Python scripts#2247
AnandSundar wants to merge 3 commits into
OWASP:masterfrom
AnandSundar:feat/add-pytest-tests-3-scripts

Conversation

@AnandSundar

@AnandSundar AnandSundar commented Jun 23, 2026

Copy link
Copy Markdown

Summary

Adds a pytest test suite for scripts/Update_CheatSheets_Index.py — the one script on the live build path (Generate_Site_mkDocs.shbuild-and-deploy-website.yml). Refactors the script to expose pure helper functions (extract_languages_snippet_provided, group_by_letter, clean_trailing_whitespace) and a parameterized main(). The refactor preserves byte-for-byte output of the original script when invoked from the scripts/ directory — verified via smoke test against the real cheatsheets/ directory.

What's in this PR

  • Refactor of 1 script (1 modified file): Update_CheatSheets_Index.py now has importable helpers and a main() that returns an exit code, with an if __name__ == "__main__": guard. No behavioral change for the existing CLI usage.
  • Test suite of 16 cases (1 test file, surviving after scope narrowing): unit tests for the helpers + integration tests for main() end-to-end. Uses tmp_path-based fixtures from tests/conftest.py; no new runtime dependencies beyond pytest.
  • Pytest infra (pytest.ini, requirements-dev.txt, tests/__init__.py, tests/conftest.py): kept separate from requirements.txt so the production image does not pull in pytest.
  • .gitignore cleanup: __pycache__/ and *.pyc added; docs/plans/ line dropped. .claude/ rules resolved during rebase against upstream commit 40f21c5 (kept upstream's specific .claude/settings.local.json, .claude/settings.json, .claude/worktrees rules).
  • CI (.github/workflows/test-python.yml): new workflow runs python -m pytest on every pull request to master. Note: the new check is informational-only until a maintainer adds it to the branch protection required-checks list for master — that admin step is a follow-up. Per the branch protection query, the only required checks are lint and Build website.

Test results

  • python -m pytest exits 0
  • 16 passed in <1s with --strict-markers enabled

PR scope-check compliance

  • Cheat sheets touched: 0
  • Net additions: <400 (well under the 1500-line limit)
  • 0 markdown files modified (the md_lint_check.yml workflow is unaffected)

Rebase

The PR was rebased onto current master (18 commits ahead of the original base) with a single conflict resolution in .gitignore. Used --force-with-lease per the standing rule for security-critical repos.

Out of scope (deferred to follow-up PRs)

  • Refactor + test Generate_RSS_Feed.py and Identify_Old_Issue_And_PR.py (network-dependent; the mocking pattern from this PR carries forward).
  • Index.md drift CI — tracked in test: add CI check that regenerates Index.md and fails on drift #2273 (regenerate Index.md, diff against the committed copy, fail on drift). This is the "higher-value" follow-up suggested in mackowski's review; the implementation is deferred to its own PR so it can be scoped on its own merits.

@mackowski mackowski left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @AnandSundar — nicely built: I ran the suite (56 tests pass) and confirmed the Update_CheatSheets_Index.py refactor is byte-for-byte output-identical to master. Most of it protects code the project doesn't actively use, so I'd like to narrow the scope.

Worth keeping: the tests for Update_CheatSheets_Index.py — it's the one script on the live build path (Generate_Site_mkDocs.sh → build-and-deploy-website.yml).

Not adding value — please drop: tests/refactors for Generate_CheatSheets_TOC.py (only used by the retired GitBook build) and Generate_Technologies_JSON.py (referenced by no workflow at all). Both are effectively dead code, untouched for ~4–7 years.

Needed before merge:

  1. Add requests to requirements-dev.txt — it only lists pytest, so the suite fails to collect in a clean env (ModuleNotFoundError: requests).
  2. Add a CI job that runs pytest — nothing runs it today, so as-is it provides zero regression protection.
  3. Drop the unrelated docs/plans/ entry from .gitignore.

One higher-value idea instead of unit tests: a CI check that regenerates Index.md and fails on drift. The committed Index.md is already stale ("92 cheat sheets" vs ~120 actual) and nothing catches it.

Happy to re-review once it's scoped to the active script with a CI runner in place. Thanks again.

@AnandSundar

Copy link
Copy Markdown
Author

@mackowski Thank you for reviewing my PR. I'll work on your recommendations and update the PR soon.

Introduces a pytest-based test suite for the three build-helper scripts
in scripts/ that have no network dependencies:

  - Generate_CheatSheets_TOC.py
  - Update_CheatSheets_Index.py
  - Generate_Technologies_JSON.py

Each script is minimally refactored to expose pure helper functions
(to_display_name, parse_index_line, group_by_letter, etc.) and to
wrap its imperative top-level code in a main() function with
parameterized paths. Byte-for-byte output of the original scripts is
preserved when invoked from the scripts/ directory.

The test suite covers 56 cases (unit + integration), including:

  - helper-function unit tests (display names, language detection,
    letter grouping, whitespace cleanup, line parsing)
  - main() integration tests against fixture cheatsheet directories
  - mocked HTTP path for Generate_Technologies_JSON.main()

Files added:
  - pytest.ini, requirements-dev.txt
  - tests/__init__.py, tests/conftest.py
  - tests/test_*.py for the three scripts

The remaining two scripts (Generate_RSS_Feed.py, Identify_Old_Issue_And_PR.py)
and a CI workflow that runs pytest on PRs are deferred to follow-up PRs.

AI assistance: this change was developed with the help of an AI coding
assistant. The refactor, tests, and commit message were reviewed by the
author before submission.
@AnandSundar AnandSundar force-pushed the feat/add-pytest-tests-3-scripts branch from 9887f6e to bc98d25 Compare July 3, 2026 15:12
Drop the refactors and tests for Generate_CheatSheets_TOC.py (retired GitBook) and Generate_Technologies_JSON.py (no live consumer) per mackowski's review. Retain refactor + tests for the one script on the live build path, Update_CheatSheets_Index.py. .gitignore was already cleaned of docs/plans/ during rebase conflict resolution; this commit only handles the script and test file changes.
Adds .github/workflows/test-python.yml that runs python -m pytest on every pull request to master. Trigger mirrors md_lint_check.yml (pull_request only, no push). actions/checkout and actions/setup-python are pinned to commit SHAs already in use across the repo. python-version is pinned to 3.12 to avoid silent drift. requirements-dev.txt is the test-only deps file, intentionally separate from requirements.txt so the production image does not pull in pytest. Per the plan, this new check is informational-only until a maintainer adds it to the branch protection required-checks list for master; that admin step is a follow-up.
@AnandSundar

Copy link
Copy Markdown
Author

Thanks for the careful review, mackowski. Addressing each point:

Tests for the two retired scripts and their refactors — dropped. Commits on the rebased head: bc98d25 (rebase) and 2d4e8bc (chore(pr-2247): narrow scope to Update_CheatSheets_Index.py) revert scripts/Generate_CheatSheets_TOC.py and scripts/Generate_Technologies_JSON.py to upstream/master and delete tests/test_generate_cheatsheets_toc.py and tests/test_generate_technologies_json.py. The remaining 16 tests in tests/test_update_cheatsheets_index.py all pass locally with --strict-markers.

requests in requirements-dev.txt — not added. Two reasons: (1) the only consumer of requests in the test suite was the dropped tests/test_generate_technologies_json.py (which mocked it), so there is no test left that needs it; (2) requests==2.34.2 is already pinned in production requirements.txt, so any CI step that runs make install-python-requirements already has it. Adding it to dev-reqs would be busywork future maintainers would second-guess.

New .github/workflows/test-python.yml — added (commit 06ec9b2). Mirrors the existing repo's CI conventions: pull_request trigger only on master, ubuntu-24.04, contents: read, actions/checkout and actions/setup-python pinned to commit SHAs already in use across the repo, python-version: '3.12' (pinned, not '3.x', to avoid silent drift), python -m pip install --user -r requirements-dev.txt for the test-only deps, then python -m pytest. The requirements-dev.txt file is intentionally separate from requirements.txt so the production mkdocs image never pulls in pytest.

docs/plans/ gitignore line — dropped. The rebase conflict resolution (upstream commit 40f21c5 added .claude/settings.local.json, .claude/settings.json, .claude/worktrees rules, which overlap with the PR's blanket .claude/ rule) also removed the PR's docs/plans/ line. The rebased .gitignore retains only the Python bytecode additions (__pycache__/, *.pyc) from the PR plus upstream's specific .claude/ rules.

Index.md drift check — declined for this PR. Tracked as a separate issue: the idea is a higher-value follow-up than the test infra, but bundling it would push net additions toward pr-scope-check.yml's 1500-line cap and conflate two different concerns (documentation freshness vs test infrastructure). Filing the issue as a follow-up that a maintainer can route on its own merits.

Rebase onto current master — done. PR moved from mergeable: CONFLICTING to mergeable: MERGEABLE. Upstream master moved 18 commits ahead during the work; the rebase resolved a single conflict in .gitignore and used --force-with-lease (per the standing rule for security-critical repos).

One note for the maintainer audience — the new pytest check is currently in action_required state awaiting a maintainer's first-run approval (standard for new workflows on a fork PR). It is not in the branch protection required-checks list for master — that admin step is a follow-up after this PR merges. Per the branch protection query, the only required checks are lint and Build website, both of which were green on the prior head and should re-pass on the rebased head once the workflows are approved.

Note: the original plan called for an inline review-reply API endpoint that doesn't actually exist in GitHub's REST API for review-summary replies — this comment is the closest scripting equivalent and mentions you directly, so the notification still reaches you.

@AnandSundar

Copy link
Copy Markdown
Author

@mackowski — scope narrowed per your review (commits bc98d25, 2d4e8bc, 06ec9b2). The two retired scripts + their tests are dropped, requests is intentionally not in requirements-dev.txt (rationale in the previous comment), the new .github/workflows/test-python.yml is in, and the docs/plans/ gitignore line is gone. Rebased onto current master, so mergeable: MERGEABLE. Index.md drift is filed as #2273.

One thing to flag for whoever clicks the merge button: the 6 workflows on the PR (5 existing + the new pytest one) are currently in action_required state awaiting a maintainer's first-run approval — standard for new workflows from a fork PR, and the contributor can't advance it from the fork side. A maintainer with write access on OWASP/CheatSheetSeries needs to click "Approve & run" on each in the PR's Checks tab. After that, the only required checks per branch protection are lint and Build website, both of which were green on the prior head and should re-pass on the rebased head. Once they're approved and pass, the PR should be ready to merge.

Ready for re-review when you have a moment — thanks again for the careful reading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants