With AI-usage also coming to homeracker we need to maintain a high quality standard in our repositories by installing deterministic guardrails.
To achieve this we need deterministic tests which can proof quantity and quality of the python code.
A combination of codecoverage (quantity) and mutation testing (quality) in combination with an intelligent agent and/or skill and proper instructions in the repository shall exactly achieve that.
We need to:
- integrate a well-known and well-maintained codecoverage tool that can be run locally and in a worklfow.
- The workflow must run on pull-requests and on main pushes.
- The codecov call (e.g. a script wrapper) must fail when coverage is below 80% in the first run
- Optional: a file is persisted with the actual codecov value so that subsequent runs MUST reach at least this value. This way we only increase in coverage.
- Depending on current (must research in official docs) best practises we either adapt the copilot-instructions and or create a specialized agent/skill for creating/maintaining tests, running coverage and fixing/enhancing tests accordingly
- Since codeverage alone is not really a good metric as it only tells us about the amount of code we tested but not about the quality of our tests, I want to introduce mutation tests as well.
- They shall be invocable locally as well as via a nightly/weekly (idk yet, depends on duration) cron job.
- We shall choose a well-known and well-maintained framework here.
Depends on implementation of #326 to even have some meaningful tests in the first place.
With AI-usage also coming to homeracker we need to maintain a high quality standard in our repositories by installing deterministic guardrails.
To achieve this we need deterministic tests which can proof quantity and quality of the python code.
A combination of codecoverage (quantity) and mutation testing (quality) in combination with an intelligent agent and/or skill and proper instructions in the repository shall exactly achieve that.
We need to:
Depends on implementation of #326 to even have some meaningful tests in the first place.