Developer and maintainer steps to install, configure, and run sw-metadata-bot.
- Run pitfalls detection using RSMetaCheck package
- Handle analysis for multiple repositories
- Create repository centric output subfolders
- Python 3.10, 3.11, or 3.12
- Git (
gitexecutable available onPATH) - GitHub or GitLab personal access token with permission to create issues
- Optional:
uv(recommended)
Git is a system dependency used for repository commit lookup
(git ls-remote fallback) and is not installed through pyproject.toml.
After cloning this project, set up the python environnment with uv:
uv syncThis will create the virtual environnment, download the dependencies and build the package locally.
With uv (recommended):
uv add sw-metadata-botOr with pip
With pip:
pip install sw-metadata-botThe package metadata also exposes standard extras for release builds:
pip install "sw-metadata-bot[docs]"
pip install "sw-metadata-bot[test]"
pip install "sw-metadata-bot[dev]"Export your tokens (only set what you need):
export GITHUB_API_TOKEN=ghp_xxxxxxxxxxxx # GitHub.com
export GITLAB_API_TOKEN=glpat_xxxxxxxxxxxx # GitLab.comConvenient one-liner to load a .env file:
set -a; source .env; set +aExample .env:
GITHUB_API_TOKEN=ghp_xxxxxxxxxxxx
GITLAB_API_TOKEN=glpat_xxxxxxxxxxxxThe pipeline now uses a single configuration file as its source of truth for:
- repository list
"repositories" - custom issue message
"custom_message" - inline opt-outs
"opt_outs" - output root
"root_dir" - run name
"run_name" - default snapshot-tag format
"snapshot_tag_format"
Example config:
{
"repositories": [
"https://github.com/SoftwareUnderstanding/sw-metadata-bot",
"https://github.com/SoftwareUnderstanding/RsMetaCheck"
],
"issues": {
"custom_message": "This is a issue created for testing purposes. Several metadata issues were identified and could be addressed.",
"opt_outs": [
]
},
"outputs": {
"root_dir": "assets",
"run_name": "example_run",
"snapshot_tag_format": "%Y%m%d"
}
}opt-out lists the repositories for which issues won't be submitted but the repository is still analysed and part of the result outputs.
To run the bot, start with analysis.
uv run sw-metadata-bot run-analysis \
--config-file assets/ossr_list_url.jsonYou can override the generated snapshot tag when needed.
uv run sw-metadata-bot run-analysis \
--config-file <path_to_your_config.json> \
--snapshot-tag <example_suffix>Output structure using the example config file provided:
assets/
└── example_run/
└── <snapshot_tag>/
├── analysis_results.json
├── config.json
├── run_report.json
├── github_com_softwareunderstanding_sw_metadata_bot/
│ ├── issue_report.md
│ ├── pitfall.jsonld
│ ├── report.json
│ └── somef_output.json
└── github_com_softwareunderstanding_rsmetacheck/
├── issue_report.md
├── pitfall.jsonld
├── report.json
└── somef_output.json
assets/: output root selected by"root_dir": "assets"in the config.example_run/: run namespace selected by"run_name": "example_run"; this keeps outputs from different campaigns separated.<snapshot_tag>/: one analysis snapshot folder, usually generated from"snapshot_tag_format": "%Y%m%d"unless you override it with--snapshot-tag.analysis_results.json: global analysis summary for the full run, including the repositories that were evaluated and metadata such as commit identifiers.config.json: a copy of the effective configuration used for this snapshot, stored for reproducibility.run_report.json: top-level decision report for the whole batch, with counters and one record per repository.github_com_softwareunderstanding_sw_metadata_bot/: per-repository folder forhttps://github.com/SoftwareUnderstanding/sw-metadata-bot; repository URLs are sanitized to lowercase folder names.github_com_softwareunderstanding_rsmetacheck/: per-repository folder forhttps://github.com/SoftwareUnderstanding/RsMetaCheck; it contains the same artifact set as the other repository folder.issue_report.md: human-readable markdown report that can be reviewed before publication and reused as issue content.pitfall.jsonld: raw RSMetacheck JSON-LD output for the repository, including detected checks and evidence.report.json: machine-readable per-repository report summarizing findings, action/decision, identifiers, and links to generated artifacts.somef_output.json: raw SOMEF metadata extraction output used as input for the analysis pipeline.
The exact repository folder names depend on the repository URLs in your config, but the files created inside each repository folder follow this same pattern.
If you want to submit the analysis to the actual repositories, you can publish from an existing analysis snapshot (no new analysis is generated):
uv run sw-metadata-bot publish \
--analysis-root outputs/ossr/<snapshot_tag>This requires setting up environment variables GITHUB_API_TOKEN / GITLAB_API_TOKEN with working tokens.
We recommend creating a .env file.
You can use the uv run sw-metadata-bot verify-tokens command to test them after set up.
- Auth failed / 401: Check
GITHUB_API_TOKEN/GITLAB_API_TOKENare exported and valid. - 403 / 404 on issue creation: You need write/triage permissions on the repository. Test with repos you own first.
- Platform not supported: Repo must be GitHub or GitLab (self-hosted GitLab is auto-detected).
- Review before posting: Inspect snapshot reports after
run-analysis, then runpublishon the selected analysis root. - Publish failed because of transient API issues: Fix the cause (for example wait for rate-limit reset or refresh token), then rerun publish with retry mode:
uv run sw-metadata-bot publish \
--analysis-root outputs/ossr/<snapshot_tag> \
--retry-failedAlready posted records remain skipped for idempotency; retry mode only re-attempts eligible failed records.
The CI intentionally separates deterministic unit tests from real GitLab integration tests:
- Unit tests are always blocking and run on every PR/push.
- GitLab integration tests are conditional and run only on
main, scheduled runs, or manual dispatch. - Integration tests require
GITLAB_API_TOKENto be configured in GitHub repository secrets.
This keeps contributor PRs stable while still validating live GitLab API behavior regularly.
Run default (unit) suite:
uv run pytest -m "not integration_gitlab"Run GitLab integration tests (requires token):
export GITLAB_API_TOKEN=glpat_xxxxxxxxxxxx
uv run pytest --run-integration-gitlab -m "integration_gitlab"The CI intentionally separates deterministic unit tests from real GitLab integration tests:
- Unit tests are always blocking and run on every PR/push.
- GitLab integration tests are conditional and run only on
main, scheduled runs, or manual dispatch. - Integration tests require
GITLAB_API_TOKENto be configured in GitHub repository secrets.
This keeps contributor PRs stable while still validating live GitLab API behavior regularly.
Run default (unit) suite:
uv run pytest -m "not integration_gitlab"Run GitLab integration tests (requires token):
export GITLAB_API_TOKEN=glpat_xxxxxxxxxxxx
uv run pytest --run-integration-gitlab -m "integration_gitlab"- GitHub.com
- Gitlab.com
Self-hosted GitLab instances are supported when you provide a token for the target host (not been tested yet). Run the bot against your own organization or repositories where you have permission to open and manage issues.