This document lays out guidelines and advice for contributing to this project. If you're thinking of contributing, please start by reading this document and getting a feel for how contributing to this project works. If you have any questions, feel free to reach out to Pieter Robberechts, the primary maintainer.
The guide is split into sections based on the type of contribution you're thinking of making.
Bug reports are hugely important! Before you raise one, though, please check through the GitHub issues, both open and closed, to confirm that the bug hasn't been reported before.
When filing an issue, make sure to answer these questions:
- Which Python version are you using?
- Which version of soccerdata are you using?
- What did you do?
- What did you expect to see?
- What did you see instead?
The best way to get your bug fixed is to provide a test case, and/or steps to reproduce the issue.
If you believe there is a feature missing, feel free to raise a feature request on the Issue Tracker.
Documentation improvements are always welcome! The documentation files live in
the docs/ directory of the codebase. They're written in
reStructuredText, and use Sphinx to generate the full suite of
documentation.
You do not have to set up a development environment to make small changes to the docs. Instead, you can edit files directly on GitHub and suggest changes.
When contributing documentation, please do your best to follow the style of the documentation files. This means a soft-limit of 79 characters wide in your text files and a semi-formal, yet friendly and approachable, prose style.
When presenting Python code, use double-quoted strings ("hello" instead of
'hello').
If you intend to contribute code, do not feel the need to sit on your contribution until it is perfectly polished and complete. It helps everyone involved for you to seek feedback as early as you possibly can. Submitting an early, unfinished version of your contribution for feedback can save you from putting a lot of work into a contribution that is not suitable for the project.
To test out code changes, you'll need to set up a Python environment with all required dependencies. It's recommended to use uv to set up your environment. On macOS and Linux, you can use curl to the download the installation script and execute it:
$ curl -LsSf https://astral.sh/uv/install.sh | shInformation about installation options and instructions for Windows can be found in uv's installation guide.
Next, create and activate a virtual environment using uv with a Python version that soccerdata supports:
$ uv venv --python 3.10
$ source .venv/bin/activateAlternatively, you can use pip. You'll need to have at least the minimum Python version that soccerdata supports. Next, install the required dependencies in a virtual environment as follows:
$ python3 -m venv .venv
$ source .venv/bin/activate
$ python -m pip install -e .
$ python -m pip install -r requirements.txtWhen contributing code, you'll want to follow this checklist:
- Fork the repository on GitHub.
- Run the tests to confirm they all pass on your system. If they don't, you'll need to investigate why they fail. If you're unable to diagnose this yourself, raise it as a bug report.
- Write tests that demonstrate your bug or feature. Ensure that they fail.
- Make your change.
- Run the entire test suite again, confirming that all tests pass including the ones you just added.
- Make sure your code follows the code style discussed below.
- Send a GitHub Pull Request to the main repository's
masterbranch. GitHub Pull Requests are the expected method of code collaboration on this project.
Run the full test suite:
$ make testYou can also run tests for a specific data source. For example, invoke the unit test suite like this to run tests for ClubElo only:
$ make test-class clubeloUnit tests are located in the tests directory,
and are written using the pytest testing framework.
The project uses DVC to manage the test data used by the test suite. This data
is stored in the tests/appdata/data directory and is tracked by the
tests/appdata/data.dvc file.
Using DVC ensures that tests are fast and reproducible by providing a cached version of the data, thus avoiding the need to scrape external websites during every test run. This is particularly important for CI, but also helps during local development.
To pull the latest test data from the remote storage, run:
$ uv run dvc pullIf you've added a new scraper or modified an existing one, you might need to
update the test data. To do this, run the test suite (or a specific test) with
the SOCCERDATA_DIR environment variable set to tests/appdata. This will
cause the scrapers to save the downloaded data to the test data directory.
$ SOCCERDATA_DIR=tests/appdata uv run pytest tests/test_MyNewScraper.pyAfter running the tests, you can add the new data to DVC:
$ uv run dvc add tests/appdata/dataThis will update the tests/appdata/data.dvc file, which you should then
include in your pull request. Note that only maintainers have write access to
the DVC remote, so you won't be able to run dvc push. The maintainers will
push the new data once your pull request is merged.
DVC is included in the test dependency group and should be available if
you've followed the environment setup instructions.
The soccerdata codebase uses the PEP 8 code style. In addition, we have a few guidelines:
- Line-length can exceed 79 characters, to 100, when convenient.
- Line-length can exceed 100 characters, when doing otherwise would be terribly inconvenient.
- Always use double-quoted strings (e.g.
"#soccer"), unless a double-quote occurs within the string.
To ensure all code conforms to this format. You can format the code using the pre-commit hooks.
$ make pre-commit-testDocstrings are to follow the numpydoc guidelines.
Open a pull request to submit changes to this project.
Your pull request needs to meet the following guidelines for acceptance:
- The test suite must pass without errors and warnings.
- Include unit tests.
- If your changes add functionality, update the documentation accordingly.
Feel free to submit early, though. We can always iterate on this.
To run linting and code formatting checks before committing your change, you can install pre-commit as a Git hook by running the following command:
$ make pre-commit-testTo automatically run the pre-commit checks before committing, you can run
$ make pre-commit-updateIt is recommended to open an issue before starting work on anything.