While dbldatagen cannot accept direct contribution from external contributors, all users can create GitHub Issues to propose new functionality. The dbldatagen team will review and prioritize new features based on user feedback.
To set up your local environment:
-
Ensure any Non-Python Dependencies are installed locally.
-
Clone the repository:
git clone "repository URL" -
Open the repository in your IDE. Run the following terminal command to create a local development environment:
make dev
When contributing new functionality:
- Sync changes from the
masterbranch:git checkout main && git pull - Checkout a new branch from
master:git checkout -b "branch name" - Add your functionality, tests, documentation, and examples.
dbldatagen aims to follow PEP8 standards. Code style should be checked for any new commits.
To validate code locally:
- Run the following terminal command in your IDE:
make fmt
- Fix any issues until no messages remain.
dbldatagen aims to have the highest possible test coverage. Code should be tested for any new commits.
To run unit tests locally:
- Run the following terminal command in your IDE:
make test-coverage
- Verify that all tests pass.
- Open the coverage report in your browser.
- Verify that all modified modules have full coverage.
To submit a pull request:
- Squash all local commits in your branch.
- Push your changes:
git push
- Navigate to the Pull Requests page and click New pull request.
- Complete the template.
- Submit your PR.
Documentation can be reviewed locally. To build and open the documentation in your browser, run the following terminal command:
make docs-servedbldatagen can be built locally as a Python wheel. To build the wheel, run the following terminal command:
make builddbldatagen supports Python 3.10+ and is tested with Python 3.10 and later.
All development tools are configured in pyproject.toml.
All Python dependencies are defined in pyproject.toml:
[project.dependencies]lists runtime dependencies installed with thedbldatagenlibrary[dependency-groups]lists development, linting, and testing dependencies managed by uv
dbldatagen is tested against Databricks Runtime version 13.3LTS and OpenJDK 17.
Spark and Java dependencies are not installed automatically by the build process and should be installed manually to develop and run dbldatagen locally.
All code should adhere to the following standards:
- Formatted and linted to PEP8 standards.
- Type-validated using mypy.
- Clearly-named variables, classes, and methods.
- Include docstrings that detail functionality and usage.
All tests should use pytest with fixtures and parameterization where appropriate. This includes:
- Unit tests cover functionality which does not require a Databricks workspace and should always be preferred to integration tests when possible.
- Integration tests cover functionality which requires Databricks compute, Unity Catalog, or other workspace features.
All local development should branch from master and adhere to the following naming convention:
feat_<feature_name>for new functionalityfix_<issue_number>_<fix_name>for bugfixes
All pull requests should adhere to the following standards:
- Pull requests should be scoped to 1 repository issue.
- Local commits should be squashed on your branch before opening a pull request.
- All pull requests should include functionality, tests, documentation, and examples.
When you contribute code, you affirm that the contribution is your original work and that you license the work to the project under the project's Databricks license. Whether or not you state this explicitly, by submitting any copyrighted material via pull request, email, or other means you agree to license the material under the project's Databricks license and warrant that you have the legal authority to do so.