cloud_archive provides Bazel repository rules for downloading private files
and archives from cloud storage with the provider's CLI, verifying them by
SHA-256, and exposing them as normal external repositories.
Supported providers:
- AWS S3
- Google Cloud Storage
- MinIO
- Backblaze B2
Archive rules can extract content, apply strip_prefix and add_prefix, and
run patches before the repository becomes available to the build. The project
supports both Bzlmod and legacy WORKSPACE usage.
- Bazel. This repository is pinned to
8.5.1in.bazelversion. - Linux or macOS.
- A provider CLI available one of these ways:
- on
$PATH - configured once via
cloud_archive.configure()in Bzlmod - passed per rule via
tool_target
- on
- The CLI must already be authenticated so its native download command works:
- S3:
aws s3api get-object - GCS:
gsutil cp - MinIO:
mc cp - B2:
b2 download-file-by-name
- S3:
Bzlmod is the recommended setup for Bazel 8+.
# MODULE.bazel
bazel_dep(name = "cloud_archive", version = "<released version>")If the provider CLIs are already on $PATH, you can skip this. If you manage
them as Bazel dependencies, configure them once with the module extension and
every cloud_archive rule in the repo will use them automatically.
# MODULE.bazel
cloud_archive = use_extension("@cloud_archive//:extensions.bzl", "cloud_archive")
cloud_archive.configure(
s3 = "@my_awscli//:aws",
gs = "@my_gsutil//:gsutil",
# minio = "@my_mc//:mc",
# b2 = "@my_b2//:b2",
)
use_repo(cloud_archive, "cloud_archive_tools")All fields are optional. Omitted providers fall back to $PATH. When multiple
modules configure tools, the root module wins.
Use use_repo_rule() to instantiate the rules you need:
# MODULE.bazel
s3_archive = use_repo_rule("@cloud_archive//:cloud_archive.bzl", "s3_archive")
gs_file = use_repo_rule("@cloud_archive//:cloud_archive.bzl", "gs_file")
s3_archive(
name = "model_weights",
bucket = "my-ml-artifacts",
file_path = "models/v3/weights.tar.gz",
sha256 = "abc123...",
strip_prefix = "weights",
build_file = "//third_party:model_weights.BUILD",
)
gs_file(
name = "deploy_tool",
bucket = "my-tools",
file_path = "bin/tool-linux-amd64",
sha256 = "def456...",
executable = True,
)Archive rules expose whatever targets are defined by build_file or
build_file_contents. File rules expose a single filegroup at
@name//file:file.
# BUILD.bazel
cc_binary(
name = "train",
srcs = ["train.cc"],
data = ["@model_weights//:files"],
)
sh_binary(
name = "run_tool",
srcs = ["wrapper.sh"],
data = ["@deploy_tool//file:file"],
)Use this only for projects that still rely on legacy external dependency registration.
# WORKSPACE
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
http_archive(
name = "cloud_archive",
sha256 = "<release archive sha256>",
strip_prefix = "cloud_archive-<version>",
urls = ["https://github.com/1e100/cloud_archive/archive/refs/tags/<version>.tar.gz"],
)In WORKSPACE mode there is no module extension, so call
cloud_archive_setup() once before declaring any cloud_archive rules.
# WORKSPACE
load(
"@cloud_archive//:cloud_archive.bzl",
"cloud_archive_setup",
"gs_archive",
"s3_archive",
)
cloud_archive_setup()
s3_archive(
name = "model_weights",
bucket = "my-ml-artifacts",
file_path = "models/v3/weights.tar.gz",
sha256 = "abc123...",
strip_prefix = "weights",
build_file = "//third_party:model_weights.BUILD",
)
gs_archive(
name = "training_data",
bucket = "my-datasets",
file_path = "train/train.tar.zst",
sha256 = "789fed...",
strip_prefix = "train",
build_file_contents = """
package(default_visibility = ["//visibility:public"])
filegroup(name = "files", srcs = glob(["**"]))
""",
)If you need to use a Bazel-managed CLI binary in WORKSPACE mode, set
tool_target on that rule directly:
s3_archive(
name = "private_dep",
bucket = "my-bucket",
file_path = "deps/private_dep.tar.gz",
sha256 = "012345...",
tool_target = "@my_awscli//:aws",
build_file = "//third_party:private_dep.BUILD",
)BUILD file usage is otherwise identical to the Bzlmod case.
| Rule | Provider | Type |
|---|---|---|
s3_archive |
AWS S3 | archive |
s3_file |
AWS S3 | file |
gs_archive |
Google Cloud Storage | archive |
gs_file |
Google Cloud Storage | file |
minio_archive |
MinIO | archive |
minio_file |
MinIO | file |
b2_archive |
Backblaze B2 | archive |
b2_file |
Backblaze B2 | file |
For MinIO rules, file_path should be the full mc source path, for example
"local/mybucket/path/to/archive.tar.zst". The other providers use separate
bucket and file_path attributes.
| Attribute | Required | Description |
|---|---|---|
name |
yes | Repository name |
file_path |
yes | Object path inside the bucket, or the full MinIO source path |
sha256 |
yes | Expected SHA-256 checksum |
tool_target |
no | Label of a CLI binary; overrides extension config and $PATH |
download_max_attempts |
no | Maximum download attempts before failing. Default: 3. Set to 1 to disable retries. |
download_backoff_base |
no | Base delay in seconds between retry attempts; doubles each attempt (exponential backoff). Default: 5. |
| Attribute | Description |
|---|---|
build_file |
Label of a BUILD file for the extracted archive |
build_file_contents |
Inline BUILD contents; takes precedence over build_file |
strip_prefix |
Directory prefix to strip during extraction |
add_prefix |
Directory prefix to add after extraction |
type |
Explicit archive type for extensionless files such as tar.gz or tar.zst |
patches |
Patch files to apply after extraction |
patch_args |
Arguments for the patch tool, for example ["-p1"] |
patch_cmds |
Shell commands to run after patching |
| Attribute | Description |
|---|---|
downloaded_file_path |
Path assigned to the downloaded file inside the repo |
executable |
Whether to make the downloaded file executable |
| Attribute | Providers | Description |
|---|---|---|
bucket |
S3, GCS, B2 | Bucket name |
profile |
S3 | AWS CLI profile to use |
file_version |
S3 | Version ID for versioned objects |
When a rule needs a provider CLI, it resolves it in this order:
- Per-rule
tool_target - Module extension config from
cloud_archive.configure() $PATH
That lets you set a default once for the whole repo, then override a specific rule if needed.
Run the repository-local tests with:
bazel test //...That exercises the local_file and local_archive test rules without needing
real cloud credentials.
MinIO end-to-end coverage lives in e2e/README.md and can be
run with:
cd e2e
./run_tests.shThat suite spins up a local MinIO server, configures mc via MC_CONFIG_DIR,
uploads test objects, and exercises the local/bucket/key MinIO path flow.
Apache 2.0. See LICENSE.