Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ resolver = "2"

[workspace.package]
edition = "2021"
license = "MIT"
license = "Apache-2.0"
publish = false
version = "0.0.3"

Expand Down
45 changes: 37 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,33 @@ Recurring Convex export pipelines for local analytics, Databricks, and downstrea

[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/shpitdev/convex-sync-kit)
[![Release](https://img.shields.io/github/v/release/shpitdev/convex-sync-kit?display_name=tag)](https://github.com/shpitdev/convex-sync-kit/releases)
[![License: MIT](https://img.shields.io/badge/license-MIT-2ea44f)](LICENSE)
[![License: Apache-2.0](https://img.shields.io/github/license/shpitdev/convex-sync-kit)](LICENSE)

![Rust](https://img.shields.io/badge/Rust-000000?logo=rust&logoColor=white)
![Convex](https://img.shields.io/badge/Convex-EE342F?logo=convex&logoColor=white)
![Amazon S3](https://img.shields.io/badge/Amazon%20S3-569A31?logo=amazons3&logoColor=white)
![Databricks](https://img.shields.io/badge/Databricks-FF3621?logo=databricks&logoColor=white)
![Palantir Foundry](https://img.shields.io/badge/Palantir%20Foundry-virtual%20tables-101828)

## Required Inputs

These are the minimum inputs almost everyone needs before a recurring sync will work:

```bash
export CONVEX_DEPLOYMENT_URL=https://your-deployment.convex.cloud
export CONVEX_DEPLOY_KEY=your-convex-deploy-key
```

Target-specific requirements:

| Target | Also required |
|---|---|
| Local recurring analysis | writable output paths |
| S3/export | AWS credentials, `--bucket`, optional `--prefix` |
| Databricks Delta | Databricks profile plus a SQL warehouse ID for bootstrap |
| Databricks over S3 | Databricks profile, SQL warehouse ID, and Unity Catalog external-location coverage |
| Palantir Foundry | either Databricks/Unity Catalog or an S3 path to connect Foundry to |

## Choose Your Path

```mermaid
Expand All @@ -34,6 +53,8 @@ flowchart TD

If you only need a one-time export or ad hoc backfill, use the official Convex tooling directly. This repo is aimed at recurring pipelines, not the simplest possible one-shot export.

See:

- [Convex streaming import/export](https://docs.convex.dev/production/integrations/streaming-import-export)
- [Convex streaming export API](https://docs.convex.dev/streaming-export-api)

Expand Down Expand Up @@ -68,21 +89,29 @@ There are two supported Databricks paths:
Recommended Databricks Delta flow:

```bash
export CONVEX_SYNC_SOURCE=meshix-api
export CONVEX_SYNC_SOURCE=<source-slug>

just databricks-delta-bootstrap 63d28889f3eb3c4b
just databricks-delta-bootstrap <warehouse-id>
just databricks-delta-sync-secret DEFAULT
just databricks-delta-deploy DEFAULT prod
just databricks-delta-run DEFAULT prod
```

The Delta path creates and updates:

- `convex_sync_kit_<source>_delta_control`
- `convex_sync_kit_<source>_delta_bronze`
- `convex_sync_kit_<source>_delta_silver`

The silver schema is expected to stay empty until you stand up a Lakeflow `AUTO CDC` pipeline for the tables you actually want to materialize there.

Reference Databricks over S3 flow:

```bash
export CONVEX_SYNC_SOURCE=meshix-api
export CONVEX_SYNC_SOURCE=<source-slug>

just run --bucket your-bucket --prefix prod
just databricks-sync-staging-views --warehouse-id 63d28889f3eb3c4b --bucket your-bucket --prefix prod
just run --bucket <bucket> --prefix prod
just databricks-sync-staging-views --warehouse-id <warehouse-id> --bucket <bucket> --prefix prod
```

### 4. Using Palantir Foundry
Expand Down Expand Up @@ -118,7 +147,7 @@ Relevant Foundry docs:
| Databricks over S3 | Unity Catalog views over published parquet snapshots | `convex_sync_kit_<source>_s3` |
| Databricks Delta | checkpoint table, bronze CDC tables, silver current-state tables | `convex_sync_kit_<source>_delta_{control,bronze,silver}` |

The current checked-in source profile is [`sources/meshix-api/env.sh`](sources/meshix-api/env.sh). That is only one source profile, not a repo identity. Add more source directories as you onboard more Convex projects.
The checked-in [`sources/meshix-api/env.sh`](sources/meshix-api/env.sh) file is only an example source profile, not a repo identity. Add more source directories as you onboard more Convex projects.

## Output Paths And Defaults

Expand Down Expand Up @@ -191,4 +220,4 @@ There is a more detailed capture list in [docs/demo-storyboard.md](docs/demo-sto

## License

[MIT](LICENSE)
[Apache License 2.0](LICENSE)
13 changes: 7 additions & 6 deletions scripts/load-source-config.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,16 @@ set -euo pipefail

load_convex_sync_source_config() {
local repo_root="$1"
local source_name="${CONVEX_SYNC_SOURCE:-meshix-api}"
local source_name="${CONVEX_SYNC_SOURCE:-}"

if [[ -z "$source_name" ]]; then
return 0
fi
local source_file="$repo_root/sources/$source_name/env.sh"

if [[ ! -f "$source_file" ]]; then
if [[ -n "${CONVEX_SYNC_SOURCE:-}" ]]; then
echo "unknown Convex source config: $source_name" >&2
return 1
fi
return 0
echo "unknown Convex source config: $source_name" >&2
return 1
fi

# shellcheck source=/dev/null
Expand Down
6 changes: 3 additions & 3 deletions sources/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,6 @@ Recommended contents:
- `DATABRICKS_DELTA_SILVER_SCHEMA`
- `DATABRICKS_DELTA_CHECKPOINT_TABLE`

Scripts load `sources/${CONVEX_SYNC_SOURCE:-meshix-api}/env.sh` automatically.
Explicit environment variables still win because the source files only set
defaults.
Scripts load `sources/$CONVEX_SYNC_SOURCE/env.sh` automatically when
`CONVEX_SYNC_SOURCE` is set. Explicit environment variables still win because
the source files only set defaults.
Comment thread
anand-testcompare marked this conversation as resolved.
Loading