Constraints: Apache Iceberg + Apache Spark are mandatory. No DuckDB. Target stack: Go (orchestrator) + Scala/PySpark (distributed runtime) + Python sidecar (lightweight/notebook) + Spark-on-K8s Operator + Lakekeeper (Iceberg REST catalog) + S3 compatible (Rook/Ceph).
Current acknowledged state (do not redo):
services/pipeline-build-service/internal/spark/spark.goalready rendersSparkApplicationCRs and delivers them to the Spark Operator via REST.services/pipeline-runner-spark/(Scala) executes--inline-sqlagainst the Iceberg catalog and publishes withdf.writeTo(...).createOrReplace().services/pipeline-runner/(Go) acts as a wrapper forspark-submit.services/pipeline-build-service/internal/handler/distributed_runtime.goalready wires thespark|pysparkengine (Flink is stubbed).libs/pipeline-expression/(~4.3k LoC) already has parser, evaluator, type inference, and function catalog.services/notebook-runtime-service/internal/kernel/python.gowires thelibs/python-sidecar.services/dataset-versioning-service/internal/hasbackingfs,runtime,domain, and 18 migrations.What's missing for 1:1 parity is in the tasks below. Each task is a self-contained prompt to hand off to a coding agent, with links to Palantir's official documentation.
Context: the service has 49+ handler files and a rich domain, but I don't
see a consolidated declaration of the REST endpoints that already exist nor a
mapping to proto/pipeline/{builds,pipeline,schedules,schedule_runs}.proto.
proto/pipeline/transform.proto and proto/pipeline/schedule.proto are 2-line
stubs.
Prompt:
Audit
services/pipeline-build-service/internal/handler/handlers.goand any route registration ininternal/server/. List every endpoint already implemented (method + route + handler). Cross-reference the result withproto/pipeline/builds.proto,proto/pipeline/pipeline.proto,proto/pipeline/schedules.proto,proto/pipeline/schedule_runs.proto, andproto/pipeline/lineage.proto. For each proto RPC that has no HTTP route, implement the handler using the existing repositories and runners. Fill inproto/pipeline/transform.protowith the RPCsCompileTransform,ValidateTransform,PreviewTransform,RegisterPythonTransform,RegisterSqlTransform,RegisterPipelineBuilderGraph. Generate code withmake gen. Ensure parity with the Foundry model: build = single execution with atomic transactions over output datasets; pipeline = compilable node graph; schedule = cron/event-based trigger that fires builds.References:
- Builds overview: https://www.palantir.com/docs/foundry/data-integration/builds-overview
- Schedules: https://www.palantir.com/docs/foundry/data-integration/schedules-overview
- Build queue & resource management: https://www.palantir.com/docs/foundry/data-integration/build-queue
Context: in internal/handler/execution.go I see OpenPipelineRun,
MarkPipelineRunRunning, FinishPipelineRun, AbortPipelineRun,
ListBuildQueue, QueueSummary, UpdatePipelineNextRun, attemptNumber,
retryOfRunID. This is the foundation, but we still need to verify forced
transitions, idempotency, and retries with exponential backoff.
Prompt:
In
services/pipeline-build-service/internal/domain/, formalize the build state machine usinglibs/state-machine. States:QUEUED,WAITING_FOR_RESOURCES,RUNNING,SUCCEEDED,FAILED,CANCELLED,TIMED_OUT,RETRYING. Define legal transitions, the events that fire them (submit,assigned,started,node_failed,all_nodes_done,user_aborted,deadline_exceeded), and emit events tolibs/event-bus-dataon every transition. Implement automatic retries with exponential backoff (configurable:max_attempts,initial_delay_ms,multiplier,jitter) when the failure is transient (Spark Operator reportsFAILED_SUBMISSIONor a kube-client network error). Uselibs/idempotencyso that aPOST /pipelines/{id}/buildswith the sameidempotency-keyreturns the same build. Tests: include a table-driven test covering every legal and forbidden transition.References:
- Job statuses / retry: https://www.palantir.com/docs/foundry/data-integration/job-status
- Retries: https://www.palantir.com/docs/foundry/data-integration/build-retries
- Build cancellation: https://www.palantir.com/docs/foundry/data-integration/aborting-builds
Context: ListBuildQueue and QueueSummary already exist. We still need
the engine that prioritizes, allocates resources, and respects per-project /
per-tenant quotas.
Prompt:
Design and implement a dispatcher in
services/pipeline-build-service/internal/domain/dispatcherthat:
- Reads builds in
QUEUEDstate frompipeline_runs.- Applies a Foundry-style scheduling policy: priority (user-set), project, round-robin fairness across projects in the same tenant, and respect for
resource_pools(CPU/RAM totals assigned to a compute pool).- Before marking
RUNNING, calls the Spark Client (spark.SparkClient) to reserve capacity; if none is available, leaves the build inWAITING_FOR_RESOURCESwith the reason.- Exposes
GET /resource-pools,POST /resource-pools,PATCH /resource-pools/{id}to administer pools.- Emits Prometheus metrics:
pipeline_build_queue_depth{pool},pipeline_build_wait_seconds{pool,priority},pipeline_build_pool_utilization{pool}.References:
- Resource queues: https://www.palantir.com/docs/foundry/data-integration/resource-queues
- Compute usage units: https://www.palantir.com/docs/foundry/resource-management/compute-usage
Context: job_logs_test.go exists but I don't see an actual handler
exposed.
Prompt:
Implement
GET /builds/{id}/logs?follow=trueas an SSE/WebSocket that connects to the Kubernetes API to dokubectl logs -fagainst the driver pod of theSparkApplication. Multiplex stdout/stderr and normalize to the[pipeline-runner pipeline_id=… run_id=…] …format thatpipeline-runner-spark/PipelineRunner.scalaalready emits. Persist the full log to S3 when the build finishes (key:s3://logs/builds/{build_id}/driver.log) and store the reference inpipeline_runs.log_uri. Add a viewer in apps/web/src/lib/components/pipeline/ that consumes the stream.References:
Context: libs/pipeline-expression/catalog.go already has a function
catalog. What's missing is the contract for the visual graph that the UI
saves.
Prompt:
Define in
libs/pipeline-expression/graph.gothe canonical JSON schema of a "Pipeline Builder logic graph" with parity to Palantir:
- Node types:
dataset_input,dataset_output,media_set_output,virtual_table_output,ontology_object_output,filter,select,join(inner|left|right|outer|anti|semi|cross|knn|lookup),aggregate,aggregate_over_window,project_over_window,pivot,unpivot,union,intersect,except,sort,rank,cast,derived_column,geo_join,media_transform,checkpoint,sample,expectation(data quality assertion).- Connections with
from_node_id/from_output_port/to_node_id/to_input_port(multi-port per node, e.g. join hasleft/right).- Expressions as an embedded AST (the format already emitted by
libs/pipeline-expression/parser.go).- Graph versioning (each save generates a
graph_version).- Metadata:
pipeline_id,branch_id,compiled_at,compiler_version.Provide a validator
ValidateGraph(g Graph) []ValidationErrorthat verifies there are no cycles, that every node receives all its required ports, that expressions reference existing columns, and that types match (lean oninfer.goandnode_check.go).References:
- Pipeline Builder nodes: https://www.palantir.com/docs/foundry/pipeline-builder/overview
- Pipeline Builder transforms: https://www.palantir.com/docs/foundry/pipeline-builder/transforms
- Pipeline Builder expressions: https://www.palantir.com/docs/foundry/pipeline-builder/expressions-overview
Context: today distributed_runtime.go ships --inline-sql with ONE
statement; the real Pipeline Builder compiles an entire graph into an
optimized Spark plan.
Prompt:
Create
libs/pipeline-expression/compiler/with a graph → Spark SQL compiler that runs in a singleSparkApplication. The compiler:
- Topologically sorts the nodes.
- Assigns each intermediate node a
TEMP VIEWwith a stable name (node_<short_hash_id>).- For each node, emits a
CREATE OR REPLACE TEMP VIEW node_x AS …in SparkSQL, translated from the node type (join →JOIN, aggregate →GROUP BYwith aggregators from the catalog, window →OVER(PARTITION BY …), pivot →PIVOTfunction, media_transform → UDF call toOF_MEDIA_TRANSFORM(...)).- For each
dataset_output, emits the final statement asINSERT INTO <iceberg_table>forAPPEND/INSERT OVERWRITE <iceberg_table>forSNAPSHOT/MERGE INTOforUPDATE(withMATCHED/NOT MATCHEDkeyed onmerge_keys).- Returns
CompiledPlan { Statements []string, Inputs []DatasetRef, Outputs []OutputBinding, EstimatedShuffle int64 }.Generate plans identical to the pattern
pipeline-runner-sparkalready expects (statements separated by;, last statement is the writer). Cover with golden tests underlibs/pipeline-expression/compiler/testdata/.References:
- Pipeline Builder compilation model: https://www.palantir.com/docs/foundry/pipeline-builder/pipeline-builder-architecture
- Joins: https://www.palantir.com/docs/foundry/pipeline-builder/joins
- Aggregations: https://www.palantir.com/docs/foundry/pipeline-builder/aggregations
- Windows: https://www.palantir.com/docs/foundry/pipeline-builder/window-functions
Context: preview.go exists, but its current contract is for in-memory
node-by-node preview. End-to-end preview from a full graph is missing.
Prompt:
In
pipeline-build-service, add two endpoints:
POST /pipelines/{id}/compilethat validates + compiles the graph into aCompiledPlanand persists it aspipeline_planswith a deterministic hash of the graph. Returns the plan and aplan_id.POST /pipelines/{id}/previewwith body{node_id, sample_rows: 100, sampling_strategy: "head"|"random"|"stratified", branch_id?}. The handler:
- Compiles only the subgraph upstream of
node_id.- Injects
LIMIT 100 SAMPLE 5 PERCENTso it doesn't read everything.- Submits the plan to a Spark "preview pool" with a short timeout (90s).
- Returns rows + inferred schema + warnings. Ensure UX parity with Pipeline Builder Preview: 15-minute cache keyed on (graph_version, node_id, sampling).
References:
- Pipeline Builder preview: https://www.palantir.com/docs/foundry/pipeline-builder/preview
- Sampling strategies: https://www.palantir.com/docs/foundry/pipeline-builder/sampling
Prompt:
Implement
checkpointandexpectationnodes in the compiler.
checkpoint: after compilation, emits aCACHE TABLE node_xbefore expensive nodes; optionally persists the TEMP VIEW as an intermediate Iceberg table unders3://intermediates/{pipeline_id}/{build_id}/{node_id}/.expectation: emits post-execution asserts (SELECT COUNT(*) FROM node_x WHERE NOT (<predicate>)); if > 0 it fails the build withexpectation_violatedand saves a sample of violating rows inbuild_expectation_violations. The UI must display the violations.References:
- Checkpoints: https://www.palantir.com/docs/foundry/pipeline-builder/checkpoints
- Data expectations: https://www.palantir.com/docs/foundry/data-quality/data-expectations
- Health checks: https://www.palantir.com/docs/foundry/data-health/overview
Context: today pipeline-runner-spark is Scala with --inline-sql. Foundry
supports PySpark as a first-class citizen with decorators
@transform_df, @transform, @incremental, @configure(profile=...).
Prompt:
Create a Python package
sdks/python/foundry-transforms/that reproduces Foundry'stransforms.api:
@transform(my_output=Output('rid'), my_input=Input('rid'))with a function that receivesctx,my_input(TransformInput), andmy_output(TransformOutput).my_input.dataframe()returns a Spark DataFrame read from the Iceberg catalog.my_output.write_dataframe(df)orset_mode("snapshot"|"append"|"update"|"delete").@transform_df(Output(...), Input(...))sugar for 1 output, 1+ inputs.@transform_pandas(...)for small datasets.@configure(profile_name="...")for resource profile selection.@incremental(snapshot_inputs=[...], require_incremental=False, semantic_version=1)with Foundry's semantics.TransformContextwithctx.is_incremental,ctx.fallback_branches,ctx.spark_session,ctx.auth_header,ctx.parameters.Build a runner equivalent to today's Scala one at
services/pipeline-runner-spark-python/(PySpark, with aDockerfilethat installs the SDK) that receives--module-zip(a .zip of the code repo),--entrypoint module:function, and the standard args frompipeline-runner-spark/PipelineRunner.scala. It invokes the decorated function and reads/writes against the Iceberg catalog.Modify
services/pipeline-build-service/internal/handler/distributed_runtime.goto useApplicationType: "Python"when the node ispython_transformand to mount the repo zip into the pod.References:
- Python transforms API: https://www.palantir.com/docs/foundry/transforms-python/transforms-python-api
@transform,@transform_df: https://www.palantir.com/docs/foundry/transforms-python/transforms-python-overview@incrementalreference: https://www.palantir.com/docs/foundry/transforms-python/incremental-reference/index.html- Profile / @configure: https://www.palantir.com/docs/foundry/transforms-python/profiles
- Incremental usage: https://www.palantir.com/docs/foundry/transforms-python/incremental-usage
Prompt:
Create
sdks/java/foundry-transforms/with the Java/Scala equivalent:@Compute,Input<Dataset>,Output<Dataset>,IncrementalTransform,RetryStrategy,BuildContext. The resulting JAR is used bypipeline-runner-spark(which is already Scala) by loading the user's class reflectively from the uploaded JAR. The code-repo compiler (Task C5) must publish a single uber-JAR per commit, mounted as an additional dependency on theSparkApplication.References:
- Java transforms: https://www.palantir.com/docs/foundry/transforms-java/overview
Prompt:
Support pure
*.sqlrepos with a tiny YAML header:-- @output: ri.foundry.main.dataset.abc123 -- @inputs: { sales: ri.foundry.main.dataset.def456 } -- @mode: snapshot SELECT … FROM ${sales} …In
libs/pipeline-expression/sql/, parse that header, substitute placeholders with fully qualified Iceberg names (catalog.namespace.table), and emit the statement asinline_sqlfor the Scala runner (which already supports this). Add a validator that rejects SQL with directINSERT/UPDATE/DELETEwhen the declared@modedoes not allow it.References:
- SQL transforms: https://www.palantir.com/docs/foundry/transforms-sql/overview
Prompt:
Create a
pipeline_profilestable + endpointsGET/POST /pipeline-profiles. Each profile defines{driver_cores, driver_memory, executor_cores, executor_instances, executor_memory, spark_conf: map<string,string>, spark_packages: [], allowed_for: ["python","java","sql","pipeline-builder"]}. Transforms reference a profile by name. The renderer ininternal/spark/spark.goalready hasSparkResourceOverrides; wire it so it first resolves the profile and then applies per-node overrides.Replicate Foundry's default profiles:
KUBERNETES_MEMORY_LARGE,KUBERNETES_MEMORY_EXTRA_LARGE,DRIVER_MEMORY_LARGE,EXECUTOR_CORES_MEDIUM,EXECUTOR_MEMORY_LARGE,DYNAMIC_ALLOCATION_ENABLED,NUM_EXECUTORS_8.References:
- Profiles: https://www.palantir.com/docs/foundry/transforms-python/profiles
- Compute usage / resource units: https://www.palantir.com/docs/foundry/resource-management/compute-usage
Context: services/code-repository-review-service exists with 2 handlers
and the code_repo/*.proto is a stub. Foundry compiles each commit of a
transforms repo into a frozen artifact.
Prompt:
Implement the Code Repositories CI cycle:
- Complete
proto/code_repo/{repository,branch,review}.protowith CRUD for repos, branches, commits, and PRs.- In a new service
services/code-repository-ci-service/, implement a webhookPOST /webhook/gitthat triggers a build:
- Python: runs
pip install -r requirements.txt, runspytest, packagesmodule.zipwith the commit SHA.- Java/Scala: invokes
sbt clean assemblyto produce an uber-jar.- SQL: validates each file with Task C3.
- Publishes the artifact to
s3://code-artifacts/{repo_id}/{commit_sha}/…and records a row incode_repo_builds.- Every transform declared in the repo (
@transformor SQL@output) is automatically registered inpipeline_transformswithrepo_id,commit_sha,entrypoint,profile.- When
pipeline-build-serviceresolves a build, it looks up the transform inpipeline_transforms, downloads the artifact, and hands it to the Spark runner.References:
- Code Repositories: https://www.palantir.com/docs/foundry/code-repositories/overview
- Checks (CI): https://www.palantir.com/docs/foundry/code-repositories/checks
- Tags and artifacts: https://www.palantir.com/docs/foundry/code-repositories/tags
Context: the Python SDK (Task C1) declares @incremental, but the runtime
needs to translate it into Iceberg "since snapshot X" reads.
Prompt:
Extend
pipeline-runner-spark(Scala) and the Python SDK for incremental:
- Before executing, read
pipeline_runsto find the last SUCCEEDED run of the same transform and pulllast_input_snapshots(map dataset_rid → Iceberg snapshot_id used).- For every non-snapshot input, expose to the user:
my_input.dataframe()→ full snapshot.my_input.dataframe("added")→SELECT * FROM table.changes WHERE snapshot_id > last_seen.my_input.dataframe("modified")anddataframe("removed")for tables with CDC enabled.- For outputs, expose:
my_output.write_dataframe(df, mode="snapshot"|"append"|"update").my_output.previous_dataframe()to read the previous output.- After success, persist
last_input_snapshotsand the output's own snapshot. If the graph changes (semantic_versionbump), discard history and force a snapshot.- Cover
require_incremental=Truewith an explicit failure.Use the Iceberg "Incremental reads" API (
spark.read.format("iceberg").option("start-snapshot-id", X)).References:
- Incremental overview: https://www.palantir.com/docs/foundry/transforms-python/incremental-overview
- Incremental reference: https://www.palantir.com/docs/foundry/transforms-python/incremental-reference/index.html
- Iceberg incremental reads: https://iceberg.apache.org/docs/latest/spark-queries/#incremental-read
- Historical dataset from snapshots: https://www.palantir.com/docs/foundry/transforms-python/create-historical-dataset
Context: distributed_runtime.go has a flink branch that returns
flink_runtime_not_configured. Foundry uses Flink, but since the stack is
Spark, use Spark Structured Streaming first (without Flink).
Prompt:
Add
mode: "streaming"topipeline_runsand to the pipeline definitions. Inpipeline-runner-spark, support--stream-trigger(once|continuous|processing-time:5s) and--checkpoint-location(s3://checkpoints/{pipeline_id}/). The runner:
- Creates
spark.readStream.format("iceberg")...load(input).- Applies the compiled plan (B2), which for streaming requires compatible nodes (no
pivot, noaggregate_over_windowwithout a watermark).- Writes with
.writeStream.format("iceberg").outputMode("append") .option("checkpointLocation", ...).toTable(output).- The SparkApplication CR switches to
restartPolicy: Alwaysand lives as aDeployment. Create a new CR-type renderer ininternal/spark/spark.gofor this case.Add endpoints
POST /pipelines/{id}/streams/startandPOST /pipelines/{id}/streams/stopwith Prometheus metrics (stream_events_per_sec,stream_lag_ms).References:
- Streaming pipelines overview: https://www.palantir.com/docs/foundry/data-integration/streaming-overview
- Streaming transforms: https://www.palantir.com/docs/foundry/streaming/overview
- Iceberg streaming reads: https://iceberg.apache.org/docs/latest/spark-structured-streaming/
Prompt:
Integrate
services/ingestion-replication-service(which already has 12 handlers + 11 migrations) with the pipeline's streaming mode. The connector publishes to a Kafka topiccdc.<source>.<table>; a streaming pipeline consumes the topic withspark.readStream.format("kafka")and materializes to Iceberg with MERGE INTO on themerge_keysdeclared in the dataset.References:
- Streaming source connectors: https://www.palantir.com/docs/foundry/data-connection/streaming
- CDC patterns: https://www.palantir.com/docs/foundry/data-connection/cdc-overview
Context: dataset_output_committer.go and internal/iceberg/ exist.
Verify and complete.
Prompt:
In
services/dataset-versioning-service/, formalize the transactions API:
POST /datasets/{rid}/transactions {type: SNAPSHOT|UPDATE|APPEND|DELETE, branch: "master"}→ returnstransaction_ridin stateOPEN.POST /datasets/{rid}/transactions/{tx}/filesto upload raw files (streaming multipart) landing unders3://datasets/{rid}/{tx}/….POST /datasets/{rid}/transactions/{tx}/commit→ validates schema, calls Iceberg viaiceberg-catalog-serviceto create the corresponding snapshot:
SNAPSHOT→INSERT OVERWRITE.APPEND→INSERT INTO.UPDATE→MERGE INTO ... WHEN MATCHED UPDATE SET ... WHEN NOT MATCHED INSERT ...over theprimary_keydeclared in the schema.DELETE→DELETE FROM ... WHERE <predicate>.POST /datasets/{rid}/transactions/{tx}/abort→ frees raw space and marksABORTED.GET /datasets/{rid}/transactions?branch=masterlists history.- When a build calls the committer: open tx → write via Spark → commit atomically. Use
libs/sagato guarantee rollback if Iceberg fails partially.References:
- Dataset transactions: https://www.palantir.com/docs/foundry/data-integration/datasets-overview
- Transaction types: https://www.palantir.com/docs/foundry/data-integration/datasets-views
- Iceberg MERGE: https://iceberg.apache.org/docs/latest/spark-writes/#merge-into
Context: proto/dataset/branch.proto has 59 lines. Verify whether the
logic is executed.
Prompt:
Implement dataset branches on top of Iceberg
branches(an Iceberg 1.2+ feature):
POST /datasets/{rid}/branches {name, from_branch}→ creates an Iceberg branch withALTER TABLE … CREATE BRANCH <name>.GET /datasets/{rid}/brancheslists them.POST /datasets/{rid}/branches/{name}/merge {into}→ fast-forward.POST /datasets/{rid}/branches/{name}/delete. Each build writes to the branch it receives in its context (ctx.branch_id). The compiled plan must inject... VERSION AS OF BRANCH '<name>'for reads.References:
- Branches in Foundry: https://www.palantir.com/docs/foundry/data-integration/branching-overview
- Iceberg branching: https://iceberg.apache.org/docs/latest/branching/
Context: schema_validation.go and schema_guidance.go exist.
Prompt:
Formalize the schema evolution policy:
- Backwards-compatible (add column, widen type, make nullable) → allow without intervention.
- Breaking (drop column, narrow type, rename) → require
?allow_schema_break=trueand saveschema_break_auditwith the user who approved it. The validator runs before executing the Spark plan; if it rejects, the build moves toFAILEDwithschema_incompatibleand the UI shows the diff. Lean on IcebergUPDATE SCHEMAto apply the change on commit.References:
- Schema evolution: https://www.palantir.com/docs/foundry/data-integration/schema-overview
- Iceberg schema evolution: https://iceberg.apache.org/docs/latest/evolution/
Prompt:
Implement
dataset_viewsas IcebergCREATE VIEW:POST /datasets/views {name, query, parent_dataset_rids}. The view appears as a normal dataset in the UI, but its backing is an Iceberg view. When a build reads it, Spark resolves the view without materializing.References:
- Foundry views: https://www.palantir.com/docs/foundry/data-integration/datasets-views
- Iceberg views: https://iceberg.apache.org/docs/latest/sql-views/
Context: proto/pipeline/schedules.proto is 205 lines;
libs/scheduling-cron/ is implemented with DST tests. Exposure is missing.
Prompt:
Implement the schedules domain:
POST /pipelines/{id}/schedules {cron: "0 */4 * * *", timezone: "America/New_York", trigger: "cron"|"on_data_change"| "on_upstream_success", upstream_dataset_rids: [], retry_policy: {…}}.- A worker in
pipeline-build-servicequeriesListDuePipelines(already exists) every 30s and creates builds.- For
on_data_change, subscribe to the Kafka topicdataset.{rid}.transaction.committedand fire a build with dataset inputs pinned to the snapshot just published.- For
on_upstream_success, hook into thebuild.succeededevent frompipeline-build-service.- Pause/resume schedule, run history (
schedule_runs.protoalready exists, 66 lines).References:
Prompt:
Implement Foundry-style "force builds": the user selects a final dataset/pipeline and the platform computes and fires every required upstream build. Use
lineage-serviceto resolve the DAG. Endpoint:POST /builds/force {target_dataset_rids: [...], branch: "...", ignore_recent: true}. Create builds in topological order withdepends_on_build_idso each one waits on the previous.References:
- Force build / build target: https://www.palantir.com/docs/foundry/data-integration/build-target
Context: notebook-runtime-service already wires libs/python-sidecar;
notebook.proto/cell.proto/kernel.proto are in place. But the
"dataset-as-variable + cells produce datasets" model isn't wired up that I
can see.
Prompt:
In
services/notebook-runtime-service, add:
POST /workbookswithkernel_type: "pyspark"|"python"|"r".- Each cell has
{kind: "code"|"markdown"|"visualization", language: "python"|"r"|"sql", output_dataset_rid?: string, input_dataset_rids: [], depends_on_cells: []}.- When a cell with
output_dataset_ridexecutes, the code output (a Spark DataFrame in theresultvariable) is persisted as a SNAPSHOT transaction (Task E1) on the dataset; subsequent cells can importfrom foundry import datasets; df = datasets.dataset('rid').dataframe().- Support PySpark kernels with a
SparkSessionalready configured against the Lakekeeper catalog (clone the logic from the CR template).POST /workbooks/{id}/run-allexecutes cells in topological order.- "Productionize":
POST /workbooks/{id}/promotegenerates a Code Repository with one transform peroutput_dataset_rid(Task C5).References:
- Code Workbook overview: https://www.palantir.com/docs/foundry/code-workbook/overview
- Concepts: https://www.palantir.com/docs/foundry/code-workbook/concepts
- Productionize: https://www.palantir.com/docs/foundry/code-workbook/productionizing
- Visualizations: https://www.palantir.com/docs/foundry/code-workbook/visualizations
Prompt:
Modify
libs/python-sidecar(or create a new sidecarlibs/pyspark-sidecar) that launches a local Spark driver (local[*]mode for small workbooks) or a remote driver on K8s for heavier workbooks. Per-workbook persistent session, with TTL and hibernation after inactivity. Automatically injectdataset('rid').dataframe()anddataset('rid').write_dataframe(df)into the kernel namespace.References:
- Spark profile selection in Workbook: https://www.palantir.com/docs/foundry/code-workbook/spark-profiles
Context: I don't see a dedicated service. This is a large Foundry module.
Prompt:
Create
services/code-workspaces-service/(cloningdocs/templates/service-skeleton/). Endpoints:
POST /workspaces {type: "jupyterlab"|"vscode"|"rstudio", profile_id, repo_id?, branch?, environment_id}.GET /workspaces/{id}returns{state, url, last_active_at, idle_ttl}.POST /workspaces/{id}/start,/stop,/hibernate.- Each workspace is provisioned as a Kubernetes
StatefulSetwith a personal PVC per user:
- JupyterLab: image
jupyter/pyspark-notebook+ OpenFoundry client library preinstalled.- VS Code: image
codercom/code-serverwith default extensions.- RStudio: image
rocker/rstudio.- Per-workspace ingress at
workspace-{id}.<host>with an auth proxy againstidentity-federation-service(already exists).- Hibernation: if the workspace is
idle_ttlminutes without traffic, runkubectl scale --replicas=0. Wake-on-request: the ingress has middleware that scales to 1 on the first request.- Volume mounted at
/home/user/workspace, persistent.References:
- Code Workspaces overview: https://www.palantir.com/docs/foundry/code-workspaces/overview
- Getting started: https://www.palantir.com/docs/foundry/code-workspaces/getting-started
- JupyterLab: https://www.palantir.com/docs/foundry/code-workspaces/jupyterlab
- VS Code workspaces: https://www.palantir.com/docs/foundry/vs-code/overview
- Lifecycle/FAQ: https://www.palantir.com/docs/foundry/code-workspaces/code-workspaces-faq
Prompt:
Preinstall a
foundry-fsclient (preferably FUSE or sidecar process) in the Jupyter/VS Code image that lazily mounts/datasets/<dataset_rid>/<branch>/<files>from S3 using delegated access tokens for the current user. Honorauthorization-policy-serviceto restrict access. Parity:import foundry; df = foundry.datasets.dataset('rid').dataframe()must work identically to the Python SDK (Task C1).References:
- Interact with data in Code Workspaces: https://www.palantir.com/docs/foundry/code-workspaces/data
Prompt:
Create
services/library-environment-service/that manages "managed environments":
POST /environments {name, kind: "conda"|"pip"|"r-cran", spec: <yaml or requirements.txt>}→ resolves dependencies in a build job (base image +conda env createorpip install), publishes the resulting image to the internal registry.- Every Code Workspace and Code Workbook can reference an
environment_idmounted at startup.- The same reference is valid in transforms (Task C1) for reproducibility.
- Versioning: each
speccreates a new immutable revision.References:
- Maestro / managed envs: https://www.palantir.com/docs/foundry/code-workspaces/managed-environments
Prompt:
Each tenant (or resource pool from Task A3) runs
SparkApplications in a dedicated Kubernetes namespace with a NetworkPolicy that only allows traffic toiceberg-catalog-serviceand the internal S3 endpoint. Wire the dynamic namespace intopipeline-build-service/internal/spark/spark.go::PipelineRunInput. Implement a job inservices/tenancy-organizations-servicethat creates the namespaces and ServiceAccounts via a K8s operator when a tenant is provisioned.References:
- Compute isolation: https://www.palantir.com/docs/foundry/security/network-isolation
- Projects: https://www.palantir.com/docs/foundry/projects/overview
Prompt:
Add
profile.spot_enabled: boolandprofile.spot_max_price. The CR renderer applies tolerations + nodeSelectorcloud.google.com/gke-spot: "true"or the AWS/Azure equivalent. If the driver is evicted, mark the build asRETRYINGwithfailure_reason: "preempted"and requeue (respecting max_attempts).References:
- Spot instances in Foundry: https://www.palantir.com/docs/foundry/resource-management/spot-execution
Prompt:
Enable the Spark
metricsServletand a Prometheus sink in the CR template; exposeGET /builds/{id}/metricsthat aggregates:
- rows read/written per node (via Iceberg
snapshot.summary).- duration per stage, GC time, shuffle bytes.
- skew histogram (max/min/median task duration).
- estimated costs (CPU·s · €/CPU·s · executor count). Store in
pipeline_run_metricsand draw in a "Build Inspector"-style UI.References:
- Build inspector: https://www.palantir.com/docs/foundry/data-integration/build-inspector
- Spark metrics & monitoring: https://spark.apache.org/docs/latest/monitoring.html
Prompt:
Each
SUCCEEDEDbuild emits an OpenLineage event toservices/lineage-servicewithinputs,outputs,job_facets(graph_version, plan_id), anddataset_facets(schema, row_count, snapshot_id). The sink already exists inlineage-service. The automatic producer inside the executor is what's missing.References:
- Lineage in Foundry: https://www.palantir.com/docs/foundry/data-lineage/overview
- OpenLineage spec: https://openlineage.io/docs/spec/0-overview
Prompt:
Under
tests/parity/compute/, create a Go suite that runs end to end against a local k3s/kind cluster:
- Creates a CSV dataset → registers it as Iceberg.
- Creates a Code Repository with a Python
@transform.- Creates a pipeline in Pipeline Builder (graph JSON) with join+aggregate.
- Creates a cron-every-minute schedule.
- Waits for three builds: the
Build Inspectormust show Iceberg-level metrics.- Modifies the graph → new branch → preview → merge.
- Converts the pipeline to streaming → publishes Kafka events → verifies rows in Iceberg with
processing_time_lag < 30s. Run as part ofmake test-integration.References:
- Tutorials: https://www.palantir.com/docs/foundry/tutorials/build-an-ontology
- Reference architectures: https://www.palantir.com/docs/foundry/reference-architecture/overview
- A1, A2, E1 — foundation: API + lifecycle + real transactions.
- B1, B2, B3 — compilable Pipeline Builder.
- C1, C4, C5 — Python transforms + repos + profiles.
- E2, E3, F1, F2 — branches + schedules + dep-driven.
- D1 — incremental.
- A3, I1, J1 — queue, multi-tenancy, observability.
- G1, G2 — Code Workbook.
- H1, H2, H3 — Code Workspaces.
- D2, D3 — Streaming.
- C2, C3 — Java/SQL transforms.
- B4, J2, K1 — checkpoints, lineage, smoke E2E.
Each block produces a deliverable slice that can be tested independently.