Skip to content

Commit a775ae6

Browse files
scotwellsclaude
andcommitted
docs: propose datumctl compute developer experience
Outlines the proposed `datumctl compute` command group — covering deploy, status, rollout, logs, instance inspection, and quota — with example output for each workflow. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent c39d4b7 commit a775ae6

1 file changed

Lines changed: 330 additions & 0 deletions

File tree

Lines changed: 330 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,330 @@
1+
# `datumctl compute` — Developer Experience
2+
3+
**Status:** Draft
4+
5+
---
6+
7+
## Summary
8+
9+
This document proposes a `compute` subcommand group in `datumctl` designed around the workflows developers actually perform: deploying a workload, watching it roll out across cities, understanding why something isn't running, and inspecting instances when something goes wrong.
10+
11+
The goal is to close the gap between "I have a container image" and "my workload is healthy across multiple locations" without requiring developers to understand the platform's internal resource model or write YAML to do common things.
12+
13+
---
14+
15+
## The problem today
16+
17+
Running a workload on Datum Cloud today requires a developer to:
18+
19+
1. Write a YAML manifest with the correct `apiVersion`, `kind`, and nested spec structure.
20+
2. Apply it with `datumctl apply -f` and wait with no visibility into what's happening.
21+
3. Run `datumctl get workloads` to check status, and then manually interpret raw condition fields.
22+
4. Look up individual instance names to get logs.
23+
24+
Each of these steps has friction that compounds. A developer who hits a quota block on their first deploy gets a raw API condition with no explanation and no next step. Someone who wants to tail logs from their app across two cities has to discover instance names, then run multiple commands.
25+
26+
This experience works. It doesn't feel like a product yet.
27+
28+
---
29+
30+
## Who this is for
31+
32+
The primary audience is a **backend developer** deploying a containerized service to Datum Cloud for the first time or as part of their daily workflow. They are comfortable with the terminal. They may have used Heroku, Railway, Fly.io, or GCP before. They should not need to know anything about how the platform's internal resource model works to deploy and operate their application.
33+
34+
The secondary audience is a **platform operator** or **DevOps engineer** who needs scripting-friendly access to the full resource hierarchy for automation and debugging.
35+
36+
---
37+
38+
## Workflows
39+
40+
The design centers on five workflows, ordered by frequency.
41+
42+
### 1. Deploy a workload
43+
44+
The developer has a container image. They want it running in one or more cities.
45+
46+
The fastest path requires no YAML:
47+
48+
```
49+
$ datumctl compute deploy api \
50+
--image=ghcr.io/acme/api:1.4.2 \
51+
--instance-type=d1-standard-2 \
52+
--city=DFW,IAD \
53+
--min=2 \
54+
--port=8080
55+
56+
Resolving workload "api" in project acme-prod...
57+
Workload does not exist — creating.
58+
Placement "default": cities=[DFW, IAD], min=2
59+
60+
Applying...
61+
workload/api created
62+
63+
Waiting for rollout. Ctrl-C to detach (rollout continues in background).
64+
65+
PLACEMENT CITY DESIRED READY PHASE
66+
default DFW 2 0 Starting
67+
default IAD 2 0 Starting
68+
default DFW 2 2 Running
69+
default IAD 2 2 Running
70+
71+
Rollout complete in 47s.
72+
73+
Instances:
74+
DFW api-dfw-0 203.0.113.10
75+
api-dfw-1 203.0.113.11
76+
IAD api-iad-0 198.51.100.20
77+
api-iad-1 198.51.100.21
78+
79+
Saved workload config to ./workload.yaml — commit this file to manage deployments declaratively.
80+
```
81+
82+
If a developer prefers an interactive walk-through:
83+
84+
```
85+
$ datumctl compute deploy
86+
? Workload name: api
87+
? Container image: ghcr.io/acme/api:1.4.2
88+
? Instance type [d1-standard-2]:
89+
? Cities (comma-separated) [DFW]: DFW,IAD
90+
? Min replicas per city [1]: 2
91+
? Expose port (optional): 8080
92+
93+
workload: api
94+
image: ghcr.io/acme/api:1.4.2
95+
instance type: d1-standard-2
96+
cities: DFW, IAD
97+
replicas: min=2
98+
ports: 8080/tcp
99+
100+
Proceed? (Y/n)
101+
```
102+
103+
For teams managing workloads declaratively, `deploy` also accepts a manifest file. It shows a human-readable diff before applying, rather than applying silently:
104+
105+
```
106+
$ datumctl compute deploy -f workload.yaml
107+
108+
Changes to workload "api":
109+
image: ghcr.io/acme/api:1.4.1 → ghcr.io/acme/api:1.4.2
110+
min replicas (default/DFW): 2 → 3
111+
112+
Apply? (Y/n) y
113+
workload/api updated
114+
```
115+
116+
All three paths — flags, interactive, manifest — converge on the same underlying representation. A developer can start with flags and graduate to a manifest when they need multi-placement topology, custom networking, or volume configuration.
117+
118+
For automated pipelines, pass `-y` to skip the confirmation prompt. The CLI also suppresses the prompt automatically when stdin is not a terminal.
119+
120+
### 2. Check workload health
121+
122+
The developer wants to know if their workload is healthy and how many instances are running across each city.
123+
124+
```
125+
$ datumctl compute status api
126+
127+
Workload api project: acme-prod
128+
Image ghcr.io/acme/api:1.4.2
129+
Updated 47s ago Revision #7
130+
131+
Health Available — all placements at desired replicas
132+
133+
CITY READY DESIRED TYPE
134+
default DFW 2/2 2 d1-standard-2
135+
IAD 2/2 2 d1-standard-2
136+
```
137+
138+
When something is wrong, the status view explains it in plain terms and tells the developer what to do next:
139+
140+
```
141+
$ datumctl compute status api
142+
143+
Workload api project: acme-prod
144+
Image ghcr.io/acme/api:1.4.3
145+
Updated 1m ago Revision #8
146+
147+
Health Degraded — 2 instances blocked in IAD
148+
149+
CITY READY DESIRED TYPE
150+
default DFW 2/2 2 d1-standard-2
151+
IAD 2/4 4 d1-standard-2 [degraded]
152+
153+
IAD: 2 instances could not start — quota exceeded
154+
Requested 4 CPU. 2 CPU available in IAD.
155+
156+
Next steps:
157+
Reduce replicas: datumctl compute scale api --min=2
158+
Check quota: datumctl compute quota
159+
View instances: datumctl compute instances --workload=api
160+
```
161+
162+
The developer never sees raw condition names or internal state reasons. If they need that level of detail for debugging or scripting, `datumctl compute workloads describe api` exposes it.
163+
164+
### 3. Watch a rollout
165+
166+
When a developer updates their workload (new image, changed replica count, config change), they can watch the rollout progress city by city:
167+
168+
```
169+
$ datumctl compute rollout api
170+
171+
Rolling workload "api" rev #7 → #8
172+
173+
PLACEMENT CITY UPDATED READY OLD PHASE
174+
default DFW 0 2 2 Pending
175+
default IAD 0 2 2 Pending
176+
default DFW 1 1 1 Updating
177+
default DFW 2 2 0 Done
178+
default IAD 1 1 1 Updating
179+
default IAD 2 2 0 Done
180+
181+
Rollout complete in 1m 12s.
182+
```
183+
184+
If the rollout stalls because of a resource or scheduling issue, the output pauses on the affected row and gives an explanation:
185+
186+
```
187+
default IAD 1 1 1 Blocked
188+
189+
2 instances waiting: quota exceeded in IAD
190+
The rollout will resume when quota becomes available.
191+
Ctrl-C to detach — the rollout continues in the background.
192+
```
193+
194+
`Ctrl-C` always detaches from the watch. It never cancels the rollout itself.
195+
196+
Rollout history is accessible at any time:
197+
198+
```
199+
$ datumctl compute rollout history api
200+
201+
REV WHEN IMAGE CHANGES BY STATUS
202+
#8 2m ago ghcr.io/acme/api:1.4.3 image updated alice@acme.io active
203+
#7 3h ago ghcr.io/acme/api:1.4.2 min replicas 2 → 3 ci-deploy —
204+
#6 yesterday ghcr.io/acme/api:1.4.2 LOG_LEVEL info → warn bob@acme.io —
205+
```
206+
207+
To roll back to a previous revision:
208+
209+
```
210+
$ datumctl compute rollout undo api --to-revision=7
211+
Creating revision #9 (copy of #7)...
212+
Rollout started. Run `datumctl compute rollout api` to watch progress.
213+
```
214+
215+
Undo creates a new revision rather than rewriting history — the audit trail stays append-only. The platform retains the 20 most recent revisions per workload; revisions beyond that are no longer available for undo.
216+
217+
### 4. Get logs
218+
219+
`datumctl compute logs` treats the workload as the target, not the individual instance. By default it returns logs across all instances and prefixes each line with the city and instance short name:
220+
221+
```
222+
$ datumctl compute logs api --follow
223+
224+
Tailing logs for workload "api" in DFW, IAD. Ctrl-C to stop.
225+
226+
[DFW/api-dfw-0] 10:14:02 GET /healthz 200 3ms
227+
[IAD/api-iad-1] 10:14:02 GET /v1/users 200 18ms
228+
[DFW/api-dfw-1] 10:14:03 POST /v1/login 401 4ms
229+
[IAD/api-iad-0] 10:14:03 GET /healthz 200 2ms
230+
```
231+
232+
Common filters reduce the output without requiring instance name lookup:
233+
234+
```
235+
$ datumctl compute logs api --city=IAD --follow
236+
$ datumctl compute logs api --since=15m
237+
$ datumctl compute logs api -c worker --follow
238+
```
239+
240+
All filters translate to label selectors against the platform's telemetry system. There is no per-city fan-out — the CLI queries a single endpoint and the label index handles scoping.
241+
242+
### 5. Inspect and debug instances
243+
244+
When something is wrong with a specific instance, `datumctl compute instances` gives a per-instance view across the whole project:
245+
246+
```
247+
$ datumctl compute instances
248+
249+
NAME WORKLOAD CITY EXTERNAL IP INTERNAL IP TYPE AGE STATUS
250+
api-dfw-0 api DFW 203.0.113.10 10.4.1.5 d1-standard-2 2d Running
251+
api-dfw-1 api DFW 203.0.113.11 10.4.1.6 d1-standard-2 2d Running
252+
api-iad-0 api IAD 198.51.100.20 10.5.1.7 d1-standard-2 2d Running
253+
api-iad-1 api IAD 198.51.100.21 10.5.1.8 d1-standard-2 2d Running
254+
worker-dfw-0 worker DFW 203.0.113.30 10.4.1.9 d1-standard-4 6h Running
255+
256+
5 instances — 5 Running, 0 Pending, 0 Failed
257+
```
258+
259+
Pass a workload name to narrow the view:
260+
261+
```
262+
$ datumctl compute instances --workload=api
263+
```
264+
265+
Instances that haven't started show why, inline:
266+
267+
```
268+
api-iad-2 api IAD — — d1-standard-2 30s Pending (quota exceeded)
269+
api-iad-3 api IAD — — d1-standard-2 30s Pending (network provisioning)
270+
```
271+
272+
Drilling into a single instance gives the full picture with actionable context:
273+
274+
```
275+
$ datumctl compute instances describe api-iad-2
276+
277+
Instance api-iad-2
278+
Workload api / default / IAD
279+
Type d1-standard-2
280+
Age 1m 12s
281+
282+
Status Not running — quota exceeded
283+
Requested 4 CPU. 2 CPU available in IAD.
284+
285+
Runtime
286+
Image: ghcr.io/acme/api:1.4.3
287+
Env: DATABASE_URL (from secret), LOG_LEVEL=info
288+
Ports: 8080/tcp
289+
290+
Network Waiting for addresses (not yet scheduled)
291+
292+
Next steps
293+
datumctl compute scale api --min=2
294+
datumctl compute quota
295+
```
296+
297+
---
298+
299+
## Command reference
300+
301+
### Short-form commands (the everyday interface)
302+
303+
```
304+
datumctl compute deploy Deploy or update a workload
305+
datumctl compute status Show health across all cities
306+
datumctl compute instances List all instances (--workload, --city to filter)
307+
datumctl compute logs Stream logs (--workload, --city, --instance, -c/--container)
308+
datumctl compute rollout Watch a rollout in progress
309+
datumctl compute rollout history List recent revisions
310+
datumctl compute rollout undo Roll back to a previous revision
311+
datumctl compute scale Adjust replica counts
312+
datumctl compute restart Restart instances (rolling)
313+
datumctl compute destroy Delete a workload
314+
datumctl compute quota Show project quota usage
315+
```
316+
317+
### Resource commands (for scripting and advanced use)
318+
319+
```
320+
datumctl compute workloads [get | describe | delete | edit]
321+
datumctl compute workloads rollout [status | history | undo]
322+
datumctl compute workloads set image NAME CONTAINER=IMAGE
323+
324+
datumctl compute instances [get | describe | logs]
325+
326+
datumctl compute cities [list | describe]
327+
datumctl compute instance-types [list | describe]
328+
datumctl compute quota [--breakdown | --constrained | --city=CITY]
329+
```
330+

0 commit comments

Comments
 (0)