Skip to content

Commit 2330024

Browse files
authored
AWS rule hardening - part 3 (#160)
1 parent f7bde0f commit 2330024

10 files changed

Lines changed: 4719 additions & 1106 deletions

cleancloud/providers/aws/rules/ai/bedrock_provisioned_idle.py

Lines changed: 329 additions & 348 deletions
Large diffs are not rendered by default.

cleancloud/providers/aws/rules/rds_snapshot_old.py

Lines changed: 302 additions & 101 deletions
Large diffs are not rendered by default.

cleancloud/providers/aws/rules/untagged_resources.py

Lines changed: 422 additions & 120 deletions
Large diffs are not rendered by default.
Lines changed: 392 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,392 @@
1+
# aws.bedrock.provisioned_throughput.idle — Canonical Rule Specification
2+
3+
## 1. Intent
4+
5+
Detect Amazon Bedrock Provisioned Throughputs in the currently evaluated account/Region that
6+
are currently serving capacity and show **no observed runtime request activity** for the full
7+
configured observation window, so they can be reviewed as potential FinOps cleanup or
8+
rightsizing candidates.
9+
10+
This is a **read-only review-candidate rule**. It is not proof that the throughput is safe to
11+
delete, not proof that no one intends to use it, and not proof that immediate savings are
12+
available if the throughput is under commitment.
13+
14+
---
15+
16+
## 2. AWS API Grounding
17+
18+
Based on official Amazon Bedrock control-plane, runtime, monitoring, pricing, and CloudWatch
19+
API documentation.
20+
21+
### Key AWS facts
22+
23+
1. `ListProvisionedModelThroughputs` is the canonical Bedrock control-plane inventory API for
24+
Provisioned Throughputs in the account and supports pagination.
25+
2. `ListProvisionedModelThroughputs` can filter by `statusEquals`.
26+
3. `ProvisionedModelSummary.status` valid values are `Creating`, `InService`, `Updating`,
27+
and `Failed`.
28+
4. `ProvisionedModelSummary.creationTime` is documented and required in list responses.
29+
5. `ProvisionedModelSummary.provisionedModelArn` is documented and required in list
30+
responses.
31+
6. `ProvisionedModelSummary.provisionedModelName` is documented and required in list
32+
responses.
33+
7. `ProvisionedModelSummary.modelArn`, `foundationModelArn`, `modelUnits`,
34+
`desiredModelUnits`, `commitmentDuration`, and `commitmentExpirationTime` are documented
35+
control-plane fields.
36+
8. Bedrock Provisioned Throughput is billed **hourly** until deleted.
37+
9. Bedrock user-guide documentation states you can purchase Provisioned Throughput with
38+
commitment options including no commitment, one month, and six months.
39+
10. Bedrock user-guide documentation states you **can't delete a Provisioned Throughput by
40+
Model Units with commitment before the commitment term is complete**.
41+
11. Bedrock runtime APIs (`InvokeModel`, `Converse`) allow the request `modelId` to be the
42+
ARN of a Provisioned Throughput.
43+
12. Bedrock runtime metrics are published in CloudWatch namespace `AWS/Bedrock`.
44+
13. Bedrock monitoring docs state runtime metrics use dimension `ModelId`.
45+
14. Bedrock monitoring docs define:
46+
- `Invocations` = number of successful runtime requests
47+
- `InvocationClientErrors` = number of invocation client-side errors
48+
- `InvocationServerErrors` = number of invocation server-side errors
49+
- `InvocationThrottles` = number of throttled invocation requests
50+
15. CloudWatch `GetMetricStatistics` requires the exact metric/dimension combination that was
51+
published, uses inclusive `StartTime` and exclusive `EndTime`, rounds `StartTime`, does
52+
not guarantee datapoint order, and enforces retention/period constraints.
53+
16. Bedrock pricing docs do **not** publish a canonical per-Model-Unit monthly USD table for
54+
Provisioned Throughput in the fetched documentation; pricing depends on model, units,
55+
Region, and commitment term, and some providers direct customers to their account team.
56+
57+
### Implications
58+
59+
- Only `InService` Provisioned Throughputs are eligible.
60+
- Age thresholding is supportable because `creationTime` is documented.
61+
- Runtime-activity evidence must be grounded in documented CloudWatch metrics under
62+
`AWS/Bedrock`.
63+
- The documented request identifier for invoking a Provisioned Throughput is the
64+
`provisionedModelArn`; this is the canonical `ModelId` dimension value for this rule.
65+
- `Invocations` alone does **not** cover failed or throttled request attempts; if the rule
66+
wants “no observed runtime request activity,” it must also consider error/throttle metrics.
67+
- Missing CloudWatch datapoints are **not** documented as zero activity and must not be
68+
interpreted as zero by default.
69+
- `estimated_monthly_cost_usd = null`.
70+
71+
---
72+
73+
## 3. Scope and Terminology
74+
75+
- **"Provisioned Throughput"** — an item returned by `ListProvisionedModelThroughputs`.
76+
- **"idle"** — no observed Bedrock runtime request activity via the documented CloudWatch
77+
metrics contract for the full configured observation window.
78+
- **"runtime request activity"** — any observed positive value in one or more of:
79+
`Invocations`, `InvocationClientErrors`, `InvocationServerErrors`, `InvocationThrottles`.
80+
- **`idle_days_threshold`** — operator-configurable threshold, default `7`.
81+
- **`observation_window_start_utc = now_utc − idle_days_threshold × 86400 seconds`**
82+
- **`observation_window_end_utc = now_utc`**
83+
- **`age_days = floor((now_utc − creation_time_utc) / 86400 seconds)`**
84+
85+
### Included
86+
87+
- Provisioned Throughputs in the currently evaluated Region/account
88+
- `status == "InService"`
89+
- `age_days >= idle_days_threshold`
90+
- full required CloudWatch activity evidence available
91+
- all required activity metrics show zero observed activity in the observation window
92+
93+
### Excluded
94+
95+
- `Creating`, `Updating`, `Failed`
96+
- missing or invalid stable identity
97+
- missing or invalid `creationTime`
98+
- too new to evaluate (`age_days < idle_days_threshold`)
99+
- missing CloudWatch datapoints for any required activity metric
100+
- any observed runtime request activity
101+
102+
---
103+
104+
## 4. Canonical Rule Statement
105+
106+
A Provisioned Throughput is eligible only when **all** of the following are true:
107+
108+
- stable Provisioned Throughput identity exists
109+
- `status == "InService"`
110+
- `creationTime` is valid and not in the future
111+
- `age_days >= idle_days_threshold`
112+
- required Bedrock runtime activity metrics are available under
113+
`ModelId = provisionedModelArn`
114+
- all required activity metrics sum to zero over the observation window
115+
116+
No additional predicate may be required for baseline eligibility, including:
117+
118+
- model family
119+
- custom-vs-foundation model type
120+
- commitment duration
121+
- commitment expiration
122+
- model units / desired model units
123+
- inferred pricing band
124+
- tags
125+
- foundation model ARN presence
126+
127+
---
128+
129+
## 5. Normalization Contract
130+
131+
All rule logic must operate on normalized fields only.
132+
133+
| Canonical field | Source field | Absent / invalid |
134+
|---|---|---|
135+
| `resource_id` | `provisionedModelArn` | skip item |
136+
| `provisioned_model_arn` | `provisionedModelArn` | skip item |
137+
| `provisioned_model_name` | `provisionedModelName` | null |
138+
| `normalized_status` | `status` | skip item |
139+
| `creation_time_utc` | `creationTime` (tz-aware UTC) | skip item |
140+
| `age_days` | floor((now − creation_time_utc) / 86400) | skip item |
141+
| `model_arn` | `modelArn` | null |
142+
| `foundation_model_arn` | `foundationModelArn` | null |
143+
| `model_units` | `modelUnits` (int only) | null |
144+
| `desired_model_units` | `desiredModelUnits` (int only) | null |
145+
| `commitment_duration` | `commitmentDuration` | null |
146+
| `commitment_expiration_time_utc` | `commitmentExpirationTime` (tz-aware UTC) | null |
147+
| `last_modified_time_utc` | `lastModifiedTime` (tz-aware UTC) | null |
148+
149+
### Normalization requirements
150+
151+
- String-valued identifiers must normalize only from non-empty strings.
152+
- Timestamp fields must be timezone-aware UTC before use; naive → skip item for required
153+
timestamps, null for contextual timestamps.
154+
- Future `creationTime` → skip item.
155+
- `resource_id` must be the `provisionedModelArn`, not the friendly name.
156+
157+
---
158+
159+
## 6. Idle-Activity Determination
160+
161+
CloudWatch is the **sole trusted runtime-activity source** for this rule.
162+
163+
### Required CloudWatch contract
164+
165+
| Field | Value |
166+
|---|---|
167+
| Namespace | `AWS/Bedrock` |
168+
| Dimension | `ModelId = provisionedModelArn` |
169+
| Statistics | `Sum` |
170+
| Period | `idle_days_threshold × 86400` (satisfies CloudWatch retention constraints) |
171+
172+
### Required metrics
173+
174+
1. `Invocations`
175+
2. `InvocationClientErrors`
176+
3. `InvocationServerErrors`
177+
4. `InvocationThrottles`
178+
179+
### Interpretation rules
180+
181+
- If any required metric returns datapoints with `Sum > 0`**not idle****SKIP ITEM**
182+
- The Provisioned Throughput is idle only when **all required metrics return datapoints** and
183+
all observed `Sum` values are exactly `0`
184+
185+
### Datapoint completeness
186+
187+
- Missing datapoints **must not** be interpreted as zero runtime activity
188+
- If any required metric returns no datapoints → **SKIP ITEM** (insufficient evidence)
189+
- If retrieval of any required metric fails → **FAIL RULE**
190+
191+
### Semantic boundary
192+
193+
- This rule detects **no observed Bedrock runtime request activity**, not “no business value”
194+
- `Invocations` covers successful requests only; the error/throttle metrics are required so
195+
that failed/throttled attempts are still treated as observed activity
196+
197+
---
198+
199+
## 7. Pricing / Commitment Boundary
200+
201+
- `estimated_monthly_cost_usd = null`
202+
203+
### Mandatory rules
204+
205+
- MUST NOT emit a fixed per-MU monthly estimate from the fetched AWS docs
206+
- MUST NOT infer immediate savings from idle state alone
207+
- MAY surface `model_units`, `desired_model_units`, `commitment_duration`, and
208+
`commitment_expiration_time` as context only
209+
210+
### Required caveats
211+
212+
- Billing continues until the Provisioned Throughput is deleted
213+
- Committed Model Unit Provisioned Throughputs may not be deletable before term completion
214+
- Idle state does not necessarily mean the cost is immediately avoidable
215+
216+
---
217+
218+
## 8. Deterministic Evaluation Order
219+
220+
1. Retrieve and fully paginate `ListProvisionedModelThroughputs(statusEquals="InService")`
221+
2. Normalize each item
222+
3. For each normalized item:
223+
- `provisioned_model_arn` absent → **SKIP ITEM**
224+
- `normalized_status` absent or not `InService`**SKIP ITEM**
225+
- `creation_time_utc` absent/invalid/future → **SKIP ITEM**
226+
- `age_days < idle_days_threshold`**SKIP ITEM**
227+
- retrieve all required CloudWatch activity metrics using `ModelId = provisionedModelArn`
228+
- any required metric retrieval failure → **FAIL RULE**
229+
- any required metric has no datapoints → **SKIP ITEM**
230+
- any required metric has `Sum > 0`**SKIP ITEM**
231+
- otherwise → **EMIT**
232+
233+
---
234+
235+
## 9. Exclusion Rules
236+
237+
1. `provisioned_model_arn` absent → malformed identity
238+
2. `normalized_status` absent → missing current-state signal
239+
3. `normalized_status != "InService"` → not currently serving provisioned capacity
240+
4. `creation_time_utc` absent/naive/future → invalid age source
241+
5. `age_days < idle_days_threshold` → too new
242+
6. any required CloudWatch activity metric has no datapoints → insufficient trusted evidence
243+
7. any required CloudWatch activity metric has positive observed activity → not idle
244+
245+
---
246+
247+
## 10. Failure Model
248+
249+
### Rule-level failures (FAIL RULE)
250+
251+
- `ListProvisionedModelThroughputs` request/pagination failure
252+
- `GetMetricStatistics` failure for any required activity metric
253+
- permission failure for required Bedrock or CloudWatch APIs
254+
255+
### Item-level skips (SKIP ITEM)
256+
257+
- malformed identity or creation time
258+
- non-`InService` state
259+
- too new
260+
- insufficient CloudWatch datapoints
261+
- observed runtime request activity
262+
263+
---
264+
265+
## 11. Confidence Model
266+
267+
| Condition | Confidence |
268+
|---|---|
269+
| All required activity metrics present and all sums zero over full window | `HIGH` |
270+
271+
**Mandatory rule:** use `HIGH` confidence. The finding is based on direct control-plane
272+
status plus direct runtime-activity metrics with full required metric coverage.
273+
274+
---
275+
276+
## 12. Risk Model
277+
278+
| Condition | Risk |
279+
|---|---|
280+
| Finding emitted | `HIGH` |
281+
282+
**Mandatory rule:** use `HIGH` risk. Provisioned Throughput is dedicated always-on Bedrock
283+
capacity that continues billing while serving no observed runtime requests.
284+
285+
---
286+
287+
## 13. Evidence / Details Contract
288+
289+
### Required details fields
290+
291+
Each emitted finding should include, at minimum:
292+
293+
```text
294+
evaluation_path = "idle-bedrock-provisioned-throughput-review-candidate"
295+
provisioned_model_arn
296+
provisioned_model_name
297+
normalized_status = "InService"
298+
creation_time
299+
age_days
300+
idle_days_threshold
301+
model_arn
302+
foundation_model_arn
303+
model_units
304+
desired_model_units
305+
commitment_duration
306+
commitment_expiration_time
307+
activity_metrics_checked
308+
```
309+
310+
### Required `activity_metrics_checked`
311+
312+
```text
313+
["Invocations", "InvocationClientErrors", "InvocationServerErrors", "InvocationThrottles"]
314+
```
315+
316+
### Required evidence wording
317+
318+
Signals used should state:
319+
320+
- Provisioned Throughput is `InService`
321+
- required Bedrock runtime activity metrics were queried under `ModelId = provisionedModelArn`
322+
- no observed runtime request activity was present over the configured window
323+
324+
Signals not checked should state major blind spots, such as:
325+
326+
- whether the throughput is intentionally kept warm for failover or rare batch windows
327+
- whether a commitment term prevents immediate deletion
328+
- whether future traffic is expected soon
329+
- application/business criticality
330+
- exact current pricing and immediate avoidable savings
331+
332+
---
333+
334+
## 14. Non-goals / Blind Spots
335+
336+
This rule does **not** prove any of the following:
337+
338+
- that the Provisioned Throughput is safe to delete
339+
- that the Provisioned Throughput is not intentionally reserved for future or failover use
340+
- that immediate savings are available despite commitment constraints
341+
- that no one attempted model usage outside the observation window
342+
- that there are no operational dependencies on the provisioned ARN
343+
344+
---
345+
346+
## 15. API and IAM Contract
347+
348+
### Required APIs
349+
350+
- `bedrock:ListProvisionedModelThroughputs`
351+
- `cloudwatch:GetMetricStatistics`
352+
353+
### Mandatory API usage rules
354+
355+
- `ListProvisionedModelThroughputs` must be paginated
356+
- inventory should be filtered to `statusEquals="InService"` or equivalently excluded later
357+
- `GetMetricStatistics` must query the exact published dimension combination
358+
- this rule must use `ModelId = provisionedModelArn`
359+
- undocumented fallback metric dimensions (for example foundation-model IDs) must not be
360+
required for canonical correctness
361+
362+
---
363+
364+
## 16. Acceptance Scenarios
365+
366+
### Must emit
367+
368+
1. `InService` Provisioned Throughput older than threshold, all 4 required metrics have
369+
datapoints, and all sums are `0`
370+
371+
### Must skip
372+
373+
2. `Creating` Provisioned Throughput
374+
3. `Updating` Provisioned Throughput
375+
4. `Failed` Provisioned Throughput
376+
5. `InService` Provisioned Throughput younger than threshold
377+
6. malformed item without `provisionedModelArn`
378+
7. malformed item with missing/invalid/future `creationTime`
379+
8. any required activity metric returns no datapoints
380+
9. `Invocations` has `Sum > 0`
381+
10. `InvocationClientErrors` has `Sum > 0`
382+
11. `InvocationServerErrors` has `Sum > 0`
383+
12. `InvocationThrottles` has `Sum > 0`
384+
385+
### Must fail
386+
387+
13. `ListProvisionedModelThroughputs` request/pagination failure
388+
14. any required `GetMetricStatistics` request failure
389+
390+
---
391+
392+
Rule: aws.bedrock.provisioned_throughput.idle

0 commit comments

Comments
 (0)