Rubric Gap Analysis

| Field | Value |
|-------|-------|
| Type | Rule Enhancement (AK-regression investigation) |
| Target Rule | `query-parameterize` |
| Severity | HIGH |
| Source | SCOPE Python Flask Webstore on Azure — V-A, Phase 3 (75 runs / 15 profiles) |
| Labels | `enhancement`, `SCOPE`, `agent-kit`, `rule:query` |

## Summary

`query-parameterize` is the only rule in the Python Flask Webstore eval where **AK profiles perform worse than control**: AK 17/30 (57%) vs control 31/45 (69%), Δ −12 pp. This inverts the expected skill-improves-outcomes signal and suggests the current rule example either (a) under-covers a Python dynamic-filter pattern the skill nudges agents toward, or (b) encourages composition via f-strings for non-user-input fragments that then get reused with user-input fragments. Every AK-profile failure site uses Python f-strings to assemble SQL — including column/filter-list fragments that mix safe identifiers with user-controlled values.

## Rubric Gap Analysis

- **Scenario**: Flask routes with dynamic filters (search, category, price range). Agents implement these as query builders that concatenate SQL fragments.
- **Method**: All 75 L5 snapshots grepped for Python f-strings / `str.format()` appearing in the same statement as `SELECT`, `WHERE`, `FROM c`, or `query_items(query=...)`. Sites manually triaged into: (1) log-message-only false positives (e.g., `f"Error …{str(e)}"`), (2) true anti-patterns, (3) mixed (parameterized user input but f-string-composed filter structure).
- **Measured**:
  - 14 f-string-in-query sites across the 75 runs, 8 are true (or borderline) anti-patterns concentrated in AK profiles (P04, P06) and mixed profiles (P11, P13).
  - Raw compliance: 48/75 (64%) overall, AK 17/30 (57%), control 31/45 (69%), Δ **−12 pp**.
- **Signal**: AK profiles are more likely to attempt dynamic multi-field filtering (they implement richer product-search UX). The current rule shows a simple "1 param → 1 value" example and does not cover the **multi-filter AND-join pattern** that is the AK regression's common shape.

## Evidence

### Existing Rule Coverage

The current `query-parameterize.md` rule is **C#-only**. Verbatim excerpts below (no Python or Node.js content in the rule as of 2026-04-20):

> Always use parameterized queries instead of string concatenation. This prevents injection attacks and enables query plan caching.

Single-parameter example (C#):

```csharp
var query = new QueryDefinition("SELECT * FROM c WHERE c.userId = @userId")
    .WithParameter("@userId", userId);
```

Multi-parameter example (C#):

```csharp
var query = new QueryDefinition(@"
    SELECT * FROM c 
    WHERE c.customerId = @customerId 
    AND c.status = @status
    AND c.orderDate >= @startDate")
    .WithParameter("@customerId", customerId)
    .WithParameter("@status", "active")
    .WithParameter("@startDate", startDate);
```

Array parameter for `IN` / `ARRAY_CONTAINS` (C#) and a LINQ note ("SDK handles parameterization automatically") round out the rule.

**Gaps for Python agents:**

1. **No Python code at all.** Python agents must transpose the C# `QueryDefinition(...).WithParameter(...)` shape to `container.query_items(query=..., parameters=[{"name": ..., "value": ...}])`. The rule does not show this.
2. **No dynamic / variadic-WHERE example in any language.** The multi-parameter C# example hard-codes three fixed predicates. Agents building a Flask route with *optional* filters (`category`, `min_price`, `max_price`) must invent the composition shape themselves — and they choose f-strings.
3. **No guidance on structure-vs-values trust boundary.** The rule's "benefits" list mentions injection but does not make explicit that **values** go through parameters while **predicate structure** must come from a closed whitelist (never external input).

### Missing Anti-Pattern / Finding

Anonymized from AK profile P04 (run 2 and run 5 both exhibit this; multi-filter WHERE assembled from a list of fragments):

```python
# ❌ Anti-pattern — safe identifiers assembled with f-string, user values parameterized separately
def search_products(category, min_price, max_price):
    filters = []
    params = []
    if category:
        filters.append("c.categoryId = @cat")
        params.append({"name": "@cat", "value": category})
    if min_price is not None:
        filters.append("c.price >= @minP")
        params.append({"name": "@minP", "value": min_price})
    if max_price is not None:
        filters.append("c.price <= @maxP")
        params.append({"name": "@maxP", "value": max_price})

    # ❌ f-string composes fragment list into final SQL
    query = f"SELECT * FROM c WHERE {' AND '.join(filters)} ORDER BY c.updatedAt DESC"
    return list(container.query_items(query=query, parameters=params))
```

And from AK profile P06 (R03):

```python
# ❌ Extends a filter string with an externally-supplied fragment
where = f"WHERE c._type = @_type"
if extra_filters:
    where += f" AND {extra_filters}"   # ❌ fragment-level trust boundary
```

Why this recurs in AK profiles: the `query-parameterize` rule nudges the agent to parameterize **values**, but says nothing about how to safely **compose WHERE clauses from a variable set of predicates**. The agent correctly parameterizes each value and then builds the predicate structure with f-strings — satisfying the letter of the rule but introducing a composition anti-pattern.

### Correct Pattern

```python
# ✅ Build both predicates and parameters from a whitelisted (field, param_name, op) map.
# Never interpolate field names or operators from external input.
_SEARCH_FIELDS = {
    "category":  ("c.categoryId", "=",  "@cat"),
    "min_price": ("c.price",      ">=", "@minP"),
    "max_price": ("c.price",      "<=", "@maxP"),
}

def search_products(category=None, min_price=None, max_price=None):
    predicates, params = [], []
    values = {"category": category, "min_price": min_price, "max_price": max_price}
    for key, value in values.items():
        if value is None:
            continue
        field, op, placeholder = _SEARCH_FIELDS[key]  # ✅ whitelisted
        predicates.append(f"{field} {op} {placeholder}")
        params.append({"name": placeholder, "value": value})

    where = f"WHERE {' AND '.join(predicates)}" if predicates else ""
    query = f"SELECT c.id, c.name, c.price, c.categoryId FROM c {where} ORDER BY c.updatedAt DESC"

    # Pass partition_key only when we have a concrete value; omit otherwise to
    # allow cross-partition. Passing partition_key=None would filter on the
    # literal None/null PK value, not enable cross-partition.
    kwargs = {"query": query, "parameters": params}
    if category is not None:
        kwargs["partition_key"] = category
    else:
        kwargs["enable_cross_partition_query"] = True
    return list(container.query_items(**kwargs))
```

Key properties:

1. **Predicate structure is authored from a closed enum** (`_SEARCH_FIELDS`) — never from external input, never extended by callers.
2. **Values are the only interpolation** and they go through `parameters=`.
3. Pairs naturally with `query-use-projections` (dedicated column list) and `query-avoid-cross-partition` (partition key when available).

## Runtime Validation (Phase 4a / 6a)

- [ ] PoC: `versionA/P04/R02/_scratch/query-parameterize-regression/`
  - Unit test that asserts the anti-pattern `f"WHERE {' AND '.join(filters)}"` emits valid SQL for a controlled filter list, then asserts the same builder with a tampered filter string produces an injection string.
  - Second test asserts the whitelisted builder above **rejects** unknown keys (raises `KeyError`) and produces the expected parameter array.
- [ ] Integration test against emulator: exercise a search endpoint with `category` + `min_price` + `max_price` combinations; record RU and row counts for both implementations.

## Rule Contradiction Scan

| Rule | Interaction |
|---|---|
| `query-parameterize` | Primary target — enhancement adds variadic/dynamic-filter pattern. No contradiction. |
| `query-use-projections` | Correct pattern uses explicit columns — reinforces #116 / `query-use-projections`. |
| `query-avoid-cross-partition` | Correct pattern shows `partition_key=` where applicable. No contradiction. |
| `query-top-literal` | Unaffected — no TOP clause in this shape. |
| Phase 5 sibling draft `issue-1` (identity auth) | No interaction. |
| Phase 5 sibling draft `issue-3` (monitoring-diagnostic-logs) | No interaction. |

## Documentation Cross-reference

- Microsoft Learn — [Parameterized queries in Azure Cosmos DB for NoSQL](https://learn.microsoft.com/azure/cosmos-db/nosql/query/parameterized-queries)
- Microsoft Learn — [NoSQL query WHERE clause](https://learn.microsoft.com/azure/cosmos-db/nosql/query/where)
- OWASP — [SQL Injection Prevention Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/SQL_Injection_Prevention_Cheat_Sheet.html) (principle: parameterize values, whitelist structure).

## Recommended cosmosdb-agent-kit Fix

Two additions to `query-parameterize.md`:

**A. Add a Python example** mirroring the existing C# single-parameter and multi-parameter examples — since the rule is currently C#-only:

```python
# Single parameter
query = "SELECT * FROM c WHERE c.userId = @userId"
params = [{"name": "@userId", "value": user_id}]
items = list(container.query_items(query=query, parameters=params))

# Multi-parameter
query = ("SELECT * FROM c "
         "WHERE c.customerId = @customerId "
         "AND c.status = @status "
         "AND c.orderDate >= @startDate")
params = [
    {"name": "@customerId", "value": customer_id},
    {"name": "@status",     "value": "active"},
    {"name": "@startDate",  "value": start_date},
]
```

**B. Add a new "Dynamic / multi-filter WHERE clauses" sub-section** after the single-field example (applies to all languages; shown in Python since that is the regression's shape):

```diff
+### Dynamic / multi-filter WHERE clauses (Python)
+
+When the query has a variable set of filters, parameterize values AND keep
+predicate structure in a **closed whitelist** — never compose predicates from
+external input.
+
+```python
+_FIELDS = {
+    "category":  ("c.categoryId", "=",  "@cat"),
+    "min_price": ("c.price",      ">=", "@minP"),
+    "max_price": ("c.price",      "<=", "@maxP"),
+}
+
+def search(category=None, min_price=None, max_price=None):
+    predicates, params = [], []
+    for key, value in {"category": category, "min_price": min_price, "max_price": max_price}.items():
+        if value is None: continue
+        field, op, placeholder = _FIELDS[key]
+        predicates.append(f"{field} {op} {placeholder}")
+        params.append({"name": placeholder, "value": value})
+
+    where = f"WHERE {' AND '.join(predicates)}" if predicates else ""
+    kwargs = {"query": f"SELECT c.id, c.name, c.price FROM c {where}", "parameters": params}
+    if category is not None:
+        kwargs["partition_key"] = category
+    else:
+        kwargs["enable_cross_partition_query"] = True
+    return list(container.query_items(**kwargs))
+```
+
+Anti-pattern to reject:
+
+```python
+# ❌ Filter string extended with externally-supplied fragment
+where = "WHERE c.type = @type"
+if extra_filters:
+    where += f" AND {extra_filters}"   # trust-boundary violation
+```
```

## Cross-Eval Check

Searches performed 2026-04-20:

- `/memories/scope-issue-tracking.md`, `/memories/scope-ak-issue-tracking.md` — no prior filings on `query-parameterize`.
- Filed open (#102, #104–#107, #116–#120) and filed closed (#28–#57, #101) — none touch `query-parameterize`.
- Other evals' drafts folders — no overlapping title.
- Result: **Novel** — no overlap. Proceed to draft. Phase 4a gate pending.


Rule	Interaction
`query-parameterize`	Primary target — enhancement adds variadic/dynamic-filter pattern. No contradiction.
`query-use-projections`	Correct pattern uses explicit columns — reinforces #116 / `query-use-projections`.
`query-avoid-cross-partition`	Correct pattern shows `partition_key=` where applicable. No contradiction.
`query-top-literal`	Unaffected — no TOP clause in this shape.
Phase 5 sibling draft `issue-1` (identity auth)	No interaction.
Phase 5 sibling draft `issue-3` (monitoring-diagnostic-logs)	No interaction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rubric Gap Analysis — `query-parameterize` enhancement: AK regression on Python Flask (−12 pp vs control) #128

Summary

Evidence

Existing Rule Coverage

Missing Anti-Pattern / Finding

Correct Pattern

Runtime Validation (Phase 4a / 6a)

Rule Contradiction Scan

Documentation Cross-reference

Recommended cosmosdb-agent-kit Fix

Cross-Eval Check

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Field	Value
Type	Rule Enhancement (AK-regression investigation)
Target Rule	`query-parameterize`
Severity	HIGH
Source	SCOPE Python Flask Webstore on Azure — V-A, Phase 3 (75 runs / 15 profiles)
Labels	`enhancement`, `SCOPE`, `agent-kit`, `rule:query`

Rubric Gap Analysis — query-parameterize enhancement: AK regression on Python Flask (−12 pp vs control) #128

Description

Summary