Skip to content

Commit 078afe6

Browse files
committed
Merge main into metrics migration
2 parents aca9f7f + 544b50f commit 078afe6

152 files changed

Lines changed: 17004 additions & 5464 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude/skills/sesh-mode/SKILL.md

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,88 @@ Stateright DST Tests Production Metrics
4444
(exhaustive) (simulation) (Observability)
4545
```
4646

47+
## STOP, Don't Weaken — Specs are Load-Bearing
48+
49+
Whether the load-bearing claim is a formal property (TLA+ invariant,
50+
Stateright property, DST assertion) or an English-language user
51+
requirement, the rule is the same: **MUST NOT** silently weaken it.
52+
First action is diagnosis and an explicit conversation with the user,
53+
not modification.
54+
55+
### Verification check fails
56+
57+
When a TLA+ invariant, Stateright property, or DST assertion fails, the
58+
failure has exactly two possible causes:
59+
60+
1. **The implementation/model has a real bug.** Fix the bug.
61+
2. **The property is over-strong** — it asserts something the design does not
62+
actually guarantee. *Sometimes* the right answer is to weaken or replace
63+
the property — but the failing trace was probably also revealing the
64+
real, weaker safety property the design *does* guarantee. Almost never
65+
should the answer be just "remove the property."
66+
67+
**Required protocol**:
68+
- Read the failing trace. State out loud what the property meant to claim,
69+
and what the trace shows.
70+
- If unsure whether (1) or (2) applies — **stop and ask the user**.
71+
Do not silently rewrite the property.
72+
- If (2) applies and you propose to weaken/replace, present the candidate
73+
replacement *and* the safety property the original was reaching for, then
74+
ask before changing. The replacement should usually preserve the spirit
75+
via a different formulation (action property, liveness, narrower precondition)
76+
— not just delete the constraint.
77+
78+
**Forbidden** without explicit user approval:
79+
- Renaming an invariant to make its negation trivially true
80+
- Deleting an invariant that just produced a counter-example
81+
- Adding `=> TRUE` or other no-op weakenings
82+
- Changing `\A` to `\E`, `[]` to `<>`, or similar quantifier flips, when the
83+
motive is to suppress a violation rather than to capture a different claim
84+
85+
### User requirement seems hard or out-of-scope
86+
87+
The same rule applies when translating from the user's spec into a plan,
88+
or from the plan into implementation. If a stated requirement looks hard,
89+
expensive, or "more than this PR needs," **stop and ask first** — do not
90+
decide unilaterally that "close enough" is acceptable.
91+
92+
Silent-weakening moves to watch for:
93+
- **Granularity downgrade**: user asked for page-by-page streaming;
94+
plan says "column-chunk granularity is the natural unit." The memory
95+
bound the user actually wanted is gone.
96+
- **Constraint dropping**: user said "must not OOM under load"; plan
97+
treats it as "bounded in the typical case." The qualifier was doing
98+
load-bearing work and you removed it.
99+
- **Strength reduction**: user said "byte-identical metadata"; plan
100+
says "logically equivalent metadata." Test passes; spec is gone.
101+
- **MUST → SHOULD**: user said "one GET per file"; plan allows "two in
102+
the rare retry case." May be reasonable, but only the user can
103+
authorize it.
104+
- **Reframing as out-of-scope**: user described a property as part of
105+
the goal; plan declares it a follow-up PR. If the user didn't say
106+
"split it," you don't get to.
107+
108+
**Required protocol** mirrors the verification case: quote the user's
109+
original phrasing back to yourself; if you cannot point at the source
110+
phrase that authorizes the weaker version, you are not authorized to
111+
ship it. Surface the gap explicitly: *"you asked for X; my plan does
112+
Y; here's why I considered Y; ok to weaken, or do you want X?"* The
113+
default answer is X.
114+
115+
**Plan-document approval is not spec approval.** When the plan you write
116+
diverges from the spec at write time, the user reviews the diverged
117+
plan, not the original requirement. Accountability for the divergence
118+
is on you — they approved what you wrote, not what they asked for.
119+
Re-read the user's original message before writing code.
120+
121+
### Why this matters
122+
123+
Specs — formal or English — describe the system's promise to its
124+
users. When a check fails or a requirement seems too strict, that has
125+
just told you either that the implementation is wrong, or that you've
126+
been claiming a stronger promise than you actually keep. Both deserve
127+
a conscious decision, never a silent edit.
128+
47129
## Testing Through Production Path
48130

49131
**MUST NOT** claim a feature works unless tested through the actual network stack.

.github/CODEOWNERS

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,7 @@
11
# CODEOWNERS — see https://docs.github.com/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners
22
#
3-
# Last matching rule per file wins. Approval from any listed owner satisfies
4-
# the requirement. quickwit-dev is listed on every rule so it can always
5-
# approve; an additional team is listed on the metrics Parquet pipeline paths
6-
# so PRs scoped to those paths can be approved by either team.
3+
# Last matching rule per file wins. Multiple owners listed on the same line
4+
# means approval from ANY ONE of them satisfies the requirement.
75

86
# Default: quickwit-core owns everything
97
* @quickwit-oss/quickwit-core
@@ -13,4 +11,11 @@
1311
/quickwit/quickwit-datafusion/ @quickwit-oss/byoc-metrics
1412
/quickwit/quickwit-df-core/ @quickwit-oss/byoc-metrics
1513
/quickwit/quickwit-dst/ @quickwit-oss/byoc-metrics
16-
/quickwit/quickwit-indexing/src/actors/metrics_pipeline/ @quickwit-oss/byoc-metrics
14+
/quickwit/quickwit-indexing/src/actors/parquet_pipeline/ @quickwit-oss/byoc-metrics
15+
16+
# Shared paths — either team can approve. `docs/internals/` is shared
17+
# architecture + verification docs that any team working on the codebase
18+
# may need to update. `Cargo.lock` churns on routine dependency bumps
19+
# and doesn't carry domain-specific review value.
20+
/docs/internals/ @quickwit-oss/quickwit-core @quickwit-oss/byoc-metrics
21+
/quickwit/Cargo.lock @quickwit-oss/quickwit-core @quickwit-oss/byoc-metrics

.github/workflows/ci.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,6 @@ permissions:
1616

1717
env:
1818
CARGO_INCREMENTAL: 0
19-
QW_DISABLE_TELEMETRY: 1
2019
QW_TEST_DATABASE_URL: postgres://quickwit-dev:quickwit-dev@localhost:5432/quickwit-metastore-dev
2120
RUST_BACKTRACE: 1
2221
RUSTDOCFLAGS: -Dwarnings -Arustdoc::private_intra_doc_links

.github/workflows/coverage.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,6 @@ env:
2020
AWS_SECRET_ACCESS_KEY: "placeholder"
2121
CARGO_INCREMENTAL: 0
2222
PUBSUB_EMULATOR_HOST: "localhost:8681"
23-
QW_DISABLE_TELEMETRY: 1
2423
QW_S3_ENDPOINT: "http://localhost:4566" # Services are exposed as localhost because we are not running coverage in a container.
2524
QW_S3_FORCE_PATH_STYLE_ACCESS: 1
2625
QW_TEST_DATABASE_URL: postgres://quickwit-dev:quickwit-dev@localhost:5432/quickwit-metastore-dev

.github/workflows/datafusion-ci.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,6 @@ permissions:
1818

1919
env:
2020
CARGO_INCREMENTAL: 0
21-
QW_DISABLE_TELEMETRY: 1
2221
RUST_BACKTRACE: 1
2322
RUSTFLAGS: -Dwarnings --cfg tokio_unstable
2423

.github/workflows/dependency.yml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,4 +20,11 @@ jobs:
2020
with:
2121
# GHSA-c38w-74pg-36hr, GHSA-4grx-2x9w-596c: minor vuln on the rsa crate, used for google storage.
2222
# GHSA-cq8v-f236-94qc: rand 0.8.6 unsound with custom logger + rand::rng(), not affected (log feature disabled, transitive dep from fail/sqlx).
23-
allow-ghsas: GHSA-c38w-74pg-36hr,GHSA-4grx-2x9w-596c,GHSA-cq8v-f236-94qc
23+
# GHSA-2f9f-gq7v-9h6m: thrift 0.17 excessive-allocation on untrusted input. Already a transitive
24+
# dep via parquet on main; PR-4 (streaming Parquet reader) adds it as a direct dep for
25+
# `parquet::format::PageHeader::read_from_in_protocol`. Inputs are parquet files we wrote to
26+
# our own object store (trusted source). Defense-in-depth: the streaming reader caps each
27+
# Thrift parse buffer at `max_page_header_bytes` (1 MiB default) BEFORE handing bytes to
28+
# thrift, so the outer buffer thrift can read from is bounded regardless of any inner
29+
# length prefix. No newer thrift release fixes this — 0.17.0 is the latest on crates.io.
30+
allow-ghsas: GHSA-c38w-74pg-36hr,GHSA-4grx-2x9w-596c,GHSA-cq8v-f236-94qc,GHSA-2f9f-gq7v-9h6m

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,3 +32,6 @@ qwdata
3232

3333
# GSD planning artifacts (local only)
3434
.planning
35+
36+
# TLC model checker working directory (state dumps from `tla2tools.jar`)
37+
states/

CLAUDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ See `quickwit/CLAUDE.md` for architecture overview, crate descriptions, and buil
4343
| Recreates futures in `select!` loops | Use `&mut fut` to resume, not recreate — dropping loses data | GAP-002 |
4444
| Holds locks across await points | Invariant violations on cancel. Use message passing or synchronous critical sections | GAP-002 |
4545
| Silently swallows unexpected state | If a condition "shouldn't happen," return an error or assert — don't silently return Ok. Skipping optional/missing data is fine; pretending a bug didn't occur is not | Code quality |
46+
| Replies to PR review comments as standalone inline comments | Use `gh api repos/.../pulls/{pr}/comments/{codex_comment_id}/replies` (or `in_reply_to_id` in the POST body) so the reply is **threaded under** the original review comment. Standalone inline comments at the same line are NOT replies; they show as separate threads. Verify with the GitHub API that `in_reply_to_id` is set on your reply. | Review hygiene |
4647

4748
## Engineering Priority
4849

0 commit comments

Comments
 (0)