-
Notifications
You must be signed in to change notification settings - Fork 3
chore: Add Claude Code agent skills and update CLAUDE.md #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,234 @@ | ||
| --- | ||
| name: clickhouse-best-practices | ||
| description: MUST USE when reviewing ClickHouse schemas, queries, or configurations. Contains 28 rules that MUST be checked before providing recommendations. Always read relevant rule files and cite specific rules in responses. | ||
| license: Apache-2.0 | ||
| metadata: | ||
| author: ClickHouse Inc | ||
| version: "0.3.0" | ||
| --- | ||
|
|
||
| # ClickHouse Best Practices | ||
|
|
||
| Comprehensive guidance for ClickHouse covering schema design, query optimization, and data ingestion. Contains 28 rules across 3 main categories (schema, query, insert), prioritized by impact. | ||
|
|
||
| > **Official docs:** [ClickHouse Best Practices](https://clickhouse.com/docs/best-practices) | ||
|
|
||
| ## IMPORTANT: How to Apply This Skill | ||
|
|
||
| **Before answering ClickHouse questions, follow this priority order:** | ||
|
|
||
| 1. **Check for applicable rules** in the `rules/` directory | ||
| 2. **If rules exist:** Apply them and cite them in your response using "Per `rule-name`..." | ||
| 3. **If no rule exists:** Use the LLM's ClickHouse knowledge or search documentation | ||
| 4. **If uncertain:** Use web search for current best practices | ||
| 5. **Always cite your source:** rule name, "general ClickHouse guidance", or URL | ||
|
|
||
| **Why rules take priority:** ClickHouse has specific behaviors (columnar storage, sparse indexes, merge tree mechanics) where general database intuition can be misleading. The rules encode validated, ClickHouse-specific guidance. | ||
|
|
||
| ### For Formal Reviews | ||
|
|
||
| When performing a formal review of schemas, queries, or data ingestion: | ||
|
|
||
| --- | ||
|
|
||
| ## Review Procedures | ||
|
|
||
| ### For Schema Reviews (CREATE TABLE, ALTER TABLE) | ||
|
|
||
| **Read these rule files in order:** | ||
|
|
||
| 1. `rules/schema-pk-plan-before-creation.md` - ORDER BY is immutable | ||
| 2. `rules/schema-pk-cardinality-order.md` - Column ordering in keys | ||
| 3. `rules/schema-pk-prioritize-filters.md` - Filter column inclusion | ||
| 4. `rules/schema-types-native-types.md` - Proper type selection | ||
| 5. `rules/schema-types-minimize-bitwidth.md` - Numeric type sizing | ||
| 6. `rules/schema-types-lowcardinality.md` - LowCardinality usage | ||
| 7. `rules/schema-types-avoid-nullable.md` - Nullable vs DEFAULT | ||
| 8. `rules/schema-partition-low-cardinality.md` - Partition count limits | ||
| 9. `rules/schema-partition-lifecycle.md` - Partitioning purpose | ||
|
|
||
| **Check for:** | ||
| - [ ] PRIMARY KEY / ORDER BY column order (low-to-high cardinality) | ||
| - [ ] Data types match actual data ranges | ||
| - [ ] LowCardinality applied to appropriate string columns | ||
| - [ ] Partition key cardinality bounded (100-1,000 values) | ||
| - [ ] ReplacingMergeTree has version column if used | ||
|
|
||
| ### For Query Reviews (SELECT, JOIN, aggregations) | ||
|
|
||
| **Read these rule files:** | ||
|
|
||
| 1. `rules/query-join-choose-algorithm.md` - Algorithm selection | ||
| 2. `rules/query-join-filter-before.md` - Pre-join filtering | ||
| 3. `rules/query-join-use-any.md` - ANY vs regular JOIN | ||
| 4. `rules/query-index-skipping-indices.md` - Secondary index usage | ||
| 5. `rules/schema-pk-filter-on-orderby.md` - Filter alignment with ORDER BY | ||
|
|
||
| **Check for:** | ||
| - [ ] Filters use ORDER BY prefix columns | ||
| - [ ] JOINs filter tables before joining (not after) | ||
| - [ ] Correct JOIN algorithm for table sizes | ||
| - [ ] Skipping indices for non-ORDER BY filter columns | ||
|
|
||
| ### For Insert Strategy Reviews (data ingestion, updates, deletes) | ||
|
|
||
| **Read these rule files:** | ||
|
|
||
| 1. `rules/insert-batch-size.md` - Batch sizing requirements | ||
| 2. `rules/insert-mutation-avoid-update.md` - UPDATE alternatives | ||
| 3. `rules/insert-mutation-avoid-delete.md` - DELETE alternatives | ||
| 4. `rules/insert-async-small-batches.md` - Async insert usage | ||
| 5. `rules/insert-optimize-avoid-final.md` - OPTIMIZE TABLE risks | ||
|
|
||
| **Check for:** | ||
| - [ ] Batch size 10K-100K rows per INSERT | ||
| - [ ] No ALTER TABLE UPDATE for frequent changes | ||
| - [ ] ReplacingMergeTree or CollapsingMergeTree for update patterns | ||
| - [ ] Async inserts enabled for high-frequency small batches | ||
|
|
||
| --- | ||
|
|
||
| ## Output Format | ||
|
|
||
| Structure your response as follows: | ||
|
|
||
| ``` | ||
| ## Rules Checked | ||
| - `rule-name-1` - Compliant / Violation found | ||
| - `rule-name-2` - Compliant / Violation found | ||
| ... | ||
|
|
||
| ## Findings | ||
|
|
||
| ### Violations | ||
| - **`rule-name`**: Description of the issue | ||
| - Current: [what the code does] | ||
| - Required: [what it should do] | ||
| - Fix: [specific correction] | ||
|
|
||
| ### Compliant | ||
| - `rule-name`: Brief note on why it's correct | ||
|
|
||
| ## Recommendations | ||
| [Prioritized list of changes, citing rules] | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Rule Categories by Priority | ||
|
|
||
| | Priority | Category | Impact | Prefix | Rule Count | | ||
| |----------|----------|--------|--------|------------| | ||
| | 1 | Primary Key Selection | CRITICAL | `schema-pk-` | 4 | | ||
| | 2 | Data Type Selection | CRITICAL | `schema-types-` | 5 | | ||
| | 3 | JOIN Optimization | CRITICAL | `query-join-` | 5 | | ||
| | 4 | Insert Batching | CRITICAL | `insert-batch-` | 1 | | ||
| | 5 | Mutation Avoidance | CRITICAL | `insert-mutation-` | 2 | | ||
| | 6 | Partitioning Strategy | HIGH | `schema-partition-` | 4 | | ||
| | 7 | Skipping Indices | HIGH | `query-index-` | 1 | | ||
| | 8 | Materialized Views | HIGH | `query-mv-` | 2 | | ||
| | 9 | Async Inserts | HIGH | `insert-async-` | 2 | | ||
| | 10 | OPTIMIZE Avoidance | HIGH | `insert-optimize-` | 1 | | ||
| | 11 | JSON Usage | MEDIUM | `schema-json-` | 1 | | ||
|
|
||
| --- | ||
|
|
||
| ## Quick Reference | ||
|
|
||
| ### Schema Design - Primary Key (CRITICAL) | ||
|
|
||
| - `schema-pk-plan-before-creation` - Plan ORDER BY before table creation (immutable) | ||
| - `schema-pk-cardinality-order` - Order columns low-to-high cardinality | ||
| - `schema-pk-prioritize-filters` - Include frequently filtered columns | ||
| - `schema-pk-filter-on-orderby` - Query filters must use ORDER BY prefix | ||
|
|
||
| ### Schema Design - Data Types (CRITICAL) | ||
|
|
||
| - `schema-types-native-types` - Use native types, not String for everything | ||
| - `schema-types-minimize-bitwidth` - Use smallest numeric type that fits | ||
| - `schema-types-lowcardinality` - LowCardinality for <10K unique strings | ||
| - `schema-types-enum` - Enum for finite value sets with validation | ||
| - `schema-types-avoid-nullable` - Avoid Nullable; use DEFAULT instead | ||
|
|
||
| ### Schema Design - Partitioning (HIGH) | ||
|
|
||
| - `schema-partition-low-cardinality` - Keep partition count 100-1,000 | ||
| - `schema-partition-lifecycle` - Use partitioning for data lifecycle, not queries | ||
| - `schema-partition-query-tradeoffs` - Understand partition pruning trade-offs | ||
| - `schema-partition-start-without` - Consider starting without partitioning | ||
|
|
||
| ### Schema Design - JSON (MEDIUM) | ||
|
|
||
| - `schema-json-when-to-use` - JSON for dynamic schemas; typed columns for known | ||
|
|
||
| ### Query Optimization - JOINs (CRITICAL) | ||
|
|
||
| - `query-join-choose-algorithm` - Select algorithm based on table sizes | ||
| - `query-join-use-any` - ANY JOIN when only one match needed | ||
| - `query-join-filter-before` - Filter tables before joining | ||
| - `query-join-consider-alternatives` - Dictionaries/denormalization vs JOIN | ||
| - `query-join-null-handling` - join_use_nulls=0 for default values | ||
|
|
||
| ### Query Optimization - Indices (HIGH) | ||
|
|
||
| - `query-index-skipping-indices` - Skipping indices for non-ORDER BY filters | ||
|
|
||
| ### Query Optimization - Materialized Views (HIGH) | ||
|
|
||
| - `query-mv-incremental` - Incremental MVs for real-time aggregations | ||
| - `query-mv-refreshable` - Refreshable MVs for complex joins | ||
|
|
||
| ### Insert Strategy - Batching (CRITICAL) | ||
|
|
||
| - `insert-batch-size` - Batch 10K-100K rows per INSERT | ||
|
|
||
| ### Insert Strategy - Async (HIGH) | ||
|
|
||
| - `insert-async-small-batches` - Async inserts for high-frequency small batches | ||
| - `insert-format-native` - Native format for best performance | ||
|
|
||
| ### Insert Strategy - Mutations (CRITICAL) | ||
|
|
||
| - `insert-mutation-avoid-update` - ReplacingMergeTree instead of ALTER UPDATE | ||
| - `insert-mutation-avoid-delete` - Lightweight DELETE or DROP PARTITION | ||
|
|
||
| ### Insert Strategy - Optimization (HIGH) | ||
|
|
||
| - `insert-optimize-avoid-final` - Let background merges work | ||
|
|
||
| --- | ||
|
|
||
| ## When to Apply | ||
|
|
||
| This skill activates when you encounter: | ||
|
|
||
| - `CREATE TABLE` statements | ||
| - `ALTER TABLE` modifications | ||
| - `ORDER BY` or `PRIMARY KEY` discussions | ||
| - Data type selection questions | ||
| - Slow query troubleshooting | ||
| - JOIN optimization requests | ||
| - Data ingestion pipeline design | ||
| - Update/delete strategy questions | ||
| - ReplacingMergeTree or other specialized engine usage | ||
| - Partitioning strategy decisions | ||
|
|
||
| --- | ||
|
|
||
| ## Rule File Structure | ||
|
|
||
| Each rule file in `rules/` contains: | ||
|
|
||
| - **YAML frontmatter**: title, impact level, tags | ||
| - **Brief explanation**: Why this rule matters | ||
| - **Incorrect example**: Anti-pattern with explanation | ||
| - **Correct example**: Best practice with explanation | ||
| - **Additional context**: Trade-offs, when to apply, references | ||
|
|
||
| --- | ||
|
|
||
| ## Full Compiled Document | ||
|
|
||
| For the complete guide with all rules expanded inline: `AGENTS.md` | ||
|
|
||
| Use `AGENTS.md` when you need to check multiple rules quickly without reading individual files. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,55 @@ | ||
| --- | ||
| title: Use Async Inserts for High-Frequency Small Batches | ||
| impact: HIGH | ||
| impactDescription: "Server-side buffering when client batching isn't practical" | ||
| tags: [insert, async, buffering, small-batches] | ||
| --- | ||
|
|
||
| ## Use Async Inserts for High-Frequency Small Batches | ||
|
|
||
| **Impact: HIGH** | ||
|
|
||
| When client-side batching isn't practical, async inserts buffer server-side and create larger parts automatically. | ||
|
|
||
| **Incorrect (small batches without async):** | ||
|
|
||
| ```python | ||
| # Small batches without async_insert - creates too many parts | ||
| for batch in chunks(events, 100): | ||
| client.execute("INSERT INTO events VALUES", batch) | ||
| ``` | ||
|
|
||
| **Correct (enable async inserts):** | ||
|
|
||
| ```python | ||
| # Enable async_insert with safe defaults | ||
| client.execute("SET async_insert = 1") | ||
| client.execute("SET wait_for_async_insert = 1") # Confirms durability | ||
|
|
||
| for batch in chunks(events, 100): | ||
| client.execute("INSERT INTO events VALUES", batch) | ||
| # Server buffers and creates larger parts automatically | ||
| ``` | ||
|
|
||
| ```sql | ||
| -- Configure server-side for specific users | ||
| ALTER USER my_app_user SETTINGS | ||
| async_insert = 1, | ||
| wait_for_async_insert = 1, | ||
| async_insert_max_data_size = 10000000, -- Flush at 10MB | ||
| async_insert_busy_timeout_ms = 1000; -- Flush after 1s | ||
| ``` | ||
|
|
||
| **Flush conditions (whichever occurs first):** | ||
| - Buffer reaches `async_insert_max_data_size` | ||
| - Time threshold `async_insert_busy_timeout_ms` elapses | ||
| - Maximum insert queries accumulate | ||
|
|
||
| **Return modes:** | ||
|
|
||
| | Setting | Behavior | Use Case | | ||
| |---------|----------|----------| | ||
| | `wait_for_async_insert=1` | Waits for flush, confirms durability | **Recommended** | | ||
| | `wait_for_async_insert=0` | Fire-and-forget, unaware of errors | **Risky** - only if you accept data loss | | ||
|
|
||
| Reference: [Selecting an Insert Strategy](https://clickhouse.com/docs/best-practices/selecting-an-insert-strategy) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,54 @@ | ||
| --- | ||
| title: Batch Inserts Appropriately (10K-100K rows) | ||
| impact: CRITICAL | ||
| impactDescription: "Each INSERT creates a part; single-row inserts overwhelm merge process" | ||
| tags: [insert, batching, parts, performance] | ||
| --- | ||
|
|
||
| ## Batch Inserts Appropriately (10K-100K rows) | ||
|
|
||
| **Impact: CRITICAL** | ||
|
|
||
| Each INSERT creates a new data part. Single-row or small-batch inserts create thousands of tiny parts, overwhelming the merge process and causing cluster instability. | ||
|
|
||
| **Incorrect (single-row or tiny batches):** | ||
|
|
||
| ```python | ||
| # Single-row inserts - creates 10,000 parts! | ||
| for event in events: | ||
| client.execute("INSERT INTO events VALUES", [event]) | ||
|
|
||
| # Tiny batches - still too many parts | ||
| for batch in chunks(events, 100): # 100 rows per INSERT | ||
| client.execute("INSERT INTO events VALUES", batch) | ||
| ``` | ||
|
|
||
| **Correct (proper batch size):** | ||
|
|
||
| ```python | ||
| # Ideal batch size: 10,000-100,000 rows | ||
| BATCH_SIZE = 10_000 | ||
| for batch in chunks(events, BATCH_SIZE): | ||
| client.execute("INSERT INTO events VALUES", batch) | ||
| ``` | ||
|
|
||
| **Recommended batch sizes:** | ||
|
|
||
| | Threshold | Value | | ||
| |-----------|-------| | ||
| | Minimum | 1,000 rows | | ||
| | Ideal range | 10,000-100,000 rows | | ||
| | Insert rate (sync) | ~1 insert per second | | ||
|
|
||
| **Validation:** | ||
|
|
||
| ```sql | ||
| -- Monitor part count (>3000 per partition blocks inserts) | ||
| SELECT table, count() as parts, sum(rows) as total_rows | ||
| FROM system.parts | ||
| WHERE active AND database = 'default' | ||
| GROUP BY table | ||
| ORDER BY parts DESC; | ||
| ``` | ||
|
|
||
| Reference: [Selecting an Insert Strategy](https://clickhouse.com/docs/best-practices/selecting-an-insert-strategy) |
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,29 @@ | ||||||||||||||
| --- | ||||||||||||||
| title: Use Native Format for Best Insert Performance | ||||||||||||||
| impact: MEDIUM | ||||||||||||||
| impactDescription: "Native format is most efficient; JSONEachRow is expensive to parse" | ||||||||||||||
| tags: [insert, format, Native, performance] | ||||||||||||||
| --- | ||||||||||||||
|
|
||||||||||||||
| ## Use Native Format for Best Insert Performance | ||||||||||||||
|
|
||||||||||||||
| **Impact: MEDIUM** | ||||||||||||||
|
|
||||||||||||||
| Data format affects insert performance. Native format is column-oriented with minimal parsing overhead. | ||||||||||||||
|
|
||||||||||||||
| **Performance Ranking (fastest to slowest):** | ||||||||||||||
|
|
||||||||||||||
| | Format | Notes | | ||||||||||||||
| |--------|-------| | ||||||||||||||
| | **Native** | Most efficient. Column-oriented, minimal parsing. Recommended. | | ||||||||||||||
| | **RowBinary** | Efficient row-based alternative | | ||||||||||||||
| | **JSONEachRow** | Easier to use but expensive to parse | | ||||||||||||||
|
|
||||||||||||||
| **Example:** | ||||||||||||||
|
|
||||||||||||||
| ```python | ||||||||||||||
| # Use Native format for best performance | ||||||||||||||
| client.execute("INSERT INTO events VALUES", data, settings={'input_format': 'Native'}) | ||||||||||||||
|
Comment on lines
+24
to
+26
|
||||||||||||||
| ```python | |
| # Use Native format for best performance | |
| client.execute("INSERT INTO events VALUES", data, settings={'input_format': 'Native'}) | |
| ```bash | |
| # Use Native format for best performance with clickhouse-client | |
| clickhouse-client --query="INSERT INTO events FORMAT Native" < events.native |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The skill manifest declares version
0.3.0, butAGENTS.mddeclaresVersion 0.1.0. These should match to avoid confusion when referencing the skill/version in reviews. Consider updatingAGENTS.md’s header to the same version asSKILL.md(or vice versa) and keep the ClickHouse version/date consistent with that release.