Commit 496de04
authored
release: Arc v26.02.1 (#175)
## New Features
### InfluxDB Client Compatibility
Arc's Line Protocol endpoints now use the same paths as InfluxDB, enabling drop-in compatibility with all official InfluxDB client libraries (Go, Python, JavaScript, Java, C#, PHP, Ruby, Telegraf, Node-RED).
- `/api/v1/write` → `/write` (InfluxDB 1.x clients, Telegraf)
- `/api/v1/write/influxdb` → `/api/v2/write` (InfluxDB 2.x clients)
- Arc-native endpoint `/api/v1/write/line-protocol` preserved
- Supports Bearer token, Token header, API key header, and query parameter authentication
### MQTT Ingestion Support
Native MQTT subscription for IoT and edge data ingestion. Connect directly to MQTT brokers without requiring additional infrastructure.
- Subscribe to multiple MQTT topics with wildcard support (`+`, `#`)
- Dynamic subscription management via REST API
- TLS/SSL connections with certificate validation
- Authentication via username/password or client certificates
- Connection auto-reconnect with exponential backoff
- Per-subscription statistics and monitoring
- Passwords encrypted at rest
- Auto-start subscriptions on server restart
### S3 File Caching via cache_httpfs Extension (PR #149)
Optional in-memory caching of S3 Parquet files via DuckDB's `cache_httpfs` community extension. 5-10x query performance improvement for workloads with repeated file access (CTEs, subqueries, Grafana dashboards).
- In-memory only — no disk caching, preserves stateless compute philosophy
- Opt-in, configurable cache size and TTL
- Graceful degradation if extension fails to load
*Contributed by @khalid244*
### Relative Time Expression Support in Partition Pruning
Queries using `NOW() - INTERVAL` now benefit from partition pruning. Supports seconds, minutes, hours, days, weeks, months.
## Bug Fixes
- **Control Characters in Measurement Names Break S3 (Issue #122)** — Added strict validation for measurement names across all ingestion endpoints (Line Protocol, MsgPack, Continuous Queries). Names must start with a letter, contain only alphanumeric/underscore/hyphen, max 128 chars.
- **Partition Pruner Fails on Non-Existent S3 Partitions (Issue #125, #144, PR #145)** — Fixed "No files found" errors when time range includes non-existent S3 partitions. Extended `filterExistingPaths()` for S3/Azure storage. Also fixed `filepath.Join()` mangling S3 URLs. *(Day-level fix by @khalid244)*
- **Server Timeout Config Values Ignored (Issue #126)** — `server.read_timeout` and `server.write_timeout` now correctly use configuration values instead of hardcoded 30s.
- **Large Payload Ingestion (413 Request Entity Too Large)** — `MaxPayloadSize` config now correctly passed to Fiber's `BodyLimit`.
- **Query Results Timestamp Timezone Inconsistency** — All timestamps in query results now normalized to UTC.
- **Azure Blob Storage Query SSL Certificate Errors on Linux (PR #92)** — Fixed by using system curl for SSL on Linux. *(Contributed by @schotime)*
- **UTC Consistency for Compaction Filenames (PR #132)** — Compacted filenames now use UTC timestamps consistently. *(Contributed by @schotime)*
- **S3 Subprocess Configuration Issues (Issue #131)** — Fixed missing credentials and SSL config forwarding to compaction subprocess.
- **Query Failures with Non-UTF8 Data (Issue #136)** — Added automatic UTF-8 sanitization during ingestion with optimized fast-path (~6-25ns overhead).
- **Nanosecond Timestamp Support for MessagePack Ingestion** — Extended timestamp detection to handle 19-digit nanosecond timestamps (important for InfluxDB migrations).
- **WHERE Clause Regex Fails to Match Multi-Line Queries (Issue #146, PR #148)** — Fixed partition pruner failing on multi-line SQL queries. *(Contributed by @khalid244)*
- **String Literals Containing SQL Keywords Break Partition Pruning** — Added string literal masking before regex matching.
- **Buffer Age-Based Flush Timing Under High Load (Issue #142)** — Ticker now fires at half the configured interval for more consistent flush timing (~25% improvement).
- **Arrow Writer Panic During High-Concurrency Writes (Issue #130)** — Fixed out-of-bounds panic during schema evolution with concurrent writes.
- **Empty Directories Not Cleaned Up After Daily Compaction** — Automatic cleanup of empty hour-level partition directories after compaction.
- **Compactor OOM and Segfaults with Large Datasets (Issue #102)** — Streaming I/O, memory limit passthrough, file batching (1000 max), and adaptive batch sizing on failure.
- **Orphaned Compaction Temp Directories (Issue #164, PR #165)** — Two-layer cleanup: startup cleanup + parent-side cleanup after subprocess completion.
- **Compaction Data Duplication on Crash (Issue #157, PR #163)** — Manifest-based tracking in S3 prevents re-compaction after crash. *(Contributed by @khalid244)*
- **WAL-Based S3 Recovery (Issue #159, PR #162)** — Fixed data loss during S3 outages with startup recovery, periodic recovery, and backpressure handling. *(Contributed by @khalid244)*
- **Tiered Storage Query Routing (Issue #166, PR #167)** — Fixed queries not routing to cold tier data and database listing not showing cold-only databases.
- **Retention Policies for S3/Azure Storage Backends (Issue #169, PR #170)** — Retention policies now work with all storage backends.
- **Retention Policy Empty Directory Cleanup (Issue #171)** — Empty directories cleaned up after retention policy file deletion.
- **Query Timeout for S3 Disconnection (Issue #151, PR #152)** — Configurable query timeout prevents indefinite hangs. Returns HTTP 504 when exceeded. *(Contributed by @khalid244)*
## Improvements
- **Configurable Server Idle and Shutdown Timeouts** — `server.idle_timeout` and `server.shutdown_timeout` now configurable.
- **Automatic Time Function Query Optimization** — `time_bucket()` and `date_trunc()` automatically rewritten to epoch arithmetic (2-2.5x faster GROUP BY queries).
- **Parallel Partition Scanning** — Queries spanning 3+ partitions now execute concurrently (2-4x speedup).
- **Two-Stage Distributed Aggregation (Enterprise)** — 5-20x speedup for cross-shard aggregations via scatter/gather.
- **DuckDB Query Engine Optimizations** — Parquet metadata caching, prefetching, pool-wide `SET GLOBAL` consistency (18-24% faster aggregations). *(SET GLOBAL fix by @khalid244)*
- **Automatic Regex-to-String Function Optimization** — URL domain extraction patterns rewritten to native string functions (2x+ faster).
- **Database Header for Query Optimization** — `x-arc-database` header skips regex parsing (5-17% faster queries).
- **MQTT Client Auto-Generated Client ID** — Prevents client ID collisions across multiple Arc instances.
- **MQTT Restart Endpoint** — `/api/v1/restart` to apply MQTT config changes without server restart.
## Security
- **Token Hashing Security Model** — New tokens hashed with bcrypt (cost 10). SHA256 prefixes for O(1) lookup. Legacy tokens supported.
## Breaking Changes
- `/api/v1/write` → `/write` and `/api/v1/write/influxdb` → `/api/v2/write`. Update client config if using old Arc-specific paths. InfluxDB client libraries need no changes.
## Contributors
Thanks to @schotime (Adam Schroder) and @khalid244 for their contributions to this release.1 parent a97d16a commit 496de04
1 file changed
Lines changed: 8 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
32 | | - | |
| 32 | + | |
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
| |||
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
53 | | - | |
54 | | - | |
| 53 | + | |
| 54 | + | |
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
| |||
128 | 128 | | |
129 | 129 | | |
130 | 130 | | |
131 | | - | |
| 131 | + | |
132 | 132 | | |
133 | 133 | | |
134 | 134 | | |
135 | 135 | | |
136 | 136 | | |
137 | 137 | | |
138 | | - | |
| 138 | + | |
139 | 139 | | |
140 | 140 | | |
141 | 141 | | |
| |||
155 | 155 | | |
156 | 156 | | |
157 | 157 | | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
0 commit comments