Support HTTPS Influx client; add wide-schema docs, v0.2.1

haoruizhou · haoruizhou · commit 8e82cb09ac81 · 2026-03-16T16:05:17.000-04:00
Introduce an HTTP-based Influx client and wide-format (columnar) workflow across the library and docs. Changes include:

- Implement HttpInfluxClient plus _ArrowLike/_Scalar adapters in src/slicks/fetcher.py and auto-select it in get_influx_client() for https:// hosts; preserve InfluxDBClient3 for non-HTTPS. This enables querying via /api/v3/query_sql when gRPC/Flight is not available.
- Update discover_sensors to use get_influx_client() and relax client typing/creation, and simplify narrow/wide paths.
- Expand fetcher typing and client selection logic; add minimal adapters so HTTP responses can be consumed like Arrow results.
- API/docs updates: emphasize wide vs narrow schemas, add schema and resample params to fetch_telemetry, document fetch_telemetry_chunked, WideWriter, CAN decoding and line-protocol writing, and update examples to use schema="wide".
- Bump package version to 0.2.1 and add httpx dependency to pyproject.toml.
- Misc: update README examples (default DB and wide schema example) and add CLAUDE.md to .gitignore.

These changes make the library more robust over HTTPS tunnels and standardize wide-format telemetry usage and writing.
diff --git a/.gitignore b/.gitignore
@@ -16,3 +16,4 @@ coverage.xml
 htmlcov/
 /examples/__pycache__
 .DS_Store
+CLAUDE.md
diff --git a/README.md b/README.md
@@ -7,9 +7,10 @@
 The home baked data pipeline for **Western Formula Racing**.
 
 This package handles:
-1. **Data Ingestion:** Reliable fetching from InfluxDB 3.0.
-2. **Movement Detection:** Smart filtering of "Moving" vs "Idle" car states.
-3. **Sensor Discovery:** Tools to explore available sensors on any given race day.
+1. **Data Ingestion:** Reliable fetching from InfluxDB 3.0 in wide (columnar) or narrow (legacy EAV) format.
+2. **Data Writing:** `WideWriter` encodes CAN frames directly to InfluxDB wide format line protocol.
+3. **Movement Detection:** Smart filtering of "Moving" vs "Idle" car states.
+4. **Sensor Discovery:** Tools to explore available sensors on any given race day.
 
 ## Documentation
 
@@ -32,14 +33,15 @@ pip install slicks
 import slicks
 from datetime import datetime
 
-# 1. Connect (Auto-configured or custom)
-slicks.connect_influxdb3(db="WFR25")
+# 1. Connect (auto-configured from env vars or explicit)
+slicks.connect_influxdb3(db="WFR26")
 
-# 2. Fetch Data (One-liner)
+# 2. Fetch Data — wide format (columnar, preferred)
 df = slicks.fetch_telemetry(
-    datetime(2025, 9, 28), 
-    datetime(2025, 9, 30), 
-    "INV_Motor_Speed"
+    datetime(2025, 9, 28),
+    datetime(2025, 9, 30),
+    "INV_Motor_Speed",
+    schema="wide",
 )
 
 print(df.describe())
diff --git a/docs/advanced_usage.md b/docs/advanced_usage.md
@@ -1,24 +1,42 @@
 # Advanced Usage & Workflows
 
-## 1. Dynamic Sensor Discovery
-Not sure what sensors are available for a specific test day? Don't guess. Use the discovery tool.
+## 1. Wide vs Narrow Format
+
+The database stores telemetry in **wide format**: each CAN signal is its own column. This is faster to query and requires no pivot step.
 
 ```python
-from slicks import discover_sensors
+import slicks
 from datetime import datetime
 
-start = datetime(2025, 9, 28)
-end = datetime(2025, 9, 30)
+start = datetime(2025, 9, 28, 12, 0)
+end   = datetime(2025, 9, 28, 14, 0)
+
+# Wide format (default, preferred) — direct column access
+df = slicks.fetch_telemetry(start, end, ["INV_Motor_Speed", "PackCurrent"], schema="wide")
+
+# Narrow format (legacy EAV) — only for old data that was never migrated
+df = slicks.fetch_telemetry(start, end, ["INV_Motor_Speed", "PackCurrent"], schema="narrow")
+```
+
+Use `schema="wide"` for all new work.
+
+---
+
+## 2. Dynamic Sensor Discovery
+Not sure what sensors are available? With wide format, discovery is instant — it reads column metadata rather than scanning data rows.
+
+```python
+from slicks import discover_sensors
 
-# This physically queries the DB to find what tags exist
-available_sensors = discover_sensors(start, end)
+# Wide: instant metadata lookup (no time range needed)
+available_sensors = discover_sensors(None, None, schema="wide")
 
 print(f"Found {len(available_sensors)} sensors:")
 for sensor in available_sensors:
     print(f" - {sensor}")
 ```
 
-## 2. Managing Environments
+## 3. Managing Environments
 You often need to switch between `Development`, `Testing`, and `Production` databases, or switch to a local replay server.
 
 ### Option A: Environment Variables (Best for CI/CD)
@@ -38,7 +56,7 @@ slicks.connect_influxdb3(
 )
 ```
 
-## 3. Bulk Export for CSV Analysis
+## 4. Bulk Export for CSV Analysis
 If you need to hand off data to the aerodynamics team who uses Excel/MATLAB, use the bulk fetcher. It handles day-by-day chunking to avoid crashing the computer.
 
 ```python
@@ -48,14 +66,41 @@ from slicks import bulk_fetch_season
 bulk_fetch_season(start, end, output_file="full_weekend_data.csv")
 ```
 
-## 4. Customizing Movement Detection
+## 5. Writing CAN Data (Wide Format)
+
+If you're ingesting raw CAN bus data (e.g., from a replay script or live logger), use `WideWriter`. It decodes CAN frames using a DBC file and writes them as wide format line protocol.
+
+```python
+from slicks import WideWriter
+
+writer = WideWriter(
+    url="http://localhost:8086",
+    token="my-token",
+    bucket="WFR26",
+    measurement="WFR26",
+    dbc_path="path/to/WFR26.dbc",
+)
+
+# Decode and queue a CAN frame
+writer.decode_and_queue(can_id=0x200, data=bytes([0x01, 0x02, ...]), ts_ns=timestamp_ns)
+
+# Flush remaining data when done
+writer.close()
+```
+
+Each decoded CAN message becomes one row with all of its signals as fields:
+```
+WFR26,messageName=BMS_Status,canId=512 PackCurrent=-3264.0,SOC=85.0 1700000000000000000
+```
+
+## 6. Customizing Movement Detection
 If you are analyzing **Charging** or **Static Testing**, the default movement filter will hide your data. Disable it:
 
 ```python
 # Fetch Battery Current even when car is stopped
 df = slicks.fetch_telemetry(
-    start, end, 
-    signals="PackCurrent", 
+    start, end,
+    signals="PackCurrent",
     filter_movement=False
 )
 ```
diff --git a/docs/api_reference.md b/docs/api_reference.md
@@ -20,33 +20,55 @@ slicks.connect_influxdb3(url=None, token=None, org=None, db=None)
 
 ### `slicks.fetch_telemetry`
 
-The primary function to retrieve data. It handles querying, pivoting, resampling, and movement filtering.
+The primary function to retrieve data. It handles querying, resampling, and movement filtering.
 
 ```python
-slicks.fetch_telemetry(start_time, end_time, signals=None, client=None, filter_movement=True)
+slicks.fetch_telemetry(start_time, end_time, signals=None, client=None,
+                       filter_movement=True, resample="1s", schema="wide")
 ```
 
 - **start_time** *(datetime)*: Start of the query range.
 - **end_time** *(datetime)*: End of the query range.
-- **signals** *(str or list[str])*: A single sensor name or a list of sensor names to fetch. Defaults to standard configuration if None.
+- **signals** *(str or list[str])*: A single sensor name or a list of sensor names to fetch. Defaults to standard configuration if `None`.
 - **client** *(InfluxDBClient3, optional)*: An existing client instance (advanced use).
-- **filter_movement** *(bool)*: If `True` (default), strips out rows where the car is stationary. If `False`, returns all raw data.
+- **filter_movement** *(bool)*: If `True` (default), strips out rows where the car is stationary.
+- **resample** *(str or None)*: Pandas frequency string for resampling (e.g. `"1s"`, `"100ms"`). Pass `None` for raw data.
+- **schema** *(str)*: `"wide"` (default, columnar — each signal is a column) or `"narrow"` (legacy EAV — requires pivot).
 
-**Returns:** `pandas.DataFrame` indexed by time, with 1-second resolution. Returns `None` if no data is found.
+**Returns:** `pandas.DataFrame` indexed by time. Returns `None` if no data is found.
+
+---
+
+### `slicks.fetch_telemetry_chunked`
+
+Same interface as `fetch_telemetry`, but splits large date ranges into chunks and runs them in parallel. Handles server resource limits automatically via adaptive query bisection.
+
+```python
+slicks.fetch_telemetry_chunked(start_time, end_time, signals=None, client=None,
+                                filter_movement=True, resample="1s", schema="wide",
+                                chunk_size=timedelta(hours=6), max_workers=4)
+```
+
+- **chunk_size** *(timedelta)*: Window size per chunk (default: 6 hours).
+- **max_workers** *(int)*: Number of parallel threads (default: 4).
+
+**Returns:** `pandas.DataFrame` concatenated from all chunks, or `None`.
 
 ---
 
 ### `slicks.discover_sensors`
 
-Scans the database to find which sensors actually recorded data during a time period.
+Returns the list of available sensor/signal names.
 
 ```python
-slicks.discover_sensors(start_time, end_time, chunk_size_days=1)
+slicks.discover_sensors(start_time, end_time, chunk_size_days=7,
+                        client=None, show_progress=True, schema="wide")
 ```
 
-- **start_time** *(datetime)*: Start of scan.
-- **end_time** *(datetime)*: End of scan.
-- **chunk_size_days** *(int)*: How many days to query at once (prevents timeouts).
+- **start_time** *(datetime)*: Start of scan (used only in `"narrow"` schema).
+- **end_time** *(datetime)*: End of scan (used only in `"narrow"` schema).
+- **chunk_size_days** *(int)*: Days per chunk for narrow schema scans (default: 7).
+- **schema** *(str)*: `"wide"` performs an instant metadata lookup (`information_schema.columns`) — no time range required. `"narrow"` scans actual data rows.
 
 **Returns:** `list[str]` of unique sensor names sorted alphabetically.
 
@@ -76,3 +98,66 @@ slicks.detect_movement_ratio(df, speed_column="INV_Motor_Speed")
 ```
 
 **Returns:** `dict` containing `total_rows`, `moving_rows`, `idle_rows`, and `movement_ratio` (0.0 - 1.0).
+
+---
+
+## Wide Format Writing
+
+### `slicks.WideWriter`
+
+Encodes CAN frames to InfluxDB wide format line protocol and writes them in batches.
+
+```python
+from slicks import WideWriter
+
+writer = WideWriter(
+    url,                    # InfluxDB URL
+    token,                  # Auth token
+    bucket,                 # Bucket/database name (e.g. "WFR26")
+    measurement,            # Measurement name (e.g. "WFR26")
+    dbc_path=None,          # Path to DBC file (or set WFR_DBC_PATH env var)
+    batch_size=5000,        # Points per write batch
+)
+```
+
+**Methods:**
+
+- `decode_and_queue(can_id, data, ts_ns)` — Decode raw CAN bytes and queue for batch write.
+- `write_lines(lines)` — Write pre-formatted line protocol strings directly.
+- `flush()` — Flush the pending batch.
+- `close()` — Flush and close the connection.
+
+**Line protocol format:**
+```
+WFR26,messageName=BMS_Status,canId=512 PackCurrent=-3264.0,SOC=85.0 1700000000000000000
+```
+
+---
+
+## CAN Decoding
+
+### `slicks.decode_frame`
+
+Decodes a raw CAN frame into named signals using a DBC database.
+
+```python
+from slicks import load_dbc, decode_frame
+
+db = load_dbc("path/to/WFR26.dbc")
+frame = decode_frame(db, can_id, raw_bytes)  # → DecodedFrame or None
+```
+
+**`DecodedFrame` fields:**
+- `message_name` *(str)*: CAN message name from the DBC.
+- `can_id` *(int)*: CAN frame ID.
+- `signals` *(dict[str, float])*: Decoded signal values.
+
+### `slicks.frame_to_line_protocol`
+
+Converts a `DecodedFrame` to an InfluxDB line protocol string.
+
+```python
+from slicks import frame_to_line_protocol
+
+line = frame_to_line_protocol(frame, measurement="WFR26", timestamp_ns=ts)
+```
diff --git a/docs/example_analysis.md b/docs/example_analysis.md
@@ -29,14 +29,14 @@ Connecting to Slicks Telemetry Database...
 
 ## 2. Discovering Sensors
 
-Before fetching data, we need to know exactly what sensors were recording during our test session. We'll scan a specific window to see the available signals.
+Before fetching data, we can check what sensors are available. With wide format, this is an instant metadata lookup — no time range needed.
 
 ```python
 start_time = datetime(2025, 9, 28, 20, 20, 0)
 end_time   = datetime(2025, 9, 28, 21, 0, 0)
 
-print(f"Scanning for sensors between {start_time} and {end_time}...")
-available_sensors = slicks.discover_sensors(start_time, end_time)
+# Wide format: instant column metadata lookup
+available_sensors = slicks.discover_sensors(start_time, end_time, schema="wide")
 
 # Filter for Inverter (INV) related sensors to narrow our search
 inv_sensors = [s for s in available_sensors if s.startswith("INV_")]
@@ -45,8 +45,6 @@ print(f"Found {len(inv_sensors)} Inverter sensors. Examples: {inv_sensors[:5]}")
 
 **Output:**
 ```text
-Scanning for sensors between 2025-09-28 20:20:00 and 2025-09-28 21:00:00...
-Discovering sensors from 2025-09-28 20:20:00 to 2025-09-28 21:00:00...
 Discovery Complete. Found 342 unique sensors.
 Found 91 Inverter sensors. Examples: ['INV_Analog_Input_1', 'INV_Analog_Input_2', 'INV_Analog_Input_3', 'INV_Analog_Input_4', 'INV_Analog_Input_5']
 ```
@@ -62,9 +60,14 @@ target_signals = ["INV_Motor_Speed", "INV_DC_Bus_Current"]
 
 print(f"Fetching data for: {target_signals}...")
 
-# Fetch 1-second resampled data. 
+# Fetch 1-second resampled data in wide format.
 # We disable filter_movement to capture the full session including startup.
-df = slicks.fetch_telemetry(start_time, end_time, signals=target_signals, filter_movement=False)
+df = slicks.fetch_telemetry(
+    start_time, end_time,
+    signals=target_signals,
+    filter_movement=False,
+    schema="wide",
+)
 
 if df is not None:
     print(f"Successfully loaded {len(df)} data points.")
@@ -77,8 +80,8 @@ Fetching data for: ['INV_Motor_Speed', 'INV_DC_Bus_Current']...
 Executing query for range: 2025-09-28 20:20:00 to 2025-09-28 21:00:00...
 Fetched 117 rows.
 Successfully loaded 117 data points.
-signalName           INV_DC_Bus_Current  INV_Motor_Speed
-time                                                    
+                     INV_DC_Bus_Current  INV_Motor_Speed
+time
 2025-09-28 20:21:27                 0.0              0.0
 2025-09-28 20:21:28                 0.0              0.0
 2025-09-28 20:21:29                 0.0              0.0
diff --git a/docs/getting_started.md b/docs/getting_started.md
@@ -54,8 +54,8 @@ end   = datetime(2025, 9, 28, 14, 0, 0) # Sept 28, 2025 at 02:00 PM
 You can request a single sensor or a list of sensors.
 
 ```python
-# Fetch Motor Speed
-df = slicks.fetch_telemetry(start, end, "INV_Motor_Speed")
+# Fetch Motor Speed (wide format — each signal is a column)
+df = slicks.fetch_telemetry(start, end, "INV_Motor_Speed", schema="wide")
 
 if df is not None:
     print(df.head())
diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "slicks"
-version = "0.2.0"
+version = "0.2.1"
 description = "The home baked data pipeline for Western Formula Racing"
 readme = "README.md"
 authors = [
@@ -21,6 +21,7 @@ dependencies = [
     "influxdb3-python>=0.1.0",
     "influxdb-client>=1.30.0",
     "cantools>=39.0.0",
+    "httpx>=0.27.0",
     "python-dotenv>=1.0.0",
     "matplotlib>=3.0.0",
     "tqdm>=4.0.0",
diff --git a/src/slicks/discovery.py b/src/slicks/discovery.py
diff --git a/src/slicks/fetcher.py b/src/slicks/fetcher.py