Skip to content

Commit 18f8ff2

Browse files
Add: Fix Bruin Python asset ArrowInvalid UTC error
Closes #218
1 parent af028ff commit 18f8ff2

1 file changed

Lines changed: 28 additions & 0 deletions

File tree

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
---
2+
id: 390f2bec4a
3+
question: 'Bruin Python asset fails with ArrowInvalid: Cannot locate timezone ''UTC'':
4+
Timezone database not found'
5+
sort_order: 4
6+
---
7+
8+
Cause: On Windows, PyArrow has no built-in timezone database. When dlt/ingestr receives a DataFrame with naive (tz-unaware) timestamp columns, it calls pyarrow.compute.assume_timezone("UTC") internally to annotate them — which requires a tzdata file on disk. If that file isn't where PyArrow expects it, the pipeline crashes even if tzdata is listed in your requirements.txt (that package only installs into the asset container, not the host ingestr environment).
9+
10+
Solution: Return timestamps that are already tz-aware UTC from your materialize() function. When columns arrive with timezone info already set, dlt skips the assume_timezone call entirely.
11+
```python
12+
for col in df.columns:
13+
if pd.api.types.is_datetime64_any_dtype(df[col]):
14+
if hasattr(df[col].dt, "tz") and df[col].dt.tz is None:
15+
df[col] = df[col].dt.tz_localize("UTC")
16+
else:
17+
df[col] = df[col].dt.tz_convert("UTC")
18+
# microsecond precision avoids ns-overflow and is what pyarrow prefers
19+
df[col] = df[col].astype("datetime64[us, UTC]")
20+
21+
df["extracted_at"] = pd.Timestamp.now("UTC").floor("us")
22+
```
23+
24+
Also bump pyarrow to 14+ in `requirements.txt``datetime64[us, UTC]` as a pandas dtype was stabilised there:
25+
```
26+
pyarrow==15.0.2
27+
tzdata==2024.1
28+
```

0 commit comments

Comments
 (0)