Skip to content

Commit 16af8c8

Browse files
authored
Add documentation on DateTime and timezone behavior including handling of mixed formats and SDK type reconstruction examples (#18)
1 parent f218675 commit 16af8c8

1 file changed

Lines changed: 50 additions & 0 deletions

File tree

concepts/metadata-filtering.mdx

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,56 @@ doc = db.ingest_text(
5959

6060
If you omit a hint, Morphik infers one automatically for simple scalars, but explicitly declaring types is recommended for reliable range queries.
6161

62+
### DateTime and Timezone Behavior
63+
64+
Morphik preserves your timezone format exactly as provided:
65+
66+
| Input | Stored As | Notes |
67+
| --- | --- | --- |
68+
| `datetime(2024, 1, 15)` (naive) | `"2024-01-15T00:00:00"` | No timezone added |
69+
| `datetime(2024, 1, 15, tzinfo=UTC)` | `"2024-01-15T00:00:00+00:00"` | Timezone preserved |
70+
| `"2024-01-15T12:00:00Z"` (string) | `"2024-01-15T12:00:00+00:00"` | Z converted to +00:00 |
71+
| `1705312800` (UNIX timestamp) | `"2024-01-15T10:00:00+00:00"` | Timestamps are inherently UTC |
72+
73+
**SDK Type Reconstruction:** When you retrieve a `Document` via the Python SDK, datetime/date/decimal values in `metadata` are automatically reconstructed to their Python types using the `metadata_types` hints. This means you get back what you put in:
74+
75+
```python
76+
from datetime import datetime
77+
78+
# Ingest with naive datetime
79+
doc = db.ingest_text("...", metadata={"created": datetime(2024, 1, 15)})
80+
81+
# Retrieve - metadata["created"] is a datetime object, not a string
82+
retrieved = db.get_document(doc.external_id)
83+
print(type(retrieved.metadata["created"])) # <class 'datetime.datetime'>
84+
print(retrieved.metadata["created"].tzinfo) # None (still naive)
85+
```
86+
87+
### Mixed Timezone Formats
88+
89+
**Morphik handles mixed formats correctly** - filtering and comparisons work even if some documents have naive datetimes and others have timezone-aware ones:
90+
91+
```python
92+
from datetime import datetime, UTC
93+
94+
# Mixed formats across documents - Morphik handles this fine
95+
db.ingest_text("Doc A", metadata={"ts": datetime(2024, 1, 15)}) # naive
96+
db.ingest_text("Doc B", metadata={"ts": datetime(2024, 6, 15, tzinfo=UTC)}) # aware
97+
98+
# Filtering works correctly
99+
results = db.list_documents(filters={"ts": {"$gte": "2024-05-01"}}) # Returns Doc B
100+
```
101+
102+
<Warning>
103+
**Python comparisons fail with mixed formats.** If you retrieve mixed-format datetimes and compare them locally, Python raises `TypeError`:
104+
105+
```python
106+
sorted([naive_dt, aware_dt]) # TypeError: can't compare offset-naive and offset-aware
107+
```
108+
109+
**Recommendation:** Stay consistent - pick one format (preferably timezone-aware with UTC) and use it throughout. Let Morphik handle filtering rather than sorting in Python.
110+
</Warning>
111+
62112
## Implicit vs Explicit Syntax
63113

64114
- **Implicit equality** – Bare key/value pairs (`{"status": "active"}`) use JSON containment and are ideal for simple matching. They also check whether an array contains the value.

0 commit comments

Comments
 (0)