Replies: 1 comment
-
|
Good morning! Thanks for the thoughtful write-up — your reading of the code is correct. Yes, Arc's consistency model is "eventual visibility after flush", not read-your-write. Records sit in We actually prototyped exactly the The short version:
So the design is intentionally optimized around batched Parquet writes plus a short, fixed freshness delay — rather than paying query-path overhead to close a sub-100ms gap. That said — what's your use case? Specifically, what kind of query are you running where ≤100ms freshness isn't enough? Read-your-write for a UI flow, an alerting/streaming pipeline, agent memory recall, something else? If there's a real workload that needs sub-flush visibility, we'd love to hear it — it might justify revisiting #249 with an opt-in design (e.g. Thanks again for digging into the code! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Arc team,
I’m trying to understand the query visibility semantics around the in-memory
ArrowBuffer.From reading the code, my current understanding is:
ArrowBuffer.max_buffer_sizeormax_buffer_age_msis reached.read_parquet(...)and execute them through DuckDB.QueryHandlerdoes not appear to hold a reference toArrowBuffer, and normal query paths do not seem to callFlushAll()before executing the query.So my question is:
Are records that have been accepted by the write API but have not yet been flushed to Parquet expected to be invisible to normal queries?
In other words, is Arc’s intended consistency model “eventual query visibility after flush”, rather than “read-your-write” visibility?
I also noticed that DuckDB can query Apache Arrow objects/datasets directly in some bindings, for example:
https://duckdb.org/docs/current/guides/python/sql_on_arrow#apache-arrow-datasets
Has Arc considered querying both persisted Parquet files and the current in-memory Arrow buffer, e.g. by combining:
Beta Was this translation helpful? Give feedback.
All reactions