Skip to content

Commit f63e2c6

Browse files
authored
feat(arrow/array): Add new arreflect package (#771)
### Rationale for this change Attempting to address apache/arrow-adbc#4185, there is no built-in way to convert between arrow arrays/records and native Go objects and types using reflection. Users currently must manually construct builders, iterate columns and handle type mapping for their own schemas. Some other Arrow implementations (e.g. pyarrow) offer higher-level APIs for this, so we can close the gap for Go. ### What changes are included in this PR? Adds a new opt-in sub-package `arrow/array/arreflect` providing bidirectional Go↔Arrow conversion via reflection. **Public API**: - `At[T]`, `ToSlice[T]` — Arrow array → Go value/slice - `FromSlice[T]` — Go slice → Arrow array (variadic `Option` for dict/listview/ree/decimal/temporal overrides) - `RecordToSlice[T]`, `RecordFromSlice[T]` — `RecordBatch` ↔ Go struct slices - `RecordAt[T]`, `RecordAtAny` — single-row record accessors (typed and runtime-inferred) - `RecordToAnySlice` — runtime-inferred full-record conversion (no compile-time Go type needed) - `InferSchema[T]`, `InferType[T]` — infer `*arrow.Schema` / `arrow.DataType` from Go types - `InferGoType` — invert Arrow→Go type mapping at runtime via `reflect.StructOf` - `AtAny`, `ToAnySlice` — dynamic accessors when the Go type is not known at compile time - `WithDict()`, `WithListView()`, `WithREE()`, `WithDecimal(p,s)`, `WithTemporal(s)` — encoding options - Sentinel errors `ErrUnsupportedType`, `ErrTypeMismatch` (usable with `errors.Is`) **Supported Arrow types**: all primitives, Timestamp/Date32/Date64/Time32/Time64/Duration, Decimal32/64/128/256, Struct, List/LargeList/ListView/LargeListView (read), FixedSizeList, Map, Dictionary (`dict` tag), RunEndEncoded (`ree` tag). *Struct tag control* (follows `encoding/json` conventions): ```go type Row struct { Name string `arrow:"name"` Score float64 `arrow:"score"` Skip string `arrow:"-"` Enc string `arrow:"enc,dict"` When time.Time `arrow:"when,date32"` Vals []int `arrow:"vals,listview"` Price decimal128.Num `arrow:"price,decimal(18,2)"` } ``` Key implementation details: - Pointer fields → nullable Arrow fields (nil = null); multi-level pointers fully dereferenced - Embedded struct fields promoted following `encoding/json` BFS rules (`collectFieldCandidates` + `resolveFieldCandidates`) - Struct metadata cached per type via `sync.Map` - `WithTemporal` validates input, returning `ErrUnsupportedType` for unrecognized values - `FromSlice` empty-slice path applies all encoding options consistently with the non-empty path (decimal, temporal, dict, listview, ree) - Tag parsing uses parenthesis-aware `splitTagTokens` for decimal(p,s) — no fragile comma reassembly - `InferGoType` validates all runes of exported field names, rejects non-identifier characters (hyphens, dots, spaces, digit prefixes), and detects duplicate exported names after capitalization - `validateDictValueType` enforced on all dict paths (struct tags, `FromSlice` opts, empty-slice) - Primitive types cached as package-level `reflect.Type` vars - Internal duplication minimized via helpers: `asTime`/`asDuration` (TypeAssert), `appendListElement` (list builder dispatch with checked type assertion), `listLike` interface (Elem() unification) - Large list variants (`LARGE_LIST`, `LARGE_LIST_VIEW`) supported for reading but not produced by `FromSlice` ### Are these changes tested? Yes, comprehensive test coverage along with testable examples that will show up in the docs. ### Are there any user-facing changes? Yes, the entirely new public API in the new `arrow/array/arreflect`
1 parent c5f0943 commit f63e2c6

13 files changed

Lines changed: 7658 additions & 0 deletions

arrow/array/arreflect/doc.go

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
// Licensed to the Apache Software Foundation (ASF) under one
2+
// or more contributor license agreements. See the NOTICE file
3+
// distributed with this work for additional information
4+
// regarding copyright ownership. The ASF licenses this file
5+
// to you under the Apache License, Version 2.0 (the
6+
// "License"); you may not use this file except in compliance
7+
// with the License. You may obtain a copy of the License at
8+
//
9+
// http://www.apache.org/licenses/LICENSE-2.0
10+
//
11+
// Unless required by applicable law or agreed to in writing, software
12+
// distributed under the License is distributed on an "AS IS" BASIS,
13+
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
// See the License for the specific language governing permissions and
15+
// limitations under the License.
16+
17+
// Package arreflect provides utilities for converting between
18+
// Apache Arrow arrays and Go structs using reflection.
19+
//
20+
// The primary entry points are the generic functions [At], [ToSlice],
21+
// [FromSlice], [RecordToSlice], and [RecordFromSlice], which convert
22+
// between Arrow arrays/records and Go slices of structs.
23+
//
24+
// Schema inference is available via [InferSchema] and [InferType].
25+
//
26+
// Arrow struct tags control field mapping:
27+
//
28+
// type MyRow struct {
29+
// Name string `arrow:"name"`
30+
// Score float64 `arrow:"score"`
31+
// Skip string `arrow:"-"`
32+
// Enc string `arrow:"enc,dict"`
33+
// T32 time.Time `arrow:"t32,time32"`
34+
// }
35+
//
36+
// Temporal type overrides for time.Time fields:
37+
//
38+
// arrow:"field,date32" — use Date32 instead of Timestamp
39+
// arrow:"field,date64" — use Date64 instead of Timestamp
40+
// arrow:"field,time32" — use Time32(ms) instead of Timestamp
41+
// arrow:"field,time64" — use Time64(ns) instead of Timestamp
42+
//
43+
// Additional tag options:
44+
//
45+
// arrow:"field,view" — use STRING_VIEW/BINARY_VIEW for string/bytes fields, or LIST_VIEW for slice fields
46+
// arrow:"field,ree" — run-end encoding at top-level only (struct fields not supported)
47+
// arrow:"field,decimal(precision,scale)" — override decimal precision and scale (e.g., arrow:",decimal(18,2)")
48+
package arreflect

0 commit comments

Comments
 (0)