Describe the bug, including details regarding any error messages, version, and platform.
Describe the bug
When writing a FixedSizeList<float32> array to Parquet via pqarrow.FileWriter, the values are written correctly in-memory but are read back as NULL when using pqarrow.FileReader.ReadTable.
The same write pattern using a standard List<float32> (with arrow.ListOf) produces correct values.
Reproduction
// fixedsize_list_parquet_repro.go
//
// Minimal reproduction for FixedSizeList + Parquet issue in Arrow Go.
// Writes a FixedSizeList<float32>[8] with values [1..8] and reads it back
// via pqarrow. On v14.0.2 the values are read as nulls, while the in-memory
// record before writing is correct.
package main
import (
"context"
"fmt"
"os"
"path/filepath"
"github.com/apache/arrow/go/v14/arrow"
"github.com/apache/arrow/go/v14/arrow/array"
"github.com/apache/arrow/go/v14/arrow/memory"
"github.com/apache/arrow/go/v14/parquet"
"github.com/apache/arrow/go/v14/parquet/file"
"github.com/apache/arrow/go/v14/parquet/pqarrow"
)
func main() {
const dim = 8
expected := []float32{1, 2, 3, 4, 5, 6, 7, 8}
out := filepath.Join(os.TempDir(), "fixedsize_bug.parquet")
fmt.Println("Parquet file:", out)
// Schema: FixedSizeList<float32>[8]
schema := arrow.NewSchema(
[]arrow.Field{
{
Name: "embedding",
Type: arrow.FixedSizeListOf(int32(dim), arrow.PrimitiveTypes.Float32),
},
},
nil,
)
pool := memory.NewGoAllocator()
// --- Write ---
f, err := os.Create(out)
if err != nil {
panic(err)
}
props := parquet.NewWriterProperties()
awProps := pqarrow.NewArrowWriterProperties()
pw, err := pqarrow.NewFileWriter(schema, f, props, awProps)
if err != nil {
panic(err)
}
b := array.NewRecordBuilder(pool, schema)
defer b.Release()
flb := b.Field(0).(*array.FixedSizeListBuilder)
vb := flb.ValueBuilder().(*array.Float32Builder)
// Single FixedSizeList value [1..8]
flb.Append(true)
for _, v := range expected {
vb.Append(v)
}
rec := b.NewRecord()
defer rec.Release()
fmt.Println("In-memory record before write:")
fmt.Println(rec)
if err := pw.Write(rec); err != nil {
panic(err)
}
// Ensure Parquet footer and metadata are fully written
if err := pw.Close(); err != nil {
panic(err)
}
// --- Read back via pqarrow ---
rf, err := os.Open(out)
if err != nil {
panic(err)
}
defer rf.Close()
pr, err := file.NewParquetReader(rf)
if err != nil {
panic(err)
}
defer pr.Close()
fr, err := pqarrow.NewFileReader(pr, pqarrow.ArrowReadProperties{}, pool)
if err != nil {
panic(err)
}
tbl, err := fr.ReadTable(context.Background())
if err != nil {
panic(err)
}
defer tbl.Release()
fmt.Println("\nExpected values:", expected)
fmt.Println("Table read back:")
fmt.Println(tbl)
}
Example output on v14.0.2:
go run ./fixedsize_list_parquet_repro.go
Parquet file: /var/folders/95/j3gr9h157fq0djs38znqgkg80000gn/T/fixedsize_bug.parquet
In-memory record before write:
record:
schema:
fields: 1
- embedding: type=fixed_size_list<item: float32, nullable>[8]
rows: 1
col[0][embedding]: [[1 2 3 4 5 6 7 8]]
Expected values: [1 2 3 4 5 6 7 8]
Table read back:
schema:
fields: 1
- embedding: type=list<list: float32, nullable>
metadata: ["PARQUET:field_id": "-1"]
embedding: [[[(null) (null) (null) (null) (null) (null) (null) (null)]]]
Expected behavior
The embedding values should be read back as [1 2 3 4 5 6 7 8], matching the in-memory FixedSizeList[8] before the Parquet write.
Actual behavior
The embedding values are read back as a list of 8 NULL values when using pqarrow.FileReader.ReadTable, even though the in-memory record before writing is correct.
Likely root cause (code-level)
In parquet/pqarrow/path_builder.go (Arrow Go v14.0.2), the FIXED_SIZE_LIST case in pathBuilder.Visit does not update p.nullableInParent before visiting the child values, while the LIST case does.
addTerminalInfo increments p.info.maxDefLevel when p.nullableInParent is true. For LIST this flag is set, so present values get the higher def-level; for FIXED_SIZE_LIST it remains false, so present values are encoded/decoded with a lower def-level and are interpreted as nulls.
A minimal fix appears to be setting p.nullableInParent = true in the FIXED_SIZE_LIST branch before Visit(larr.ListValues()), mirroring the LIST handling.
Environment
- Arrow Go: v14.0.2
- Go: 1.21+ (repro’d with go1.24 toolchain)
- OS: macOS (ARM64)
- Reader used: pqarrow.FileReader.ReadTable
(behavior also visible when inspecting the
Parquet file with DuckDB)
Component(s)
Parquet
Describe the bug, including details regarding any error messages, version, and platform.
Describe the bug
When writing a
FixedSizeList<float32>array to Parquet viapqarrow.FileWriter, the values are written correctly in-memory but are read back asNULLwhen usingpqarrow.FileReader.ReadTable.The same write pattern using a standard
List<float32>(witharrow.ListOf) produces correct values.Reproduction
Example output on v14.0.2:
Expected behavior
The embedding values should be read back as
[1 2 3 4 5 6 7 8], matching the in-memory FixedSizeList[8] before the Parquet write.Actual behavior
The embedding values are read back as a list of 8 NULL values when using pqarrow.FileReader.ReadTable, even though the in-memory record before writing is correct.
Likely root cause (code-level)
In parquet/pqarrow/path_builder.go (Arrow Go v14.0.2), the FIXED_SIZE_LIST case in pathBuilder.Visit does not update p.nullableInParent before visiting the child values, while the LIST case does.
addTerminalInfo increments p.info.maxDefLevel when p.nullableInParent is true. For LIST this flag is set, so present values get the higher def-level; for FIXED_SIZE_LIST it remains false, so present values are encoded/decoded with a lower def-level and are interpreted as nulls.
A minimal fix appears to be setting p.nullableInParent = true in the FIXED_SIZE_LIST branch before Visit(larr.ListValues()), mirroring the LIST handling.
Environment
(behavior also visible when inspecting the
Parquet file with DuckDB)
Component(s)
Parquet