Skip to content

[Go][Parquet] FixedSizeList values read back as NULL when written via pqarrow #584

@rmorgans

Description

@rmorgans

Describe the bug, including details regarding any error messages, version, and platform.

Describe the bug

When writing a FixedSizeList<float32> array to Parquet via pqarrow.FileWriter, the values are written correctly in-memory but are read back as NULL when using pqarrow.FileReader.ReadTable.

The same write pattern using a standard List<float32> (with arrow.ListOf) produces correct values.

Reproduction

// fixedsize_list_parquet_repro.go
//
// Minimal reproduction for FixedSizeList + Parquet issue in Arrow Go.
// Writes a FixedSizeList<float32>[8] with values [1..8] and reads it back
// via pqarrow. On v14.0.2 the values are read as nulls, while the in-memory
// record before writing is correct.

package main

import (
	"context"
	"fmt"
	"os"
	"path/filepath"

	"github.com/apache/arrow/go/v14/arrow"
	"github.com/apache/arrow/go/v14/arrow/array"
	"github.com/apache/arrow/go/v14/arrow/memory"
	"github.com/apache/arrow/go/v14/parquet"
	"github.com/apache/arrow/go/v14/parquet/file"
	"github.com/apache/arrow/go/v14/parquet/pqarrow"
)

func main() {
	const dim = 8
	expected := []float32{1, 2, 3, 4, 5, 6, 7, 8}

	out := filepath.Join(os.TempDir(), "fixedsize_bug.parquet")
	fmt.Println("Parquet file:", out)

	// Schema: FixedSizeList<float32>[8]
	schema := arrow.NewSchema(
		[]arrow.Field{
			{
				Name: "embedding",
				Type: arrow.FixedSizeListOf(int32(dim), arrow.PrimitiveTypes.Float32),
			},
		},
		nil,
	)

	pool := memory.NewGoAllocator()

	// --- Write ---

	f, err := os.Create(out)
	if err != nil {
		panic(err)
	}

	props := parquet.NewWriterProperties()
	awProps := pqarrow.NewArrowWriterProperties()

	pw, err := pqarrow.NewFileWriter(schema, f, props, awProps)
	if err != nil {
		panic(err)
	}

	b := array.NewRecordBuilder(pool, schema)
	defer b.Release()

	flb := b.Field(0).(*array.FixedSizeListBuilder)
	vb := flb.ValueBuilder().(*array.Float32Builder)

	// Single FixedSizeList value [1..8]
	flb.Append(true)
	for _, v := range expected {
		vb.Append(v)
	}

	rec := b.NewRecord()
	defer rec.Release()

	fmt.Println("In-memory record before write:")
	fmt.Println(rec)

	if err := pw.Write(rec); err != nil {
		panic(err)
	}

	// Ensure Parquet footer and metadata are fully written
	if err := pw.Close(); err != nil {
		panic(err)
	}

	// --- Read back via pqarrow ---

	rf, err := os.Open(out)
	if err != nil {
		panic(err)
	}
	defer rf.Close()

	pr, err := file.NewParquetReader(rf)
	if err != nil {
		panic(err)
	}
	defer pr.Close()

	fr, err := pqarrow.NewFileReader(pr, pqarrow.ArrowReadProperties{}, pool)
	if err != nil {
		panic(err)
	}

	tbl, err := fr.ReadTable(context.Background())
	if err != nil {
		panic(err)
	}
	defer tbl.Release()

	fmt.Println("\nExpected values:", expected)
	fmt.Println("Table read back:")
	fmt.Println(tbl)
}

Example output on v14.0.2:

go run ./fixedsize_list_parquet_repro.go
Parquet file: /var/folders/95/j3gr9h157fq0djs38znqgkg80000gn/T/fixedsize_bug.parquet
In-memory record before write:
record:
  schema:
  fields: 1
    - embedding: type=fixed_size_list<item: float32, nullable>[8]
  rows: 1
  col[0][embedding]: [[1 2 3 4 5 6 7 8]]


Expected values: [1 2 3 4 5 6 7 8]
Table read back:
schema:
  fields: 1
    - embedding: type=list<list: float32, nullable>
           metadata: ["PARQUET:field_id": "-1"]
embedding: [[[(null) (null) (null) (null) (null) (null) (null) (null)]]]

Expected behavior

The embedding values should be read back as [1 2 3 4 5 6 7 8], matching the in-memory FixedSizeList[8] before the Parquet write.

Actual behavior

The embedding values are read back as a list of 8 NULL values when using pqarrow.FileReader.ReadTable, even though the in-memory record before writing is correct.

Likely root cause (code-level)

In parquet/pqarrow/path_builder.go (Arrow Go v14.0.2), the FIXED_SIZE_LIST case in pathBuilder.Visit does not update p.nullableInParent before visiting the child values, while the LIST case does.

addTerminalInfo increments p.info.maxDefLevel when p.nullableInParent is true. For LIST this flag is set, so present values get the higher def-level; for FIXED_SIZE_LIST it remains false, so present values are encoded/decoded with a lower def-level and are interpreted as nulls.

A minimal fix appears to be setting p.nullableInParent = true in the FIXED_SIZE_LIST branch before Visit(larr.ListValues()), mirroring the LIST handling.

Environment

  • Arrow Go: v14.0.2
  • Go: 1.21+ (repro’d with go1.24 toolchain)
  • OS: macOS (ARM64)
  • Reader used: pqarrow.FileReader.ReadTable
    (behavior also visible when inspecting the
    Parquet file with DuckDB)

Component(s)

Parquet

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions