Skip to content

DataFrame doesn't decode boolean arrays correctly from Arrow #7115

@johnnyg

Description

@johnnyg

System Information (please complete the following information):

  • OS & Version: Windows 10
  • ML.NET Version: 0.21.1
  • .NET Version: .NET 5.0

Describe the bug
Creating a dataframe from an arrow record batch where a column is a boolean array produces incorrect results (and occasionally even throws exceptions).

To Reproduce
Run:

public void DataFrameDecodeBooleans()
{
    BooleanArray boolArray = new BooleanArray.Builder().AppendNull().Append(false).Append(true).Build();
    Field boolField = new Field("col", BooleanType.Default, true);
    Schema schema = new Schema(new[] { boolField }, null);
    RecordBatch batch = new RecordBatch(schema, new IArrowArray[] { boolArray }, boolArray.Length);
    DataFrame df = DataFrame.FromArrowRecordBatch(batch);
    DataFrameColumn boolCol = df["col"];
    Assert.Equal(boolArray.Length, boolCol.Length);
    for (int row = 0; row < boolArray.Length; row++)
    {
        Assert.Equal(boolArray.GetValue(row), boolCol[row]);
    }
}

Expected behavior
Above test passes

Metadata

Metadata

Assignees

No one assigned

    Labels

    untriagedNew issue has not been triaged

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions