Skip to content

Writing sparse arrays with variable length attributes bug #494

Description

@lunaroverlord

Consider this:

array_name = "test"
ctx = tiledb.Ctx()
dom = tiledb.Domain(
    tiledb.Dim(name="id", domain=(0, 10), dtype=np.int64),
    ctx=ctx
)
attr = tiledb.Attr(name="val", var=True, dtype=np.int64, ctx=ctx)
schema = tiledb.ArraySchema(domain=dom, sparse=True, attrs=[attr], ctx=ctx)
tiledb.SparseArray.create(array_name, schema)

vals = np.array([
    np.array([1, 2, 9], dtype=np.int64), 
    np.array([3, 4, 5], dtype=np.int64)
], dtype='O')

with tiledb.open(array_name, "w") as array:
    array[[1, 2]] = dict(val=vals)

>>> ValueError: value length (6) does not match coordinate length (2)

Only happens when the attribute dimensions in vals form a block shape. There's no issue with either of the following:

vals = np.array([
    np.array([1, 2], dtype=np.int64), 
    np.array([3, 4, 5], dtype=np.int64)
], dtype='O')


vals = np.array([
    np.array([1, 2, 9, 3], dtype=np.int64), 
    np.array([3, 4, 5], dtype=np.int64)
], dtype='O')

I think it's because numpy coalesces object types containing homogeneous subarrays.

vals_hetero = np.array([
    np.array([1, 2], dtype=np.int64),
    np.array([3, 4, 5], dtype=np.int64)
], dtype='O')

vals_homo = np.array([
    np.array([1, 2, 9], dtype=np.int64),
    np.array([3, 4, 5], dtype=np.int64)
], dtype='O')

print(vals_hetero)
>>> [array([1, 2]) array([3, 4, 5])]

print(vals_homo)
>>> [[1 2 9]
     [3 4 5]]

print(vals_hetero.size, vals_homo.size) 
>>> 2 6

The exception is raised because TileDB relies on attr_val.size checks in libtiledb.pyx#L5241.

Is there a workaround or an alternative way of constructing the object?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions