Skip to content

[Bug] Updating obs dataframe via update_obs after deleting fails. #4439

@scriptbotprime

Description

@scriptbotprime

Describe the bug
I have deleted from an Experiment using the tiledbsoma.Experiment.obs_axis_delete method. Now I can no longer update the obs dataframe using tiledbsoma.io.ingest.update_obs, which worked before. The problem seems to be that deleting data creates "holes"which update_obs can't handle.

To Reproduce

  • Create an experiment from an AnnData H5AD file.

  • Delete something:

    Experiment.open(uri, mode='d')
       exp.obs_axis_delete(value_filter = obs_criteria)
    
  • Read obs dataframe

    obs_df = exp.read().concat().to_pandas()
    
  • Change something in obs_df.

  • Write obs_df to exp.obs

    with Experiment.open(uri, "r") as exp:
        tiledbsoma.io.ingest.update_obs(exp=exp, new_data=obs_df)
    

Versions (please complete the following information):
tiledbsoma.version: 2.3.0
TileDB core version (libtiledbsoma): 2.30.0
python version: 3.14.3.final.0
OS version: Linux 5.15.0-171-generic

Additional context
Converting the pandas.DafaFrame to an ArrowTable, altering the Schema - if needed - and writing via tiledbsoma.DataFrame.write seems to work:

from tiledbsoma import Experiment
from tiledbsoma.io._util import get_arrow_str_format
from tiledbsoma.io import conversions
import pyarrow as pa

def write_obs_df(self):
    arrow_table = conversions.df_to_arrow_table(self.obs)
    arrow_schema = arrow_table.schema.remove_metadata()

    with Experiment.open(self.db_uri, "r") as exp:
        old_cols = set(exp.obs.schema.names)
        new_cols = set(arrow_table.schema.names)
        drop_cols = list(old_cols - new_cols)
        add_keys = new_cols - old_cols
        if drop_cols or add_keys:
            add_attrs = {}
            add_enmrs = {}
            for add_key in add_keys:
                atype = arrow_schema.field(add_key).type
                if pa.types.is_dictionary(arrow_table.schema.field(add_key).type):
                    add_attrs[add_key] = get_arrow_str_format(atype.index_type)
                    enmr_format = get_arrow_str_format(atype.value_type)
                    add_enmrs[add_key] = (enmr_format, atype.ordered)
                else:
                    add_attrs[add_key] = get_arrow_str_format(atype)
            exp.obs._handle._update_dataframe_schema(drop_cols, add_attrs, add_enmrs)

    with Experiment.open(self.db_uri, "w") as exp:
        exp.obs.write(arrow_table)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions