Skip to content

AOCS Sampling Fails with Large Objects (BLOB) #349

@Sanikadze

Description

@Sanikadze

Bug Report: AOCS Sampling Fails with Large Objects (BLOB)

Summary

Greenplum 6.29.0, ANALYZE and auto_stats fail on AOCS tables containing large objects (TEXT/JSONB columns) with error:

ERROR: Advance not called on large datum stream object (datumstream.c:276)

Root Cause

Problem location: src/backend/access/aocs/aocsam.c, function aocs_gettuple_column()

if (chkvisimap && !isSnapshotAny && !AppendOnlyVisimap_IsVisible(&scan->visibilityMap, &aotid))
{
    ret = false;
    goto out;  // ← Returns WITHOUT calling datumstreamread_advance()
}

datumstreamread_find(ds, rownum - ds->blockFirstRowNum);  //  Never reached

When a BLOB block is read, largeObjectState is set to HaveAoContent. If aocs_gettuple_column() returns early (visibility check or other reasons), datumstreamread_advance() is never called, leaving largeObjectState = HaveAoContent.

On the next sample row iteration, datumstreamread_nth() is called (line 1015 in elog DEBUG2), which throws error when largeObjectState == HaveAoContent.

Reproduction

  1. Create AOCS table with TEXT/JSONB column containing large values (>block size)
  2. Enable auto_stats: SET gp_autostats_mode = 'on_change';
  3. INSERT/COPY data into the table
  4. Error occurs during auto_stats or manual ANALYZE

Workaround

Disable auto_stats:

SET gp_autostats_mode = 'none';

Then run ANALYZE manually using legacy method or skip ANALYZE on affected tables.

Suggested Fix

In aocs_gettuple_column(), call datumstreamread_advance() even for invisible rows to properly transition largeObjectState:

if (chkvisimap && !isSnapshotAny && !AppendOnlyVisimap_IsVisible(&scan->visibilityMap, &aotid))
{
    // Advance position for large objects to reset state
    if (ds->largeObjectState == DatumStreamLargeObjectState_HaveAoContent)
        datumstreamread_advance(ds);

    ret = false;
    goto out;
}

Affected Tables

Tables with:

  • appendonly=true, orientation=column
  • TEXT, JSONB, or other varlena columns with large values (BLOBs)

Stack Trace

acquire_sample_rows -> analyze_rel -> vacuum -> auto_stats
datumstreamread_nthlarge (datumstream.c:276)

Related Files

  • src/backend/access/aocs/aocsam.c - aocs_gettuple_column(), aocs_gettuple()
  • src/backend/utils/datumstream/datumstream.c - datumstreamread_nthlarge() (line 276)
  • src/include/utils/datumstream.h - DatumStreamLargeObjectState enum

Status in other branches

Bug is NOT fixed in any branch (checked 2025-01-21):

  • origin/master - NOT FIXED (same goto out pattern)
  • origin/OPENGPDB_STABLE - NOT FIXED
  • origin/OPENGPDB_6_29_STABLE - NOT FIXED

The problematic goto out without calling datumstreamread_advance() exists in all branches.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions