Skip to content

[REST] First-class Tag concept in Iceberg REST Catalog #16165

@flyingImer

Description

@flyingImer

Proposed Change

Context

New to the Iceberg community, actively contributing to Polaris OSS on tagging. This issue opens a discussion on standardizing a first-class Tag concept in the REST Catalog spec, distinct from the in-flight read-restrictions and labels proposals. A discussion thread will follow on the dev list.

Summary

Add a first-class Tag entity to the Iceberg REST Catalog spec. Tags have identity (name + namespace) and optional schema (allowed values, inheritability). Tag attachments carry a value and are applied to tables, columns (via field-id), views, and namespaces.

Scope

  • Tag entity CRUD at namespace level
  • Tag attachment management: attach, detach, reverse lookup, dedicated retrieval endpoint
  • Normative behavior contracts: privilege enforcement, visibility filtering, rename atomicity

Goals

  • Provide a portable classification primitive across catalogs so governance tooling, AI agents, FinOps dashboards, and discovery surfaces integrate once, not per catalog
  • Cover the cross-ecosystem convergent pattern: Snowflake tags, Unity Catalog governed tags, Google Cloud Dataplex tag templates, Apache Atlas classifications, Apache Gravitino tags, and DataHub tags all expose a first-class entity with identity, optional schema, and attachments to objects
  • Serve multiple use cases: governance classification, ownership, FinOps, AI and semantic hints, data discovery
  • Complement read-restrictions as the classification input side of the governance pipeline (tag is input, read-restrictions is output)
  • Coexist with labels: tags and labels solve different problems

Non-Goals

  • No changes to Iceberg table format files
  • No typed multi-field per-attachment value schema (Atlas and Dataplex advanced usage; addable non-breaking later)
  • No Governed-vs-Standard type system (Unity Catalog's pattern; expressible via configuration rather than a type split)
  • No tag-to-policy binding wire format (belongs in a separate Policy authoring phase)
  • No LoadTableResult changes (tag attachments retrieved via dedicated endpoint)

Prior discussions

Earlier threads touched on related concepts without landing a concrete spec:

  • "Column-Level Key-Value Properties (Tags) in Iceberg" (Walaa Eldin Moustafa, 2024-01; revived by Anand Kumar Sankaran, 2026-02). Community raised concerns on time travel and schema versioning, governance-specific framing narrowing adoption, writer burden on all engines, and access control leaking through schema write paths. Thread: https://lists.apache.org/thread/yflg8w1h87qgwc4s3qtog4l8nx8nk8m0
  • "Adding Tags field to Iceberg V4" (Micah Kornfield, 2025-12). Concerns on manifest size impact and vendor proprietary metadata degrading interop.

My read is this proposal addresses the prior pushback:

  • Catalog-level entity, not a table metadata field. This lines up with the direction Ryan noted on the 2026-02 revival: "security labels should be tracked solely by the catalog and not as part of table metadata." Schema evolution, time travel, and writer compatibility are independent concerns.
  • General classification primitive, serving ownership, FinOps, AI and semantic hints, data discovery, and governance. Governance is one use case, not the framing.
  • Concrete shape up front. TagDefinition and TagAttachment entities, specific endpoints, normative behavior contracts. Catalogs that do not need tags implement nothing.
  • Cross-ecosystem convergence lowers the adoption question.

Relationship to existing work

Next Steps

  • Full design document to follow: wire shape, behavior contracts, cross-catalog interoperability analysis
  • Seeking input from community reviewers, especially contributors outside Polaris
  • Looking for co-champions from catalog implementations interested in cross-catalog validation

Proposal document

To be added (design document in progress).

Specifications

  • Table
  • View
  • REST
  • Puffin
  • Encryption
  • Other

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions