Skip to content

Epic: Vortex Type System #7706

@connortsui20

Description

@connortsui20

This Epic is for tracking the formalization and soundness of the Vortex type system.

There are some corners with our current implementation of the Vortex type system that we would ideally like to resolve. This issue will be a way to link to all of these threads.

Status

Proposed.

It is not clear if we actually need to make changes. But if we do decide to move forward with anything, it will likely cause large refactors and breakage.

Goal / Motivation / Issues

We want the Vortex type system to be both sound and well-implemented. People building systems with Vortex in the future should never run into some of the issues we ourselves have run into. They should be able to fearlessly use our built-in types and add their own extension types without worrying about performance.

Unresolved questions

We have run into several issues (mostly performance-related) in how we have implemented Vortex types. Here are a few of them:

  • Even though List and ListView are logically equivalent, they have vastly different performance characteristics and semantics. Choosing to change the canonical list type from List to ListView (see Tracking Issue: Canonicalize ListView over List #4699) caused many performance issues that we still have not fully figured out
  • There is a lot of duplication between Primitive and Decimal arrays, as the physical encoding is the essentially identical, but we split it into 2 different arrays.
  • Both Binary and Utf8 use VarBinView as their canonical encoding, even though they have different semantics and compression characteristics. This is the opposite problem to Primitive vs Decimal canonical arrays.
  • Should FixedSizeBinary be a new logical type? Or should we just have a Flat encoding that allows for any aligned fixed-size binary values that would encode Primitive, Decimal, and FixedSizeBinary? Note that this is also similar to FixedSizeList<u8>[size].
  • Decimal type logical vs physical mismatch issues: DecimalArray Logical and Physical type mismatch #5820
  • We should formalize the Vortex type system: Type System RFC rfcs#55

TODO ^ these things should be new tracking issues or discussions.

The issues above seem to imply that some adjustments are needed to ensure that we don't keep running into problems like these as we add more type logic in Vortex (with extension types, with new canonical types like union, etc). What those adjustments are is unclear.

Metadata

Metadata

Assignees

No one assigned

    Labels

    epicPublic roadmap umbrella for a major initiative, with work tracked in sub-issues.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions