Skip to content

feat: vgplot Python API#1007

Open
karthik-anand wants to merge 220 commits into
uwdata:mainfrom
cmudig:pythonAPI_improvements
Open

feat: vgplot Python API#1007
karthik-anand wants to merge 220 commits into
uwdata:mainfrom
cmudig:pythonAPI_improvements

Conversation

@karthik-anand
Copy link
Copy Markdown

@karthik-anand karthik-anand commented Apr 5, 2026

This PR introduces a full Python API for vgplot, allowing users to build Mosaic visualizations programmatically in Python rather than writing raw JSON/YAML specs by hand. It adds a new packages/vgplot-python package and integrates Python spec output across the docs and examples.

Key additions

  • vgplot Python package, a fluent builder API mirroring the vgplot JavaScript API, with Python-idiomatic snake_case naming (e.g. line_y, x_domain instead of lineY, xDomain)
  • spec_classes module, which provides auto-generated Python classes derived from the Mosaic JSON schema, including typed wrappers for all marks, interactors, layout helpers, and plot options
  • api/ module, offering higher-level helpers for constructing plots (plot.py), managing data sources (data.py), and working with params/selections (params.py)
  • AST-to-Python code generator (ast-to-python.js), a JavaScript-side utility that translates parsed Mosaic spec ASTs into equivalent Python code, used to regenerate the specs/python/ examples
  • Python specs for all examples, where every example in docs/public/specs/python/ and specs/python/ has been regenerated using the updated codegen
  • Round-trip parity tests (test_full_round_trip.py), which validate that Python-built specs serialize to the same JSON as their source counterparts

How It's Implemented

The package is structured around a two-layer design:

  1. Schema-derived spec classes are generated at build time from the Mosaic JSON schema via generate_spec_classes.py. Each class inherits from SchemaBase and implements to_dict() for JSON serialization.

  2. High-level API helpers in the api/ module provide more convenient interfaces: plot() composes marks and directives into a plot dict, Param and Selection classes handle reactive state, and data() helpers reference named datasets. Everything is assembled into a Spec via vg.spec().

Naming follows a consistent convention: camelCase option names from the JS API become snake_case in Python. The codegen layer handles this mapping automatically when regenerating examples.

How to Use

Install the package:

pip install vgplot

Build a basic scatter plot and display it in Jupyter:

import vgplot as vg
from mosaic_widget import MosaicWidget

_data = vg.data(athletes=vg.parquet("data/athletes.parquet"))

_view = vg.plot(
    vg.dot(data=vg.from_("athletes"), x="weight", y="height", fill="steelblue", opacity=0.5),
    vg.width(600),
    vg.height(400),
)

spec = vg.spec(data=_data, view=_view)
MosaicWidget(spec.to_dict())

Add an interactive selection:

brush = vg.Selection.intersect()

_view = vg.plot(
    vg.dot(data=vg.from_("athletes"), x="weight", y="height", fill="sex"),
    {"select": "intervalXY", "as": brush},
    vg.width(600),
)

spec = vg.spec(data=_data, params={"brush": brush}, view=_view)
MosaicWidget(spec.to_dict())

Option names match the vgplot API reference, but in snake_case. For example, xDomainx_domain, colorSchemecolor_scheme.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants