Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# AGENTS.md

## Purpose and scope
- This repository is a multi-module Java + C++ codebase for the Pixels columnar engine, deployment daemons, and serverless query acceleration.
- `PIXELS_HOME` is a hard requirement for most workflows (`install.sh`, IntelliJ run configs, runtime scripts).
- Existing AI guidance files were not found; conventions here are inferred from project READMEs and build scripts.

## Big-picture architecture (read these first)
- Core format/runtime: `pixels-core`, shared APIs/types: `pixels-common`, cache service: `pixels-cache`.
- Control plane services run as stateless daemons (`pixels-daemon`): metadata, transaction, cache coordination, index endpoints.
- External API gateway: `pixels-server` (REST on `18891`, RPC on `18892`).
- Query pipeline: SQL parse (`pixels-parser`) -> physical planning (`pixels-planner`) -> operator execution (`pixels-executor`).
- Turbo/serverless path: `pixels-turbo/*` modules (`pixels-invoker-*`, `pixels-worker-*`, `pixels-scaling-*`) integrate Trino with Lambda/vHive/EC2 autoscaling.
- Retina CDC path: `pixels-retina` + `cpp/pixels-retina` replay log-based changes with MVCC; index backends are pluggable under `pixels-index/*`.

## Cross-component contracts and data flow
- RPC/data contracts are centralized in `proto/*.proto`; row batch schema is in `flatbuffers/rowBatch.fbs`.
- Storage backends are split by module (`pixels-storage/pixels-storage-{s3,hdfs,gcs,redis,http,localfs,...}`) and selected by table/storage settings.
- Typical operational flow: Trino connector reads Pixels metadata/files -> optional Turbo pushdown builds sub-plans -> workers execute and write intermediate/output storage.
- Example runtime switch for Turbo lives in Trino catalog config: `cloud.function.switch=off|on|auto` (`pixels-turbo/README.md`).

## Developer workflows (project-specific)
- Full build from repo root (default used by project):
- `mvn -T 3 clean install`
- Install local runnable layout (`bin/`, `sbin/`, `etc/`) into `PIXELS_HOME`:
- `./install.sh`
- Start core services after install:
- `./sbin/start-pixels.sh` (from `PIXELS_HOME`)
- Run CLI tooling (load/compact/stat/eval):
- `java -jar ./sbin/pixels-cli-*-full.jar`
- C++/DuckDB path is separate (`cpp/README.md`):
- `cd cpp && make pull && make -j`

## Test/build caveats to avoid wasted cycles
- Root `pom.xml` sets `maven-surefire-plugin` with `<skipTests>true</skipTests>`; tests do not run unless explicitly enabled.
- Some JUnit tests need lower JDK internals access (README notes JDK 8 for those tests), while integrations like Trino may require newer JDKs.
- Prefer module-scoped iterations when changing one area (for example `mvn -pl pixels-core -am ...`) to avoid rebuilding all modules.

## Conventions agents should follow in this repo
- Keep runtime/config edits aligned with `PIXELS_HOME/etc/pixels.properties` and scripts under `scripts/{bin,sbin,etc}`.
- When documenting/evaluating performance flows, mirror project examples in `docs/TPC-H.md` and `docs/CLICKBENCH.md` (for example `LOAD`, `COMPACT`, `STAT`).
- For storage- or index-related changes, update the matching pluggable module instead of hard-coding backend-specific logic in shared modules.
- For serverless changes, ensure planner/invoker/worker settings remain consistent (input/intermediate/output storage schemes in `pixels-turbo/README.md`).

13 changes: 6 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,6 @@ Pixels
=======
[![Pixels Daily Build](https://github.com/pixelsdb/pixels/actions/workflows/daily-build.yml/badge.svg)](https://github.com/pixelsdb/pixels/releases/tag/daily-latest)
![GitHub commits](https://img.shields.io/github/commit-activity/m/pixelsdb/pixels/master)
![GitHub Issues](https://img.shields.io/github/issues-closed/pixelsdb/pixels)
![GitHub Pull Requests](https://img.shields.io/github/issues-pr-closed/pixelsdb/pixels)
[![Visitors](https://api.visitorbadge.io/api/combined?path=https%3A%2F%2Fgithub.com%2Fpixelsdb%2Fpixels&label=visitors&countColor=%23ff8a65&style=flat)](https://visitorbadge.io/status?path=https%3A%2F%2Fgithub.com%2Fpixelsdb%2Fpixels)
![GitHub Created At](https://img.shields.io/github/created-at/pixelsdb/pixels)
![GitHub code size](https://img.shields.io/github/languages/code-size/pixelsdb/pixels)
![GitHub repo size](https://img.shields.io/github/repo-size/pixelsdb/pixels)
[![GitHub License](https://img.shields.io/github/license/pixelsdb/pixels)](https://github.com/pixelsdb/pixels/blob/master/LICENSE)
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/pixelsdb/pixels)

Expand All @@ -27,7 +21,7 @@ The other integrations are opensourced in separate repositories:

Pixels also has its own query engine [Pixels-Turbo](pixels-turbo).
It prioritizes processing queries in an autoscaling MPP cluster (currently based on Trino) and exploits serverless functions
(e.g, [AWS Lambda](https://aws.amazon.com/lambda/) or [vHive / Knative](https://github.com/vhive-serverless/vHive))
(e.g, [AWS Lambda](https://aws.amazon.com/lambda/), [vHive / Knative](https://github.com/vhive-serverless/vHive), and [Spike](https://github.com/pixelsdb/pixels-spike))
to accelerate the processing of workload spikes. With `Pixels-Turbo`, we can achieve better performance and cost-efficiency
for continuous workloads while not compromising elasticity for workload spikes.

Expand All @@ -37,6 +31,11 @@ service levels in query urgency. It allows users to select whether to execute th
Pixels-Turbo can apply different resource scheduling and query execution policies for Different levels of query urgency, which
will result in different monetary costs on resources.

Furthermore, Pixels has a real-time data synchronization framework namely [Pixels-Retina](pixels-retina).
It replays data-change operations from log-based CDC sources as mirror transactions on the columnar table data,
using a lightweight MVCC mechanism to support concurrent analytical queries with 10-ms-level data freshness, significantly
outperforming the batch-granular merge-on-read approach used by existing lakehouses such as Apache Iceberg and Paimon.

## Build Pixels

Pixels is mainly implemented in both Java (with some JNI hooks of system calls and C/C++ libs) and C++.
Expand Down
2 changes: 1 addition & 1 deletion cpp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -202,7 +202,7 @@ localfs.enable.async.io=true

## Common issues

### 1. How to fetch the lastest pixels reader and duckdb?
### 1. How to fetch the latest pixels reader and duckdb?

`pixels reader` and `duckdb` will be updated frequently in the next few months, so please keep the two submodules updated.

Expand Down
File renamed without changes.
1 change: 1 addition & 0 deletions docs/INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Here, we only install and configure the essential components for query processin
To use the following optional components, follow the instructions in the corresponding README.md after the basic installation:
* [Pixels Cache](../pixels-cache/README.md): The distributed columnar cache to accelerate query processing.
* [Pixels Turbo](../pixels-turbo/README.md): The hybrid query engine that invokes serverless resources to help process unpredictable workload spikes.
* [Pixels Retina](../pixels-retina/README.md): The transactional data synchronization framework that replays the data changes from the CDC (Change-Data-Capture) stream.
* [Pixels Amphi](../pixels-amphi/README.md): The adaptive query scheduler that enables cost-efficient query processing in both on-perm and in-cloud environments.

In AWS EC2, create an Ubuntu 22.04 instance with x86 arch and at least 4GB memory and 20GB root volume.
Expand Down
2 changes: 0 additions & 2 deletions pixels-cli/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@ This is the command-line tool for benchmark evaluations (e.g., TPC-H).
We can use it to load data from csv files into Pixels, copy the data, compact the small files,
collect data statistics, and run the benchmark queries.

It was previously named `pixels-load`.

## Usage

[TPC-H Evaluation](../docs/TPC-H.md) provides an example of using `pixels-cli`.
4 changes: 4 additions & 0 deletions pixels-daemon/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,7 @@ While the `Workers` are deployed together with the query engine workers and stor

Either `Coordinator` or `Worker` process is started as a **stateless** daemon process in the server.
Whenever the daemon process is crashed or killed, it only needs a restart to recover.

## Usage

[TPC-H Evaluation](../docs/TPC-H.md) provides an example of using `pixels-daemon`.
2 changes: 1 addition & 1 deletion pixels-retina/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ replay throughput, without compromising query performance or resource cost-effic
significantly outperforming state-of-the-art lakehouses, Iceberg and Paimon, which provides minute-level data freshness
and one order of magnitude lower data-change throughput.

## Retina Components
## Components

The components related to Retina are:

Expand Down
1 change: 1 addition & 0 deletions pixels-storage/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ Queries will load and call the providers to get access to the underlying storage
- `pixels-storage-s3qs` provides a storage based on AWS SQS and S3 for intermediate data shuffle.

## Usage

Storage provider can be used in either of the following ways:
1. Put the compiled jar and its dependencies in the CLASSPATH of you program.
2. If your program is build by maven, you can also add the storage provider as dependency.
Expand Down
1 change: 1 addition & 0 deletions pixels-turbo/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ and automatically invokes cloud functions to process unpredictable workload spik
enough resources.

## Components

Currently, `Pixels-Turbo` uses Trino as the query entry-point and the query processor in the MPP cluster,
and implements the serverless query accelerator from scratch.
`Pixels-Turbo` is composed of the following components:
Expand Down
Loading