OpenVCL

OpenVCL is a free VCL preprocessor for PlayStation 2 VU programs. It reads VCL-style source, performs register allocation and scheduling, and emits standard VSM/DSM-style output that can be assembled by the PS2 toolchain.

The project was originally written by Jesper Svennevid and Daniel Collin. This repository is currently being modernized around ps2gl compatibility, correct VU scheduling, and measurable VSM cost analysis.

Francisco Javier Trujillo Mata is the current main contributor and maintainer, recovering and extending the project after a long period without active development.

License

OpenVCL is licensed under AFL v2.0. See LICENSE.

Background

OpenVCL has been built from public VCL documentation and VCL source examples. No proprietary binary has been reverse engineered.

VU Command Line is a trademark of Sony Computer Entertainment. VCL is the abbreviated name for VU Command Line.

Build

make openvcl

Install into $PS2DEV/bin when PS2DEV is set:

make install

Or install into another prefix:

PREFIX=/usr/local make install

BSD users may need to use gmake.

Basic Usage

Compile VCL to VSM:

./openvcl input.vcl -o output.vsm

Read from stdin and write to stdout:

./openvcl < input.vcl > output.vsm

Run MASP as the gasp replacement:

./openvcl -g --gasp masp input.vcl -o output.vsm

Show command-line help:

./openvcl -h

Useful options:

option	purpose
`-c`	emit nearly original source as comments
`-C`	disable code reduction
`-d`	emit dumb/unscheduled-style code
`-e`	disable generated `[E]` bits
`-f`	disable generated `.align` directives
`-g`	run `gasp` or `--gasp` before VCL processing
`-G`	run the C preprocessor before VCL processing
`-I<path>`	include path for gasp/MASP
`-K`	keep preprocessor temporary files
`-L`	globally disable loop code generation
`-m`	generate `.mpg` and DMA tags automatically
`-n`	enable new syntax
`-o <file>`	output filename
`-t <n>`	optimizer timeout
`-u <text>`	unique label-generation string
`--gasp <name>`	run a specific gasp-compatible preprocessor
`--cpp <name>`	run a specific C preprocessor
`--bthres <n>`	dynamic branch visit threshold
`--show-reg-alloc`	print register allocation information
`--cost`	analyze scheduled `.vsm` cost
`--cost-json`	analyze scheduled `.vsm` cost as JSON
`--cost-loop <label>=<n>`	weight a block by expected iterations
`--cost-loop-preset ps2gl`	apply known ps2gl hot-loop weights
`--cost-compare <baseline>`	compare scheduled `.vsm` cost against a baseline
`--cost-compare-json <baseline>`	compare scheduled `.vsm` cost as JSON
`--cost-compare-markdown <baseline>`	compare scheduled `.vsm` cost as a Markdown table
`--cost-compare-list-markdown`	read baseline/candidate VSM pairs and emit one Markdown table
`--cost-compare-list-check <metric>`	fail if any listed candidate is slower than its baseline
`--dump-instruction-info`	print the VU instruction metadata table
`--dump-instruction-info-json`	print the VU instruction metadata table as JSON
`--dump-schedule-info`	print generic ready-scheduler issue slots
`--dump-schedule-info-json`	print generic ready-scheduler issue slots as JSON
`--enable-generic-software-pipelining`	enable safe generic software-pipeline rewrites, currently the default
`--disable-generic-software-pipelining`	disable generic software-pipeline rewrites for comparison/debugging
`--strict-schedule-slots`	emit from the typed scheduler slot model without legacy lookahead pairing

-M, -P, and -Z are accepted for VCL command-line compatibility.

VSM Cost Analysis

OpenVCL can also analyze already scheduled .vsm files. This works for both OpenVCL-generated VSM and SCE/reference VSM files.

Human-readable report:

./openvcl --cost shader.vsm

JSON report:

./openvcl --cost-json shader.vsm

The JSON report includes label_order and cost_by_label so tools can read a shader by its VSM labels instead of reverse-engineering the raw block list:

{
  "label_order": ["init_lid", "xform_loop_lid", "done_lid"],
  "cost_by_label": {
    "xform_loop_lid": {
      "affine_role": "loop",
      "static_cycles": 25,
      "estimated_cycles": 25,
      "weighted_estimated_cycles": 2500
    }
  }
}

Weight hot blocks by expected loop iterations:

./openvcl --cost --cost-loop xform_loop_lid=100 shader.vsm

Apply the ps2gl 100-vertex hot-loop preset:

./openvcl --cost --cost-loop-preset ps2gl shader.vsm

When loop labels are configured, reports also include an affine cost expression:

affine_estimated_cycles: 120 + 25n

The base term is the one-time cost for setup plus teardown. The n term is the cost of the selected loop block(s) for each vertex. Static affine cost uses the scheduled instruction cycles as emitted; estimated affine cost also adds the modeled FDIV/EFU issue stalls and explicit waitq/waitp stalls.

The preset recognizes both OpenVCL labels such as xform_loop_lid and SCE optimized main-loop labels such as EXPL_..._xform_loop_lid__MAIN_LOOP. It also maps SCE fast-family adcLoop_done_lid__MAIN_LOOP labels onto xform_loop_lid, so it can be used for side-by-side reference comparisons.

Compare a candidate shader against a reference shader:

./openvcl --cost-compare sce_reference.vsm openvcl_candidate.vsm

Emit comparison output for scripts or Markdown reports:

./openvcl --cost-compare-json sce_reference.vsm openvcl_candidate.vsm
./openvcl --cost-compare-markdown sce_reference.vsm openvcl_candidate.vsm

The comparison JSON also includes label_comparisons, keyed by canonical label matching where possible, so SCE optimized loop labels such as EXPL_...__MAIN_LOOP can be compared with their OpenVCL source labels.

Emit one Markdown table for a set of VSM pairs:

./openvcl --cost-compare-list-markdown --cost-loop-preset ps2gl pairs.txt

pairs.txt contains whitespace-separated baseline.vsm candidate.vsm rows. Blank lines and # comments are ignored.

Fail when any listed candidate is slower than its own baseline:

./openvcl --cost-compare-list-check weighted-estimated --cost-loop-preset ps2gl pairs.txt

Supported check metrics are static, estimated, weighted-static, weighted-estimated, affine-static-base, affine-static-loop, affine-estimated-base, and affine-estimated-loop. This check is per row: an OpenVCL shader only passes when that specific shader is equal to or faster than its matching SCE/reference VSM.

The report includes:

static scheduled cycles;
estimated cycles including modeled FDIV/EFU producer issue stalls and explicit waitq/waitp stalls;
loop-weighted totals when --cost-loop or --cost-loop-preset is used;
affine base + loop*n static and estimated costs for selected loop labels;
upper/lower slot usage, paired cycles, NOP slots, and nop-only cycles;
per-label block costs;
weighted hot-block, idle-slot, estimated-cost, and wait-stall rankings;
unknown-instruction and slot-mismatch checks.

Instruction Metadata

OpenVCL exposes its shared VU instruction table for scheduling and tooling work. The text form is useful while inspecting opcodes:

./openvcl --dump-instruction-info

The JSON form is intended for scripts and regression tests:

./openvcl --dump-instruction-info-json

Each row includes the mnemonic, pipe, execution unit, throughput, latency, parser operand pattern, readable parameter summary, short description, implicit resources, memory flags, branch-delay slots, and special bypass notes. This is the canonical table to inspect before adding new parser, cost-analysis, or scheduler rules.

Generated-code helpers should use VuInstructionOpcode/vuInstructionMnemonic instead of spelling raw mnemonics directly in CodeGenerator.cpp. Hand-written software-pipeline paths can then share the same names as the parser, cost analyzer, and metadata dumps.

Scheduler Status

OpenVCL now performs conservative VU scheduling rather than only emitting VCL -d-style output. Current work includes:

bounded upper/lower pairing lookahead;
latency-gap filling with ready independent instructions;
deferred waitq/waitp emission;
Q/P, I, MAC, CLIP, ACC, VF/VI, and per-field VF dependency checks;
safe movement around selected plain loads/stores;
branch padding reuse when an existing pure nop/nop cycle is available;
adjacent upper/direct-branch pairing while preserving branch delay slots;
deterministic alias allocation for reproducible VSM output;
--LoopCS-marked loop temporaries get conservative VF lifetime expansion when register pressure allows, giving the scheduler room to overlap loads;
static cost reporting used to compare OpenVCL output with SCE/reference VSM.

The current refactor is consolidating instruction facts into one canonical VU instruction metadata table. src/VuInstructionInfo.* now feeds parser operand construction and cost-analysis opcode classification. The scheduler should migrate onto the same table so resource and barrier rules are not duplicated across the codebase.

Current ps2gl pure-OpenVCL aggregate cost baseline:

metric	SCE/reference	OpenVCL	delta
static scheduled cycles	6308	5268	-1040
estimated cycles	6820	5838	-982
ps2gl-loop weighted static cycles	100358	334938	+234580
ps2gl-loop weighted estimated cycles	100870	373227	+272357

estimated cycles includes modeled FDIV/EFU producer issue stalls and explicit waitq/waitp stalls. These are static VSM estimates, not measured runtime per draw call. The loop-weighted rows apply --cost-loop-preset ps2gl to the 13 matched ps2gl renderer pairs; they better expose the remaining hot-loop gap caused by SCE/reference prolog/main/epilog software-pipelined loops versus OpenVCL's current generic scheduling and limited generic software-pipeline coverage.

This baseline uses corrected ACC dependencies for multiply-add/subtract instructions plus safe generic software-pipeline rewrites where loop analysis can prove the cloned prolog/main/drain structure. Conservative branch-delay filling handles independent integer instructions immediately before direct branches. Standalone branches no longer emit an extra pre-branch bubble once normal read-hazard padding is satisfied. Eligible pre-increment stores can also move into branch delay slots by adjusting their offsets against the incremented base register. Dead VI-only fallthrough integer instructions can fill forward conditional branch delay slots when the taken path overwrites the same VI value before reading it. Independent plain stores immediately before loop-counter increments can fill the following branch delay slot when the store does not depend on the updated counter value. Load results can feed ftoi* conversions with the same narrow bypass behavior used by the SCE-generated ps2gl ADC setup, while normal load-to-VF consumers still keep the conservative load-use padding. The ACC rule is intentionally more conservative than older reports that let the scheduler move madd*/msub* instructions across ACC producer chains.

The performance target is per shader, not aggregate. A scheduler change is only complete when every matched ps2gl OpenVCL VSM is equal to or faster than its matching SCE/reference VSM for the selected static and estimated metrics.

Roadmap

The generic scheduler is now a descriptor-backed ready-set scheduler over typed basic blocks. The remaining work is to keep moving code emission and loop optimization decisions onto that explicit schedule model:

keep VuInstructionInfo as the canonical instruction table for parser, cost analysis, resource descriptors, and scheduler tooling;
move remaining default emission-time cycle state into the typed scheduler plan already used by --strict-schedule-slots;
improve exact memory aliasing beyond base register and constant offset;
extend typed branch-delay-slot metadata before attempting broader branch NOP removal or delay-slot filling in the default path;
optimize hot ps2gl loops for estimated block cost and dual-pipe occupancy, guided by --cost-compare and --cost-loop-preset ps2gl;
retire the remaining bounded textual lookahead once the typed scheduler and software-pipeline planner match the legacy output on correctness and cost.

Scheduling changes should preserve Q/P, I, MAC, CLIP, ACC, VF/VI, broadcast field, branch-delay, and memory-ordering correctness. Each new scheduling rule should have a focused unit or integration test, full OpenVCL test coverage, regenerated ps2gl pure-OpenVCL VSMs, and PCSX2 smoke coverage when generated output can affect visible examples.

Expression Solver

OpenVCL extends loi expressions with math functions. These extensions make source incompatible with standard VCL when used, but are useful for standalone OpenVCL projects.

function	result
`abs(x)`	absolute value
`exp(x)`	exponential
`sin(x)`, `cos(x)`, `tan(x)`	trigonometry from radians
`sinh(x)`, `cosh(x)`, `tanh(x)`	hyperbolic trigonometry
`asin(x)`, `acos(x)`, `atan(x)`, `atan2(x, y)`	inverse trigonometry
`pow(x, y)`	`x` raised to `y`
`log(x)`, `log10(x)`	logarithms
`sqrt(x)`	square root
`pi()`	pi

Example:

loi sin(45 * (pi()/1.8e2))

Function names are case-sensitive.

Differences From SCE VCL

If an alias contains a register-field declaration and the argument slot does not support fields, OpenVCL rejects it.
I may not be used as an alias for integer registers.
Float expressions are evaluated for loi; GAS does not handle float-expression immediates.
Old-syntax field access by suffixing aliases, for example srcx, is not supported. Prefer new syntax such as src[x].
Selecting a specific simplification branch, for example mula d,s,t, limits simplification to that operand and does not expand back to the full VCL simplification set.

MASP / GASP

GNU gasp has been removed from newer binutils. Use MASP as a compatible replacement by installing masp on your PATH and passing:

./openvcl -g --gasp masp input.vcl -o output.vsm

You may also pass a full path to --gasp.

Tests

Build and run the unit/integration suite:

cmake --build test/build --target openvcl_unit_tests -j8
ctest --test-dir test/build --output-on-failure

Name		Name	Last commit message	Last commit date
Latest commit History 358 Commits
.github/workflows		.github/workflows
examples		examples
src		src
test		test
tools		tools
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
fast.vcl		fast.vcl
fast_nolights.vcl		fast_nolights.vcl
fast_nolights.vo		fast_nolights.vo
general.vcl		general.vcl
general_nospec.vcl		general_nospec.vcl
general_nospec_quad.vcl		general_nospec_quad.vcl
general_nospec_tri.vcl		general_nospec_tri.vcl
general_pv_diff.vcl		general_pv_diff.vcl
general_pv_diff_quad.vcl		general_pv_diff_quad.vcl
general_pv_diff_tri.vcl		general_pv_diff_tri.vcl
general_quad.vcl		general_quad.vcl
general_tri.vcl		general_tri.vcl
indexed.vcl		indexed.vcl
tundra.lua		tundra.lua

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenVCL

License

Background

Build

Basic Usage

VSM Cost Analysis

Instruction Metadata

Scheduler Status

Roadmap

Expression Solver

Differences From SCE VCL

MASP / GASP

Tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenVCL

License

Background

Build

Basic Usage

VSM Cost Analysis

Instruction Metadata

Scheduler Status

Roadmap

Expression Solver

Differences From SCE VCL

MASP / GASP

Tests

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages