OpenVCL is a free VCL preprocessor for PlayStation 2 VU programs. It reads VCL-style source, performs register allocation and scheduling, and emits standard VSM/DSM-style output that can be assembled by the PS2 toolchain.
The project was originally written by Jesper Svennevid and Daniel Collin. This repository is currently being modernized around ps2gl compatibility, correct VU scheduling, and measurable VSM cost analysis.
Francisco Javier Trujillo Mata is the current main contributor and maintainer, recovering and extending the project after a long period without active development.
OpenVCL is licensed under AFL v2.0. See LICENSE.
OpenVCL has been built from public VCL documentation and VCL source examples. No proprietary binary has been reverse engineered.
VU Command Line is a trademark of Sony Computer Entertainment. VCL is the abbreviated name for VU Command Line.
make openvclInstall into $PS2DEV/bin when PS2DEV is set:
make installOr install into another prefix:
PREFIX=/usr/local make installBSD users may need to use gmake.
Compile VCL to VSM:
./openvcl input.vcl -o output.vsmRead from stdin and write to stdout:
./openvcl < input.vcl > output.vsmRun MASP as the gasp replacement:
./openvcl -g --gasp masp input.vcl -o output.vsmShow command-line help:
./openvcl -hUseful options:
| option | purpose |
|---|---|
-c |
emit nearly original source as comments |
-C |
disable code reduction |
-d |
emit dumb/unscheduled-style code |
-e |
disable generated [E] bits |
-f |
disable generated .align directives |
-g |
run gasp or --gasp before VCL processing |
-G |
run the C preprocessor before VCL processing |
-I<path> |
include path for gasp/MASP |
-K |
keep preprocessor temporary files |
-L |
globally disable loop code generation |
-m |
generate .mpg and DMA tags automatically |
-n |
enable new syntax |
-o <file> |
output filename |
-t <n> |
optimizer timeout |
-u <text> |
unique label-generation string |
--gasp <name> |
run a specific gasp-compatible preprocessor |
--cpp <name> |
run a specific C preprocessor |
--bthres <n> |
dynamic branch visit threshold |
--show-reg-alloc |
print register allocation information |
--cost |
analyze scheduled .vsm cost |
--cost-json |
analyze scheduled .vsm cost as JSON |
--cost-loop <label>=<n> |
weight a block by expected iterations |
--cost-loop-preset ps2gl |
apply known ps2gl hot-loop weights |
--cost-compare <baseline> |
compare scheduled .vsm cost against a baseline |
--cost-compare-json <baseline> |
compare scheduled .vsm cost as JSON |
--cost-compare-markdown <baseline> |
compare scheduled .vsm cost as a Markdown table |
--cost-compare-list-markdown |
read baseline/candidate VSM pairs and emit one Markdown table |
--cost-compare-list-check <metric> |
fail if any listed candidate is slower than its baseline |
--dump-instruction-info |
print the VU instruction metadata table |
--dump-instruction-info-json |
print the VU instruction metadata table as JSON |
--dump-schedule-info |
print generic ready-scheduler issue slots |
--dump-schedule-info-json |
print generic ready-scheduler issue slots as JSON |
--enable-generic-software-pipelining |
enable safe generic software-pipeline rewrites, currently the default |
--disable-generic-software-pipelining |
disable generic software-pipeline rewrites for comparison/debugging |
--strict-schedule-slots |
emit from the typed scheduler slot model without legacy lookahead pairing |
-M, -P, and -Z are accepted for VCL command-line compatibility.
OpenVCL can also analyze already scheduled .vsm files. This works for both
OpenVCL-generated VSM and SCE/reference VSM files.
Human-readable report:
./openvcl --cost shader.vsmJSON report:
./openvcl --cost-json shader.vsmThe JSON report includes label_order and cost_by_label so tools can read a
shader by its VSM labels instead of reverse-engineering the raw block list:
{
"label_order": ["init_lid", "xform_loop_lid", "done_lid"],
"cost_by_label": {
"xform_loop_lid": {
"affine_role": "loop",
"static_cycles": 25,
"estimated_cycles": 25,
"weighted_estimated_cycles": 2500
}
}
}Weight hot blocks by expected loop iterations:
./openvcl --cost --cost-loop xform_loop_lid=100 shader.vsmApply the ps2gl 100-vertex hot-loop preset:
./openvcl --cost --cost-loop-preset ps2gl shader.vsmWhen loop labels are configured, reports also include an affine cost expression:
affine_estimated_cycles: 120 + 25n
The base term is the one-time cost for setup plus teardown. The n term is
the cost of the selected loop block(s) for each vertex. Static affine cost uses
the scheduled instruction cycles as emitted; estimated affine cost also adds the
modeled FDIV/EFU issue stalls and explicit waitq/waitp stalls.
The preset recognizes both OpenVCL labels such as xform_loop_lid and SCE
optimized main-loop labels such as EXPL_..._xform_loop_lid__MAIN_LOOP. It
also maps SCE fast-family adcLoop_done_lid__MAIN_LOOP labels onto
xform_loop_lid, so it can be used for side-by-side reference comparisons.
Compare a candidate shader against a reference shader:
./openvcl --cost-compare sce_reference.vsm openvcl_candidate.vsmEmit comparison output for scripts or Markdown reports:
./openvcl --cost-compare-json sce_reference.vsm openvcl_candidate.vsm
./openvcl --cost-compare-markdown sce_reference.vsm openvcl_candidate.vsmThe comparison JSON also includes label_comparisons, keyed by canonical label
matching where possible, so SCE optimized loop labels such as
EXPL_...__MAIN_LOOP can be compared with their OpenVCL source labels.
Emit one Markdown table for a set of VSM pairs:
./openvcl --cost-compare-list-markdown --cost-loop-preset ps2gl pairs.txtpairs.txt contains whitespace-separated baseline.vsm candidate.vsm rows.
Blank lines and # comments are ignored.
Fail when any listed candidate is slower than its own baseline:
./openvcl --cost-compare-list-check weighted-estimated --cost-loop-preset ps2gl pairs.txtSupported check metrics are static, estimated, weighted-static,
weighted-estimated, affine-static-base, affine-static-loop,
affine-estimated-base, and affine-estimated-loop. This check is per row: an
OpenVCL shader only passes when that specific shader is equal to or faster than
its matching SCE/reference VSM.
The report includes:
- static scheduled cycles;
- estimated cycles including modeled FDIV/EFU producer issue stalls and
explicit
waitq/waitpstalls; - loop-weighted totals when
--cost-loopor--cost-loop-presetis used; - affine
base + loop*nstatic and estimated costs for selected loop labels; - upper/lower slot usage, paired cycles, NOP slots, and nop-only cycles;
- per-label block costs;
- weighted hot-block, idle-slot, estimated-cost, and wait-stall rankings;
- unknown-instruction and slot-mismatch checks.
OpenVCL exposes its shared VU instruction table for scheduling and tooling work. The text form is useful while inspecting opcodes:
./openvcl --dump-instruction-infoThe JSON form is intended for scripts and regression tests:
./openvcl --dump-instruction-info-jsonEach row includes the mnemonic, pipe, execution unit, throughput, latency, parser operand pattern, readable parameter summary, short description, implicit resources, memory flags, branch-delay slots, and special bypass notes. This is the canonical table to inspect before adding new parser, cost-analysis, or scheduler rules.
Generated-code helpers should use VuInstructionOpcode/vuInstructionMnemonic
instead of spelling raw mnemonics directly in CodeGenerator.cpp. Hand-written
software-pipeline paths can then share the same names as the parser, cost
analyzer, and metadata dumps.
OpenVCL now performs conservative VU scheduling rather than only emitting
VCL -d-style output. Current work includes:
- bounded upper/lower pairing lookahead;
- latency-gap filling with ready independent instructions;
- deferred
waitq/waitpemission; - Q/P, I, MAC, CLIP, ACC, VF/VI, and per-field VF dependency checks;
- safe movement around selected plain loads/stores;
- branch padding reuse when an existing pure
nop/nopcycle is available; - adjacent upper/direct-branch pairing while preserving branch delay slots;
- deterministic alias allocation for reproducible VSM output;
--LoopCS-marked loop temporaries get conservative VF lifetime expansion when register pressure allows, giving the scheduler room to overlap loads;- static cost reporting used to compare OpenVCL output with SCE/reference VSM.
The current refactor is consolidating instruction facts into one canonical VU
instruction metadata table. src/VuInstructionInfo.* now feeds parser operand
construction and cost-analysis opcode classification. The scheduler should
migrate onto the same table so resource and barrier rules are not duplicated
across the codebase.
Current ps2gl pure-OpenVCL aggregate cost baseline:
| metric | SCE/reference | OpenVCL | delta |
|---|---|---|---|
| static scheduled cycles | 6308 | 5268 | -1040 |
| estimated cycles | 6820 | 5838 | -982 |
| ps2gl-loop weighted static cycles | 100358 | 334938 | +234580 |
| ps2gl-loop weighted estimated cycles | 100870 | 373227 | +272357 |
estimated cycles includes modeled FDIV/EFU producer issue stalls and
explicit waitq/waitp stalls. These are static VSM estimates, not measured
runtime per draw call. The loop-weighted rows apply --cost-loop-preset ps2gl
to the 13 matched ps2gl renderer pairs; they better expose the remaining
hot-loop gap caused by SCE/reference prolog/main/epilog software-pipelined
loops versus OpenVCL's current generic scheduling and limited generic
software-pipeline coverage.
This baseline uses corrected ACC dependencies for multiply-add/subtract
instructions plus safe generic software-pipeline rewrites where loop analysis
can prove the cloned prolog/main/drain structure. Conservative branch-delay
filling handles independent integer instructions immediately before direct
branches. Standalone branches no longer emit an extra pre-branch bubble once
normal read-hazard padding is satisfied.
Eligible pre-increment stores can also move into branch delay slots by
adjusting their offsets against the incremented base register. Dead VI-only
fallthrough integer instructions can fill forward conditional branch delay
slots when the taken path overwrites the same VI value before reading it.
Independent plain stores immediately before loop-counter increments can fill
the following branch delay slot when the store does not depend on the updated
counter value. Load results can feed ftoi* conversions with the same narrow
bypass behavior used by the SCE-generated ps2gl ADC setup, while normal
load-to-VF consumers still keep the conservative load-use padding.
The ACC rule is intentionally more conservative than older reports that let the
scheduler move madd*/msub* instructions across ACC producer chains.
The performance target is per shader, not aggregate. A scheduler change is only complete when every matched ps2gl OpenVCL VSM is equal to or faster than its matching SCE/reference VSM for the selected static and estimated metrics.
The generic scheduler is now a descriptor-backed ready-set scheduler over typed basic blocks. The remaining work is to keep moving code emission and loop optimization decisions onto that explicit schedule model:
- keep
VuInstructionInfoas the canonical instruction table for parser, cost analysis, resource descriptors, and scheduler tooling; - move remaining default emission-time cycle state into the typed scheduler
plan already used by
--strict-schedule-slots; - improve exact memory aliasing beyond base register and constant offset;
- extend typed branch-delay-slot metadata before attempting broader branch NOP removal or delay-slot filling in the default path;
- optimize hot ps2gl loops for estimated block cost and dual-pipe occupancy,
guided by
--cost-compareand--cost-loop-preset ps2gl; - retire the remaining bounded textual lookahead once the typed scheduler and software-pipeline planner match the legacy output on correctness and cost.
Scheduling changes should preserve Q/P, I, MAC, CLIP, ACC, VF/VI, broadcast field, branch-delay, and memory-ordering correctness. Each new scheduling rule should have a focused unit or integration test, full OpenVCL test coverage, regenerated ps2gl pure-OpenVCL VSMs, and PCSX2 smoke coverage when generated output can affect visible examples.
OpenVCL extends loi expressions with math functions. These extensions make
source incompatible with standard VCL when used, but are useful for standalone
OpenVCL projects.
| function | result |
|---|---|
abs(x) |
absolute value |
exp(x) |
exponential |
sin(x), cos(x), tan(x) |
trigonometry from radians |
sinh(x), cosh(x), tanh(x) |
hyperbolic trigonometry |
asin(x), acos(x), atan(x), atan2(x, y) |
inverse trigonometry |
pow(x, y) |
x raised to y |
log(x), log10(x) |
logarithms |
sqrt(x) |
square root |
pi() |
pi |
Example:
loi sin(45 * (pi()/1.8e2))Function names are case-sensitive.
- If an alias contains a register-field declaration and the argument slot does not support fields, OpenVCL rejects it.
Imay not be used as an alias for integer registers.- Float expressions are evaluated for
loi; GAS does not handle float-expression immediates. - Old-syntax field access by suffixing aliases, for example
srcx, is not supported. Prefer new syntax such assrc[x]. - Selecting a specific simplification branch, for example
mula d,s,t, limits simplification to that operand and does not expand back to the full VCL simplification set.
GNU gasp has been removed from newer binutils. Use MASP as a compatible
replacement by installing masp on your PATH and passing:
./openvcl -g --gasp masp input.vcl -o output.vsmYou may also pass a full path to --gasp.
Build and run the unit/integration suite:
cmake --build test/build --target openvcl_unit_tests -j8
ctest --test-dir test/build --output-on-failure