Skip to content

ps2dev/openvcl

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

358 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenVCL

OpenVCL is a free VCL preprocessor for PlayStation 2 VU programs. It reads VCL-style source, performs register allocation and scheduling, and emits standard VSM/DSM-style output that can be assembled by the PS2 toolchain.

The project was originally written by Jesper Svennevid and Daniel Collin. This repository is currently being modernized around ps2gl compatibility, correct VU scheduling, and measurable VSM cost analysis.

Francisco Javier Trujillo Mata is the current main contributor and maintainer, recovering and extending the project after a long period without active development.

License

OpenVCL is licensed under AFL v2.0. See LICENSE.

Background

OpenVCL has been built from public VCL documentation and VCL source examples. No proprietary binary has been reverse engineered.

VU Command Line is a trademark of Sony Computer Entertainment. VCL is the abbreviated name for VU Command Line.

Build

make openvcl

Install into $PS2DEV/bin when PS2DEV is set:

make install

Or install into another prefix:

PREFIX=/usr/local make install

BSD users may need to use gmake.

Basic Usage

Compile VCL to VSM:

./openvcl input.vcl -o output.vsm

Read from stdin and write to stdout:

./openvcl < input.vcl > output.vsm

Run MASP as the gasp replacement:

./openvcl -g --gasp masp input.vcl -o output.vsm

Show command-line help:

./openvcl -h

Useful options:

option purpose
-c emit nearly original source as comments
-C disable code reduction
-d emit dumb/unscheduled-style code
-e disable generated [E] bits
-f disable generated .align directives
-g run gasp or --gasp before VCL processing
-G run the C preprocessor before VCL processing
-I<path> include path for gasp/MASP
-K keep preprocessor temporary files
-L globally disable loop code generation
-m generate .mpg and DMA tags automatically
-n enable new syntax
-o <file> output filename
-t <n> optimizer timeout
-u <text> unique label-generation string
--gasp <name> run a specific gasp-compatible preprocessor
--cpp <name> run a specific C preprocessor
--bthres <n> dynamic branch visit threshold
--show-reg-alloc print register allocation information
--cost analyze scheduled .vsm cost
--cost-json analyze scheduled .vsm cost as JSON
--cost-loop <label>=<n> weight a block by expected iterations
--cost-loop-preset ps2gl apply known ps2gl hot-loop weights
--cost-compare <baseline> compare scheduled .vsm cost against a baseline
--cost-compare-json <baseline> compare scheduled .vsm cost as JSON
--cost-compare-markdown <baseline> compare scheduled .vsm cost as a Markdown table
--cost-compare-list-markdown read baseline/candidate VSM pairs and emit one Markdown table
--cost-compare-list-check <metric> fail if any listed candidate is slower than its baseline
--dump-instruction-info print the VU instruction metadata table
--dump-instruction-info-json print the VU instruction metadata table as JSON
--dump-schedule-info print generic ready-scheduler issue slots
--dump-schedule-info-json print generic ready-scheduler issue slots as JSON
--enable-generic-software-pipelining enable safe generic software-pipeline rewrites, currently the default
--disable-generic-software-pipelining disable generic software-pipeline rewrites for comparison/debugging
--strict-schedule-slots emit from the typed scheduler slot model without legacy lookahead pairing

-M, -P, and -Z are accepted for VCL command-line compatibility.

VSM Cost Analysis

OpenVCL can also analyze already scheduled .vsm files. This works for both OpenVCL-generated VSM and SCE/reference VSM files.

Human-readable report:

./openvcl --cost shader.vsm

JSON report:

./openvcl --cost-json shader.vsm

The JSON report includes label_order and cost_by_label so tools can read a shader by its VSM labels instead of reverse-engineering the raw block list:

{
  "label_order": ["init_lid", "xform_loop_lid", "done_lid"],
  "cost_by_label": {
    "xform_loop_lid": {
      "affine_role": "loop",
      "static_cycles": 25,
      "estimated_cycles": 25,
      "weighted_estimated_cycles": 2500
    }
  }
}

Weight hot blocks by expected loop iterations:

./openvcl --cost --cost-loop xform_loop_lid=100 shader.vsm

Apply the ps2gl 100-vertex hot-loop preset:

./openvcl --cost --cost-loop-preset ps2gl shader.vsm

When loop labels are configured, reports also include an affine cost expression:

affine_estimated_cycles: 120 + 25n

The base term is the one-time cost for setup plus teardown. The n term is the cost of the selected loop block(s) for each vertex. Static affine cost uses the scheduled instruction cycles as emitted; estimated affine cost also adds the modeled FDIV/EFU issue stalls and explicit waitq/waitp stalls.

The preset recognizes both OpenVCL labels such as xform_loop_lid and SCE optimized main-loop labels such as EXPL_..._xform_loop_lid__MAIN_LOOP. It also maps SCE fast-family adcLoop_done_lid__MAIN_LOOP labels onto xform_loop_lid, so it can be used for side-by-side reference comparisons.

Compare a candidate shader against a reference shader:

./openvcl --cost-compare sce_reference.vsm openvcl_candidate.vsm

Emit comparison output for scripts or Markdown reports:

./openvcl --cost-compare-json sce_reference.vsm openvcl_candidate.vsm
./openvcl --cost-compare-markdown sce_reference.vsm openvcl_candidate.vsm

The comparison JSON also includes label_comparisons, keyed by canonical label matching where possible, so SCE optimized loop labels such as EXPL_...__MAIN_LOOP can be compared with their OpenVCL source labels.

Emit one Markdown table for a set of VSM pairs:

./openvcl --cost-compare-list-markdown --cost-loop-preset ps2gl pairs.txt

pairs.txt contains whitespace-separated baseline.vsm candidate.vsm rows. Blank lines and # comments are ignored.

Fail when any listed candidate is slower than its own baseline:

./openvcl --cost-compare-list-check weighted-estimated --cost-loop-preset ps2gl pairs.txt

Supported check metrics are static, estimated, weighted-static, weighted-estimated, affine-static-base, affine-static-loop, affine-estimated-base, and affine-estimated-loop. This check is per row: an OpenVCL shader only passes when that specific shader is equal to or faster than its matching SCE/reference VSM.

The report includes:

  • static scheduled cycles;
  • estimated cycles including modeled FDIV/EFU producer issue stalls and explicit waitq/waitp stalls;
  • loop-weighted totals when --cost-loop or --cost-loop-preset is used;
  • affine base + loop*n static and estimated costs for selected loop labels;
  • upper/lower slot usage, paired cycles, NOP slots, and nop-only cycles;
  • per-label block costs;
  • weighted hot-block, idle-slot, estimated-cost, and wait-stall rankings;
  • unknown-instruction and slot-mismatch checks.

Instruction Metadata

OpenVCL exposes its shared VU instruction table for scheduling and tooling work. The text form is useful while inspecting opcodes:

./openvcl --dump-instruction-info

The JSON form is intended for scripts and regression tests:

./openvcl --dump-instruction-info-json

Each row includes the mnemonic, pipe, execution unit, throughput, latency, parser operand pattern, readable parameter summary, short description, implicit resources, memory flags, branch-delay slots, and special bypass notes. This is the canonical table to inspect before adding new parser, cost-analysis, or scheduler rules.

Generated-code helpers should use VuInstructionOpcode/vuInstructionMnemonic instead of spelling raw mnemonics directly in CodeGenerator.cpp. Hand-written software-pipeline paths can then share the same names as the parser, cost analyzer, and metadata dumps.

Scheduler Status

OpenVCL now performs conservative VU scheduling rather than only emitting VCL -d-style output. Current work includes:

  • bounded upper/lower pairing lookahead;
  • latency-gap filling with ready independent instructions;
  • deferred waitq/waitp emission;
  • Q/P, I, MAC, CLIP, ACC, VF/VI, and per-field VF dependency checks;
  • safe movement around selected plain loads/stores;
  • branch padding reuse when an existing pure nop/nop cycle is available;
  • adjacent upper/direct-branch pairing while preserving branch delay slots;
  • deterministic alias allocation for reproducible VSM output;
  • --LoopCS-marked loop temporaries get conservative VF lifetime expansion when register pressure allows, giving the scheduler room to overlap loads;
  • static cost reporting used to compare OpenVCL output with SCE/reference VSM.

The current refactor is consolidating instruction facts into one canonical VU instruction metadata table. src/VuInstructionInfo.* now feeds parser operand construction and cost-analysis opcode classification. The scheduler should migrate onto the same table so resource and barrier rules are not duplicated across the codebase.

Current ps2gl pure-OpenVCL aggregate cost baseline:

metric SCE/reference OpenVCL delta
static scheduled cycles 6308 5268 -1040
estimated cycles 6820 5838 -982
ps2gl-loop weighted static cycles 100358 334938 +234580
ps2gl-loop weighted estimated cycles 100870 373227 +272357

estimated cycles includes modeled FDIV/EFU producer issue stalls and explicit waitq/waitp stalls. These are static VSM estimates, not measured runtime per draw call. The loop-weighted rows apply --cost-loop-preset ps2gl to the 13 matched ps2gl renderer pairs; they better expose the remaining hot-loop gap caused by SCE/reference prolog/main/epilog software-pipelined loops versus OpenVCL's current generic scheduling and limited generic software-pipeline coverage.

This baseline uses corrected ACC dependencies for multiply-add/subtract instructions plus safe generic software-pipeline rewrites where loop analysis can prove the cloned prolog/main/drain structure. Conservative branch-delay filling handles independent integer instructions immediately before direct branches. Standalone branches no longer emit an extra pre-branch bubble once normal read-hazard padding is satisfied. Eligible pre-increment stores can also move into branch delay slots by adjusting their offsets against the incremented base register. Dead VI-only fallthrough integer instructions can fill forward conditional branch delay slots when the taken path overwrites the same VI value before reading it. Independent plain stores immediately before loop-counter increments can fill the following branch delay slot when the store does not depend on the updated counter value. Load results can feed ftoi* conversions with the same narrow bypass behavior used by the SCE-generated ps2gl ADC setup, while normal load-to-VF consumers still keep the conservative load-use padding. The ACC rule is intentionally more conservative than older reports that let the scheduler move madd*/msub* instructions across ACC producer chains.

The performance target is per shader, not aggregate. A scheduler change is only complete when every matched ps2gl OpenVCL VSM is equal to or faster than its matching SCE/reference VSM for the selected static and estimated metrics.

Roadmap

The generic scheduler is now a descriptor-backed ready-set scheduler over typed basic blocks. The remaining work is to keep moving code emission and loop optimization decisions onto that explicit schedule model:

  • keep VuInstructionInfo as the canonical instruction table for parser, cost analysis, resource descriptors, and scheduler tooling;
  • move remaining default emission-time cycle state into the typed scheduler plan already used by --strict-schedule-slots;
  • improve exact memory aliasing beyond base register and constant offset;
  • extend typed branch-delay-slot metadata before attempting broader branch NOP removal or delay-slot filling in the default path;
  • optimize hot ps2gl loops for estimated block cost and dual-pipe occupancy, guided by --cost-compare and --cost-loop-preset ps2gl;
  • retire the remaining bounded textual lookahead once the typed scheduler and software-pipeline planner match the legacy output on correctness and cost.

Scheduling changes should preserve Q/P, I, MAC, CLIP, ACC, VF/VI, broadcast field, branch-delay, and memory-ordering correctness. Each new scheduling rule should have a focused unit or integration test, full OpenVCL test coverage, regenerated ps2gl pure-OpenVCL VSMs, and PCSX2 smoke coverage when generated output can affect visible examples.

Expression Solver

OpenVCL extends loi expressions with math functions. These extensions make source incompatible with standard VCL when used, but are useful for standalone OpenVCL projects.

function result
abs(x) absolute value
exp(x) exponential
sin(x), cos(x), tan(x) trigonometry from radians
sinh(x), cosh(x), tanh(x) hyperbolic trigonometry
asin(x), acos(x), atan(x), atan2(x, y) inverse trigonometry
pow(x, y) x raised to y
log(x), log10(x) logarithms
sqrt(x) square root
pi() pi

Example:

loi sin(45 * (pi()/1.8e2))

Function names are case-sensitive.

Differences From SCE VCL

  • If an alias contains a register-field declaration and the argument slot does not support fields, OpenVCL rejects it.
  • I may not be used as an alias for integer registers.
  • Float expressions are evaluated for loi; GAS does not handle float-expression immediates.
  • Old-syntax field access by suffixing aliases, for example srcx, is not supported. Prefer new syntax such as src[x].
  • Selecting a specific simplification branch, for example mula d,s,t, limits simplification to that operand and does not expand back to the full VCL simplification set.

MASP / GASP

GNU gasp has been removed from newer binutils. Use MASP as a compatible replacement by installing masp on your PATH and passing:

./openvcl -g --gasp masp input.vcl -o output.vsm

You may also pass a full path to --gasp.

Tests

Build and run the unit/integration suite:

cmake --build test/build --target openvcl_unit_tests -j8
ctest --test-dir test/build --output-on-failure

About

Replacement for VCL (VU Command Line)

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • C++ 86.1%
  • VCL 13.3%
  • Other 0.6%