Skip to content

JNasm Assembler Design

opencode-agent[bot] edited this page May 10, 2026 · 1 revision

JNasm Assembler Design

JNode's custom Java-based x86 assembler that converts assembly source into native machine code for the boot image.

Overview

JNasm is a custom-written x86 assembler implemented in Java, located in builder/src/builder/org/jnode/jnasm/. Unlike using external assemblers (NASM/MASM), JNasm allows the entire JNode build to be self-contained. It takes .asm source files and produces native code that becomes part of the boot image. The assembler is invoked by BootImageBuilder during the build process to compile the kernel assembly files.

Key Components

Class File Purpose
JNAsm JNAsm.java Main entry point; orchestrates preprocessor and assembler
Preprocessor preprocessor/Preprocessor.java Macro processing, include files, symbol substitution
Assembler assembler/Assembler.java Two-pass assembly engine; label/constant resolution
Instruction assembler/Instruction.java Represents a single instruction with operands
X86Core assembler/x86/X86Core.java Core x86 instruction encoding (emitADD, emitMOV, etc.)
X86Support assembler/x86/X86Support.java Hardware support abstraction and addressing modes
PseudoInstructions assembler/PseudoInstructions.java Handles pseudo-ops like DB, DW, TIMES

How It Works

Preprocessing

The preprocessor handles:

  • Single-line macros: #define NAME value substitutions
  • Multi-line macros: .macro / .endm blocks
  • Includes: .include "file.asm" directive
  • Symbol definitions: Passed from build (e.g., BITS32, JNODE_VERSION)
# Example source with macros
%define KERNEL_BASE 0x100000
mov eax, KERNEL_BASE

Two-Pass Assembly

JNasm uses a classic two-pass approach:

  1. Pass 1: Parse all instructions, collect labels and constants, resolve forward references
  2. Pass 2: Emit actual machine code using resolved addresses
// Assembler.java - performTwoPasses
public void performTwoPasses(Reader reader, NativeStream asm) throws Exception {
    String data = readToString(reader);
    // 1st pass
    ReInit(new StringReader(data));
    setPass(1);
    jnasmInput();
    assemble((int) asm.getBaseAddr());
    // 2nd pass
    setPass(2);
    instructions.clear();
    ReInit(new StringReader(data));
    jnasmInput();
    emit(asm);
}

Instruction Encoding

Each instruction has an emit* method in X86Core that generates the binary encoding. The encoding follows x86 specification:

// X86Core.java - emitADD showing addressing mode handling
private void emitADD() {
    int addr = getAddressingMode(2);
    switch (addr) {
        case RR_ADDR:           // Register-Register
            stream.writeADD(getReg(0), getReg(1));
            break;
        case RC_ADDR:           // Register-Constant
            stream.writeADD(getReg(0), getInt(1));
            break;
        case RE_ADDR:           // Register-Memory
            Address ind = getAddress(1);
            stream.writeADD(getReg(0), getRegister(ind.getImg()), ind.disp);
            break;
        case EC_ADDR:           // Memory-Constant
            ind = getAddress(0);
            stream.writeADD(operandSize, getRegister(ind.getImg()), ind.disp, getInt(1));
            break;
        // ... more addressing modes
    }
}

Addressing Modes

The assembler supports these addressing modes:

  • R: Single register operand
  • RR: Register to register
  • RC: Register to immediate constant
  • RE: Register to effective address (memory)
  • ER: Effective address to register
  • EC: Effective address to constant
  • A: Absolute address

Integration with BootImageBuilder

During boot image construction, BootImageBuilder.compileKernel() invokes JNasm:

// BootImageBuilder.java
protected void compileKernel(NativeStream os, AsmSourceInfo sourceInfo) throws BuildException {
    final Map<String, String> symbols = new HashMap<String, String>();
    symbols.put("BITS" + i_bist, "");  // BITS32 or BITS64
    symbols.put("JNODE_VERSION", "'" + version + "'");

    JNAsm.assembler(os, sourceInfo, symbols);
}

The assembled kernel code is embedded in the boot image at the address specified by os.getBaseAddr().

Assembly Syntax

JNasm uses Intel-style syntax similar to MASM/TASM:

; Kernel initialization
[BITS 32]
[ORG 0x100000]

start:
    mov eax, cr0
    or eax, 1
    mov cr0, eax
    
    ; Call function
    call init_paging
    
    ; Jump table entry
    jmp [eax + 4 * ebx]

Supported directives:

  • [BITS 32|64] - Set operand size
  • [ORG addr] - Set origin address
  • [SECTION name] - Define sections
  • %define - Macro definition
  • .include - Include file
  • .macro / .endm - Multi-line macros
  • TIMES n instruction - Repeat instruction n times (for padding)

Gotchas

  1. Generated parser classes: The actual parser (JNAsm in assembler/gen/) and preprocessor (JNAsmPP in preprocessor/gen/) are auto-generated at build time using ANTLR. They aren't checked in.

  2. Addressing mode validation: If an instruction is used with an unsupported addressing mode, reportAddressingError() throws an exception indicating the allowed modes.

  3. Pass-dependent behavior: The assembler tracks the current pass to handle forward label references. During pass 1, labels are collected but not resolved; during pass 2, actual code is emitted.

  4. Register sizing: Mixing incompatible register sizes (e.g., 8-bit AL with 32-bit EAX) triggers IllegalArgumentException with a clear error message.

  5. Symbol resolution: Undefined constants throw UndefinedConstantException at runtime. Hex constants can be written as 0x... or just hex digits.

Related Pages

Clone this wiki locally