Skip to content

Proposal: add generic parameters to types to distinguish different data structures on different architectures #118

@Lancern

Description

@Lancern

Motivation

The link between core data types (such as Capstone, Insn, InsnDetail, etc.) and architecture-specific data types (the data types under crate::arch) is not that "explicit" on the typing. For example, to get the instruction ID on the x86 arch, we have to code:

let cs = Capstone::new().x86().build().unwrap();
let insns = cs.disasm_all(CODE, 0x1000).unwrap();
for i in insns.as_ref() {
    let insn_id = unsafe {
        // Neither typing nor documentation mention this!
        std::mem::transmute::<_, X86Insn>(i.id().0)
    };
    // do something with insn_id
}

which is not easier for beginners to catch.

We can simply resolve the problem above by adding a method arch_insn_id that returns the corresponding instruction ID enum variants just like the InsnDetail::arch_detail method:

pub enum ArchInsnId {
    X86InsnId(X86Insn),
    // other architectures
}

impl<'a> Insn<'a> {
    pub fn arch_insn_id(&self) -> ArchInsnId {
        // ... code
    }
}

This leads to another slightly disturbing problem. We have to match against the return value of arch_insn_id to extract the x86-specific instruction ID, given that we are already confident about the architecture. This problem also arises when we call the InsnDetail::arch_detail method or other methods with a similarly-typed return value.

The Proposal

The proposal posted here is only an (too-)early draft and more details may be missing for further considerations and discussions.

First of all, we can add a new trait that abstracts a specific architecture:

pub trait Arch {
    type InsnId;
    type InsnDetail;
    // ... other stuff
}

Then, we add a generic parameter to Capstone, Insn and InsnDetail that represents the architecture:

pub struct Capstone<A: Arch> {
    // ... fields
}

pub struct Insn<'a, A: Arch> {
    // ... fields
}

pub struct InsnDetail<'a, A: Arch> {
    // ... fields
}

Then, the methods mentioned in the motivation section can be typed in a more straight-forward way:

impl<A: Arch> Capstone<A> {
    pub fn insn_detail<'s, 'i: 's>(
        &'s self, 
        insn: &'i Insn<'_, A>
    ) -> CsResult<InsnDetail<'i, A>> {
        // ... code
    }
}

impl<'a, A: Arch> Insn<'a, A> {
    pub fn id(&self) -> A::InsnId {
        // ... code
    }
}

impl<'a, A: Arch> InsnDetail<'a, A> {
    pub fn arch_detail(&self) -> A::InsnDetail {
        // ... code
    }
}

No more matches, as long as we're targeting a specific architecture. Also, beginners can find corresponding architecture-specific implementations just by looking at the typings. The instruction ID problem can be resolved accordingly.

Unresolved Problems

When the target architecture cannot be determined during compile-time (when the disassembler is created by the Capstone::new_raw method), the generic parameters cannot be set to represent specific architecture. To resolve this problem, maybe we need to introduce a DynamicArch that implements Arch and represents the target architecture is determined during runtime.


This proposal may be pre-mature, but I do think that it reveals some (possibly minor) problems.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions