Skip to content

Latest commit

 

History

History
930 lines (690 loc) · 39.1 KB

File metadata and controls

930 lines (690 loc) · 39.1 KB

Chapter 11: Vertex Processing and Transforms

Part 2 — The Rasterization Pipeline | Prerequisites: Chapter 10, Chapter 2 | Difficulty: Intermediate | Language: C++

In the last chapter, we saw the big picture: vertices go in, pixels come out. Now we zoom into the very first stage -- vertex processing. This is where 3D coordinates become 2D screen positions, and it's the foundation everything else is built on.

You already know the math from Chapter 2. Model, View, Projection matrices. Coordinate spaces. Perspective division. This chapter puts that math into code. By the end, you'll have the front half of a software 3D rasterizer: reading 3D vertices and producing clipped, screen-space triangles ready for rasterization.

In This Chapter

  • Vertex attributes: position, normal, texture coordinates, color
  • The Model-View-Projection transform chain in code
  • Implementing the vertex shader (CPU-side)
  • Clipping against the view frustum
  • Perspective divide and the viewport transform
  • Backface culling
  • Homogeneous coordinates and why the fourth component matters
  • Index buffers and vertex sharing

Related Chapters

  • Chapter 2: Coordinate Spaces and the View Pipeline -- the math behind every transform in this chapter
  • Chapter 10: The Graphics Pipeline Overview -- where this stage fits in the pipeline
  • Chapter 12: Rasterization: Turning Triangles into Pixels -- the next stage: turning screen-space triangles into pixels
  • Chapter 15: Your First OpenGL Program -- the GPU does this stage in hardware with vertex shaders

What's in a Vertex?

In Chapter 9, your vertices were simple: an (x, y) position and maybe a color. In 3D, vertices carry more information. A vertex is a bundle of attributes -- named values that describe one point on a surface.

Here's what a typical vertex looks like:

struct Vertex {
    Vec3 position;    // where this point is in 3D (object space)
    Vec3 normal;      // which direction the surface faces here
    Vec2 texcoord;    // where to sample the texture (UV)
    Vec3 color;       // per-vertex color (sometimes used)
};

Position is obvious -- it's where the point is. But the other attributes deserve explanation:

Normal: A unit vector perpendicular to the surface at this vertex. We won't use normals until Chapter 16 (lighting), but the vertex pipeline needs to transform them along with positions. Normals are how the renderer knows which way a surface faces, which determines how light bounces off it.

Texture coordinates (UVs): A 2D coordinate that maps this vertex to a position on a texture image. U goes horizontally (0 to 1, left to right), V goes vertically (0 to 1, bottom to top in OpenGL). We'll use these in Chapter 14.

Color: Sometimes vertices have per-vertex colors instead of (or in addition to) textures. The rasterizer interpolates these across the triangle, giving you smooth color gradients.

Here's the critical insight: every vertex attribute gets interpolated across the triangle during rasterization. Position determines where the triangle is on screen. Every other attribute is just along for the ride -- the rasterizer interpolates it using the same barycentric math, and the fragment shader receives the interpolated values.

Vertex A: position=(0,0,0), color=(1,0,0)    ← red
Vertex B: position=(1,0,0), color=(0,1,0)    ← green
Vertex C: position=(0.5,1,0), color=(0,0,1)  ← blue

Fragment at triangle center:
  color = (1/3, 1/3, 1/3)                    ← grayish (mix of all three)

The Transform Chain

You learned this in Chapter 2, but let's see it again through the lens of implementation. A vertex's position goes through a chain of matrix multiplications to reach screen space:

Object Space  →  World Space  →  Camera Space  →  Clip Space  →  NDC  →  Screen Space
           Model         View         Projection      /w divide    Viewport

Each transform has a specific purpose:

Model Matrix

Positions your object in the world. If you have a chair model and you want it at position (3, 0, -5), rotated 45 degrees, the model matrix encodes that. Every instance of the chair can use a different model matrix.

View Matrix

Moves everything relative to the camera. Rather than moving the camera, we move the entire world in the opposite direction. If the camera is at position (10, 5, 10) looking toward the origin, the view matrix translates everything by (-10, -5, -10) and rotates to align the camera's forward direction with -Z.

Projection Matrix

The big one. This matrix maps the 3D camera-space view frustum into a normalized cube. It encodes:

  • Field of view: how wide the camera sees
  • Aspect ratio: width/height of the screen
  • Near and far planes: the closest and farthest visible distances

The projection matrix is where perspective happens. Distant objects get smaller. Parallel lines converge. A 3D world becomes a 2D image.

We usually combine all three into an MVP matrix and apply it in one step:

Mat4 mvp = projection * view * model;  // right-to-left: model first, then view, then projection
Vec4 clipPos = mvp * Vec4(vertex.position, 1.0);

What happens if you get the multiplication order wrong? You get garbage. Matrix multiplication is not commutative. projection * view * model means "first apply model, then view, then projection." If you wrote model * view * projection, you'd be projecting first, then viewing, then modeling -- which makes no geometric sense. The rightmost matrix applies first.


Implementing the Math

Let's build this. We need a small math library -- vectors and matrices. If you already have one from Chapter 9, extend it. If not, here's what we need:

struct Vec2 {
    float x, y;
};

struct Vec3 {
    float x, y, z;
    Vec3 operator+(const Vec3& b) const { return {x+b.x, y+b.y, z+b.z}; }
    Vec3 operator-(const Vec3& b) const { return {x-b.x, y-b.y, z-b.z}; }
    Vec3 operator*(float s) const { return {x*s, y*s, z*s}; }
    float dot(const Vec3& b) const { return x*b.x + y*b.y + z*b.z; }
    Vec3 cross(const Vec3& b) const {
        return {y*b.z - z*b.y, z*b.x - x*b.z, x*b.y - y*b.x};
    }
    Vec3 normalized() const {
        float len = std::sqrt(x*x + y*y + z*z);
        return {x/len, y/len, z/len};
    }
};

struct Vec4 {
    float x, y, z, w;
    Vec4(Vec3 v, float w) : x(v.x), y(v.y), z(v.z), w(w) {}
    Vec4(float x, float y, float z, float w) : x(x), y(y), z(z), w(w) {}
};

The 4x4 matrix stores 16 floats in column-major order (matching OpenGL's convention):

struct Mat4 {
    float m[16];  // column-major: m[col*4 + row]

    // Access element at (row, col)
    float& at(int row, int col) { return m[col * 4 + row]; }
    float at(int row, int col) const { return m[col * 4 + row]; }

    // Matrix-vector multiplication
    Vec4 operator*(const Vec4& v) const {
        return Vec4(
            at(0,0)*v.x + at(0,1)*v.y + at(0,2)*v.z + at(0,3)*v.w,
            at(1,0)*v.x + at(1,1)*v.y + at(1,2)*v.z + at(1,3)*v.w,
            at(2,0)*v.x + at(2,1)*v.y + at(2,2)*v.z + at(2,3)*v.w,
            at(3,0)*v.x + at(3,1)*v.y + at(3,2)*v.z + at(3,3)*v.w
        );
    }

    // Matrix-matrix multiplication
    Mat4 operator*(const Mat4& b) const {
        Mat4 result = {};
        for (int col = 0; col < 4; col++)
            for (int row = 0; row < 4; row++)
                for (int k = 0; k < 4; k++)
                    result.at(row, col) += at(row, k) * b.at(k, col);
        return result;
    }

    static Mat4 identity() {
        Mat4 m = {};
        m.at(0,0) = m.at(1,1) = m.at(2,2) = m.at(3,3) = 1.0f;
        return m;
    }
};

Now we can build the three matrices:

Building the Model Matrix

Mat4 translate(float tx, float ty, float tz) {
    Mat4 m = Mat4::identity();
    m.at(0, 3) = tx;
    m.at(1, 3) = ty;
    m.at(2, 3) = tz;
    return m;
}

Mat4 scale(float sx, float sy, float sz) {
    Mat4 m = Mat4::identity();
    m.at(0, 0) = sx;
    m.at(1, 1) = sy;
    m.at(2, 2) = sz;
    return m;
}

Mat4 rotateY(float angle) {
    float c = std::cos(angle), s = std::sin(angle);
    Mat4 m = Mat4::identity();
    m.at(0, 0) = c;  m.at(0, 2) = s;
    m.at(2, 0) = -s; m.at(2, 2) = c;
    return m;
}

// Usage: rotate 45 degrees around Y, then move to (3, 0, -5)
Mat4 model = translate(3.0f, 0.0f, -5.0f) * rotateY(3.14159f / 4.0f);

Building the View Matrix (Look-At)

The classic "look-at" matrix constructs a camera from an eye position, a target, and an up vector:

Mat4 lookAt(Vec3 eye, Vec3 target, Vec3 up) {
    Vec3 forward = (eye - target).normalized();  // camera looks along -Z
    Vec3 right = up.cross(forward).normalized();
    Vec3 newUp = forward.cross(right);

    Mat4 m = Mat4::identity();
    m.at(0, 0) = right.x;   m.at(0, 1) = right.y;   m.at(0, 2) = right.z;
    m.at(1, 0) = newUp.x;   m.at(1, 1) = newUp.y;   m.at(1, 2) = newUp.z;
    m.at(2, 0) = forward.x; m.at(2, 1) = forward.y;  m.at(2, 2) = forward.z;

    m.at(0, 3) = -right.dot(eye);
    m.at(1, 3) = -newUp.dot(eye);
    m.at(2, 3) = -forward.dot(eye);

    return m;
}

Mat4 view = lookAt(
    Vec3{0, 2, 5},    // camera position
    Vec3{0, 0, 0},    // looking at origin
    Vec3{0, 1, 0}     // up is +Y
);

Building the Projection Matrix

The perspective projection matrix maps the view frustum to the [-1, 1] clip cube:

Mat4 perspective(float fovY, float aspect, float near, float far) {
    float tanHalfFov = std::tan(fovY / 2.0f);

    Mat4 m = {};
    m.at(0, 0) = 1.0f / (aspect * tanHalfFov);
    m.at(1, 1) = 1.0f / tanHalfFov;
    m.at(2, 2) = -(far + near) / (far - near);
    m.at(2, 3) = -(2.0f * far * near) / (far - near);
    m.at(3, 2) = -1.0f;
    return m;
}

Mat4 proj = perspective(
    1.0472f,    // 60 degrees in radians
    16.0f / 9.0f,  // 16:9 aspect ratio
    0.1f,       // near plane
    100.0f      // far plane
);

What happens if the near plane is 0? Division by zero in the depth calculation. The projection matrix has near in the denominator, so a near plane of 0 produces infinite values. Even a very small near plane (like 0.001) causes severe depth precision problems, which we'll explore in Chapter 13.


The Vertex Shader: Your First Programmable Stage

On a real GPU, vertex processing is done by a vertex shader -- a small program that runs once per vertex. In our software rasterizer, we'll write a C++ function that does the same thing:

struct VertexOutput {
    Vec4 clipPos;     // position in clip space
    Vec3 color;       // per-vertex color (to be interpolated)
    Vec2 texcoord;    // texture coordinates (to be interpolated)
    Vec3 worldNormal; // normal in world space (for lighting later)
};

VertexOutput vertexShader(const Vertex& in, const Mat4& mvp, const Mat4& model) {
    VertexOutput out;
    out.clipPos = mvp * Vec4(in.position, 1.0f);
    out.color = in.color;
    out.texcoord = in.texcoord;

    // Transform normal by the model matrix (ignoring translation)
    // For correct normal transforms with non-uniform scale,
    // you'd use the inverse-transpose of the model matrix
    Vec4 n = model * Vec4(in.normal, 0.0f);  // w=0: direction, not point
    out.worldNormal = Vec3{n.x, n.y, n.z}.normalized();

    return out;
}

A few things to notice:

The position gets the full MVP transform, putting it in clip space. Every other attribute gets whatever transform is appropriate -- normals go to world space (for lighting), UVs pass through unchanged.

Normals use w=0, which means the translation part of the matrix is ignored. A normal is a direction, not a point -- translating a direction doesn't make sense.

The Normal Transform Problem

For uniform scaling and rotation, transforming normals by the model matrix works fine. But for non-uniform scaling (stretching more in one axis than another), the normal can end up pointing in the wrong direction.

Consider a sphere scaled to be flat like a pancake (scale Y by 0.1). The normals on the sides should still point outward, but if you just multiply by the model matrix, they get squished along with the geometry and tilt toward the Y-axis.

The correct transform for normals is the inverse-transpose of the model matrix:

// Correct normal transformation for non-uniform scale:
Mat4 normalMatrix = transpose(inverse(model));
Vec4 n = normalMatrix * Vec4(in.normal, 0.0f);

The inverse-transpose undoes the skewing effect of non-uniform scaling on directions. For uniform scaling and pure rotation, the inverse-transpose equals the original matrix, so this formula always works.

In GLSL, you'd write:

vec3 worldNormal = mat3(transpose(inverse(model))) * aNormal;

Computing the inverse on the GPU per-vertex is wasteful. In practice, you compute the normal matrix on the CPU once per object and pass it as a uniform.

The vertex shader outputs everything the fragment shader will need. The rasterizer will interpolate all of these values across the triangle. Anything you don't pass through here won't be available later. This means you must plan ahead -- if your fragment shader needs world-space positions for lighting calculations, the vertex shader must compute and output them.


Clipping

After the vertex shader transforms vertices to clip space, some triangles may extend outside the viewable area. The clipping stage removes the invisible parts.

A vertex is inside the view frustum if its clip-space coordinates satisfy:

-w <= x <= w
-w <= y <= w
-w <= z <= w

(where w is the fourth component of the clip-space position). If all three vertices of a triangle satisfy these conditions, the triangle is fully inside -- no clipping needed. If none do, the triangle is fully outside -- discard it. If some vertices are inside and some are outside, we need to clip.

The Simple Approach: Triangle Rejection

For our software rasterizer, we'll start with a simplified approach: test entire triangles against each frustum plane, and reject triangles that are fully outside. We won't clip partially-visible triangles -- we'll just let them extend past the screen edges and rely on our rasterizer's bounding-box clamping to handle the rest.

This is a common shortcut in software rasterizers. Full clipping (splitting triangles along frustum planes) adds significant complexity, and the screen-space bounding box clamp achieves most of the benefit for free.

bool isInsideFrustum(const Vec4& v) {
    return v.x >= -v.w && v.x <= v.w &&
           v.y >= -v.w && v.y <= v.w &&
           v.z >= -v.w && v.z <= v.w;
}

// Simple rejection: discard triangle if all three vertices are
// on the same wrong side of any clipping plane
bool shouldCullTriangle(const Vec4& a, const Vec4& b, const Vec4& c) {
    // Check each of the 6 frustum planes
    if (a.x > a.w && b.x > b.w && c.x > c.w) return true;  // all right of right plane
    if (a.x < -a.w && b.x < -b.w && c.x < -c.w) return true; // all left of left plane
    if (a.y > a.w && b.y > b.w && c.y > c.w) return true;  // all above top plane
    if (a.y < -a.w && b.y < -b.w && c.y < -c.w) return true; // all below bottom plane
    if (a.z > a.w && b.z > b.w && c.z > c.w) return true;  // all beyond far plane
    if (a.z < -a.w && b.z < -b.w && c.z < -c.w) return true; // all in front of near plane
    return false;
}

What happens if a triangle spans the near plane? This is the one case where simple rejection isn't enough. A vertex behind the camera (z < -w) will produce nonsense after the perspective divide. Real pipelines handle this with proper near-plane clipping, which splits the triangle. For our software rasterizer, we'll add a guard to discard any triangle with a vertex that has w <= 0.

Proper Near-Plane Clipping

For vertices behind the camera, we need actual clipping. Here's a simplified version that clips against just the near plane:

// Clip a triangle against the near plane (w = 0 boundary, or z = -w)
// Returns 0, 1, or 2 triangles
int clipNearPlane(Vec4 v[3], Vec4 out[6]) {
    // For each vertex, check if it's in front of the near plane
    bool inside[3];
    for (int i = 0; i < 3; i++)
        inside[i] = v[i].w > 0.001f && v[i].z >= -v[i].w;

    int numInside = inside[0] + inside[1] + inside[2];

    if (numInside == 3) {
        // All inside -- keep the triangle as-is
        out[0] = v[0]; out[1] = v[1]; out[2] = v[2];
        return 1;  // 1 triangle
    }
    if (numInside == 0) {
        return 0;  // fully clipped
    }

    // Partial clipping produces 1 or 2 triangles
    // (Full implementation would interpolate along clipped edges)
    // For simplicity, we'll just discard partially-clipped triangles
    // in our first implementation
    return 0;
}

A production rasterizer implements the Sutherland-Hodgman algorithm, which clips polygons against each frustum plane in sequence. We'll keep things simpler here.


Perspective Divide

After clipping, we perform the perspective divide: dividing x, y, and z by w. This maps clip-space coordinates to Normalized Device Coordinates (NDC):

Vec3 perspectiveDivide(const Vec4& clip) {
    float invW = 1.0f / clip.w;
    return Vec3{clip.x * invW, clip.y * invW, clip.z * invW};
}

In NDC, visible geometry lives in the range [-1, 1] on all three axes. The x and y values map to screen position, while z maps to depth (for the Z-buffer in Chapter 13).

The perspective divide is what makes distant objects smaller. After the projection matrix, a vertex 10 units away has a larger w than a vertex 2 units away. Dividing by a larger w shrinks the x and y coordinates. That's perspective.

Before perspective divide:
  Near vertex: clip = (2, 1, -0.5, 1)  → NDC = (2, 1, -0.5)
  Far vertex:  clip = (2, 1, -5, 5)    → NDC = (0.4, 0.2, -1)

The far vertex is now smaller. That's perspective projection.

Viewport Transform

NDC coordinates range from [-1, 1]. Your screen has pixel coordinates like [0, 1920] x [0, 1080]. The viewport transform maps one to the other:

struct Viewport {
    float x, y;           // top-left corner (usually 0, 0)
    float width, height;  // screen dimensions
};

Vec2 viewportTransform(const Vec3& ndc, const Viewport& vp) {
    float screenX = (ndc.x + 1.0f) * 0.5f * vp.width + vp.x;
    float screenY = (1.0f - ndc.y) * 0.5f * vp.height + vp.y;  // flip Y: NDC Y-up, screen Y-down
    return Vec2{screenX, screenY};
}

Note the Y-flip: in NDC, Y points up (mathematical convention). On screen, Y points down (pixel convention, where row 0 is the top of the image). This flip is easy to forget and causes upside-down rendering -- one of the most common beginner bugs.

What happens if you forget to flip Y? Your image renders upside-down. Everything else works perfectly -- colors, depth, textures -- but the whole scene is vertically mirrored. It's a classic "aha" moment when you realize why.


Putting It All Together

Let's assemble the complete vertex processing pipeline. We'll define a triangle, run it through the vertex shader, clip, perspective-divide, and viewport-transform:

#include <cmath>
#include <vector>
#include <cstdio>

// ... (Vec2, Vec3, Vec4, Mat4, Vertex, VertexOutput structs from above)

struct ScreenTriangle {
    Vec2 screenPos[3];    // screen-space positions
    float depth[3];       // NDC z values for depth testing
    Vec3 color[3];        // interpolated vertex colors
    Vec2 texcoord[3];     // interpolated texture coordinates
};

std::vector<ScreenTriangle> processVertices(
    const std::vector<Vertex>& vertices,
    const std::vector<int>& indices,  // groups of 3
    const Mat4& model,
    const Mat4& view,
    const Mat4& proj,
    const Viewport& viewport
) {
    Mat4 mvp = proj * view * model;
    std::vector<ScreenTriangle> triangles;

    for (size_t i = 0; i < indices.size(); i += 3) {
        // Step 1: Run vertex shader on each vertex
        VertexOutput v0 = vertexShader(vertices[indices[i]],   mvp, model);
        VertexOutput v1 = vertexShader(vertices[indices[i+1]], mvp, model);
        VertexOutput v2 = vertexShader(vertices[indices[i+2]], mvp, model);

        // Step 2: Frustum culling (reject triangles fully outside)
        if (shouldCullTriangle(v0.clipPos, v1.clipPos, v2.clipPos))
            continue;

        // Step 3: Guard against vertices behind camera
        if (v0.clipPos.w <= 0.001f || v1.clipPos.w <= 0.001f || v2.clipPos.w <= 0.001f)
            continue;

        // Step 4: Perspective divide → NDC
        Vec3 ndc0 = perspectiveDivide(v0.clipPos);
        Vec3 ndc1 = perspectiveDivide(v1.clipPos);
        Vec3 ndc2 = perspectiveDivide(v2.clipPos);

        // Step 5: Viewport transform → screen space
        ScreenTriangle tri;
        tri.screenPos[0] = viewportTransform(ndc0, viewport);
        tri.screenPos[1] = viewportTransform(ndc1, viewport);
        tri.screenPos[2] = viewportTransform(ndc2, viewport);

        tri.depth[0] = ndc0.z;
        tri.depth[1] = ndc1.z;
        tri.depth[2] = ndc2.z;

        tri.color[0] = v0.color;
        tri.color[1] = v1.color;
        tri.color[2] = v2.color;

        tri.texcoord[0] = v0.texcoord;
        tri.texcoord[1] = v1.texcoord;
        tri.texcoord[2] = v2.texcoord;

        triangles.push_back(tri);
    }

    return triangles;
}

Testing with a Cube

Let's define a simple cube and process it:

int main() {
    // A cube: 8 vertices, 12 triangles (2 per face)
    std::vector<Vertex> vertices = {
        // Front face (z = 0.5)
        {{-0.5f, -0.5f,  0.5f}, {0,0,1}, {0,0}, {1,0,0}},
        {{ 0.5f, -0.5f,  0.5f}, {0,0,1}, {1,0}, {0,1,0}},
        {{ 0.5f,  0.5f,  0.5f}, {0,0,1}, {1,1}, {0,0,1}},
        {{-0.5f,  0.5f,  0.5f}, {0,0,1}, {0,1}, {1,1,0}},
        // Back face (z = -0.5)
        {{-0.5f, -0.5f, -0.5f}, {0,0,-1}, {1,0}, {1,0,1}},
        {{ 0.5f, -0.5f, -0.5f}, {0,0,-1}, {0,0}, {0,1,1}},
        {{ 0.5f,  0.5f, -0.5f}, {0,0,-1}, {0,1}, {1,1,1}},
        {{-0.5f,  0.5f, -0.5f}, {0,0,-1}, {1,1}, {0.5,0.5,0.5}},
    };

    std::vector<int> indices = {
        0,1,2, 0,2,3,  // front
        5,4,7, 5,7,6,  // back
        4,0,3, 4,3,7,  // left
        1,5,6, 1,6,2,  // right
        3,2,6, 3,6,7,  // top
        4,5,1, 4,1,0,  // bottom
    };

    Mat4 model = rotateY(0.5f);  // rotate slightly so we see 3D
    Mat4 view = lookAt({0, 1.5f, 3}, {0, 0, 0}, {0, 1, 0});
    Mat4 proj = perspective(1.0472f, 16.0f/9.0f, 0.1f, 100.0f);
    Viewport vp = {0, 0, 800, 600};

    auto tris = processVertices(vertices, indices, model, view, proj, vp);

    printf("Processed %zu screen-space triangles\n", tris.size());
    for (size_t i = 0; i < tris.size(); i++) {
        printf("  Triangle %zu:\n", i);
        for (int v = 0; v < 3; v++) {
            printf("    v%d: screen(%.1f, %.1f) depth=%.4f color=(%.2f,%.2f,%.2f)\n",
                v, tris[i].screenPos[v].x, tris[i].screenPos[v].y,
                tris[i].depth[v],
                tris[i].color[v].x, tris[i].color[v].y, tris[i].color[v].z);
        }
    }
    return 0;
}

When you run this, you should see 12 triangles (6 faces x 2 triangles each) with screen-space coordinates in the [0, 800] x [0, 600] range. The cube faces in front will have smaller (less positive) depth values than the faces in back. Some triangles might be culled if they face away from the camera.


Backface Culling

There's one more important optimization we should add. A solid cube has 6 faces, but from any viewpoint you can see at most 3 of them. The other 3 face away from the camera. Drawing triangles that face away is wasteful -- they'll be hidden behind the front-facing triangles anyway.

Backface culling discards triangles whose front face points away from the camera. We detect this using the cross product of two edges in screen space:

float edgeCross(Vec2 a, Vec2 b, Vec2 c) {
    return (b.x - a.x) * (c.y - a.y) - (b.y - a.y) * (c.x - a.x);
}

bool isBackfacing(const Vec2& v0, const Vec2& v1, const Vec2& v2) {
    // If the cross product is negative, vertices wind clockwise → backfacing
    // (assuming counter-clockwise winding = front face, Y-down screen space)
    return edgeCross(v0, v1, v2) <= 0.0f;
}

The sign of the cross product tells you the winding order of the triangle on screen. If the vertices wind counter-clockwise (positive area), the triangle faces toward the camera. If clockwise (negative area), it faces away.

This is a convention -- OpenGL uses counter-clockwise as the default front face. DirectX uses clockwise. What matters is consistency: define your triangles with a consistent winding and cull the opposite direction.

Add this check after viewport transform:

// After viewport transform, before adding to the output list:
if (isBackfacing(tri.screenPos[0], tri.screenPos[1], tri.screenPos[2]))
    continue;  // skip backfacing triangles

For a cube, this instantly halves the number of triangles you need to rasterize. For a complex mesh with millions of triangles, it's a massive win.

What happens if your winding order is inconsistent? You'll see some faces appear and others vanish seemingly at random. This is one of the most common bugs in graphics programming -- a mesh where some triangles are wound clockwise and others counter-clockwise. It looks like the mesh has holes or random faces missing.


The Importance of Homogeneous Coordinates

You might wonder why we use 4D vectors (Vec4) instead of 3D. Why the w component? This is the homogeneous coordinates system from Chapter 2, and it's worth understanding deeply because it makes the entire pipeline work.

With 3x3 matrices, you can represent rotation and scaling but not translation. Adding a fourth component w lets a 4x4 matrix encode translation too:

For a point: (x, y, z, 1)   → translation affects it
For a direction: (x, y, z, 0)  → translation doesn't affect it

But w does something else that's even more important: it makes perspective projection a linear operation. Without homogeneous coordinates, perspective (divide by distance) is nonlinear and can't be represented as a matrix multiply. With them, the projection matrix encodes perspective as a linear transform, and the nonlinear part is deferred to the perspective divide step (dividing by w).

This separation -- linear projection matrix + nonlinear perspective divide -- is the key that makes the entire pipeline possible. Clipping can happen in the linear clip space (between the matrix multiply and the divide), where the math is simpler.

Why homogeneous coordinates matter:

1. Translation becomes a matrix multiply
2. Perspective becomes a matrix multiply (deferred divide)
3. Clipping math is simpler in clip space (before the divide)
4. All transforms compose into a single matrix multiply
5. Points (w=1) and directions (w=0) are distinguished

Performance Considerations

In our software rasterizer, vertex processing is relatively cheap compared to rasterization (which touches every pixel). But in GPU applications, vertex processing can become the bottleneck in specific scenarios:

Vertex-limited scenarios:

  • Dense meshes with millions of vertices (tessellated terrain, high-poly characters)
  • Expensive vertex shaders (skeletal animation with many bones, complex displacement)
  • Draw call overhead (CPU spends too long setting up each draw call)

Optimizations:

  • Index buffers: Share vertices between triangles. A cube has 8 unique vertices but 36 indices (12 triangles x 3 vertices). Without indexing, you'd need 36 vertices with lots of duplication.
  • Vertex cache: GPUs cache recently transformed vertices so shared vertices don't get transformed twice.
  • Level of detail (LOD): Use simpler meshes for distant objects that cover fewer pixels.

For now, our software rasterizer is simple enough that vertex processing won't be the bottleneck. That honor goes to rasterization, which we'll tackle next.


Debugging Vertex Processing

When things look wrong, the vertex processing stage is often the culprit. Here's a systematic debugging approach:

Common Symptoms and Causes

Nothing visible at all:

  • Object is behind the camera (check view matrix -- is the object's Z position in front of the camera's forward direction?)
  • Object is extremely far away and appears as a sub-pixel dot
  • Object is at the origin and the camera is also at the origin (they overlap)
  • Projection matrix has wrong near/far or aspect ratio

Object is inside-out (seeing backfaces):

  • Winding order is wrong -- vertices are specified clockwise instead of counter-clockwise (or vice versa)
  • Model matrix has a negative scale (reflection), which reverses winding order

Object is upside down:

  • Y-axis convention mismatch: some systems use Y-up, others use Y-down
  • Viewport transform forgot to flip Y

Object is distorted or squished:

  • Aspect ratio in projection matrix doesn't match the window aspect ratio
  • Non-uniform scale is being applied accidentally

The Matrix Debugging Technique

When your transforms look wrong, multiply test points through each matrix one at a time and verify the results at each stage:

Vec4 objPos(0, 0, 0, 1);  // object-space origin

Vec4 worldPos = model * objPos;
printf("World: (%.2f, %.2f, %.2f)\n", worldPos.x, worldPos.y, worldPos.z);
// Should be where you placed the object

Vec4 camPos = view * worldPos;
printf("Camera: (%.2f, %.2f, %.2f)\n", camPos.x, camPos.y, camPos.z);
// Z should be negative (OpenGL convention: camera looks along -Z)

Vec4 clipPos = proj * camPos;
printf("Clip: (%.2f, %.2f, %.2f, %.2f)\n", clipPos.x, clipPos.y, clipPos.z, clipPos.w);
// x,y,z should be within [-w, w] for visible points

Vec3 ndc = perspectiveDivide(clipPos);
printf("NDC: (%.2f, %.2f, %.2f)\n", ndc.x, ndc.y, ndc.z);
// Should be within [-1, 1] on all axes

Vec2 screen = viewportTransform(ndc, viewport);
printf("Screen: (%.1f, %.1f)\n", screen.x, screen.y);
// Should be within [0, width] x [0, height]

If any stage produces values outside the expected range, you've found where the problem is. This technique is invaluable and will save you hours of staring at black screens.


How Real Engines Handle Vertex Processing

In a production engine, the vertex processing stage handles much more than basic MVP transforms.

Instanced Rendering

Rather than issuing separate draw calls for 1000 trees with different positions, you submit one draw call with 1000 instances. Each instance gets a different model matrix from a buffer, but shares the same vertex data. The vertex shader receives an instanceID telling it which instance it's processing:

// Instanced vertex shader
layout(location = 3) in mat4 instanceModel;  // per-instance model matrix

void main() {
    gl_Position = projection * view * instanceModel * vec4(aPos, 1.0);
}

This dramatically reduces CPU overhead. Instead of 1000 draw calls (with all the state-setting overhead), you have 1 draw call that renders 1000 objects.

Skinned Meshes

Animated characters use vertex skinning. Each vertex is influenced by one or more bones, and the vertex shader blends the bone transforms based on per-vertex weights:

// Simplified skeletal animation in vertex shader
uniform mat4 boneMatrices[64];  // one matrix per bone

void main() {
    mat4 skinMatrix =
        boneMatrices[boneIndex0] * boneWeight0 +
        boneMatrices[boneIndex1] * boneWeight1 +
        boneMatrices[boneIndex2] * boneWeight2 +
        boneMatrices[boneIndex3] * boneWeight3;

    gl_Position = projection * view * model * skinMatrix * vec4(aPos, 1.0);
}

Each vertex is influenced by up to 4 bones (a common limit). The bone matrices are updated on the CPU each frame based on the animation data. We'll explore this thoroughly in Chapter 32.


Summary

Concept Key Takeaway
Vertex attributes Position, normal, UV, color -- bundled per vertex, interpolated across triangles
MVP transform Model (object→world) * View (world→camera) * Projection (camera→clip)
Vertex shader Runs once per vertex, transforms position, passes attributes through
Clipping Discards or trims triangles outside the view frustum
Perspective divide Divide by w to get NDC. Makes distant things smaller.
Viewport transform NDC [-1,1] → screen pixels [0,width] x [0,height]. Don't forget Y-flip.
Backface culling Discard triangles facing away from camera using winding order
Homogeneous coordinates 4D vectors that make translation and perspective into matrix multiplies


Index Buffers and Vertex Sharing

In our cube example, we defined 8 vertices but needed 36 index entries (12 triangles x 3). However, there's a subtlety: do we actually share vertices across faces?

For a cube with per-face normals (each face has a different normal), a vertex at a corner can't be shared between adjacent faces -- it needs different normals depending on which face it belongs to. This means we actually need 24 unique vertices (4 per face x 6 faces), not 8.

Corner vertex needs different normals for each face:

     ↑ Normal for top face
     │
     ●── → Normal for right face
    ╱
   ╱
  Normal for front face

  Same position, THREE different vertices (one per face)

This is a common source of confusion. Vertices are shared only when ALL attributes match, not just position. If two triangles meet at a corner but have different normals (hard edge) or different UVs (texture seam), the vertex must be duplicated.

Smooth objects (spheres, organic shapes) can share vertices across triangles because the normals vary smoothly. Hard-edged objects (cubes, mechanical parts) need separate vertices at each sharp edge.

// Hard edge: 2 separate vertices at the same position
Vertex leftFace  = {{-0.5, 0.5, 0.5}, {-1, 0, 0}, {1, 1}, {1,1,1}};  // normal points left
Vertex frontFace = {{-0.5, 0.5, 0.5}, { 0, 0, 1}, {0, 1}, {1,1,1}};  // normal points forward
// Same position, different normals → can't share

// Smooth surface: shared vertex
// Both adjacent triangles use the same vertex with the averaged normal
Vertex shared = {{0, 1, 0}, {0, 1, 0}, {0.5, 1.0}, {1,1,1}};

Loading Mesh Data

For anything beyond a cube, you'll want to load mesh data from a file rather than defining it by hand. The simplest common format is OBJ (Wavefront .obj):

# A simple OBJ file
v -1.0 -1.0  0.0    # vertex 1
v  1.0 -1.0  0.0    # vertex 2
v  0.0  1.0  0.0    # vertex 3

vn 0.0 0.0 1.0      # normal (facing +Z)

vt 0.0 0.0          # texture coord 1
vt 1.0 0.0          # texture coord 2
vt 0.5 1.0          # texture coord 3

f 1/1/1 2/2/1 3/3/1  # face: vertex/texcoord/normal indices

A minimal OBJ loader for our rasterizer:

struct OBJMesh {
    std::vector<Vertex> vertices;
    std::vector<int> indices;
};

OBJMesh loadOBJ(const char* filename) {
    std::ifstream file(filename);
    std::string line;

    std::vector<Vec3> positions;
    std::vector<Vec3> normals;
    std::vector<Vec2> texcoords;
    OBJMesh mesh;

    while (std::getline(file, line)) {
        std::istringstream iss(line);
        std::string prefix;
        iss >> prefix;

        if (prefix == "v") {
            Vec3 p;
            iss >> p.x >> p.y >> p.z;
            positions.push_back(p);
        } else if (prefix == "vn") {
            Vec3 n;
            iss >> n.x >> n.y >> n.z;
            normals.push_back(n);
        } else if (prefix == "vt") {
            Vec2 t;
            iss >> t.x >> t.y;
            texcoords.push_back(t);
        } else if (prefix == "f") {
            // Parse face indices (v/vt/vn format)
            for (int i = 0; i < 3; i++) {
                std::string vertStr;
                iss >> vertStr;
                // Parse slash-separated indices...
                int vi, ti, ni;
                sscanf(vertStr.c_str(), "%d/%d/%d", &vi, &ti, &ni);

                Vertex vert;
                vert.position = positions[vi - 1];  // OBJ is 1-indexed
                vert.texcoord = texcoords.empty() ? Vec2{0,0} : texcoords[ti - 1];
                vert.normal = normals.empty() ? Vec3{0,1,0} : normals[ni - 1];
                vert.color = {1, 1, 1};

                mesh.indices.push_back((int)mesh.vertices.size());
                mesh.vertices.push_back(vert);
            }
        }
    }
    return mesh;
}

This is a simplified loader -- a production OBJ loader handles quads (more than 3 vertices per face), missing texture coordinates, material libraries (.mtl files), and other edge cases. We'll build a more complete loader in Chapter 48.

With this loader, you can render meshes exported from Blender or downloaded from the internet. The Stanford Bunny, the Utah Teapot, and Suzanne (Blender's monkey head) are classic test meshes available in OBJ format.


Exercises

  1. Matrix debugging: Print out each matrix (model, view, projection) and verify they make sense. The identity matrix should produce vertices that don't move. A translation-only model matrix should shift vertices but not rotate them. Trace a known vertex through the full MVP chain and verify each intermediate result.

  2. Camera movement: Modify the camera position and target in the lookAt call. What happens when the camera is inside the cube? What happens when the camera looks straight down the Y axis? (Hint: the up vector and forward vector become parallel -- what goes wrong?)

  3. Aspect ratio experiment: Change the viewport to non-square dimensions (like 800x200). What happens if you don't update the projection matrix's aspect ratio to match? What visual distortion do you see?

  4. Backface culling toggle: Add a boolean to enable/disable backface culling. Count how many triangles are drawn with and without it. For a cube viewed from one corner, what fraction of triangles get culled?

  5. Multiple objects: Render two cubes side by side using different model matrices. This is how instancing works at a basic level -- same geometry, different transforms. Then render 100 cubes in a grid pattern.

  6. Orthographic projection: Implement an orthographic projection matrix (no perspective, parallel projection). Compare the output to perspective projection. When would orthographic be useful? (Hint: 2D games, CAD applications, shadow mapping.)

Mat4 orthographic(float left, float right, float bottom, float top, float near, float far) {
    Mat4 m = {};
    m.at(0,0) = 2.0f / (right - left);
    m.at(1,1) = 2.0f / (top - bottom);
    m.at(2,2) = -2.0f / (far - near);
    m.at(0,3) = -(right + left) / (right - left);
    m.at(1,3) = -(top + bottom) / (top - bottom);
    m.at(2,3) = -(far + near) / (far - near);
    m.at(3,3) = 1.0f;
    return m;
}
  1. OBJ loader: Implement the simple OBJ loader shown in this chapter and load a mesh from the internet (try the Stanford Bunny: google "stanford bunny obj"). Render it with your software rasterizer. This is a satisfying milestone -- the first time you render a complex 3D model from scratch.

  2. Field of view experiment: Render the same scene with a 30-degree FOV and a 120-degree FOV. How does the perspective distortion change? A narrow FOV compresses depth (telephoto effect). A wide FOV exaggerates it (fisheye effect). Most games use 60-90 degrees.