Skip to content

Commit 7802d83

Browse files
chore: init docs
1 parent 08598a3 commit 7802d83

10 files changed

Lines changed: 856 additions & 0 deletions

File tree

docs/arch/README.md

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# Aegis Architecture Overview
2+
3+
Aegis is a parameterized FPGA fabric generator written in Dart using the
4+
ROHD hardware description framework. It outputs synthesizable SystemVerilog
5+
for the entire FPGA, from logic tiles to I/O pads to the configuration
6+
chain. This document describes the silicon architecture: how the fabric is
7+
structured, how tiles connect, and how a bitstream programs the device.
8+
9+
## Device Hierarchy
10+
11+
An Aegis device is organized as a layered hierarchy:
12+
13+
```mermaid
14+
graph TD
15+
FPGA[AegisFPGA]
16+
FPGA --> Loader[FabricConfigLoader]
17+
FPGA --> Clock["ClockTile[0..N]"]
18+
FPGA --> IO[IOFabric]
19+
IO --> IOTile["IOTile[0..P]"]
20+
IO --> SerDes["SerDesTile[0..S]"]
21+
FPGA --> Fabric[LutFabric]
22+
Fabric --> Tiles["Tile[x][y] (LUT, BRAM, or DSP)"]
23+
```
24+
25+
The `LutFabric` is a rectangular grid of tiles. Each tile contains a
26+
configurable logic block (CLB) and a routing switchbox. Specialized columns
27+
replace standard LUT tiles with BRAM or DSP tiles at regular intervals.
28+
29+
The `IOFabric` wraps the grid perimeter with I/O pads and SerDes
30+
transceivers. Clock tiles sit outside the fabric and distribute divided
31+
clocks to all tiles.
32+
33+
## Fabric Grid Layout
34+
35+
The grid is `width x height` tiles. Columns are specialized based on
36+
their index:
37+
38+
- **BRAM columns**: placed at every `bramColumnInterval` columns
39+
- **DSP columns**: placed at every `dspColumnInterval` columns, skipping
40+
BRAM positions
41+
- **LUT columns**: all remaining columns
42+
43+
For the Terra 1 device (48x64, bramColumnInterval=16, dspColumnInterval=24):
44+
45+
| Column Type | Count | Tiles per Column | Total Tiles |
46+
|-------------|-------|------------------|-------------|
47+
| LUT | 45 | 64 | 2,880 |
48+
| BRAM | 2 | 64 | 128 |
49+
| DSP | 1 | 64 | 64 |
50+
51+
## Carry Chains
52+
53+
Each column has a vertical carry chain running south to north. The bottom
54+
tile in each column receives `carryIn = 0`, and each tile's `carryOut`
55+
feeds the `carryIn` of the tile above it. This enables fast arithmetic
56+
(adders, counters) without routing through the switchbox.
57+
58+
BRAM tiles pass the carry signal through unchanged.
59+
60+
## Edge I/O
61+
62+
The fabric's four edges aggregate tile outputs using wired-OR. Any tile
63+
on an edge can drive the corresponding external output. I/O pads on the
64+
perimeter connect to these edge signals:
65+
66+
- North edge: `width` pads (left to right)
67+
- East edge: `height` pads (top to bottom)
68+
- South edge: `width` pads (left to right)
69+
- West edge: `height` pads (top to bottom)
70+
71+
Total pads = `2 * width + 2 * height` (224 for Terra 1).
72+
73+
## Configuration
74+
75+
The entire device is programmed through a single serial shift register
76+
chain. Bits are shifted in through the clock tiles, then through I/O
77+
tiles, then SerDes tiles, and finally through the fabric tiles in
78+
row-major order. A `cfgLoad` pulse transfers the shift register contents
79+
to the active configuration registers in parallel.
80+
81+
See [Configuration Chain](configuration.md) for the full protocol.
82+
83+
## Tile Documentation
84+
85+
- [CLB (Configurable Logic Block)](clb.md)
86+
- [Routing](routing.md)
87+
- [BRAM (Block RAM)](bram.md)
88+
- [DSP (Digital Signal Processing)](dsp.md)
89+
- [I/O Pad](io.md)
90+
- [SerDes](serdes.md)
91+
- [Clock Tile](clock.md)
92+
- [Configuration Chain](configuration.md)
93+
94+
## Other Documentation
95+
96+
- [PDK Integration and Tapeout](pdk.md)

docs/arch/bram.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# Block RAM (BRAM) Tile
2+
3+
BRAM tiles provide on-chip memory distributed across the fabric in
4+
dedicated columns. Each BRAM tile implements a dual-port synchronous RAM
5+
that can be read and written independently from two directions.
6+
7+
## Parameters
8+
9+
| Parameter | Default | Description |
10+
|--------------|---------|-------------------------------|
11+
| Data width | 8 bits | Width of each memory word |
12+
| Address width| 7 bits | Address bus width |
13+
| Depth | 128 | Number of words (2^addrWidth) |
14+
15+
## Ports
16+
17+
The two ports are mapped to the tile's directional routing:
18+
19+
- **Port A**: input from the north, output to the south
20+
- **Port B**: input from the west, output to the east
21+
22+
Data, address, and write-enable signals are packed into the routing
23+
tracks. The packing adapts to the available track width:
24+
25+
```
26+
[effAddrWidth-1 : 0] Address bits
27+
[effAddrWidth+effDataWidth-1 : effAddrWidth] Data bits
28+
[effAddrWidth+effDataWidth] Write-enable (if tracks allow)
29+
```
30+
31+
If the track width is narrower than the full address + data width, the
32+
signals are truncated and zero-extended.
33+
34+
## Read/Write Behavior
35+
36+
**Writes** are synchronous. On the rising clock edge, if the port is
37+
enabled and write-enable is asserted, the data word is stored at the
38+
given address. Both ports can write simultaneously (true dual-port).
39+
40+
**Reads** are asynchronous (combinational). When a port is enabled, the
41+
data at the addressed location is continuously driven onto the output
42+
tracks. When disabled, the output is zero.
43+
44+
## Carry Chain
45+
46+
BRAM tiles pass the carry signal through unchanged (`carryOut = carryIn`).
47+
They do not consume or generate carry values.
48+
49+
## Configuration
50+
51+
| Bit | Field |
52+
|--------|---------------|
53+
| `[0]` | Port A enable |
54+
| `[1]` | Port B enable |
55+
| `[7:2]`| Reserved |
56+
57+
**Total: 8 bits**

docs/arch/clb.md

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# Configurable Logic Block (CLB)
2+
3+
The CLB is the fundamental logic element of the Aegis fabric. Each CLB
4+
contains a 4-input lookup table (LUT4), a D flip-flop, and a carry chain
5+
multiplexer (MUXCY). Together these implement arbitrary 4-input
6+
combinational logic with optional registering and fast arithmetic.
7+
8+
## Block Diagram
9+
10+
```mermaid
11+
graph TD
12+
in0 --> LUT4["LUT4 (16-bit truth table)"]
13+
in1 --> LUT4
14+
in2 --> LUT4
15+
in3 --> LUT4
16+
LUT4 -- "lutOut" --> OutMux["Output Mux (cfg 16)"]
17+
LUT4 -- "lutOut" --> FF["D Flip-Flop"]
18+
clk --> FF
19+
FF -- "ffOut" --> OutMux
20+
OutMux --> out
21+
LUT4 -- "propagate (P)" --> MUXCY["MUXCY (cfg 17)"]
22+
carryIn --> MUXCY
23+
in0 -- "generate" --> MUXCY
24+
MUXCY --> carryOut
25+
```
26+
27+
## LUT4
28+
29+
The LUT4 implements any Boolean function of four inputs. It stores a
30+
16-bit truth table in configuration bits `[15:0]`. The output is selected
31+
by using the four inputs as an index into the truth table.
32+
33+
Internally, the LUT is a 4-stage multiplexer tree:
34+
35+
| Stage | Select | Muxes | Operation |
36+
|-------|--------|-------|----------------------------------------|
37+
| 0 | in0 | 8 | `s0[i] = mux(in0, cfg[2i+1], cfg[2i])` |
38+
| 1 | in1 | 4 | `s1[i] = mux(in1, s0[2i+1], s0[2i])` |
39+
| 2 | in2 | 2 | `s2[i] = mux(in2, s1[2i+1], s1[2i])` |
40+
| 3 | in3 | 1 | `out = mux(in3, s2[1], s2[0])` |
41+
42+
The effective computation is `out = cfg[{in3, in2, in1, in0}]`.
43+
44+
### Common Truth Tables
45+
46+
| Function | Truth Table | Notes |
47+
|--------------|-------------|------------------------------|
48+
| 2-input AND | `0x8888` | on in0, in1 |
49+
| 2-input OR | `0xEEEE` | on in0, in1 |
50+
| 2-input XOR | `0x6666` | also used as carry propagate |
51+
| NOT | `0x5555` | inverts in0 |
52+
| Constant 0 | `0x0000` | |
53+
| Constant 1 | `0xFFFF` | |
54+
55+
## D Flip-Flop
56+
57+
The flip-flop captures the LUT output on the rising clock edge. Config
58+
bit `[16]` selects whether the CLB output comes from the flip-flop
59+
(registered) or directly from the LUT (combinational).
60+
61+
- `cfg[16] = 0`: `out = LUT output` (combinational)
62+
- `cfg[16] = 1`: `out = FF output` (registered)
63+
64+
## Carry Chain (MUXCY)
65+
66+
The carry chain enables fast arithmetic by bypassing the general routing
67+
fabric. When carry mode is enabled (config bit `[17] = 1`):
68+
69+
- The LUT output acts as the **propagate** signal (P)
70+
- `carryOut = P ? carryIn : in0`
71+
- `sum = P ^ carryIn` (fast XOR for adder sum bit)
72+
73+
When carry mode is disabled (`cfg[17] = 0`):
74+
- `carryOut = 0`
75+
- The CLB output is the normal LUT/FF output
76+
77+
Carry chains propagate vertically through a column (south to north),
78+
enabling multi-bit adders and counters without consuming routing
79+
resources.
80+
81+
## Configuration Bit Layout
82+
83+
| Bits | Width | Field |
84+
|----------|-------|-------------|
85+
| `[15:0]` | 16 | LUT truth table |
86+
| `[16]` | 1 | FF enable (1 = registered) |
87+
| `[17]` | 1 | Carry mode enable |
88+
89+
**Total: 18 bits**

docs/arch/clock.md

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
# Clock Tile
2+
3+
Clock tiles generate divided clock signals from a reference clock and
4+
distribute them to the fabric. Each clock tile provides four independent
5+
outputs, each with its own divider, phase offset, and duty cycle control.
6+
7+
## Outputs
8+
9+
Each of the four outputs can be independently configured:
10+
11+
| Feature | Range / Options |
12+
|-------------|----------------------------------|
13+
| Divider | 1 to 256 (8-bit, divides by N+1) |
14+
| Phase | 0, 90, 180, or 270 degrees |
15+
| Duty cycle | 50% toggle or single-cycle pulse |
16+
| Enable | Per-output enable bit |
17+
18+
## Divider
19+
20+
Each output has an 8-bit counter that counts from 0 to the configured
21+
divider value. This divides the reference clock frequency by
22+
`(divider + 1)`, giving a range of divide-by-1 to divide-by-256.
23+
24+
## Phase Control
25+
26+
Phase offset shifts the output clock relative to the reference. The
27+
offset is computed as a fraction of the divider period:
28+
29+
| Phase Select | Offset Cycles |
30+
|--------------|---------------------------|
31+
| `00` (0) | 0 |
32+
| `01` (90) | divider / 4 |
33+
| `10` (180) | divider / 2 |
34+
| `11` (270) | divider / 2 + divider / 4 |
35+
36+
## Duty Cycle
37+
38+
In **50% duty mode** (`duty = 1`), the output toggles at the midpoint of
39+
each period, producing a symmetric square wave.
40+
41+
In **pulse mode** (`duty = 0`), the output pulses high for one reference
42+
clock cycle at the phase offset point and remains low otherwise.
43+
44+
## Lock Indicator
45+
46+
The `locked` output is asserted when all enabled clock outputs have
47+
completed at least one full division cycle. This can be used for
48+
synchronization or to gate downstream logic until clocks are stable.
49+
50+
## Configuration
51+
52+
| Bits | Field |
53+
|-------------|---------------|
54+
| `[0]` | Global enable |
55+
| `[8:1]` | Divider 0 - 1 |
56+
| `[16:9]` | Divider 1 - 1 |
57+
| `[24:17]` | Divider 2 - 1 |
58+
| `[32:25]` | Divider 3 - 1 |
59+
| `[34:33]` | Phase 0 |
60+
| `[36:35]` | Phase 1 |
61+
| `[38:37]` | Phase 2 |
62+
| `[40:39]` | Phase 3 |
63+
| `[41]` | Enable 0 |
64+
| `[42]` | Enable 1 |
65+
| `[43]` | Enable 2 |
66+
| `[44]` | Enable 3 |
67+
| `[45]` | Duty 0 |
68+
| `[46]` | Duty 1 |
69+
| `[47]` | Duty 2 |
70+
| `[48]` | Duty 3 |
71+
72+
**Total: 49 bits**

0 commit comments

Comments
 (0)