diff --git a/source/accessories/amds/firmware/building-and-running-firmware.md b/source/accessories/amds/firmware/building-and-running-firmware.md
new file mode 100644
index 00000000..46914707
--- /dev/null
+++ b/source/accessories/amds/firmware/building-and-running-firmware.md
@@ -0,0 +1,83 @@
+# Building and Running Firmware
+
+This guide provides step-by-step instructions on how to configure, build, and flash the AMDS firmware onto different hardware targets (e.g., AMDS and 2S).
+
+## Prerequisites
+
+- **IDE:** STM32CubeIDE (or your preferred C/C++ IDE configured for ARM Cortex-M development).
+- **Hardware:** ST-Link V2/V3 or equivalent hardware debugger/programmer.
+- **Target Board:** Either an AMDS board or AMDS-compatible board.
+
+## Multi-Target Firmware Project (Custom Build Configurations)
+
+The firmware is designed to operate on multiple target hardware platforms using a single, unified codebase.
+
+- **Target Definitions**: The firmware uses `TARGET_AMDS` and `TARGET_2S` preprocessor macros to conditionally compile board-specific configurations.
+- **Dynamic Peripheral Assignment**: Depending on the selected target, the system correctly configures the corresponding hardware peripherals. For example, `TARGET_AMDS` utilizes `UART4` and `UART5` for the Daisy Chain RX lines, while `TARGET_2S` relies on `USART6` and `USART1`.
+- **Custom Run Configurations**: You can program either an AMDS or other devices without creating separate project branches, simply by toggling the target macro in your build/run configurations.
+
+## Step 1: Open the Project
+
+1. Launch STM32CubeIDE.
+2. Go to **File > Open Projects from File System...**
+3. Select the directory containing the firmware source code (`AMDS\Mainboard\Firmware\mainboard\`) and click **Finish**.
+
+## Step 2: Set the Build Configuration (Target Macro)
+
+The firmware uses preprocessor macros to conditionally compile the correct peripheral assignments and active sensor masks for your specific board. Two targets are currently supported:
+
+- `AMDS`: This is the standard AMDS hardware as documented on this website.
+- `2S`: This is a new target that has only two sensor cards on hardware that is not yet publicly released.
+
+```{tip}
+Nearly all users are on AMDS hardware. When in doubt, select the `AMDS` option.
+```
+
+1. Right-click on `mainboard` and go to **Build Configuration > Set Active > AMDS or 2S**
+
+*Note: Alternatively, simply select the appropriate active configuration the Build "hammer" dropdown menu.*
+
+## Step 3: Build the Project
+
+1. **Clean** the project to ensure no artifact mix-ups from previous board builds: Go to **Project > Clean...** and select your project.
+2. **Build** the project: Click the **Build** (hammer) icon or go to **Project > Build Project**.
+3. Check the console output to ensure there are no compilation errors and that the build finishes successfully.
+
+```{important}
+If you did not set the build configuration in the previous steps you will see many compilation errors that look like this:
+
+#error "Please define a target board (TARGET_AMDS or TARGET_2S)!"
+```
+
+## Step 4: Configure the Run/Debug Settings
+
+1. Connect your ST-Link to your PC and the target board's SWD (Serial Wire Debug) header.
+2. Power on the target board.
+3. In STM32CubeIDE, go to **Run > Debug Configurations...**
+4. Double-click **STM32 Cortex-M C/C++ Application** to create a new configuration.
+5. In the **Main** tab, ensure the correct `.elf` file is selected in `C/C++ Application` as either `AMDS/mainboard.elf` or `2S/mainboard.elf`.
+6. In the **Debugger** tab, ensure the Debug probe is set to **ST-LINK** and the interface is set to **SWD**.
+7. Click **Apply**.
+
+```{image} images/debugger-config-options.svg
+:width: 75%
+```
+
+## Step 5: Flash and Verify
+
+1. Click **Debug** (or **Run**) from the configuration window to flash the firmware.
+2. The IDE will connect to the board, erase the necessary flash sectors, and write the new firmware.
+3. Once flashing is complete, if you are in Debug mode, click the **Resume** (play) button to start execution.
+4. **Verification:** Observe the board's behavior. Depending on your configuration, verify that the active sensor mask operates correctly (AMDS enables all 8 channels `0xFF`, 2S enables a subset `0x11`) and that UART/DMA streams begin processing as expected.
+
+```{tip}
+For the AMDS board, a good indicator that things are running smoothly is the 4 LEDS near the MCU blinking in order.
+```
+
+## Switching Between Targets
+
+Because the project shares a single codebase, programming a different target (`AMDS` vs `2S`) is simple:
+
+1. Disconnect the current board and connect the new one.
+2. Return to **Step 2** and swap the target macro.
+3. Rebuild (**Step 3**) and Flash (**Step 5**).
diff --git a/source/accessories/amds/firmware/daisy-chain.md b/source/accessories/amds/firmware/daisy-chain.md
new file mode 100644
index 00000000..09dce5c8
--- /dev/null
+++ b/source/accessories/amds/firmware/daisy-chain.md
@@ -0,0 +1,59 @@
+# AMDS Daisy Chain
+
+This document outlines the architecture, setup, and salient details of the AMDS's Daisy Chain capability.
+
+## Overview
+
+The AMDC and AMDS allow up to three AMDS boards to be daisy chained together on each of the AMDC's GPIO ports, as shown below.
+
+```{image} images/daisy-chain.svg
+:width: 100%
+```
+
+Each AMDS can run the same firmware, and does not need to know it is in a daisy chain. To each AMDS, the board "downstream" from it (i.e., the board with a lower number in the image above) appears as `master`.
+
+## Theory of Operation
+
+Upon receiving a `SYNC_ADC` signal, the AMDS performs the following operations:
+
+1. Assert `SYNC_ADC` on its upstream port
+2. Collect and transmit sensor card ADC data as described in the [AMDC firmware article](index.md).
+3. Process data received on its incoming `DATA0` and `DATA1` ports from any upstream AMDS boards
+ - Header packets are incremented by `0x04`
+ - Data is transmitted to the corresponding downstream port; for example, if the packet arrived via the upstream `DATA0` port, it will go out the downstream `DATA0` port
+
+## Hardware
+
+The cabling between each pair of boards runs at the same baudrate (20 Mbps).
+
+Currently released AMDS hardware relies on a daisy chain adapter board placed between each pair of AMDS boards to add the necessary transceivers. Details on this board can be found in the AMDS git repo's [`AMDS/Accessories/DaisyChainAdapter` directory](https://github.com/Severson-Group/AMDS/tree/develop/Accessories/DaisyChainAdapter).
+
+Custom cabling must be used between AMDS boards to transpose the UART `RX` and `TX` pins.
+
+## Architecture
+
+### Direct Memory Access (DMA) for Receiving Data
+
+To ensure near zero-CPU overhead when receiving incoming UART data, the firmware utilizes DMA streams to receive `DATA0` and `DATA1` data from upstream AMDS boards.
+
+- **Circular Buffers**: Incoming daisy-chain data is placed into `DAISY_RX1_Pool` and `DAISY_RX2_Pool`, both of which are 256-byte circular buffers (`AMDS_RX_BUF_SIZE`). Utilizing a 256-byte size allows for 8-bit integer math to handle wrap-around without complex modulo logic.
+- **Error Recovery**: In high-noise environments, UART hardware errors (Parity, Overrun, Noise, or Frame errors) can cause the hardware to drop the `DMAR` (DMA Receiver) bit, halting the stream. The UART Interrupt Service Routines (ISRs) actively monitor for these flags, clear them, and immediately re-enable the DMA requests to ensure continuous stream operation without resetting the device.
+
+### Processing Data from Upstream AMDS Devices
+
+Data received from upstream devices is processed immediately after transmitting all data collected from local sensor cards. This is handled by the `process_routing()` function. The timing of this code is carefully optimized to minimize the total transmit time to the AMDC across the enitre link.
+
+Implementation details:
+
+- **Collection of Complete Packets**: The code attempts to collect complete three byte packets prior to processing. Wait timeouts are implemented.
+- **Dual-Stream Optimization**: If both UART streams have at least a full 3-byte packet ready, the logic processes them completely interleaved. This keeps both hardware TX lines saturated simultaneously.
+- **Single-Stream Optimization**: If only one UART has a 3 byte packet (i.e., a different number of packets are broadcast due to `active_sensor_mask != 0xFF` on an upstream AMDS), the code follows a Single-Stream Fast Path.
+- **Fall-Back, Slow Path**: If a packet gets fragmented across a DMA boundary or becomes misaligned, the system reverts to a 1-byte-at-a-time State Machine (the "Slow Path") to recover the stream.
+- **Thread-safe Invocation**: The AMDS attempts to broadcast all DMA data within a single call to `process_routing()` from the `SYNC_ADC` interrupt context. However, if this times out, the firmware provides a fall-back path: the main `while(1)` loop constantly checks `drv_uart_has_dma_data()` and invokes `process_routing()` in a thread-safe manner if any further data arrives.
+
+## Performance
+
+Daisy chain benchmark testing shows the following complete transmission times from assertion of `SYNC_ADC` to the last bit arriving at the AMDC:
+
+- **24 sensors** (3x AMDS boards, each with 8 sensor cards): `27 us`
+- **6 sensors** (3x 2S boards, each with 2 sensor cards): `13.7 us`
diff --git a/source/accessories/amds/firmware/images/daisy-chain.svg b/source/accessories/amds/firmware/images/daisy-chain.svg
new file mode 100644
index 00000000..fb47f1c7
--- /dev/null
+++ b/source/accessories/amds/firmware/images/daisy-chain.svg
@@ -0,0 +1,502 @@
+
+
+
+
diff --git a/source/accessories/amds/firmware/images/debugger-config-options.svg b/source/accessories/amds/firmware/images/debugger-config-options.svg
new file mode 100644
index 00000000..598f3df9
--- /dev/null
+++ b/source/accessories/amds/firmware/images/debugger-config-options.svg
@@ -0,0 +1,59 @@
+
+
+
+
diff --git a/source/accessories/amds/firmware/images/firmware-arch-interface.svg b/source/accessories/amds/firmware/images/firmware-arch-interface.svg
new file mode 100644
index 00000000..a44bb4e4
--- /dev/null
+++ b/source/accessories/amds/firmware/images/firmware-arch-interface.svg
@@ -0,0 +1,363 @@
+
+
+
+
diff --git a/source/accessories/amds/firmware/images/firmware_arch_interface.svg b/source/accessories/amds/firmware/images/firmware_arch_interface.svg
deleted file mode 100644
index 19dd05c2..00000000
--- a/source/accessories/amds/firmware/images/firmware_arch_interface.svg
+++ /dev/null
@@ -1,212 +0,0 @@
-
-
-
-
diff --git a/source/accessories/amds/firmware/index.md b/source/accessories/amds/firmware/index.md
index 328de436..7929d5a5 100644
--- a/source/accessories/amds/firmware/index.md
+++ b/source/accessories/amds/firmware/index.md
@@ -18,7 +18,7 @@ While the architecture of the AMDS firmware is fairly simple, the I/O interface,
The AMDS firmware is designed to interface to the master controller over three logical wires: one signal from the master, and two data lines to the master. Physically, these signals are all differential pairs for noise immunity.
-
+
#### RX Signal: `SYNC_ADC`
@@ -30,9 +30,9 @@ After all sensorcards have been sampled, the AMDS streams all sampled data back
The two TX signals are controlled by the AMDS and go to the master. These are only used to send ADC sample data to the master. As soon as all ADCs are sampled, the AMDS starts sending the latest data to the master using the two TX wires. Two lanes are used so that the data can be transmitted at twice the speed, thus reducing latency.
-The format of the data sent on the TX signals is UART. This means there is no clock line between the master and AMDS: the interface is completely asynchronous. The UART is configured to run at 25 Mbps. Conceptually, the TX lines are actually two distinct UART devices, each with only one-way communication. Both UARTs are configured as 8-bit data, 2 stop bits, and odd parity.
+The format of the data sent on the TX signals is UART. This means there is no clock line between the master and AMDS: the interface is completely asynchronous. The UART is configured to run at 20 Mbps. Conceptually, the TX lines are actually two distinct UART devices, each with only one-way communication. Both UARTs are configured as 8-bit data, 2 stop bits, and odd parity.
-##### Data Format
+### Data Format
The ADCs on the sensor cards are assumed to be 16-bit devices which are all compatible with each other (i.e. they can be daisy-chained and support equal clock rates). See each sensor card's hardware design files for specs on the specific ADCs which are supported. The 16-bit raw data from the ADCs are packed into bytes which are sent across the `DATA0` and `DATA1` UART lines. `DATA0` is used to send the contents of the first four sensor cards and `DATA1` sends the last four sensor card data. The transmissions happen in parallel between the data lines.
@@ -62,15 +62,50 @@ The message structure is equal between both `DATA0` and `DATA1`. However, each m
| ---- | ---- | ---- |
| 0x93 | MSB of sample 4 | LSB of sample 4|
-_NOTE: there is no full CRC included in the transmission. The simple protocol relies on the parity check in the UART packet. This is not a terribly robust approach, but has worked well is moderate EMI environments._
+```{note}
+There is no full CRC included in the transmission. The simple protocol relies on the parity check in the UART packet. This is not a terribly robust approach, but has worked well is moderate EMI environments.
+```
+
+#### Selective Channel Transmitting
+
+To optimize processing and transmission bandwidth, the system supports disabling unused sensor channels.
+
+**Active Sensor Mask**: The AMDS codebase contains a global variable `active_sensor_mask` that acts as a bitmask where `1 = Active` and `0 = Inactive` for each of the 8 AMDS sensor cards. This variable determines which sensor cards' data are sent to the master each time a `SYNC_ADC` is received by the AMDS. For example,
+
+- `active_sensor_mask = 0xFF;`: AMDS will send all 8 channels
+- `active_sensor_mask = 0x01;`: AMDS will only send sensor card channel 1
+- `active_sensor_mask = 0x06;`: AMDS will only send sensor card channels 2 and 3
+
+```{hint}
+The AMDC platform system default is to send all 8 channels. If a reduced number of channels are being sent, the user must alert the AMDC to this by calling `amdc_set_enabled()`, as the AMDS does not communicate configuration settings with master.
+```
+
+#### Optimized Sample-and-Transmit Fast Path
+
+The AMDS firmware has been optimized to minimize time from `SYNC_ADC` until the last bit of data is transmitted to the master. The code path used depends on the value of `active_sensor_mask`:
+
+- `active_sensor_mask != OxFF`: a generalized function `adc_sample_all_daughtercards()` is used.
+- `active_sensor_mask == OxFF`: a highly optimized function `adc_sample_and_transmit_fast_path()` is used.
+
+As compared to the generalized `adc_sample_all_daughtercards()` function, `adc_sample_and_transmit_fast_path` decreases latency by doing the following:
+
+- **Avoid `active_sensor_mask` Conditional Checks**: to remove processor time associated with `if` statements.
+- **Hardware Cycle Counting**: Rather than using `NOP` loops for the 1300ns ADC wait time, the fast path uses the Cortex-M7 DWT Cycle Counter (`DWT->CYCCNT`) for deterministic waiting.
+- **Instruction Interleaving**: The code optimizes wait states by starting SPI reads, and transmitting UART header bytes (`0x90`) while the CPU is waiting for the SPI RX buffers to fill.
+
+Both code paths optimize timing by using the ST32 MCU's UART transmit shift register to queue up two bytes of UART transmit data at a time. This is done by optimizing calls to the inline `drv_uart_putc_fast()` function. When the shift register is empty, the function accepts new data without delay. When the shift register is occupied, the function blocks until it can take the new data byte. If the UART tranmit interface is idle, two back-to-back calls can be made to this function without any blocking delay.
+
+### Daisy Chain
+
+The AMDS firmware includes support for up to three AMDS boards to be connected in series into a "daisy chain," allowing for 24 sensor cards worth of data to be sent to master. Details of this are provided in [AMDS Daisy Chain](daisy-chain.md).
### Interrupt-Driven Design
-After start-up, the AMDS firmware is completely interrupt driven. This means that all processing occurs within an interrupt context, not the main loop. The interrupt which used to drive the firmware occurs on the rising and falling edges of the `SYNC_ADC` signal.
+After start-up, the AMDS firmware is interrupt driven. This means that all critical processing occurs within an interrupt context, not the main loop. The interrupt which used to drive the firmware occurs on the rising and falling edges of the `SYNC_ADC` signal.
In the typical flow, the master is operating its PWM output and thus triggering the `SYNC_ADC` ISR periodically. The ADCs on the sensor cards start their conversions and store the latest data in the AMDS memory. Once this is complete, the AMDS sends the data back to the master. Then the AMDS will wait for the next `SYNC_ADC` interrupt.
-### Performance Limitations
+## Performance Limitations
The AMDS firmware design directly affects the operation limits of the `SYNC_ADC` signal. It will continue to work up to some threshold, at which point some ISRs will be missed and the performance will drop. However, the system will not "crash" -- it will continue to work, albeit not as well.
@@ -95,9 +130,11 @@ The channels in the above scope capture show the following signals from top to b
- C2: The `DATA0` line from the AMDS back to the AMDC, showing 12 bytes (4 x 3-Byte packets) of UART data. This is the data for AMDS sensor card channels 1-4.
- C3: The `DATA1` line from the AMDS back to the AMDC, showing 12 bytes (4 x 3-Byte packets) of UART data. This is the data for AMDS sensor card channels 5-8.
-**Note**: The AMDS firmware always assumes all eight sensor cards must be sampled. Even when they are not populated, the firmware timing remains as if all sensor cards were in pairs of daisy chains. This acts to limit the overall sampling throughput.
+```{hint}
+The default value of `active_sensor_mask` will have the AMDS assume that all eight sensor cards must be sampled. Even when they are not populated, the firmware timing remains as if all sensor cards were in pairs of daisy chains. The only way to improve sample throughput when fewer cards are used is to update `active_sensor_mask` as described [above](#selective-channel-transmitting).
+```
-#### Performance Specifications
+### Performance Specifications
Given a control frequency of `Fs` and PWM switching frequency of `Fsw`, the following constraints must be satisfied for the AMDS firmware to perform well:
@@ -106,11 +143,13 @@ Given a control frequency of `Fs` and PWM switching frequency of `Fsw`, the foll
For application with SiC or GaN inverters where `Fsw` is typically much faster than `Fs`, the AMDS firmware works well.
-**Warning:** When `Fs` is close to `Fsw` (i.e. control frequency is equal to PWM frequency), **the current AMDS firmware design will not work well.**
+```{warning}
+When `Fs` is close to `Fsw` (i.e. control frequency is equal to PWM frequency), **the current AMDS firmware design will not work well.**
+```
## Future Improvements
-The AMDS firmware works, albeit with limitations as described above. Some ideas to improve the system are now described:
+The AMDS firmware works, albeit with limitations as described above. Some ideas to improve the system are now listed:
1. The AMDS cannot be configured from the master. Improvements could use an additional TX/RX pair to enable simple register protocol for config. This could be used to set digital filter bandwidths, turn on/off sensor card slots for faster sampling, etc.
@@ -118,4 +157,9 @@ The AMDS firmware works, albeit with limitations as described above. Some ideas
3. There is no robust CRC error detection on the data transmission from the AMDS to the master device, although the UART parity is used. Future improvements could add a footer CRC to ensure the received message at the master is valid. Error correction codes could also be used to further increase the communication robustness in high EMI environments (e.g. SECDED). There is no free lunch: all of these methods would increase the data transmission latency from the AMDS.
-4. There is no need to transmit the data from all eight sensor cards if they are not all populated. Theoretically, a user could run the AMDS interface MUCH faster with fewer sensor cards installed, if changes are made such that only real data acquired from populated sensor cards are transmitted back to the master.
+```{toctree}
+:hidden:
+
+daisy-chain
+building-and-running-firmware
+```
diff --git a/source/accessories/amds/index.md b/source/accessories/amds/index.md
index bc617c53..7ea38e0c 100644
--- a/source/accessories/amds/index.md
+++ b/source/accessories/amds/index.md
@@ -32,8 +32,8 @@ However, it is a complete system which could be interfaced to any other host dev
- Low voltage
- Current
+- Up to 3 AMDS boards daisy chained per AMDC port (up to 24 sensor cards)
- Synchronous sensor sampling to PWM carrier waveform (up to 100 kHz)
-- Data request rate up to 10 kHz to host device
```{toctree}
:hidden:
@@ -42,4 +42,4 @@ amds-in-action/index
firmware/index
mainboard/index
sensor-cards/index
-```
\ No newline at end of file
+```