NPU Configuration Guidelines for TI Neural Network Compiler

This document provides comprehensive guidelines for designing neural network models that are optimally configured for the TI NPU (Neural Processing Unit) accelerator found in devices like F28P55 and F28P65.

Overview
Supported Layer Types
Terminology and Notation
Layer Configuration Constraints
Optimal Design Patterns
Common Pitfalls to Avoid
Model Design Checklist

Overview

The TI NPU accelerator provides hardware acceleration for common neural network operations. However, to fully leverage the NPU, models must conform to specific layer configurations. Layers that don't meet these constraints will fall back to software execution, significantly reducing performance.

Key Principle: Design models with NPU constraints in mind from the start, rather than trying to adapt existing models.

Supported Layer Types

Layer Type	NPU Name	Description
First Convolution	FCONV	Convolution with input channel = 1
Generic Convolution	GCONV	Standard convolution with input channels as multiple of 4
Depth-Wise Convolution	DWCONV	Convolution where groups = input channels
Point-Wise Convolution	PWCONV	1x1 convolution for channel mixing
Point-Wise Conv + Residual	PWCONVRES	1x1 convolution with residual addition
Transposed Convolution	TCONV	Upsampling convolution
Fully-Connected	FC	Dense/Linear layer
Average Pooling	AVGPOOL	Global and non-global average pooling
Max Pooling	MAXPOOL	Maximum pooling

Terminology and Notation

Dimension Notation

Symbol	Meaning	Example
iB	Input bit-width	8 (8-bit quantized)
oB	Output bit-width	8
kB	Kernel/weight bit-width	2, 4, or 8
iH	Input height	Sequence length for 1D
iW	Input width	1 for 1D time series
iC	Input channels	Number of input features
oH	Output height	After convolution/pooling
oW	Output width	After convolution/pooling
oC	Output channels	Number of output features
kH	Kernel height	Convolution kernel size
kW	Kernel width	Convolution kernel size
sH	Stride height	Vertical stride
sW	Stride width	Horizontal stride

Value Notation

Notation	Meaning	Examples
any	Any positive integer	1, 2, 3, ...
m4	Multiples of 4	4, 8, 12, 16, 20, ...
m8	Multiples of 8	8, 16, 24, 32, ...
m1b2e7	Range from 2 to 7	2, 3, 4, 5, 6, 7
m1b8	Minimum 8, any value	8, 9, 10, 11, ...
m1b16	Minimum 16, any value	16, 17, 18, ...

Layer Configuration Constraints

FCONV (First Convolution Layer)

Use when the input has exactly 1 channel (e.g., single-variable time series).

Parameter	Constraint	Notes
Input Channels (iC)	1	Fixed - this defines FCONV
Output Channels (oC)	m4	Must be 4, 8, 12, 16, ...
Kernel Height (kH)	any	Flexible
Kernel Width (kW)	1-8	Maximum 8 for 1D convolutions
Kernel Bit-width (kB)	2, 4, or 8	8-bit most common

Example (PyTorch):

# Good: FCONV with iC=1, oC=8
Conv2d(in_channels=1, out_channels=8, kernel_size=(5, 1))

# Bad: oC=6 is not m4
Conv2d(in_channels=1, out_channels=6, kernel_size=(5, 1))

GCONV (Generic Convolution Layer)

Use for intermediate convolution layers where input channels > 1.

Parameter	Constraint	Notes
Input Channels (iC)	m4	Will be padded to m4 if not
Output Channels (oC)	m4	Must be 4, 8, 12, 16, ...
Kernel Height (kH)	any (if kW=1) or specific (if kW>1)	Can be flexible with certain kW values
Kernel Width (kW)	any (if kH=1) or specific (if kH>1)	Can be flexible with certain kH values
Kernel Bit-width (kB)	2, 4, or 8	8-bit most common

Rule: For 1D convolutions (kW=1), kH can be any value (unlimited). Similar flexibility applies for other configurations where one dimension is constrained.

Example (PyTorch):

# Good: 1D convolution - kW=1 allows any kH
Conv2d(in_channels=16, out_channels=32, kernel_size=(5, 1))    # kH=5, kW=1
Conv2d(in_channels=16, out_channels=32, kernel_size=(100, 1))  # kH=any, kW=1 ✓

# Good: 1D convolution with kH=1
Conv2d(in_channels=16, out_channels=32, kernel_size=(1, 5))    # kH=1, kW=5

DWCONV (Depth-Wise Convolution Layer)

Use for efficient spatial filtering with groups=in_channels.

Parameter	Constraint	Notes
Input Channels (iC)	m4	Must be 4, 8, 12, 16, ...
Output Channels (oC)	m4	Equal to iC for true depthwise
Kernel Height (kH)	any (if kW constrained) or specific (if kW=any)	Can be flexible with certain kW values
Kernel Width (kW)	any (if kH constrained) or specific (if kH=any)	Can be flexible with certain kH values
Groups	iC	Must equal input channels

Rule: Similar to GCONV, one dimension can often be flexible while the other is constrained. Compiler guide shows configurations like kH=9 with kW=1.

Example (PyTorch):

# Good: Depthwise with kW=1, allows various kH
Conv2d(in_channels=16, out_channels=16, kernel_size=(3, 1), groups=16)  # kH=3, kW=1 ✓
Conv2d(in_channels=16, out_channels=16, kernel_size=(9, 1), groups=16)  # kH=9, kW=1 ✓
Conv2d(in_channels=16, out_channels=16, kernel_size=(5, 1), groups=16)  # kH=5, kW=1 ✓

PWCONV (Point-Wise Convolution Layer)

Use for channel mixing after depthwise convolution (1x1 convolution).

Parameter	Constraint	Notes
Input Channels (iC)	m4	Must be 4, 8, 12, 16, ...
Output Channels (oC)	m4	Must be 4, 8, 12, 16, ...
Kernel Size	(1, 1)	Fixed for pointwise
Stride	(1, 1)	Fixed

Example (PyTorch):

# Good: 1x1 conv with m4 channels
Conv2d(in_channels=16, out_channels=32, kernel_size=(1, 1))

FC (Fully-Connected Layer)

Parameter	Constraint (8-bit)	Constraint (4-bit)
Input Features	>= 16	>= 8
Output Features	any	any

Critical: Ensure sufficient input features before FC layer!

Example (PyTorch):

# Good: input features = 64 (from 16 channels * 4 spatial)
AdaptiveAvgPool2d((4, 1))  # With 16 channels -> 64 features
Linear(in_features=64, out_features=num_classes)

# Bad: input features = 4 (below minimum)
AdaptiveAvgPool2d((1, 1))  # With 4 channels -> 4 features
Linear(in_features=4, out_features=num_classes)

MAXPOOL (Max Pooling Layer)

Parameter	Constraint	Notes
Input Channels (iC)	m4	Must be 4, 8, 12, 16, ...
Output Channels (oC)	m4	Same as input
Kernel Height (kH)	1-4 or any	If kW is fixed (1-4), then kH can be any; if kH is fixed, max is 4
Kernel Width (kW)	1-4 or any	If kH is fixed (1-4), then kW can be any; if kW is fixed, max is 4

Rule: At least one dimension must be constrained to 1-4; the other can be flexible (any).

Valid Examples (PyTorch):

# All valid - one dimension is fixed to 1-4
MaxPool2d(kernel_size=(3, 1), stride=(2, 1))  # kH=3, kW=1 ✓
MaxPool2d(kernel_size=(1, 4), stride=(1, 2))  # kH=1, kW=4 ✓
MaxPool2d(kernel_size=(8, 1), stride=(4, 1))  # kH=any, kW=1 ✓ (VALID!)
MaxPool2d(kernel_size=(256, 1), stride=(2, 1))  # kH=any, kW=1 ✓
MaxPool2d(kernel_size=(1, 128), stride=(1, 2))  # kH=1, kW=any ✓

Invalid Examples:

# Invalid - both dimensions exceed 4
MaxPool2d(kernel_size=(8, 8), stride=(4, 4))  # kH=8, kW=8 ✗
MaxPool2d(kernel_size=(128, 2), stride=(2, 1))  # kH=128, kW=2 (kH > 4) ✗

AVGPOOL (Average Pooling Layer)

Global Average Pooling:

Parameter	Constraint	Notes
Input Channels (iC)	m4	Must be 4, 8, 12, 16, ...
Output Size	(1, 1)	Global pooling
Condition	*(iH iW) > 2**	Must have spatial dimensions

Non-Global Average Pooling: Converted to DWCONV internally. Follow DWCONV constraints.

Optimal Design Patterns

Pattern 1: MobileNet-Style Depthwise Separable Convolution

The most efficient pattern for NPU acceleration:

# Depthwise convolution (spatial filtering)
Conv2d(in_channels=16, out_channels=16, kernel_size=(3, 1), groups=16)
BatchNorm2d(16)
ReLU()

# Pointwise convolution (channel mixing)
Conv2d(in_channels=16, out_channels=32, kernel_size=(1, 1))
BatchNorm2d(32)
ReLU()

Pattern 2: Channel Doubling Progression

Efficient channel progression that maintains m4 constraint:

# Start: 1 -> 8 (FCONV)
# Then: 8 -> 16 -> 32 -> 64 (GCONV)
channels = [1, 8, 16, 32, 64]

Pattern 3: Replace Large Kernels with Multiple Small Kernels

Instead of one large kernel, use multiple smaller ones:

# Bad: Single large kernel (kH=9 exceeds limit)
Conv2d(in_channels=16, out_channels=32, kernel_size=(9, 1))

# Good: Two smaller kernels (both kH<=7)
Conv2d(in_channels=16, out_channels=16, kernel_size=(5, 1))
Conv2d(in_channels=16, out_channels=32, kernel_size=(5, 1))

Pattern 4: Ensure Sufficient FC Input

Design pooling to ensure minimum FC input features:

# Good: Ensure >= 16 features for FC
# Option A: More channels
AdaptiveAvgPool2d((1, 1))  # With 16+ channels

# Option B: Larger spatial output
AdaptiveAvgPool2d((4, 1))  # With 4+ channels -> 16+ features

Common Pitfalls to Avoid

1. Asymmetric Kernel Constraints (Not Both Dimensions Large)

Problem: Both kernel dimensions exceed limits simultaneously.

# WRONG - Both dimensions too large
Conv2d(in_channels=16, out_channels=32, kernel_size=(8, 8))   # kH=8, kW=8 ✗
Conv2d(in_channels=16, out_channels=32, kernel_size=(128, 5)) # Both > limits ✗

# CORRECT - One dimension can be large if other is fixed
Conv2d(in_channels=16, out_channels=32, kernel_size=(100, 1)) # kH=any, kW=1 ✓
Conv2d(in_channels=16, out_channels=32, kernel_size=(1, 100)) # kH=1, kW=any ✓

Solution: Ensure at least one dimension is within bounds. For 1D convolutions (kW=1 or kH=1), the other dimension can be flexible.

2. Non-m4 Channel Counts

Problem: Channels not divisible by 4.

# WRONG
Conv2d(in_channels=8, out_channels=12, kernel_size=(3, 1))  # 12 is OK
Conv2d(in_channels=12, out_channels=18, kernel_size=(3, 1))  # 18 is NOT m4!

Solution: Always use channels: 4, 8, 12, 16, 20, 24, 32, 48, 64...

3. Insufficient FC Input Features

Problem: FC layer receives fewer than 16 features (8-bit) or 8 features (4-bit).

# WRONG: Only 4 features to FC
Conv2d(1, 4, kernel_size=(3, 1))
AdaptiveAvgPool2d((1, 1))  # 4 channels * 1 * 1 = 4 features
Linear(4, num_classes)  # FAILS on NPU!

Solution: Increase channels or spatial size before FC.

4. MaxPool with Both Dimensions > 4

Problem: MaxPool with both dimensions exceeding the 1-4 limit.

# WRONG - Both dimensions exceed limit
MaxPool2d(kernel_size=(8, 8), stride=(4, 4))      # kH=8, kW=8 ✗
MaxPool2d(kernel_size=(128, 2), stride=(2, 1))    # kH=128, kW=2 ✗ (kH too large)

# CORRECT - One dimension can be large if other is fixed 1-4
MaxPool2d(kernel_size=(8, 1), stride=(4, 1))      # kH=any, kW=1 ✓
MaxPool2d(kernel_size=(256, 1), stride=(2, 1))    # kH=any, kW=1 ✓
MaxPool2d(kernel_size=(1, 128), stride=(1, 2))    # kH=1, kW=any ✓

Solution: Ensure at least one dimension is 1-4. The other dimension can be larger if the first is fixed within the limit.

Model Design Checklist

Use this checklist when designing or reviewing models for NPU compatibility:

Channel Dimensions

First layer input channels = 1 (for FCONV) OR multiple of 4
All intermediate layer channels are multiples of 4
Output channels of all convolutions are multiples of 4

Kernel Sizes

GCONV: For 1D (kW=1), any kH allowed; for 2D, check constraints per compiler guide
DWCONV: For 1D (kW=1), any kH allowed; for 2D, check constraints per compiler guide
FCONV: Verify kH and kW per compiler guide (examples show kH up to 10)
MaxPool: At least one dimension must be 1-4; other can be any (not both > 4)

Fully-Connected Layers

FC input features >= 16 (for 8-bit weights)
FC input features >= 8 (for 4-bit weights)

Pooling

Global AvgPool has (iH * iW) > 2
Non-global AvgPool follows DWCONV constraints
MaxPool kernels <= 4

Depthwise Convolutions

Groups parameter equals input channels
Both input and output channels are m4

General

No padding requirements exceed NPU support
Stride values are supported for each layer type

Quick Reference Card

Layer Type	kH	kW	iC Constraint	oC Constraint
FCONV	10 or any	1, 4, or any	1	m4
GCONV	any (kW=1) or specific	any (kH=1) or specific	m4 (padded)	m4
DWCONV	any/3/9	any/1/3	m4	m4
PWCONV	1	1	m4	m4
MAXPOOL	1-4 or any	1-4 or any	m4	m4
FC	N/A	N/A	>=16 (8-bit)	any

References

TI Neural Network Compiler for MCUs User's Guide v2.1.0
Section 5: Layer Configurations Supported on the NPU
TI Software Download

Document Version: 1.0 Last Updated: January 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NPU Configuration Guidelines for TI Neural Network Compiler

Table of Contents

Overview

Supported Layer Types

Terminology and Notation

Dimension Notation

Value Notation

Layer Configuration Constraints

FCONV (First Convolution Layer)

GCONV (Generic Convolution Layer)

DWCONV (Depth-Wise Convolution Layer)

PWCONV (Point-Wise Convolution Layer)

FC (Fully-Connected Layer)

MAXPOOL (Max Pooling Layer)

AVGPOOL (Average Pooling Layer)

Optimal Design Patterns

Pattern 1: MobileNet-Style Depthwise Separable Convolution

Pattern 2: Channel Doubling Progression

Pattern 3: Replace Large Kernels with Multiple Small Kernels

Pattern 4: Ensure Sufficient FC Input

Common Pitfalls to Avoid

1. Asymmetric Kernel Constraints (Not Both Dimensions Large)

2. Non-m4 Channel Counts

3. Insufficient FC Input Features

4. MaxPool with Both Dimensions > 4

Model Design Checklist

Channel Dimensions

Kernel Sizes

Fully-Connected Layers

Pooling

Depthwise Convolutions

General

Quick Reference Card

References

FilesExpand file tree

NPU_CONFIGURATION_GUIDELINES.md

Latest commit

History

NPU_CONFIGURATION_GUIDELINES.md

File metadata and controls

NPU Configuration Guidelines for TI Neural Network Compiler

Table of Contents

Overview

Supported Layer Types

Terminology and Notation

Dimension Notation

Value Notation

Layer Configuration Constraints

FCONV (First Convolution Layer)

GCONV (Generic Convolution Layer)

DWCONV (Depth-Wise Convolution Layer)

PWCONV (Point-Wise Convolution Layer)

FC (Fully-Connected Layer)

MAXPOOL (Max Pooling Layer)

AVGPOOL (Average Pooling Layer)

Optimal Design Patterns

Pattern 1: MobileNet-Style Depthwise Separable Convolution

Pattern 2: Channel Doubling Progression

Pattern 3: Replace Large Kernels with Multiple Small Kernels

Pattern 4: Ensure Sufficient FC Input

Common Pitfalls to Avoid

1. Asymmetric Kernel Constraints (Not Both Dimensions Large)

2. Non-m4 Channel Counts

3. Insufficient FC Input Features

4. MaxPool with Both Dimensions > 4

Model Design Checklist

Channel Dimensions

Kernel Sizes

Fully-Connected Layers

Pooling

Depthwise Convolutions

General

Quick Reference Card

References