Elman Network

Introduction

The Elman network is the simplest recurrent neural network (RNN) architecture in tinymind. It extends a standard feed-forward network by adding feedback connections from the hidden layer's output at the previous time step back into the hidden layer at the current time step. This gives the network a form of short-term memory, enabling it to learn temporal patterns in sequential data.

Elman networks are a good fit for embedded tasks where input data has a temporal component but the dependencies are relatively short-range:

Sensor filtering -- smoothing noisy sensor readings by incorporating recent history
Simple sequence prediction -- predicting the next value in a periodic or quasi-periodic signal
Pattern detection -- recognizing short temporal patterns in IMU, ECG, or vibration data
Adaptive control -- adjusting motor control parameters based on recent system behavior

A trainable Elman network (2->3->1) in Q8.8 fixed-point takes just 472 bytes. For inference-only deployment, the footprint drops to 192 bytes.

Architecture

An Elman network has a single hidden layer with recurrent connections of depth 1. At each time step:

The input layer receives the current input values
The hidden layer receives both the input layer output and its own output from the previous time step (via the recurrent layer)
The output layer receives the hidden layer output

         +------------------+
         |  Recurrent Layer |<----+
         | (previous hidden)|     |
         +--------+---------+     |
                  |               |
                  v               |
Input --> [Hidden Layer] ---------+----> Output

The recurrent connection depth is fixed to 1, meaning only the immediately previous time step is fed back. For deeper recurrent connections, use RecurrentNeuralNetwork directly with a custom depth. For gated architectures that can learn longer-term dependencies, see LSTM and GRU Recurrent Networks.

Template Parameters

template<
    typename ValueType,
    size_t NumberOfInputs,
    size_t NumberOfNeuronsInHiddenLayer,
    size_t NumberOfOutputs,
    typename TransferFunctionsPolicy,
    bool IsTrainable = true,
    size_t BatchSize = 1,
    outputLayerConfiguration_e OutputLayerConfiguration = FeedForwardOutputLayerConfiguration
>
class ElmanNeuralNetwork

ValueType -- The numeric type for all values and weights (e.g., double, float, or a fixed-point QValue type)
NumberOfInputs -- Number of input neurons
NumberOfNeuronsInHiddenLayer -- Number of neurons in the single hidden layer
NumberOfOutputs -- Number of output neurons
TransferFunctionsPolicy -- Policy class providing activation functions, random weight generation, error calculation, and zero tolerance
IsTrainable -- Set to false for inference-only deployment (saves ~60% memory)
BatchSize -- Number of samples to accumulate before updating weights (default: 1 for online learning)
OutputLayerConfiguration -- FeedForwardOutputLayerConfiguration (default) or ClassifierOutputLayerConfiguration for softmax output

ElmanNetwork is also available as a backward-compatible alias with the same template parameters.

Example: Floating-Point Elman Network

This example trains an Elman network to learn a simple temporal XOR pattern where the output depends on both the current and previous inputs.

#include "neuralnet.hpp"
#include "activationFunctions.hpp"

#include <cstdlib>
#include <cstdio>

// Random number generator policy
struct RandomNumberGenerator
{
    static double generateRandomWeight()
    {
        return (static_cast<double>(rand()) / RAND_MAX) * 2.0 - 1.0;
    }
};

// Transfer functions policy
typedef tinymind::FloatingPointTransferFunctions<
    double,
    RandomNumberGenerator,
    tinymind::TanhActivationPolicy,
    tinymind::TanhActivationPolicy> TransferFunctionsType;

// Elman network: 2 inputs, 3 hidden neurons, 1 output
typedef tinymind::ElmanNeuralNetwork<
    double, 2, 3, 1, TransferFunctionsType> ElmanNetworkType;

int main()
{
    srand(42);
    ElmanNetworkType nn;

    // XOR training data
    const double xorInputs[4][2] = {{0, 0}, {0, 1}, {1, 0}, {1, 1}};
    const double xorTargets[4]   = { 0,      1,      1,      0     };

    double inputs[2];
    double target[1];
    double learned[1];

    // Train
    for (int epoch = 0; epoch < 50000; ++epoch)
    {
        for (int pattern = 0; pattern < 4; ++pattern)
        {
            inputs[0] = xorInputs[pattern][0];
            inputs[1] = xorInputs[pattern][1];
            target[0] = xorTargets[pattern];

            nn.feedForward(inputs);
            double error = nn.calculateError(target);

            if (!TransferFunctionsType::isWithinZeroTolerance(error))
            {
                nn.trainNetwork(target);
            }
        }
    }

    // Verify
    for (int pattern = 0; pattern < 4; ++pattern)
    {
        inputs[0] = xorInputs[pattern][0];
        inputs[1] = xorInputs[pattern][1];

        nn.feedForward(inputs);
        nn.getLearnedValues(learned);

        printf("%.0f XOR %.0f = %.4f (expected %.0f)\n",
               inputs[0], inputs[1], learned[0], xorTargets[pattern]);
    }

    return 0;
}

Example: Fixed-Point Elman Network

For embedded deployment without floating-point hardware, use a QValue type:

#include "neuralnet.hpp"
#include "activationFunctions.hpp"
#include "fixedPointTransferFunctions.hpp"

// Q8.8 signed fixed-point: range -128 to ~127.996, resolution 0.00390625
typedef tinymind::QValue<8, 8, true> ValueType;

// Random number generator for fixed-point weights
template<typename VT>
struct RandomNumberGenerator
{
    static VT generateRandomWeight()
    {
        const int r = (rand() % 512) - 256;
        return VT(static_cast<typename VT::FullWidthValueType>(r));
    }
};

// Fixed-point transfer functions with tanh activation
typedef tinymind::FixedPointTransferFunctions<
    ValueType,
    RandomNumberGenerator<ValueType>,
    tinymind::TanhActivationPolicy<ValueType>,
    tinymind::TanhActivationPolicy<ValueType>> TransferFunctionsType;

// Elman network: 2 inputs, 3 hidden neurons, 1 output
typedef tinymind::ElmanNeuralNetwork<
    ValueType, 2, 3, 1, TransferFunctionsType> ElmanNetworkType;

The training loop is identical to the floating-point version. The only difference is that input and target values must be constructed as ValueType instances:

ValueType inputs[2];
ValueType target[1];

inputs[0] = ValueType(0);
inputs[1] = ValueType(1);
target[0] = ValueType(1);

nn.feedForward(inputs);
nn.trainNetwork(target);

Fixed-point Elman networks require the tanh lookup table to be compiled in. Add -DTINYMIND_USE_TANH_8_8=1 to your compiler flags for Q8.8.

Inference-Only Deployment

For deploying a pre-trained network on an embedded device, set IsTrainable=false to eliminate all training code and data:

typedef tinymind::ElmanNeuralNetwork<
    ValueType, 2, 3, 1, TransferFunctionsType, false> InferenceElmanType;

This reduces the instance size from 472 bytes to 192 bytes for a Q8.8 (2->3->1) configuration. Weights can be loaded from an external source using the weight setter methods:

InferenceElmanType nn;

// Load weights from trained network
nn.setInputLayerWeightForNeuronAndConnection(neuron, connection, weight);
nn.setInputLayerBiasWeightForConnection(connection, weight);
nn.setHiddenLayerWeightForNeuronAndConnection(layer, neuron, connection, weight);
nn.setHiddenLayerBiasNeuronWeightForConnection(layer, connection, weight);

// Run inference
nn.feedForward(inputs);
nn.getLearnedValues(output);

See Weight Import Export and PyTorch Interoperability for details on training in PyTorch and deploying in tinymind.

API Reference

Method	Description
`feedForward(const ValueType* inputs)`	Forward-propagate inputs through the network
`calculateError(const ValueType* targets)`	Compute error between predicted and target outputs
`trainNetwork(const ValueType* targets)`	Back-propagate error and update weights
`getLearnedValues(ValueType* output)`	Retrieve the network's predicted output values
`initializeWeights()`	Re-randomize all connection weights
`getRecurrentLayer()`	Access the recurrent layer (previous hidden state)
`setLearningRate(const ValueType& value)`	Set the learning rate
`setMomentumRate(const ValueType& value)`	Set the momentum rate
`setAccelerationRate(const ValueType& value)`	Set the acceleration rate
`getLearningRate()`	Get the current learning rate
`getMomentumRate()`	Get the current momentum rate
`getAccelerationRate()`	Get the current acceleration rate

When to Use Elman vs LSTM/GRU

	Elman	LSTM	GRU
Memory (Q8.8, 2->3->1)	472 bytes	952 bytes	808 bytes
Gates	None	4 (input, forget, output, cell)	3 (update, reset, candidate)
Long-term dependencies	Limited	Strong	Strong
Training complexity	Simple	Higher	Moderate
Best for	Short temporal patterns, simple sequences	Long sequences, complex dependencies	Balance of capability and efficiency

Use Elman when the temporal dependencies in your data are short (1-2 time steps) and memory is at a premium. For longer-range dependencies, LSTM and GRU networks provide gated mechanisms that prevent gradient vanishing, at the cost of additional memory and computation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elman Network

Introduction

Architecture

Template Parameters

Example: Floating-Point Elman Network

Example: Fixed-Point Elman Network

Inference-Only Deployment

API Reference

When to Use Elman vs LSTM/GRU

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally