Skip to content

Elman Network

Dan McLeran edited this page Apr 2, 2026 · 1 revision

Introduction

The Elman network is the simplest recurrent neural network (RNN) architecture in tinymind. It extends a standard feed-forward network by adding feedback connections from the hidden layer's output at the previous time step back into the hidden layer at the current time step. This gives the network a form of short-term memory, enabling it to learn temporal patterns in sequential data.

Elman networks are a good fit for embedded tasks where input data has a temporal component but the dependencies are relatively short-range:

  • Sensor filtering -- smoothing noisy sensor readings by incorporating recent history
  • Simple sequence prediction -- predicting the next value in a periodic or quasi-periodic signal
  • Pattern detection -- recognizing short temporal patterns in IMU, ECG, or vibration data
  • Adaptive control -- adjusting motor control parameters based on recent system behavior

A trainable Elman network (2->3->1) in Q8.8 fixed-point takes just 472 bytes. For inference-only deployment, the footprint drops to 192 bytes.

Architecture

An Elman network has a single hidden layer with recurrent connections of depth 1. At each time step:

  1. The input layer receives the current input values
  2. The hidden layer receives both the input layer output and its own output from the previous time step (via the recurrent layer)
  3. The output layer receives the hidden layer output
         +------------------+
         |  Recurrent Layer |<----+
         | (previous hidden)|     |
         +--------+---------+     |
                  |               |
                  v               |
Input --> [Hidden Layer] ---------+----> Output

The recurrent connection depth is fixed to 1, meaning only the immediately previous time step is fed back. For deeper recurrent connections, use RecurrentNeuralNetwork directly with a custom depth. For gated architectures that can learn longer-term dependencies, see LSTM and GRU Recurrent Networks.

Template Parameters

template<
    typename ValueType,
    size_t NumberOfInputs,
    size_t NumberOfNeuronsInHiddenLayer,
    size_t NumberOfOutputs,
    typename TransferFunctionsPolicy,
    bool IsTrainable = true,
    size_t BatchSize = 1,
    outputLayerConfiguration_e OutputLayerConfiguration = FeedForwardOutputLayerConfiguration
>
class ElmanNeuralNetwork
  • ValueType -- The numeric type for all values and weights (e.g., double, float, or a fixed-point QValue type)
  • NumberOfInputs -- Number of input neurons
  • NumberOfNeuronsInHiddenLayer -- Number of neurons in the single hidden layer
  • NumberOfOutputs -- Number of output neurons
  • TransferFunctionsPolicy -- Policy class providing activation functions, random weight generation, error calculation, and zero tolerance
  • IsTrainable -- Set to false for inference-only deployment (saves ~60% memory)
  • BatchSize -- Number of samples to accumulate before updating weights (default: 1 for online learning)
  • OutputLayerConfiguration -- FeedForwardOutputLayerConfiguration (default) or ClassifierOutputLayerConfiguration for softmax output

ElmanNetwork is also available as a backward-compatible alias with the same template parameters.

Example: Floating-Point Elman Network

This example trains an Elman network to learn a simple temporal XOR pattern where the output depends on both the current and previous inputs.

#include "neuralnet.hpp"
#include "activationFunctions.hpp"

#include <cstdlib>
#include <cstdio>

// Random number generator policy
struct RandomNumberGenerator
{
    static double generateRandomWeight()
    {
        return (static_cast<double>(rand()) / RAND_MAX) * 2.0 - 1.0;
    }
};

// Transfer functions policy
typedef tinymind::FloatingPointTransferFunctions<
    double,
    RandomNumberGenerator,
    tinymind::TanhActivationPolicy,
    tinymind::TanhActivationPolicy> TransferFunctionsType;

// Elman network: 2 inputs, 3 hidden neurons, 1 output
typedef tinymind::ElmanNeuralNetwork<
    double, 2, 3, 1, TransferFunctionsType> ElmanNetworkType;

int main()
{
    srand(42);
    ElmanNetworkType nn;

    // XOR training data
    const double xorInputs[4][2] = {{0, 0}, {0, 1}, {1, 0}, {1, 1}};
    const double xorTargets[4]   = { 0,      1,      1,      0     };

    double inputs[2];
    double target[1];
    double learned[1];

    // Train
    for (int epoch = 0; epoch < 50000; ++epoch)
    {
        for (int pattern = 0; pattern < 4; ++pattern)
        {
            inputs[0] = xorInputs[pattern][0];
            inputs[1] = xorInputs[pattern][1];
            target[0] = xorTargets[pattern];

            nn.feedForward(inputs);
            double error = nn.calculateError(target);

            if (!TransferFunctionsType::isWithinZeroTolerance(error))
            {
                nn.trainNetwork(target);
            }
        }
    }

    // Verify
    for (int pattern = 0; pattern < 4; ++pattern)
    {
        inputs[0] = xorInputs[pattern][0];
        inputs[1] = xorInputs[pattern][1];

        nn.feedForward(inputs);
        nn.getLearnedValues(learned);

        printf("%.0f XOR %.0f = %.4f (expected %.0f)\n",
               inputs[0], inputs[1], learned[0], xorTargets[pattern]);
    }

    return 0;
}

Example: Fixed-Point Elman Network

For embedded deployment without floating-point hardware, use a QValue type:

#include "neuralnet.hpp"
#include "activationFunctions.hpp"
#include "fixedPointTransferFunctions.hpp"

// Q8.8 signed fixed-point: range -128 to ~127.996, resolution 0.00390625
typedef tinymind::QValue<8, 8, true> ValueType;

// Random number generator for fixed-point weights
template<typename VT>
struct RandomNumberGenerator
{
    static VT generateRandomWeight()
    {
        const int r = (rand() % 512) - 256;
        return VT(static_cast<typename VT::FullWidthValueType>(r));
    }
};

// Fixed-point transfer functions with tanh activation
typedef tinymind::FixedPointTransferFunctions<
    ValueType,
    RandomNumberGenerator<ValueType>,
    tinymind::TanhActivationPolicy<ValueType>,
    tinymind::TanhActivationPolicy<ValueType>> TransferFunctionsType;

// Elman network: 2 inputs, 3 hidden neurons, 1 output
typedef tinymind::ElmanNeuralNetwork<
    ValueType, 2, 3, 1, TransferFunctionsType> ElmanNetworkType;

The training loop is identical to the floating-point version. The only difference is that input and target values must be constructed as ValueType instances:

ValueType inputs[2];
ValueType target[1];

inputs[0] = ValueType(0);
inputs[1] = ValueType(1);
target[0] = ValueType(1);

nn.feedForward(inputs);
nn.trainNetwork(target);

Fixed-point Elman networks require the tanh lookup table to be compiled in. Add -DTINYMIND_USE_TANH_8_8=1 to your compiler flags for Q8.8.

Inference-Only Deployment

For deploying a pre-trained network on an embedded device, set IsTrainable=false to eliminate all training code and data:

typedef tinymind::ElmanNeuralNetwork<
    ValueType, 2, 3, 1, TransferFunctionsType, false> InferenceElmanType;

This reduces the instance size from 472 bytes to 192 bytes for a Q8.8 (2->3->1) configuration. Weights can be loaded from an external source using the weight setter methods:

InferenceElmanType nn;

// Load weights from trained network
nn.setInputLayerWeightForNeuronAndConnection(neuron, connection, weight);
nn.setInputLayerBiasWeightForConnection(connection, weight);
nn.setHiddenLayerWeightForNeuronAndConnection(layer, neuron, connection, weight);
nn.setHiddenLayerBiasNeuronWeightForConnection(layer, connection, weight);

// Run inference
nn.feedForward(inputs);
nn.getLearnedValues(output);

See Weight Import Export and PyTorch Interoperability for details on training in PyTorch and deploying in tinymind.

API Reference

Method Description
feedForward(const ValueType* inputs) Forward-propagate inputs through the network
calculateError(const ValueType* targets) Compute error between predicted and target outputs
trainNetwork(const ValueType* targets) Back-propagate error and update weights
getLearnedValues(ValueType* output) Retrieve the network's predicted output values
initializeWeights() Re-randomize all connection weights
getRecurrentLayer() Access the recurrent layer (previous hidden state)
setLearningRate(const ValueType& value) Set the learning rate
setMomentumRate(const ValueType& value) Set the momentum rate
setAccelerationRate(const ValueType& value) Set the acceleration rate
getLearningRate() Get the current learning rate
getMomentumRate() Get the current momentum rate
getAccelerationRate() Get the current acceleration rate

When to Use Elman vs LSTM/GRU

Elman LSTM GRU
Memory (Q8.8, 2->3->1) 472 bytes 952 bytes 808 bytes
Gates None 4 (input, forget, output, cell) 3 (update, reset, candidate)
Long-term dependencies Limited Strong Strong
Training complexity Simple Higher Moderate
Best for Short temporal patterns, simple sequences Long sequences, complex dependencies Balance of capability and efficiency

Use Elman when the temporal dependencies in your data are short (1-2 time steps) and memory is at a premium. For longer-range dependencies, LSTM and GRU networks provide gated mechanisms that prevent gradient vanishing, at the cost of additional memory and computation.

Clone this wiki locally