-
Notifications
You must be signed in to change notification settings - Fork 2
Elman Network
The Elman network is the simplest recurrent neural network (RNN) architecture in tinymind. It extends a standard feed-forward network by adding feedback connections from the hidden layer's output at the previous time step back into the hidden layer at the current time step. This gives the network a form of short-term memory, enabling it to learn temporal patterns in sequential data.
Elman networks are a good fit for embedded tasks where input data has a temporal component but the dependencies are relatively short-range:
- Sensor filtering -- smoothing noisy sensor readings by incorporating recent history
- Simple sequence prediction -- predicting the next value in a periodic or quasi-periodic signal
- Pattern detection -- recognizing short temporal patterns in IMU, ECG, or vibration data
- Adaptive control -- adjusting motor control parameters based on recent system behavior
A trainable Elman network (2->3->1) in Q8.8 fixed-point takes just 472 bytes. For inference-only deployment, the footprint drops to 192 bytes.
An Elman network has a single hidden layer with recurrent connections of depth 1. At each time step:
- The input layer receives the current input values
- The hidden layer receives both the input layer output and its own output from the previous time step (via the recurrent layer)
- The output layer receives the hidden layer output
+------------------+
| Recurrent Layer |<----+
| (previous hidden)| |
+--------+---------+ |
| |
v |
Input --> [Hidden Layer] ---------+----> Output
The recurrent connection depth is fixed to 1, meaning only the immediately previous time step is fed back. For deeper recurrent connections, use RecurrentNeuralNetwork directly with a custom depth. For gated architectures that can learn longer-term dependencies, see LSTM and GRU Recurrent Networks.
template<
typename ValueType,
size_t NumberOfInputs,
size_t NumberOfNeuronsInHiddenLayer,
size_t NumberOfOutputs,
typename TransferFunctionsPolicy,
bool IsTrainable = true,
size_t BatchSize = 1,
outputLayerConfiguration_e OutputLayerConfiguration = FeedForwardOutputLayerConfiguration
>
class ElmanNeuralNetwork-
ValueType -- The numeric type for all values and weights (e.g.,
double,float, or a fixed-pointQValuetype) - NumberOfInputs -- Number of input neurons
- NumberOfNeuronsInHiddenLayer -- Number of neurons in the single hidden layer
- NumberOfOutputs -- Number of output neurons
- TransferFunctionsPolicy -- Policy class providing activation functions, random weight generation, error calculation, and zero tolerance
-
IsTrainable -- Set to
falsefor inference-only deployment (saves ~60% memory) - BatchSize -- Number of samples to accumulate before updating weights (default: 1 for online learning)
-
OutputLayerConfiguration --
FeedForwardOutputLayerConfiguration(default) orClassifierOutputLayerConfigurationfor softmax output
ElmanNetwork is also available as a backward-compatible alias with the same template parameters.
This example trains an Elman network to learn a simple temporal XOR pattern where the output depends on both the current and previous inputs.
#include "neuralnet.hpp"
#include "activationFunctions.hpp"
#include <cstdlib>
#include <cstdio>
// Random number generator policy
struct RandomNumberGenerator
{
static double generateRandomWeight()
{
return (static_cast<double>(rand()) / RAND_MAX) * 2.0 - 1.0;
}
};
// Transfer functions policy
typedef tinymind::FloatingPointTransferFunctions<
double,
RandomNumberGenerator,
tinymind::TanhActivationPolicy,
tinymind::TanhActivationPolicy> TransferFunctionsType;
// Elman network: 2 inputs, 3 hidden neurons, 1 output
typedef tinymind::ElmanNeuralNetwork<
double, 2, 3, 1, TransferFunctionsType> ElmanNetworkType;
int main()
{
srand(42);
ElmanNetworkType nn;
// XOR training data
const double xorInputs[4][2] = {{0, 0}, {0, 1}, {1, 0}, {1, 1}};
const double xorTargets[4] = { 0, 1, 1, 0 };
double inputs[2];
double target[1];
double learned[1];
// Train
for (int epoch = 0; epoch < 50000; ++epoch)
{
for (int pattern = 0; pattern < 4; ++pattern)
{
inputs[0] = xorInputs[pattern][0];
inputs[1] = xorInputs[pattern][1];
target[0] = xorTargets[pattern];
nn.feedForward(inputs);
double error = nn.calculateError(target);
if (!TransferFunctionsType::isWithinZeroTolerance(error))
{
nn.trainNetwork(target);
}
}
}
// Verify
for (int pattern = 0; pattern < 4; ++pattern)
{
inputs[0] = xorInputs[pattern][0];
inputs[1] = xorInputs[pattern][1];
nn.feedForward(inputs);
nn.getLearnedValues(learned);
printf("%.0f XOR %.0f = %.4f (expected %.0f)\n",
inputs[0], inputs[1], learned[0], xorTargets[pattern]);
}
return 0;
}For embedded deployment without floating-point hardware, use a QValue type:
#include "neuralnet.hpp"
#include "activationFunctions.hpp"
#include "fixedPointTransferFunctions.hpp"
// Q8.8 signed fixed-point: range -128 to ~127.996, resolution 0.00390625
typedef tinymind::QValue<8, 8, true> ValueType;
// Random number generator for fixed-point weights
template<typename VT>
struct RandomNumberGenerator
{
static VT generateRandomWeight()
{
const int r = (rand() % 512) - 256;
return VT(static_cast<typename VT::FullWidthValueType>(r));
}
};
// Fixed-point transfer functions with tanh activation
typedef tinymind::FixedPointTransferFunctions<
ValueType,
RandomNumberGenerator<ValueType>,
tinymind::TanhActivationPolicy<ValueType>,
tinymind::TanhActivationPolicy<ValueType>> TransferFunctionsType;
// Elman network: 2 inputs, 3 hidden neurons, 1 output
typedef tinymind::ElmanNeuralNetwork<
ValueType, 2, 3, 1, TransferFunctionsType> ElmanNetworkType;The training loop is identical to the floating-point version. The only difference is that input and target values must be constructed as ValueType instances:
ValueType inputs[2];
ValueType target[1];
inputs[0] = ValueType(0);
inputs[1] = ValueType(1);
target[0] = ValueType(1);
nn.feedForward(inputs);
nn.trainNetwork(target);Fixed-point Elman networks require the tanh lookup table to be compiled in. Add -DTINYMIND_USE_TANH_8_8=1 to your compiler flags for Q8.8.
For deploying a pre-trained network on an embedded device, set IsTrainable=false to eliminate all training code and data:
typedef tinymind::ElmanNeuralNetwork<
ValueType, 2, 3, 1, TransferFunctionsType, false> InferenceElmanType;This reduces the instance size from 472 bytes to 192 bytes for a Q8.8 (2->3->1) configuration. Weights can be loaded from an external source using the weight setter methods:
InferenceElmanType nn;
// Load weights from trained network
nn.setInputLayerWeightForNeuronAndConnection(neuron, connection, weight);
nn.setInputLayerBiasWeightForConnection(connection, weight);
nn.setHiddenLayerWeightForNeuronAndConnection(layer, neuron, connection, weight);
nn.setHiddenLayerBiasNeuronWeightForConnection(layer, connection, weight);
// Run inference
nn.feedForward(inputs);
nn.getLearnedValues(output);See Weight Import Export and PyTorch Interoperability for details on training in PyTorch and deploying in tinymind.
| Method | Description |
|---|---|
feedForward(const ValueType* inputs) |
Forward-propagate inputs through the network |
calculateError(const ValueType* targets) |
Compute error between predicted and target outputs |
trainNetwork(const ValueType* targets) |
Back-propagate error and update weights |
getLearnedValues(ValueType* output) |
Retrieve the network's predicted output values |
initializeWeights() |
Re-randomize all connection weights |
getRecurrentLayer() |
Access the recurrent layer (previous hidden state) |
setLearningRate(const ValueType& value) |
Set the learning rate |
setMomentumRate(const ValueType& value) |
Set the momentum rate |
setAccelerationRate(const ValueType& value) |
Set the acceleration rate |
getLearningRate() |
Get the current learning rate |
getMomentumRate() |
Get the current momentum rate |
getAccelerationRate() |
Get the current acceleration rate |
| Elman | LSTM | GRU | |
|---|---|---|---|
| Memory (Q8.8, 2->3->1) | 472 bytes | 952 bytes | 808 bytes |
| Gates | None | 4 (input, forget, output, cell) | 3 (update, reset, candidate) |
| Long-term dependencies | Limited | Strong | Strong |
| Training complexity | Simple | Higher | Moderate |
| Best for | Short temporal patterns, simple sequences | Long sequences, complex dependencies | Balance of capability and efficiency |
Use Elman when the temporal dependencies in your data are short (1-2 time steps) and memory is at a premium. For longer-range dependencies, LSTM and GRU networks provide gated mechanisms that prevent gradient vanishing, at the cost of additional memory and computation.