Dynamic Node Creation (Ash, 1989)
This page is part of the Constructive Neural Networks learning project.
The page is a work-in-progress. Expected date of completion: 13 July 2025
Dynamic Node Creation in Backpropagation Networks
[edit | edit source]Dynamic Node Creation (DNC) is a constructive algorithm for artificial neural networks, introduced by Timur Ash in 1989, that automatically determines an appropriate network topology during the training process. Instead of requiring the user to guess the optimal number of hidden nodes beforehand, DNC starts with a minimal network and adds new nodes only when learning progress stagnates. This approach addresses one of the fundamental challenges in neural network design: selecting an architecture that is large enough to solve a problem but small enough to generalize well and train efficiently.
This page provides an overview of the algorithm, its mechanics, and a practical implementation based on Ash's paper, "Dynamic Node Creation in Backpropagation Networks".
The Problem of Network Architecture
[edit | edit source]When using a standard backpropagation network, a key design decision is the number of hidden layers and the number of nodes within those layers. This choice presents a dilemma:
- Too few nodes: The network may lack the capacity to learn the underlying input-output mapping, resulting in high error.
- Too many nodes: The network may be computationally expensive to train and prone to "overfitting," where it memorizes the training data (including its statistical noise) instead of learning the general relationship. This leads to poor performance on new, unseen data.
Two general strategies have emerged to automate topology selection:
- Pruning: Start with a network that is larger than necessary and remove (prune) redundant nodes or weights during or after training.
- Construction (or Growing): Start with a small network and add new nodes or connections as needed until the problem is solved.
Dynamic Node Creation is a prominent example of the constructive approach.
The Dynamic Node Creation (DNC) Algorithm
[edit | edit source]The core idea of DNC is to monitor the network's learning progress and add a new hidden node whenever the error curve flattens out, indicating that the network has reached the limit of its current capacity.
The network begins with a minimal architecture, typically a single hidden node. It trains using the standard backpropagation algorithm. During training, the algorithm checks if a new node should be added based on a "trigger condition."
The Trigger Condition
[edit | edit source]A new hidden node is added when the rate of decrease in the average squared error becomes too slow. The decision is based on several parameters:
| Symbol | Definition |
|---|---|
| The current time, measured in training trials (epochs). | |
| The time (epoch) when the last node was added. Initially, . | |
| The width of the time window (in epochs) over which the error slope is evaluated. | |
| The average squared error per output node at time . | |
| The average squared error at the time the last node was added. | |
| The "trigger slope"—a user-defined threshold. If the normalized rate of error reduction falls below this value, a new node is added. |
A new node is added if both of the following conditions are met:
1. The learning progress has flattened:
2. The network has trained for at least epochs with the current topology:
The first condition measures the relative drop in error over the last epochs. By normalizing with , the condition is less sensitive to the absolute magnitude of the error, which is typically high at the beginning of training.
Stopping Condition
[edit | edit source]The process of adding nodes is not infinite. Node growth is disabled once the network's performance reaches a desired level of accuracy. This is determined by two user-specified cutoffs:
- : The desired cutoff for average squared error.
- : The desired cutoff for the maximum squared error on any single output for any pattern.
Node growth stops when:
where is the maximum squared error at time . After this point, the network can continue to train with its final architecture to fine-tune its weights.
Algorithm Pseudocode
[edit | edit source]Initialize network with a minimal number of hidden nodes (e.g., 1)
Initialize DNC parameters: w, Δ_T, C_a, C_m
t_0 = 0
a_t0 = initial_error
node_growth_enabled = True
FOR t = 1 TO max_epochs:
Train network on all data for one epoch
Calculate a_t (average squared error)
Calculate m_t (maximum squared error)
// Check stopping condition for node growth
IF (a_t <= C_a AND m_t <= C_m) AND node_growth_enabled:
node_growth_enabled = False
PRINT "Target accuracy reached. Disabling node growth."
// Check trigger condition to add a new node
IF node_growth_enabled AND (t - t_0 >= w):
error_drop = error_at(t-w) - a_t
slope = error_drop / a_t0
IF slope < Δ_T:
ADD one new node to the hidden layer
Re-initialize the optimizer for the new network parameters
t_0 = t
a_t0 = a_t
// Check for final convergence
IF training_is_sufficiently_complete:
BREAK
PyTorch Implementation Example
[edit | edit source]Here we provide a practical implementation of the DNC algorithm in PyTorch. The code demonstrates how to build a network that can grow dynamically and how to integrate the trigger logic into the training loop.
The Dynamic Network Module
[edit | edit source]First, we define a PyTorch nn.Module that contains a method, add_hidden_node(), to dynamically resize its layers. When a node is added, the existing weights are preserved to retain the learned information, and the new weights are initialized to small random values.
import torch
import torch.nn as nn
class DynamicNet(nn.Module):
"""
A neural network that can dynamically add nodes to its hidden layer.
"""
def __init__(self, input_size, output_size, initial_hidden_size=1):
super(DynamicNet, self).__init__()
self.input_size = input_size
self.output_size = output_size
self.hidden_size = initial_hidden_size
self.hidden_layer = nn.Linear(self.input_size, self.hidden_size)
self.output_layer = nn.Linear(self.hidden_size, self.output_size)
self.activation = nn.Sigmoid()
self.initialize_weights()
def forward(self, x):
x = self.activation(self.hidden_layer(x))
x = self.activation(self.output_layer(x))
return x
def initialize_weights(self):
# Initialize weights to small random values as per the paper
init_range = 0.1666
for param in self.parameters():
if param.data.ndimension() >= 2:
nn.init.uniform_(param.data, -init_range, init_range)
else:
nn.init.zeros_(param.data)
def add_hidden_node(self):
"""Adds a single node to the hidden layer."""
print(f"\n--- Adding a new hidden node. New size: {self.hidden_size + 1} ---\n")
# Store old weights
old_hidden_weights = self.hidden_layer.weight.data
old_hidden_biases = self.hidden_layer.bias.data
old_output_weights = self.output_layer.weight.data
# Increment hidden layer size and create new layers
self.hidden_size += 1
new_hidden_layer = nn.Linear(self.input_size, self.hidden_size)
new_output_layer = nn.Linear(self.hidden_size, self.output_size)
# Initialize new layers with small random weights
init_range = 0.1666
nn.init.uniform_(new_hidden_layer.weight.data, -init_range, init_range)
nn.init.zeros_(new_hidden_layer.bias.data)
nn.init.uniform_(new_output_layer.weight.data, -init_range, init_range)
nn.init.zeros_(new_output_layer.bias.data)
# Copy old weights into the new layers
new_hidden_layer.weight.data[:-1, :] = old_hidden_weights
new_hidden_layer.bias.data[:-1] = old_hidden_biases
new_output_layer.weight.data[:, :-1] = old_output_weights
# Replace the old layers with the new ones
self.hidden_layer = new_hidden_layer
self.output_layer = new_output_layer
The Training Loop with DNC Logic
[edit | edit source]The training function orchestrates the process. It runs a standard backpropagation loop but also includes the logic to check the trigger condition at regular intervals.
import collections
import torch.optim as optim
def train_dnc(model, X, y, max_epochs=20000):
# Parameters from the paper
learning_rate = 0.5
momentum = 0.9
w = 1000 # Window width
delta_T = 0.05 # Trigger slope
C_a = 0.001 # Desired average squared error
C_m = 0.01 # Desired maximum squared error
optimizer = optim.SGD(model.parameters(), lr=learning_rate, momentum=momentum)
criterion = nn.MSELoss(reduction='none')
t0 = 0
error_history = collections.deque(maxlen=w + 1)
a_t0 = 1.0
node_growth_enabled = True
for t in range(1, max_epochs + 1):
# Standard training step
model.train()
optimizer.zero_grad()
outputs = model(X)
loss_per_pattern = criterion(outputs, y)
a_t = loss_per_pattern.mean()
m_t = loss_per_pattern.max()
a_t.backward()
optimizer.step()
error_history.append(a_t.item())
# Check stopping condition for node growth
if node_growth_enabled and (a_t.item() <= C_a and m_t.item() <= C_m):
node_growth_enabled = False
print(f"--- Node growth disabled at epoch {t} ---")
# Check node creation condition
if node_growth_enabled and (t - t0 >= w):
if len(error_history) > w:
a_t_minus_w = error_history[0]
error_drop = a_t_minus_w - a_t.item()
slope = error_drop / a_t0 if a_t0 > 0 else float('inf')
if slope < delta_T:
model.add_hidden_node()
# Reset optimizer with new parameters
optimizer = optim.SGD(model.parameters(), lr=learning_rate, momentum=momentum)
# Reset DNC tracking variables
t0 = t
a_t0 = a_t.item()
error_history.clear()
if t % 1000 == 0:
print(f"Epoch {t:5d} | Hidden Nodes: {model.hidden_size} | Avg Sq Error: {a_t.item():.6f}")
print("\n--- Training Finished ---")
Results and Significance
[edit | edit source]In his paper, Ash demonstrated the effectiveness of DNC on several benchmark problems, including Parity (XOR), Symmetry, and Binary Addition. The key findings were:
- Effectiveness: The DNC algorithm successfully found a solution for every problem attempted.
- Minimal Topologies: In many cases, DNC found the known minimal network architecture required to solve the problem. For the 2-input XOR problem (PAR2), it consistently grows from one to two hidden nodes, the minimum required size.
- Efficiency: The computational cost was comparable to, and sometimes even less than, training a standard backpropagation network that was given the correct topology from the start. This suggests that training on smaller, simpler versions of the problem first can help guide the network toward a good solution more quickly.
- Improved Convergence: For some difficult problems, DNC was able to find a solution where standard backpropagation with a fixed, minimal topology failed to converge.
Dynamic Node Creation provides an elegant and effective solution to the problem of network design. By allowing the network to grow into its solution, it removes the burden of trial-and-error from the practitioner and often results in efficient, compact, and well-performing models.
References
[edit | edit source]- Ash, T. (1989). Dynamic node creation in backpropagation networks. Connection Science, 1(4), 365-375. https://doi.org/10.1080/09540098908915647
- Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In Parallel distributed processing: Explorations in the microstructure of cognition, vol. 1. MIT Press.