This document provides a rigorous treatment of the theoretical underpinnings behind the Genetic Algorithm (GA) approach to the 0/1 Knapsack Problem implemented in this repository.
The 0/1 Knapsack Problem belongs to the class of NP-complete problems, as established by Richard Karp in his seminal 1972 paper "Reducibility Among Combinatorial Problems."
Instance: A finite set
Question: Is there a subset
The optimization variant — used in this implementation — seeks the subset that maximizes total value subject to the capacity constraint:
| Aspect | Complexity |
|---|---|
| Brute force enumeration | |
| Dynamic programming |
|
| Fully polynomial-time approximation scheme (FPTAS) | |
| This GA implementation |
|
The DP approach is pseudo-polynomial because
The theoretical justification for GAs rests on John Holland's Schema Theorem, which provides a lower bound on the expected number of instances of a schema in the next generation:
Where:
-
$m(H, t)$ — number of instances of schema$H$ at generation$t$ -
$f(H)$ — average fitness of strings matching schema$H$ -
$\bar{f}$ — average fitness of the entire population -
$p_c$ — crossover probability -
$\delta(H)$ — defining length of the schema -
$l$ — chromosome length -
$p_m$ — mutation probability per bit -
$o(H)$ — order of the schema (number of fixed positions)
Implication: Short, low-order, above-average schemata receive exponentially increasing trials in subsequent generations. This is the Building Block Hypothesis — GAs work by assembling short, high-fitness building blocks into complete solutions.
Rudolph (1994) proved that a GA with elitist selection converges to the global optimum with probability 1, given:
- The mutation operator can reach any point in the search space from any other point (ergodicity)
- The best solution found so far is always preserved (elitism)
This implementation satisfies both conditions:
- Bit-flip mutation can transform any binary string into any other
-
$(\mu + \lambda)$ selection ensures the best individual always survives
Therefore, this algorithm provably converges to the optimal knapsack packing given sufficient generations.
The crossover operator in this implementation uses a fixed midpoint cut:
Positional bias: Fixed-point crossover introduces positional bias — genes near the crossover point have higher linkage disruption. For the knapsack problem (where gene order is arbitrary), this is acceptable. For problems with epistatic linkage, uniform crossover would be preferred.
Exploration capacity: Each crossover produces 2 offspring (symmetric cuts), doubling the population before survivor selection. This creates a
The mutation operator is triggered by three conditions:
- Duplicate detection — if the offspring already exists in the population
- Infeasibility — if the offspring violates the weight constraint
-
Stochastic trigger — with probability
$p_m / 100$
This creates an adaptive mutation pressure that increases when the population is losing diversity (many duplicates) or when crossover produces infeasible solutions. This is more sophisticated than standard fixed-rate mutation.
Infeasible solutions are handled through a repair + penalize hybrid:
- Repair: Mutation iteratively flips bits until a feasible solution is found
- Penalize: The fitness function's capacity-proximity term naturally penalizes solutions far from the capacity limit
This avoids the need for explicit penalty functions with tunable coefficients — a common source of brittleness in constrained evolutionary optimization.
The absolute worth function:
Can be simplified to:
Case 1:
This shows that fitness increases as
Case 2:
These solutions are filtered out by the constraint check, so this case doesn't affect selection.
The ratio
- Early generations: high variance in fitness → strong selection pressure → rapid improvement
- Late generations: low variance → weak selection pressure → fine-tuning
This natural annealing of selection pressure is visible in the convergence plots: steep improvement in generations 0–20, gradual refinement in generations 20–100.
SA explores via random perturbation with a temperature-controlled acceptance function. The GA's mutation operator serves a similar role, but operates on a population rather than a single solution. This implicit parallelism gives GAs better exploration of multimodal landscapes.
PSO uses velocity vectors and social/cognitive components for search. GAs achieve similar behavior through crossover (social information sharing) and mutation (individual exploration). For binary problems like knapsack, GA's binary encoding is more natural than PSO's continuous velocity model.
ACO constructs solutions incrementally using pheromone trails. GAs operate on complete solutions, making them better suited for problems where partial solutions don't have meaningful fitness values.
| Operation | Complexity |
|---|---|
| Fitness evaluation |
|
| Crossover |
|
| Mutation |
|
| Duplicate check |
|
| Sorting | |
| Total per generation |
For the default configuration (
Compare to brute force:
- Karp, R. M. (1972). Reducibility Among Combinatorial Problems. In: Complexity of Computer Computations, pp. 85–103.
- Holland, J. H. (1975). Adaptation in Natural and Artificial Systems. University of Michigan Press.
- Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley.
- Rudolph, G. (1994). Convergence Analysis of Canonical Genetic Algorithms. IEEE Transactions on Neural Networks, 5(1), 96–101.
- Michalewicz, Z. (1996). Genetic Algorithms + Data Structures = Evolution Programs. Springer-Verlag.
- Martello, S. & Toth, P. (1990). Knapsack Problems: Algorithms and Computer Implementations. John Wiley & Sons.
- Eiben, A. E. & Smith, J. E. (2015). Introduction to Evolutionary Computing. Springer, 2nd Edition.
"The measure of intelligence is the ability to change." — Charles Darwin