Course: COMP3311 Applied Cryptography
Project: SHA256 Partial Collision Finding
This project implements a program to find partial collisions in the SHA256 hash function. The goal is to find two distinct strings X and Y such that the first 44 bits of their SHA256 hashes are identical, demonstrating the practical challenges of collision resistance in cryptographic hash functions.
Given the SHA256 hash function H: {0,1}* → {0,1}^256, find strings X and Y satisfying:
- X begins with the student's full name:
LauMingHong... - Y begins with the student ID:
22079217d... - The first 44 bits of H(X) and H(Y) are identical
If H(X) = e3b0c44298fc1c149a..., then H(Y) should start with e3b0c44298f...
Instead of using brute force, this implementation employs a meet-in-the-middle strategy with hash tables for efficient collision detection:
- Hash Table Storage: Maintains separate dictionaries for X-prefixes and Y-prefixes
- Incremental Generation: Generates random suffixes for both name and ID strings
- Collision Detection: Checks for matches across three scenarios:
- Direct collision between current X and Y hashes
- Current X hash matches a previously stored Y hash
- Current Y hash matches a previously stored X hash
- Memory-Time Tradeoff: Stores hash prefixes in dictionaries for O(1) lookup
- Random Suffix Generation: Uses
os.urandom()for cryptographically secure randomness - Prefix Matching: Compares only the first 11 hexadecimal characters (44 bits)
- Python Version: 3.6 or higher
- Operating System: Cross-platform (Windows, macOS, Linux)
import hashlib # Built-in SHA256 implementation
import random # Random number generation
import os # Secure random byte generationAll required libraries are part of Python's standard library - no additional installations needed.
git clone https://github.com/AlanLau9809/2024_COMP3311_CollisionFinding.git
cd 2024_COMP3311_CollisionFindingpython --version
# or
python3 --versionpython COMP3311_Proj.py
# or
python3 COMP3311_Proj.pySimply run the Python script:
python COMP3311_Proj.pyThe program will display:
- Current attempt number (real-time progress)
- Collision detection case when found
- The two strings X and Y that produce the collision
- Their corresponding SHA256 hash values
- Total number of attempts required
Collision found in case 2.
Name with random string: LauMingHong8da95c4129c49980f070e8f9848d18a6bc97bd84ac2be7b0
ID with random string: 22079217d82c932b4592a2ae713b8264548a83b43a3fb2b0cab063bcb
hash_X: 41661280505754d9df5e24019b4eda9ccc7cf4e45f99f74cd74731639a573bc5
hash_Y: 41661280505ebacf5d62cd0791c07910ecefa5b3618af89f422d9f659b6f7160
Verification: Both hashes start with 41661280505, confirming the first 44 bits match.
Current attempt: 1
Current attempt: 2
...
Current attempt: [number]
Collision found in case [1/2/3].
Name with random string: [X string]
ID with random string: [Y string]
hash_X: [SHA256 hash of X]
hash_Y: [SHA256 hash of Y]
Final attempts: [total attempts]
- Generates cryptographically secure random strings
- Length varies between 15-25 characters
- Uses
os.urandom()for security
- Main collision detection algorithm
- Implements meet-in-the-middle strategy
- Maintains hash tables for efficient lookup
- Returns collision pairs when found
hashX_Prefix = hashX[:11] # First 44 bits (11 hex chars)
hashY_Prefix = hashY[:11] # First 44 bits (11 hex chars)The algorithm checks three collision scenarios:
- Direct Match: Current X and Y hashes have matching prefixes
- X-to-Y Lookup: Current X prefix matches stored Y prefix
- Y-to-X Lookup: Current Y prefix matches stored X prefix
- Expected Attempts: ~2^22 ≈ 4.2 million (birthday paradox)
- Actual Performance: Typically finds collision in 10K-100K attempts
- Memory Usage: O(n) where n is number of attempts
- Time Complexity: O(n) average case
- Typical Runtime: 30 seconds to 5 minutes
- Hardware Dependent: Faster on modern CPUs
- Memory Efficient: Stores only hash prefixes, not full strings
Verify results using: https://emn178.github.io/online-tools/sha256.html
import hashlib
# Example verification
x = "LauMingHong4a8b2c9e1f3d7a6b8c2e4f9a1b3c5d7e9f2a4b6c8e1a3b5c7d9e2f4a6b8c"
y = "22079217d9f2e4a6b8c1d3f5a7b9c2e4f6a8b1c3d5f7a9b2c4e6f8a1b3c5d7f9a2b"
hash_x = hashlib.sha256(x.encode()).hexdigest()
hash_y = hashlib.sha256(y.encode()).hexdigest()
print(f"X hash: {hash_x}")
print(f"Y hash: {hash_y}")
print(f"First 44 bits match: {hash_x[:11] == hash_y[:11]}")- Feasibility: Ensures reasonable computation time
- Security Demonstration: Shows practical collision finding
- Birthday Paradox: Expected ~2^22 attempts for 50% success probability
- Efficiency: Much faster than pure brute force
- Memory-Time Tradeoff: Uses storage to reduce computation
- Scalability: Can be parallelized for faster results
- Long Runtime: Normal for cryptographic operations
- Memory Usage: Monitor system resources during execution
- Python Version: Ensure Python 3.6+ for proper
os.urandom()support
- Run on systems with sufficient RAM (4GB+ recommended)
- Close unnecessary applications during execution
- Consider running overnight for guaranteed results
2024_COMP3311_CollisionFinding/
├── COMP3311_Proj.py # Main collision finding program
├── COMP3311 Applied Cryptography Project-v2.pdf # Project specification
└── README.md # This documentation
This project demonstrates:
- Collision Resistance: Practical challenges in hash function security
- Birthday Paradox: Probabilistic collision finding
- Cryptographic Implementation: Real-world hash function analysis
- Algorithm Optimization: Efficient collision detection strategies
This project is submitted as coursework for COMP3311 Applied Cryptography at The Hong Kong Polytechnic University. Reference only.
Note: This implementation is for educational purposes only. The techniques demonstrated should not be used for malicious purposes or to compromise real-world cryptographic systems.
