Name	Name	Last commit message	Last commit date
parent directory ..
examples	examples
src	src
tests	tests
.gitignore	.gitignore
CHANGELOG.md	CHANGELOG.md
README.md	README.md
VERSION	VERSION
VERSION.md	VERSION.md
requirements.txt	requirements.txt

Gossip Protocol

A production-ready implementation of epidemic-style gossip protocols for decentralized state propagation in distributed systems.

Overview

This project provides efficient, fault-tolerant gossip protocols for disseminating information across large-scale distributed systems. Gossip protocols enable eventually consistent state sharing without centralized coordination.

Version: 0.1.0

Features

✅ Push Gossip — Nodes actively push updates to random peers
✅ Pull Gossip — Nodes actively pull updates from random peers
✅ Push-Pull Hybrid — Combines both for faster convergence
✅ Anti-Entropy — Periodic full state reconciliation
✅ Failure Detection — Integrated phi-accrual failure detector
✅ Membership Management — Dynamic join/leave with SWIM protocol
✅ Configurable Fanout — Tune convergence speed vs bandwidth

Architecture

gossip-protocol/
├── src/
│   ├── gossip_node.py        # Core gossip node implementation
│   ├── membership.py          # Cluster membership management (SWIM)
│   ├── failure_detector.py   # Phi-accrual failure detection
│   ├── message.py             # Gossip message types
│   ├── state_store.py         # Versioned state storage (vector clocks)
│   └── config.py              # Configuration
├── examples/
│   ├── distributed_cache.py   # Distributed cache example
│   ├── service_discovery.py   # Service discovery system
│   └── event_bus.py           # Pub/sub event bus
├── tests/
│   ├── test_convergence.py
│   ├── test_failure_detection.py
│   └── test_membership.py
├── docs/
│   ├── ARCHITECTURE.md
│   ├── API.md
│   └── TUNING.md
├── VERSION
└── requirements.txt

Quick Start

Installation

pip install -r requirements.txt

Basic Usage

from gossip_node import GossipNode

# Create a gossip node
node = GossipNode(
    node_id="node1",
    bind_address="127.0.0.1:5001",
    seed_nodes=["127.0.0.1:5002", "127.0.0.1:5003"]
)

# Start the node
node.start()

# Update local state (will be gossiped to cluster)
node.set("service:api", {"host": "10.0.1.5", "port": 8080, "status": "healthy"})

# Read eventually consistent state
value = node.get("service:api")

# Subscribe to state changes
def on_update(key, value):
    print(f"Key {key} updated to {value}")

node.subscribe("service:*", on_update)

Use Cases

1. Service Discovery

# Nodes gossip service endpoints
node.set("service:users", {"host": "10.0.1.5", "port": 8080})
node.set("service:orders", {"host": "10.0.1.6", "port": 8081})

# All nodes eventually see all services
services = node.get_prefix("service:")

2. Distributed Cache

# Cache entries propagate via gossip
cache.set("user:123", user_data, ttl=300)

# Reads hit local state (eventually consistent)
user = cache.get("user:123")

3. Cluster Metadata

# Share configuration across cluster
node.set("config:max_connections", 1000)
node.set("config:timeout", 30)

# All nodes see the same configuration
config = node.get_prefix("config:")

Protocol Details

Push Gossip

Node updates local state
Every gossip interval, select random peers (fanout)
Send state updates to selected peers
Peers merge received updates

Pros: Fast for new updates
Cons: May not reach all nodes if updates stop

Pull Gossip

Every gossip interval, select random peers
Request their state digests
Request full state for missing/stale entries

Pros: Eventually reaches all nodes
Cons: Slower for new updates

Push-Pull (Recommended)

Combines both approaches for optimal convergence:

Push for fast initial propagation
Pull for anti-entropy and completeness

Configuration

config = {
    "gossip_interval": 1000,      # milliseconds between gossip rounds
    "gossip_fanout": 3,            # number of peers to gossip with
    "membership_gossip_interval": 500,  # SWIM membership updates
    "failure_detection_interval": 1000,
    "suspicion_timeout": 5000,    # time before marking node as failed
    "max_transmission_count": 3,   # anti-entropy retransmission
}

Performance Characteristics

Convergence Time: O(log N) gossip rounds to reach all nodes
Message Complexity: O(N log N) messages total
Bandwidth: Configurable via fanout (higher = faster + more bandwidth)
Fault Tolerance: Tolerates up to N-1 failures (eventual recovery)

Failure Detection

Uses Phi-accrual failure detector for adaptive failure detection:

Continuously adapts to network conditions
No hard timeouts (uses statistical likelihood)
Low false positive rate in variable networks

# Check if a node is suspected as failed
if node.is_failed("node2"):
    print("Node2 is likely down")

# Get failure probability
phi = node.get_failure_phi("node2")  # Higher = more likely failed

Membership Management

Implements SWIM (Scalable Weakly-consistent Infection-style Membership):

Nodes join by contacting seed nodes
Periodic ping-req for indirect probing
Graceful leave vs failure detection
Suspicion mechanism to reduce false positives

# Join cluster
node.join(seed_nodes=["127.0.0.1:5001"])

# Get current members
members = node.get_members()

# Leave gracefully
node.leave()

State Semantics

Consistency Model

Eventually consistent: All nodes converge to the same state
Conflict resolution: Last-write-wins with vector clocks
Causality tracking: Vector clocks for happens-before relationships

State Operations

# Set with metadata
node.set("key", "value", metadata={"version": 1, "ttl": 300})

# Conditional update (CAS)
success = node.compare_and_set("key", old_value, new_value)

# Delete with tombstone
node.delete("key")  # Propagates deletion marker

Integration Guide

As a Submodule

git submodule add <repo-url>/gossip-protocol lib/gossip

Network Transport

Default uses UDP multicast. Pluggable transports:

from transport import TCPTransport, UDPTransport

node = GossipNode(
    node_id="node1",
    transport=TCPTransport()  # Or custom transport
)

State Store Backend

Default uses in-memory store. Pluggable backends:

from state_store import RedisStateStore

node = GossipNode(
    node_id="node1",
    state_store=RedisStateStore(redis_url="redis://localhost:6379")
)

Testing

# Unit tests
python -m pytest tests/

# Convergence simulation
python tests/test_convergence.py --nodes 100 --failures 10

# Partition testing
python tests/test_partition.py --partition-duration 30

Tuning Guide

For Low Latency

Increase gossip_fanout (more bandwidth, faster convergence)
Decrease gossip_interval (more CPU, faster updates)

For Large Clusters

Keep fanout low (3-5) for bandwidth efficiency
Increase gossip_interval slightly
Use push-pull to ensure convergence

For Unreliable Networks

Increase max_transmission_count
Tune failure detector threshold (phi)
Consider using TCP transport

See TUNING.md for detailed guidance.

Monitoring

Expose metrics for observability:

metrics = node.get_metrics()
# {
#   "gossip_rounds": 1523,
#   "messages_sent": 4569,
#   "messages_received": 4432,
#   "state_size": 256,
#   "cluster_size": 10,
#   "suspected_failures": 0
# }

Production Considerations

Bandwidth: Monitor and limit gossip traffic in large clusters
State Size: Implement TTL and garbage collection for stale state
Convergence: Monitor convergence time as cluster grows
Security: Add encryption and authentication for production use

Examples

See examples/ for complete examples:

Distributed cache with gossip
Service discovery system
Event bus with pub/sub

Dependencies

Python 3.8+
Optional: msgpack for efficient serialization

Versioning

Current: 0.1.0

Follows Semantic Versioning

References

Standalone Installation

pip install git+https://github.com/navinBRuas/_DistributedSystems#subdirectory=gossip-protocol

Usage

See the sections above and examples/ for usage patterns.

Configuration

See the Configuration section above for gossip settings.

Version

0.1.0 (see VERSION.md)

Changelog

See CHANGELOG.md.

License

MIT License. See repo root LICENSE.

FilesExpand file tree

gossip-protocol

Directory actions

More options

Directory actions

More options

Latest commit

History

gossip-protocol

Folders and files

parent directory

README.md

Gossip Protocol

Overview

Features

Architecture

Quick Start

Installation

Basic Usage

Use Cases

1. Service Discovery

2. Distributed Cache

3. Cluster Metadata

Protocol Details

Push Gossip

Pull Gossip

Push-Pull (Recommended)

Configuration

Performance Characteristics

Failure Detection

Membership Management

State Semantics

Consistency Model

State Operations

Integration Guide

As a Submodule

Network Transport

State Store Backend

Testing

Tuning Guide

For Low Latency

For Large Clusters

For Unreliable Networks

Monitoring

Production Considerations

Examples

Dependencies

Versioning

References

Standalone Installation

Usage

Configuration

Version

Changelog

License