This document introduces a graph propagation-based node correlation analysis algorithm that calculates correlations between nodes in a graph and identifies central nodes through an information propagation mechanism with source identification. The core implementation of the algorithm is in the Gds (Graph Diffusion with Source) class, using the igraph library for graph data structure processing.
The Gds class is the core implementation of the algorithm, containing functions for graph initialization, information propagation, normalization, and central node calculation.
After initialization, the graph G is passed in from outside, and the nodes of the graph have an attribute r_msg that stores the node's correlation messages.
class Gds():
def __init__(self, G):
# Graph initialization coder_msg: Stores node correlation messages, a dictionary in JSON format with source node IDs as keys and correlation degrees as valuesbuffer: Stores message buffers propagated from neighbor nodes, designed to implement parallel transmissionid_nodeid_dict: Mapping from vertex ID to node IDnodeid_id_dict: Mapping from node ID to vertex ID
def __init__(self, G):
self.G = G
self.df_vertexs = self.G.get_vertex_dataframe()
# Create ID mappings
self.id_nodeid_dict = self.df_vertexs.set_index('vertex ID')['node_id'].to_dict()
self.nodeid_id_dict = {v: k for k, v in self.id_nodeid_dict.items()}
# Initialize node attributes
self.G.vs["r_msg"] = json.dumps({})
self.G.vs["buffer"] = json.dumps([])
# Set propagation parameters
self.FADE = 0.3 # Message attenuation coefficient
# Set other parameters based on number of nodesdef add_one_node_ids(self, node_ids):
for node_id in node_ids:
vid = self.nodeid_id_dict[node_id]
node = self.G.vs[vid]
origin_msg = json.loads(self.G.vs[int(vid)]["r_msg"])
# Add source node message
add_msg = {str(node_id): 1}
origin_msg.update(add_msg)
# Merge messages and update
buffer = []
buffer.append(add_msg)
buffer.append(origin_msg)
merged_dict = merge_dicts_with_sum(buffer)
node['r_msg'] = json.dumps(merged_dict)
self.normalize()Specify a series of nodes (node_ids), and set each node's 'r_msg' attribute to {node_id: 1} - node_id: Node identifier - 1: Weight from the source node
def add_one_node_ids(self, node_ids):Each node's message propagates to its neighbor nodes' buffers
def merge_from_buffer(self):Thus, each node's buffer stores messages from all neighbor nodes, forming a message list.
def merge_from_buffer(self):def normalize(self):def show_central(self):
return filtered_dictdef show_nodes(self, node_data):FADE: Message propagation attenuation coefficient, default value is 0.3LIMIT: Correlation filtering threshold, default value is3*(1/number of nodes)MIN_SIZE/MAX_SIZE/DEFAULT_SIZE: Node visualization size parameters
- Initialize graph and parameters
- Add source nodes and set initial messages
- Perform message propagation (single or multiple iterations), propagating messages to buffers
- Merge buffer messages
- Normalize node messages
- Calculate central nodes
- Visualize results
See tests\test.ipynb
- Source Identification: Messages retain the identification of source nodes, allowing tracking of information propagation paths
- Attenuation Mechanism: Messages attenuate during propagation, avoiding excessive influence from distant nodes
- Dynamic Threshold: Dynamically adjusts correlation filtering threshold based on the number of nodes
- Visualization: Provides node importance visualization functionality
- Normalization: Ensures the sum of node messages is 1, facilitating comparison
This algorithm is suitable for social network analysis, fraud detection, recommendation systems, and other scenarios, effectively identifying key nodes and community structures in graphs.