Skip to content

Node2Vec Graph Embeddings

Introduction

Node2Vec is a popular algorithm for generating vector representations of nodes in a graph. It aims to capture the network structure by performing biased random walks on the graph and then applying the Skip-Gram model from natural language processing to generate node embeddings. These embeddings can be used for various downstream tasks such as node classification, link prediction, and clustering.

Mathematical Definition

Given a graph ( G = (V, E) ), where ( V ) is the set of vertices and ( E ) is the set of edges, Node2Vec generates embeddings by optimizing the following objective function:

\[ \max \sum_{u \in V} \sum_{v \in N_S(u)} \log \Pr(v | u) \]
where ( N_S(u) ) is a set of nodes sampled from random walks starting at node ( u ), and ( \Pr(v | u) ) is the conditional probability of node ( v ) given node ( u ).

Conditional Probability

The conditional probability ( \Pr(v | u) ) is defined using the Softmax function:

\[ \Pr(v | u) = \frac{\exp(\mathbf{z}_v \cdot \mathbf{z}_u)}{\sum_{v' \in V} \exp(\mathbf{z}_{v'} \cdot \mathbf{z}_u)} \]
where ( \mathbf{z}_u ) and ( \mathbf{z}_v ) are the embedding vectors of nodes ( u ) and ( v ), respectively.

Random Walks

Node2Vec generates random walks to capture the graph structure. These walks are controlled by two parameters:

  • Return parameter ( p ): Controls the likelihood of returning to the previous node in the walk.
  • In-out parameter ( q ): Controls the likelihood of exploring outward nodes.

Transition Probability

The transition probability from node ( t ) to node ( v ) given the previous node ( s ) is defined as:

\[ \alpha_{pq}(t, v) = 
\begin{cases} 
1/p & \text{if } d_{t,v} = 0 \\
1 & \text{if } d_{t,v} = 1 \\
1/q & \text{if } d_{t,v} = 2 
\end{cases}
\]
where ( d_{t,v} ) is the shortest path distance between nodes ( t ) and ( v ).

Skip-Gram Model

The Skip-Gram model is used to optimize the embeddings by maximizing the probability of observing a node's neighbors in random walks. The objective function is:

\[ \max \sum_{u \in V} \sum_{v \in N_S(u)} \log \Pr(v | u) \]

Example Code

Below is an example code that demonstrates how to generate Node2Vec embeddings using the node2vec Python package.

from euler.graph_api import KnowledgeGraphAPI
api = KnowledgeGraphAPI()

Create more nodes

api.create_node(id='1', label='Person', properties={'name': 'Alice', 'age': 30})
api.create_node(id='2', label='Person', properties={'name': 'Bob', 'age': 35})
api.create_node(id='3', label='Person', properties={'name': 'Charlie', 'age': 25})

Create edges

api.create_edge(id='1-2', source='1', target='2', label='knows', properties={'since': '2020'})
api.create_edge(id='2-3', source='2', target='3', label='knows', properties={'since': '2018'})
api.create_edge(id='3-1', source='3', target='4', label='knows', properties={'since': '2015'})

convert graph into Graph

network_graph = api.graph.to_networkx()

Create Node2Vec Embeddings

api.graph.generate_embeddings(network_graph, method="node2vec", dimensions=64, walk_length=30, num_walks=200, workers=4)
node2vec_embedding = api.graph.get_embedding('1')
print("Node2Vec embedding for node '1':", node2vec_embedding)