Node2Vec Graph Embeddings¶
Introduction¶
Node2Vec is a popular algorithm for generating vector representations of nodes in a graph. It aims to capture the network structure by performing biased random walks on the graph and then applying the Skip-Gram model from natural language processing to generate node embeddings. These embeddings can be used for various downstream tasks such as node classification, link prediction, and clustering.
Mathematical Definition¶
Given a graph ( G = (V, E) ), where ( V ) is the set of vertices and ( E ) is the set of edges, Node2Vec generates embeddings by optimizing the following objective function:
\[ \max \sum_{u \in V} \sum_{v \in N_S(u)} \log \Pr(v | u) \]
Conditional Probability¶
The conditional probability ( \Pr(v | u) ) is defined using the Softmax function:
\[ \Pr(v | u) = \frac{\exp(\mathbf{z}_v \cdot \mathbf{z}_u)}{\sum_{v' \in V} \exp(\mathbf{z}_{v'} \cdot \mathbf{z}_u)} \]
Random Walks¶
Node2Vec generates random walks to capture the graph structure. These walks are controlled by two parameters:
- Return parameter ( p ): Controls the likelihood of returning to the previous node in the walk.
- In-out parameter ( q ): Controls the likelihood of exploring outward nodes.
Transition Probability¶
The transition probability from node ( t ) to node ( v ) given the previous node ( s ) is defined as:
\[ \alpha_{pq}(t, v) =
\begin{cases}
1/p & \text{if } d_{t,v} = 0 \\
1 & \text{if } d_{t,v} = 1 \\
1/q & \text{if } d_{t,v} = 2
\end{cases}
\]
Skip-Gram Model¶
The Skip-Gram model is used to optimize the embeddings by maximizing the probability of observing a node's neighbors in random walks. The objective function is:
\[ \max \sum_{u \in V} \sum_{v \in N_S(u)} \log \Pr(v | u) \]
Example Code¶
Below is an example code that demonstrates how to generate Node2Vec embeddings using the node2vec
Python package.
from euler.graph_api import KnowledgeGraphAPI
api = KnowledgeGraphAPI()
Create more nodes¶
api.create_node(id='1', label='Person', properties={'name': 'Alice', 'age': 30})
api.create_node(id='2', label='Person', properties={'name': 'Bob', 'age': 35})
api.create_node(id='3', label='Person', properties={'name': 'Charlie', 'age': 25})
Create edges¶
api.create_edge(id='1-2', source='1', target='2', label='knows', properties={'since': '2020'})
api.create_edge(id='2-3', source='2', target='3', label='knows', properties={'since': '2018'})
api.create_edge(id='3-1', source='3', target='4', label='knows', properties={'since': '2015'})
convert graph into Graph ¶
network_graph = api.graph.to_networkx()
Create Node2Vec Embeddings¶
api.graph.generate_embeddings(network_graph, method="node2vec", dimensions=64, walk_length=30, num_walks=200, workers=4)
node2vec_embedding = api.graph.get_embedding('1')
print("Node2Vec embedding for node '1':", node2vec_embedding)