Representative power constraints and generalization error estimation for graph neural networks

At present, one of the trends in the study of graph neural networks is the analysis of the operation of such architectures, comparison with nuclear methods, assessment of complexity and generalizing ability. All this helps to understand the weak points of existing models and creates space for new ones.





The work is aimed at investigating two problems related to graph neural networks. First, the authors give examples of graphs that are different in structure, but indistinguishable for both simple and more powerful GNNs . Second, they bound the generalization error for graph neural networks more accurately than VC bounds.





Introduction

Graph neural networks are models that work directly with graphs. They allow you to consider information about the structure. A typical GNN includes a small number of layers that are applied sequentially, updating the vertex representations at each iteration. Examples of popular architectures: GCN , GraphSAGE , GAT , GIN .









The process of updating vertex embeddings for any GNN architecture can be summarized by two formulas:





a_v ^ {t + 1} = AGG \ left (h_w ^ t: w \ in \ mathcal {N} \ left (v \ right) \ right) \\ h_v ^ {t + 1} = COMBINE \ left (h_v ^ t, a_v ^ {t + 1} \ right),

where AGG is usually a function invariant to permutations ( sum , mean , max etc.), COMBINE is a function that combines the representation of a vertex and its neighbors.





Computation tree for two-layer GNN using the example of node A. Source: https://eng.uber.com/uber-eats-graph-learning/
Computation tree for two-layer GNN using the example of node A. Source: https://eng.uber.com/uber-eats-graph-learning/

More advanced architectures may consider additional information such as edge features, edge angles, etc.





The article discusses the GNN class for the graph classification problem. These models are structured like this:





  1. First, vertices are embeddings using L steps of graph convolutions





  2. (, sum, mean, max)









GNN:





  • (LU-GNN). GCN, GraphSAGE, GAT, GIN





  • CPNGNN, , 1 d, d - ( port numbering)





  • DimeNet, 3D-,





LU-GNN

G G LU-GNN, , , readout-, . CPNGNN G G, .





CPNGNN

, “” , CPNGNN .





S8 S4 , , ( ), , , CPNGNN readout-, , . , .





CPNGNN G2 G1. , DimeNet , , , , \ angle A_1B_1C_1 \ angle \ underline {A} _1 \ underline {B} _1 \ underline {C} _1.





DimeNet

DimeNet G4 , G3, . , . , G4 G3 S4 S8, , , DimeNet S4 S8 .





GNN

. , , .





GNN, :





  1. DimeNet





  2. message- m_ {uv} ^ {\ left (l \ right)} \ Phi_ {uv} \ underline {m} _ {uv} ^ {\ left (l \ right)} = \ underline {f} \ left (m_ {uv} ^ {\ left (l \ right)}, \ Phi_ {uv} \ right )





  3. \ left (c_v \ left (i \ right), t_ {i, v} \ right), c - i- v, t - .



    :



    h_{v}^{\left( l + 1 \right)} = f \left( h_{v}^{\left( l \right)}, \underline{m}_{c_v\left( 1 \right)v}^{\left( l \right )}, t_{1, v}, ..., \underline{m}_{c_v\left( d \left( v \right ) \right)v}^{\left( l \right )}, t_{ d \left( v \right ), v} \right )





  4. readout-









.





: LU-GNN,





h_v^{l + 1} = \phi \left( W_1x_v + W_2 \rho \left( \sum_{u \in \mathcal{N} \left( v \right)} g\left( h_u^l \right)\right) \right),

\phi,\ g,\ \rho - , x_v - v, , \rho \left(0\right) = 0,\ \forall v:  \lVert x_v \rVert_2 \le B_x,\ \forall x \in \mathbb{R}^r: \lVert \phi \left( x \right ) \rVert_{\infty} \le b < \infty,\ \phi\left( 0 \right ) = 0,\ g\left( 0 \right ) = 0. , \phi,\ g,\ \rho C_{\phi},\ C_{g},\ C_{\rho}, \lVert W_1 \rVert_2 \le B_1,\ \lVert W_2 \rVert_2 \le B_2. W_1,\ W_2,\ \phi,\ g,\ \rho GNN.

. \beta \lVert \beta \rVert_2 \le B_{\beta}.





f\left(G\right) - GNN y \in \{0, 1\}, p \ left (f \ left (G \ right), y \ right) = y \ left (2f \ left (G \ right) - 1 \ right) + \ left (1 - y \ right) \ left (1 - 2 f \ left (G \ right) \ right) - , p \ left (f \ left (G \ right), y \ right) <0 .





, a = -p \ left (f \ left (G \ right), y \ right), \ mathbb {I} \ left [\ right] - :





loss _ {\ gamma} \ left (a \ right) = \ mathbb {I} \ left [a> 0 \ right] + (1 + \ frac {a} {\ gamma}) \ mathbb {I} \ left [a \ in \ left [\ gamma, 0 \ right] \ right].

GNN f \ {G_j, y_j \} _ {j = 1} ^ m:





\ hat {\ mathcal {R}} \ left (f \ right) = \ frac {1} {m} \ sum_ {j = 1} ^ m loss _ {\ gamma} \ left (-p \ left (f \ left (G_j \ right), y_j \ right) \ right)

, , , , GNN . , (GNN, ), , , .





, :





  • ,





  • ( )





RNN





Comparison with RNN, similarity is visible
C RNN,

\ mathcal {C}- “ ”: \ mathcal {C} = C _ {\ phi} C_gC _ {\ rho} B_2, r - , d - , m - , L - , \ gamma- ,





, , \ tilde {\ mathcal {O}} \ left (r ^ 3N / \ sqrt {m} \ right)





GNN, ( readout-), . , , .





( ), . , , , , , , .





Evidence and more detailed information can be found by reading the original article or by watching a report from one of the authors.








All Articles