Learning Graph Filters for Spectral GNNs via Newton Interpolation-526互联

概
符号说明
Motivation
NewtonNet
代码

Xu J., Dai E., Luo D>, Zhang X. and Wang S. Learning graph filters for spectral gnns via newton interpolation. 2023.

概

令谱图网络的多项式系数按照牛顿插值的方式训练.

符号说明

\(\mathcal{V} = \{v_1, \ldots, v_N\}\), node set;
\(\mathcal{E}\), edge set;
\(\mathbf{X} = \{\mathbf{x}_1, \ldots, \mathbf{x}_N\} \in \mathbb{R}^{N \times F}\), node feature matrix;
\(\mathcal{G} = (\mathcal{V}, \mathcal{E}, \mathbf{X})\), undirected graph;
\(\mathbf{A}\), adjacency matrix;
\(\mathbf{D}\), diagonal matrix with \(\mathbf{D}_{ii} = \sum_{j} \mathbf{A}_{ij}\);
\(\mathbf{L} = \mathbf{D - \mathbf{A}}\), Laplacian matrix

Motivation

假设 Laplacian matrix 的特征分解为:

\[\mathbf{L} = \mathbf{U\Lambda U}^T, \]
其中 \(0 \le \lambda_1 \le \lambda_2 \le \cdots \le \lambda_N\) 为 \(\Lambda\) 的对角线元素 (即 \(\mathbf{L}\) 的特征值).
作者首先分析了不同频率的重要性差异: 对于同质图, 低频信息更有用, 而对于异质图, 高频信息更加有用.
一般来说, 我们通常根据 Homophily Ratio

\[ h(G) = \frac{|\{(s, t) \in \mathcal{E}: y_s = y_t\}|}{|\mathcal{E}|} \]
来判断一个图的性质.
由于随机加边的"平衡图" (这里平衡图指的是每个类的结点数目是相同的) 的 \(\mathbb{E}[h(G)] = 1/C\), 所以, 我们可以认为, 当 \(h > 1 / C\) 的时候, 这个图是更偏向同质图的, 反之是偏向异质图的.
定义

\[\bar{d}_{in} = \frac{1}{|\mathcal{P}_{in}|} \sum_{(s, t) \in \mathcal{P}_{in}} (x_s - x_t)^2, \\ \bar{d}_{out} = \frac{1}{|\mathcal{P}_{out}|} \sum_{(s, t) \in \mathcal{P}_{out}} (x_s - x_t)^2, \]
其中 (注意, 允许 self-loop)

\[\mathcal{P}_{in} := \{(s, t) | y_s = y_t\}, \\ \mathcal{P}_{out} := \{(s, t) | y_s \not= y_t\}. \]
故 \(\bar{d}_{in}, \bar{d}_{out}\) 分别描述了类内平均距离和类间平均距离. 我们定义二者之差为:

\[\Delta \bar{d} = \bar{d}_{out} - \bar{d}_{in}. \]
现在假设有两个滤波 \(g_1, g_2\) 满足:

\[g_i(\mathbf{L}) = \mathbf{U} g_i(\Lambda) \mathbf{U}^T, \quad i=1,2, \]
且

\[g_1(\lambda_i) < g_2 (\lambda_i), \quad 1 \le i \le m, g_1(\lambda_i) > g_2 (\lambda_i), \quad m+1 \le i \le N, \\ \|g_1(\Lambda)\|_F = \|g_2(\Lambda)\|. \]
显然, \(g_1, g_2\) 分别侧重高频和低频信号. 令

\[\mathbf{x}^{(1)} = g_1(\mathbf{L})\mathbf{x}, \quad \mathbf{x}^{(2)} = g_2(\mathbf{L})\mathbf{x}, \]
则有 \(\mathbf{x}^{(1)}\) 更侧重高频, \(\mathbf{x}^{(2)}\) 更侧重低频.
则, 在一定条件下:
1. 当 \(h > 1 / C\) (即偏同质图):
  \[ \mathbb{E}_{\mathbf{x}}[\Delta \bar{d}^{(1)}] < \mathbb{E}_{\mathbf{x}}[\Delta \bar{d}^{(2)}]. \]
2. 当 \(h < 1 / C\) (即偏异质图):
  \[ \mathbb{E}_{\mathbf{x}}[\Delta \bar{d}^{(1)}] > \mathbb{E}_{\mathbf{x}}[\Delta \bar{d}^{(2)}]. \]
其实是很直观的一个结论.
不过, 需要声明一点的是, \(\mathbb{E}_{\mathbf{x}}\) 的操作使得这个结论有那么一点一般 (但也挺有意思的).
其次, 作者在文中采用的是 normalzied Laplacian matrix 去证明 (但实际上是有问题的).

NewtonNet

说实话, 后面的设计和前面的 motivation 并没有太强的联系, 后面就是利用牛顿插值来限制多项式基的系数的学习:

\[\mathbf{Z} = \sum_{k=0}^K \bigg \{ \hat{g}_t [q_0, \cdots, q_k] \prod_{i=0}^{k-1} (\mathbf{L} - q_i \mathbf{I}) \bigg\} f_{\theta}(\mathbf{X}). \]
其中 \(\hat{g}_t[q_0, \cdots, q_k]\) 是根据牛顿插值的迭代公式得到的: