Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning-526互联

概
符号说明
Laplacian smoothing
代码

Li Q., Han Z. and Wu X. Deeper insights into graph convolutional networks for semi-supervised learning. AAAI, 2018.

概

本文分析了 GCN 的实际上就是一种 Smoothing, 但是如果层数过多就会导致 over-smoothing.

符号说明

\(\mathcal{G = (V, E)}\), 图;
\(|\mathcal{V}| = n\);
\(A \in \mathbb{R}^{n \times n}\), 邻接矩阵;
\(L = D - A\), Laplacian matrix;
\(L_{sym} := D^{-1/2} L D^{-1/2}\);
\(L_{rw} := D^{-1} L\);
\(X = [\mathbf{x}_1, \mathbf{x}_2, \cdots, \mathbf{x}_n]^T \in \mathbb{R}^{n \times c}\), feature matrix;

Laplacian smoothing

GCN 的每一层实际上就是如下的一个过程:

\[Y = \tilde{D}^{-1/2} \tilde{A}^{-1/2} \tilde{D}^{-1/2} X, \]
这里 \(\tilde{A} = A + I, \tilde{D} = \sum_j \tilde{A}_{ij}\).
在图论中, Laplacian smoothing 被定义为:

\[\hat{Y} = X - \gamma \tilde{D}^{-1} \tilde{L} X = (I - \gamma \tilde{D}^{-1} \tilde{L}) X = \Big( (1 - \gamma) I + \tilde{D}^{-1} \tilde{A} \Big) X. \]
倘若 \(\gamma=1\), 则其退化为 \(\tilde{D}^{-1} \tilde{A}X\), 这是在一般的 rankdom walk 的机制下推导出来的 Laplacian smoothing.
自然地, 在对称的 Laplacian 矩阵 \(\tilde{L}_{sym}\) 的框架下, 我们就得到了一般的 GCN 的方式.
总的来说, 我们可以认为, 相较于普通的 MLP, GCN 更具优势的点在于它能够将相似的点的信息聚在一起, 从而获得更好的判别性质.
但是, 倘若我们不断地叠加 GCN 就会产生 over-smoothing 的现象, 这可以通过如下的定理中理解.

Theorem 1. 如果图 \(\mathcal{G}\) 不包含 bipartite 成分, 则对于任意的 \(\mathbf{w} \in \mathbb{R}^n, \alpha \in (0, 1]\) 有

\[\lim_{l \rightarrow +\infty} (I - \alpha L_{rw})^l \mathbf{w} = [\mathbf{1}^{(1)}, \mathbf{1}^{2}, \ldots, \mathbf{1}^{(k)}] \bm{\rho}_1, \\ \lim_{l \rightarrow +\infty} (I - \alpha L_{sym})^l \mathbf{w} = D^{1/2}[\mathbf{1}^{(1)}, \mathbf{1}^{2}, \ldots, \mathbf{1}^{(k)}] \bm{\rho}_2. \\ \]

这里, 我们假设 \(\mathcal{G}\) 有 \(k\) 个连通的部分, 并令:

\[\mathbf{1}_j^{(i)} = \left \{ \begin{array}{ll} 1 & v_j \in C_i, \\ 0 & v_j \not \in C_i. \\ \end{array} \right . \]

proof:

只需证明 \((I - \alpha L_{rw}), (I - \alpha L_{sym})\) 的特征值在 \((-1, 1]\), 且特征值 1 所对应的特征向量为 \(\mathbf{1}, D^{1/2} \mathbf{1}\).

注: 文中说 \(I - \alpha L_{sym}\) 的特征值为 \(D^{-1/2}\mathbf{1}\), 这似乎是错的, 因为

\[\begin{array}{ll} (I - \alpha L_{sym}) D^{1/2} \mathbf{1} &= D^{1/2} \mathbf{1} - \alpha L_{sym} D^{1/2} \mathbf{1} \\ &= D^{1/2} \mathbf{1} - \alpha D^{-1/2} L D^{-1/2} D^{1/2} \mathbf{1} \\ &= D^{1/2} \mathbf{1} - \alpha (I - D^{-1/2} A D^{-1/2}) D^{1/2} \mathbf{1} \\ &= D^{1/2} \mathbf{1} - \alpha (D^{1/2}\mathbf{1} - D^{-1/2} A\mathbf{1}) \\ &= D^{1/2} \mathbf{1} - \alpha (D^{1/2}\mathbf{1} - D^{-1/2} D\mathbf{1}) \\ &= D^{1/2} \mathbf{1} - \alpha 0 \\ &= D^{1/2} \mathbf{1}. \end{array} \]

\(D^{-1/2}\mathbf{1}\) 是如何证明出来的?