Given data vectors $\{\boldsymbol{x}_i\}$ and indicator of association strength $w_{ij} \geq 0$ for data pair $(\boldsymbol{x}_i,\boldsymbol{x}_j)$ $(1 \leq i < j \leq n )$, a wide variety of graph embedding methods employ vector-valued neural networks $\boldsymbol{f}_{\text{NN}}:\mathbb{R}^p \to \mathbb{R}^K$, so that the Inner Product Similarity (IPS) $\langle \boldsymbol{y}_i,\boldsymbol{y}_j\rangle$ of the obtained feature vectors $\boldsymbol{y}_i:=\boldsymbol{f}_{\text{NN}}(\boldsymbol{x}_i)$ approximates $w_{ij}$; namely, $w_{ij} \approx h_{\text{IPS}}(\boldsymbol{x}_i,\boldsymbol{x}_j) := \langle \boldsymbol{f}_{\text{NN}}(\boldsymbol{x}_i) , \boldsymbol{f}_{\text{NN}}(\boldsymbol{x}_j) \rangle$. Okuno, Hada, and Shimodaira (ICML2018) Theorem 5.1 proves that the IPS approximates any PD kernel $\mu_{\text{PD}}(\boldsymbol{x}_i,\boldsymbol{x}_j)$ if the neural network size and $K$ are sufficiently large. However, IPS cannot approximate non-PD similarities; Okuno and Shimodaira (ICML2018 TADGM-WS) and its extension (Okuno, Kim, and Shimodaira, AISTATS2019, to appear) propose a novel Shifted IPS (SIPS) $ h_{\text{SIPS}}(\boldsymbol{x}_i,\boldsymbol{x}_j) := \langle \boldsymbol{f}_{\text{NN}}(\boldsymbol{x}_i) , \boldsymbol{f}_{\text{NN}}(\boldsymbol{x}_j) \rangle + u_{\text{NN}}(\boldsymbol{x}_i) + u_{\text{NN}}(\boldsymbol{x}_j)$, by incorporating another neural network $u_{\text{NN}}:\mathbb{R}^p \to \mathbb{R}^K$. SIPS is capable of approximating not only PD kernels but also Conditionally PD (CPD) kernels, including negative Poincaré distance and negative Wasserstein distance (they are CPD but not PD; IPS cannot approximate them!) used in recent graph embedding methods such as Poincaré embedding (Nickel and Kiela, NIPS2017).

In the following, (i) a CPD kernel $\mu_{\text{CPD}}(s\boldsymbol{e}_1,t\boldsymbol{e}_2)$ is plotted on $(s,t)$-plane along with two orthogonal directions $\boldsymbol{e}_1,\boldsymbol{e}_2 \in \mathbb{R}^5$. This kernel is approximated by (ii) existing IPS and (iii) proposed SIPS, with two-layer neural networks with $1,000$ hidden-units and ReLU activations.

(i) True
(ii) existing IPS
(iii) proposed SIPS

Error rate for SIPS to approximate general CPD similarities is also evaluated in Okuno, Kim, and Shimodaira (AISTATS2019, to appear).