Given data vectors $\{\boldsymbol{x}_i\}$ and indicator of association strength $w_{ij} \geq 0$ for data pair $(\boldsymbol{x}_i,\boldsymbol{x}_j)$ $(1 \leq i < j \leq n )$, a wide variety of graph embedding methods employ vector-valued neural networks $\boldsymbol{f}_{\text{NN}}:\mathbb{R}^p \to \mathbb{R}^K$, so that the Inner Product Similarity (IPS) $\langle \boldsymbol{y}_i,\boldsymbol{y}_j\rangle$ of the obtained feature vectors $\boldsymbol{y}_i:=\boldsymbol{f}_{\text{NN}}(\boldsymbol{x}_i)$ approximates $w_{ij}$; namely, $w_{ij} \approx h_{\text{IPS}}(\boldsymbol{x}_i,\boldsymbol{x}_j) := \langle \boldsymbol{f}_{\text{NN}}(\boldsymbol{x}_i) , \boldsymbol{f}_{\text{NN}}(\boldsymbol{x}_j) \rangle$. Okuno, Hada, and Shimodaira (ICML2018) Theorem 5.1 proves that the IPS approximates any PD kernel $\mu_{\text{PD}}(\boldsymbol{x}_i,\boldsymbol{x}_j)$ if the neural network size and $K$ are sufficiently large. However, IPS cannot approximate non-PD similarities; Okuno and Shimodaira (ICML2018 TADGM-WS) and its extension (Okuno, Kim, and Shimodaira, AISTATS2019, to appear) propose a novel Shifted IPS (SIPS) $h_{\text{SIPS}}(\boldsymbol{x}_i,\boldsymbol{x}_j) := \langle \boldsymbol{f}_{\text{NN}}(\boldsymbol{x}_i) , \boldsymbol{f}_{\text{NN}}(\boldsymbol{x}_j) \rangle + u_{\text{NN}}(\boldsymbol{x}_i) + u_{\text{NN}}(\boldsymbol{x}_j)$, by incorporating another neural network $u_{\text{NN}}:\mathbb{R}^p \to \mathbb{R}^K$. SIPS is capable of approximating not only PD kernels but also Conditionally PD (CPD) kernels, including negative Poincaré distance and negative Wasserstein distance (they are CPD but not PD; IPS cannot approximate them!) used in recent graph embedding methods such as Poincaré embedding (Nickel and Kiela, NIPS2017).

In the following, (i) a CPD kernel $\mu_{\text{CPD}}(s\boldsymbol{e}_1,t\boldsymbol{e}_2)$ is plotted on $(s,t)$-plane along with two orthogonal directions $\boldsymbol{e}_1,\boldsymbol{e}_2 \in \mathbb{R}^5$. This kernel is approximated by (ii) existing IPS and (iii) proposed SIPS, with two-layer neural networks with $1,000$ hidden-units and ReLU activations.

Error rate for SIPS to approximate general CPD similarities is also evaluated in Okuno, Kim, and Shimodaira (AISTATS2019, to appear).