Given data vectors $\{\boldsymbol{x}_i\}$ and indicator of association strength $w_{ij} \geq 0$ for data pair $(\boldsymbol{x}_i,\boldsymbol{x}_j)$ $(1 \leq i < j \leq n )$, a wide variety of graph embedding methods employ vector-valued neural networks $\boldsymbol{f}_{\text{NN}}:\mathbb{R}^p \to \mathbb{R}^K$, so that the Inner Product Similarity (IPS) $\langle \boldsymbol{y}_i,\boldsymbol{y}_j\rangle$ of the obtained feature vectors $\boldsymbol{y}_i:=\boldsymbol{f}_{\text{NN}}(\boldsymbol{x}_i)$ approximates $w_{ij}$; namely,
$w_{ij} \approx h_{\text{IPS}}(\boldsymbol{x}_i,\boldsymbol{x}_j)
:=
\langle \boldsymbol{f}_{\text{NN}}(\boldsymbol{x}_i)
,
\boldsymbol{f}_{\text{NN}}(\boldsymbol{x}_j)
\rangle$.
**Okuno, Hada, and Shimodaira (ICML2018)** Theorem 5.1 proves that the IPS approximates any PD kernel $\mu_{\text{PD}}(\boldsymbol{x}_i,\boldsymbol{x}_j)$ if the neural network size and $K$ are sufficiently large.
However, IPS cannot approximate non-PD similarities;
**Okuno and Shimodaira (ICML2018 TADGM-WS)** and its extension **(Okuno, Kim, and Shimodaira, AISTATS2019, to appear)** propose a novel *Shifted IPS (SIPS)*
$
h_{\text{SIPS}}(\boldsymbol{x}_i,\boldsymbol{x}_j)
:=
\langle \boldsymbol{f}_{\text{NN}}(\boldsymbol{x}_i)
,
\boldsymbol{f}_{\text{NN}}(\boldsymbol{x}_j)
\rangle
+
u_{\text{NN}}(\boldsymbol{x}_i)
+
u_{\text{NN}}(\boldsymbol{x}_j)$,
by incorporating another neural network $u_{\text{NN}}:\mathbb{R}^p \to \mathbb{R}^K$. SIPS is capable of approximating not only PD kernels but also Conditionally PD (CPD) kernels, including negative Poincaré distance and negative Wasserstein distance (they are CPD but not PD; IPS cannot approximate them!) used in recent graph embedding methods such as Poincaré embedding (Nickel and Kiela, NIPS2017).

In the following, (i) a CPD kernel $\mu_{\text{CPD}}(s\boldsymbol{e}_1,t\boldsymbol{e}_2)$ is plotted on $(s,t)$-plane along with two orthogonal directions $\boldsymbol{e}_1,\boldsymbol{e}_2 \in \mathbb{R}^5$. This kernel is approximated by (ii) existing IPS and (iii) proposed SIPS, with two-layer neural networks with $1,000$ hidden-units and ReLU activations.

Error rate for SIPS to approximate general CPD similarities is also evaluated in **Okuno, Kim, and Shimodaira (AISTATS2019, to appear)**.