Given data vectors $\{\boldsymbol{x}_i\}$ and indicator of association strength $w_{ij} \geq 0$ for data pair $(\boldsymbol{x}_i,\boldsymbol{x}_j)$ $(1 \leq i < j \leq n )$, a wide variety of graph embedding methods employ vector-valued neural networks $\boldsymbol{f}_{\text{NN}}:\mathbb{R}^p \to \mathbb{R}^K$, so that the Inner Product Similarity (IPS) $\langle \boldsymbol{y}_i,\boldsymbol{y}_j\rangle$ of the obtained feature vectors $\boldsymbol{y}_i:=\boldsymbol{f}_{\text{NN}}(\boldsymbol{x}_i)$ approximates $w_{ij}$; namely, $w_{ij} \approx h_{\text{IPS}}(\boldsymbol{x}_i,\boldsymbol{x}_j) := \langle \boldsymbol{f}_{\text{NN}}(\boldsymbol{x}_i) , \boldsymbol{f}_{\text{NN}}(\boldsymbol{x}_j) \rangle$. Okuno, Hada, and Shimodaira (ICML2018) Theorem 5.1 proves that the IPS approximates any PD kernel $\mu_{\text{PD}}(\boldsymbol{x}_i,\boldsymbol{x}_j)$ if the neural network size and $K$ are sufficiently large. However, IPS cannot approximate non-PD similarities; Okuno and Shimodaira (ICML2018 TADGM-WS) and its extension (Okuno, Kim, and Shimodaira, AISTATS2019) propose a novel Shifted IPS (SIPS) $ h_{\text{SIPS}}(\boldsymbol{x}_i,\boldsymbol{x}_j) := \langle \boldsymbol{f}_{\text{NN}}(\boldsymbol{x}_i) , \boldsymbol{f}_{\text{NN}}(\boldsymbol{x}_j) \rangle + u_{\text{NN}}(\boldsymbol{x}_i) + u_{\text{NN}}(\boldsymbol{x}_j)$, by incorporating another neural network $u_{\text{NN}}:\mathbb{R}^p \to \mathbb{R}^K$. SIPS is capable of approximating not only PD kernels but also Conditionally PD (CPD) kernels, including negative Poincaré distance and negative Wasserstein distance (they are CPD but not PD; IPS cannot approximate them!) used in recent graph embedding methods such as Poincaré embedding (Nickel and Kiela, NIPS2017).

In the following, (i) a CPD kernel $\mu_{\text{CPD}}(s\boldsymbol{e}_1,t\boldsymbol{e}_2)$ is plotted on $(s,t)$-plane along with two orthogonal directions $\boldsymbol{e}_1,\boldsymbol{e}_2 \in \mathbb{R}^5$. This kernel is approximated by (ii) existing IPS and (iii) proposed SIPS, with two-layer neural networks with $1,000$ hidden-units and ReLU activations.

sips_true
(i) True
sips_pd
(ii) existing IPS
sips_cpd
(iii) proposed SIPS

Error rate for SIPS to approximate general CPD similarities is also evaluated in Okuno, Kim, and Shimodaira (AISTATS2019).

Kim, Okuno, Fukui, and Shimodaira (IJCAI2019) proposes weighted inner product similarity (WIPS) $\langle \boldsymbol{f}_{\text{NN}}(\boldsymbol{x}) ,\boldsymbol{f}_{\text{NN}}(\boldsymbol{x}') \rangle_{\boldsymbol{\lambda}}$, where $\langle \boldsymbol{y},\boldsymbol{y}'\rangle_{\boldsymbol{\lambda}}:=\sum_{k=1}^{K}\lambda_k y_k y_k'$ represents the weighted inner product, and $\boldsymbol{\lambda}=(\lambda_1,\lambda_2,\ldots,\lambda_K) \in \mathbb{R}^K$ is a weight vector to be estimated. WIPS is capable of approximating general similarities $g(\boldsymbol{x},\boldsymbol{x}')$; WIPS overcomes the limitation of SIPS. An important point of WIPS is, the weight vector can take real values including negative values. You can imagine the mechanism of WIPS as, the weight $\lambda_k$ and the $k$-th entry of NN $\boldsymbol{f}(\boldsymbol{x})$ captures the eigenvalue and eigenfunctions of the kernel $g(\boldsymbol{x},\boldsymbol{x}')$.

sips_true
Representation capabilities of similarities

Extensive experiments on a large scale network dataset show that WIPS outperforms other similarities, including SIPS.