Types of data vectors (e.g. images, tags, etc.) are called as "views", and we consider multi-view data vector $\boldsymbol{x}_i$ that belongs to exactly one of $D \in \mathbb{N}$ views; we denote the view of $\boldsymbol{x}_i$ as $d_i \in \{1,2,\ldots,D\}$. Dimension $p_{d_i}$ of the vector $\boldsymbol{x}_i$ depends on $d_i$, and their strength of associations $w_{ij} \geq 0$. Usual graph embedding (corresponding to $D=1$), that transforms $1$-view feature vectors by a single vector-valued function $\boldsymbol{f}: \mathbb{R}^{p} \to \mathbb{R}^K$, cannot be applied to multi-view setting, since the dimension $p_{d_i}$ of the vector $\boldsymbol{x}_i$ in multi-view setting depends on the view.

**Okuno et al., (ICML2018)** proposes a simple framework *Probabilistic Multi-view Graph Embedding (PMvGE)* for multi-view feature learning with many-to-many associations, so that it generalizes various existing multi-view methods.
PMvGE is built on a very simple idea:
PMvGE prepares $D$ different transformations $\boldsymbol{f}^{(d)}: \mathbb{R}^{p_{d}} \to \mathbb{R}^K$ $(d=1,2,\ldots,D)$ for $D$ different views.
Remaining procedures are the same as those of the usual graph embedding.
While existing multi-view feature learning techniques can treat only either of many-to-many association or non-linear transformation, PMvGE can treat both simultaneously.

Our likelihood-based estimator enables efficient computation of non-linear transformations (e.g. neural networks) of data vectors in large-scale datasets by minibatch SGD, and numerical experiments illustrate that PMvGE outperforms existing multi-view methods.