Yang Song, Sahaj Garg, Jiaxin Shi, and Stefano Ermon.

Sliced score matching: A scalable approach to density and score estimation. In Uncertainty in Artificial Intelligence, 2020

@inproceedings{song2020sliced,
  title={Sliced score matching: A scalable approach to density and score estimation},
  author={Song, Yang and Garg, Sahaj and Shi, Jiaxin and Ermon, Stefano},
  booktitle={Uncertainty in Artificial Intelligence},
  pages={574--584},
  year={2020},
  organization={PMLR}
}

PDF: song20a.pdf (mlr.press)

Other sources:

Background

In short, we want to estimate the probability distribution of the data.

We hope to find the parameters of a model so the model distribution is close to the data distribution. The model respresents the parametrized probability distribution of the data, which we call model distribution.

image.png

Our dataset contains N samples and xi is each data point in the dataset. From the all the models with probability distributions $\Theta$ we want to find a single probability distribution $\theta \in \Theta$ by minimizing the distance between $p_{data}$ and $p_\theta$, so then we can generate samples from $p_\theta$.

image.png

BUT the data distribution is very complex for high dimensional data.

We will start from a gaussian distribution which is a graph with 2 layers, the data points and a single unit, which is the density function of the probability distribution of such points.

image.png

This distribution is too simple to model high dimensional data, so we need to add more layers and build a deeper computational graph or neural network to model the probability distribution $p_\theta$, where $\theta$ denotes the weights of the network.

image.png

BUT, how to build a deep neural network that models the probability distribution? Neural networks convert a high dimensional input into a simple one dimensional output $f_\theta(x)$. However, f_\theta(x) might not be positive everywhere, which means that we cannot directly model the probability distribution with the neural network.