Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole.
Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021.
@inproceedings{song2021scorebased,
author = {Yang Song and
Jascha Sohl{-}Dickstein and
Diederik P. Kingma and
Abhishek Kumar and
Stefano Ermon and
Ben Poole},
title = {Score-Based Generative Modeling through Stochastic Differential Equations},
booktitle = {9th International Conference on Learning Representations, {ICLR}},
year = {2021},
}
TL;DR:
One remarkable propoerty of score-based generative models is the capability to control the generating proces in a principled way.
Given an unconditional score-based model that generates images of dogs and cats, how can we generate only images of dogs?
Suppose we are given a fordard model, which is basically an image classifier that gives us the label of an image $y$ from an image $x$. We want to specify a control signal which is a target label $y$, that is, specifying the target label to be dog. Then, we hope to sample from the conditional distribution of $x$ given $y$. This conditional distribution will provide images of dogs only. This is called inverse distribution because we can view it as a probabilistic inversion of the forward model.
So, how can we obtain this inverse distribution?
The standard approach is to leverage the Baye’s rule. In Baye’s rule, we have the unconditional distribution $p(\mathbf{x})$ and we are given the forward model $p(\mathbf{y} \vert \mathbf{x})$, but we do not know the denominator $p(\mathbf{y})$. The denominator is the normalizing constant of the inverse distribution.
$$ p(\mathbf{x} \vert \mathbf{y}) = \frac{p(\mathbf{x}) p(\mathbf{y} \vert \mathbf{x})}{p(\mathbf{y})} $$
As it is done in [Song et al. UAI 2019], we can use score functions to bypass this challenge in Baye’s rule, and we can derive the Baye’s rule for score functions very easily by taking the logarithm on both sides an the gradient.
$$ \nabla_x \text{log} p(\mathbf{x} \vert \mathbf{y}) = \nabla_x \text{log} p(\mathbf{x}) + \nabla_x \text{log} p(\mathbf{y} \vert \mathbf{x}) - \nabla_x \text{log} p(\mathbf{y}) $$
The denominator becomes zero: $\nabla_x \text{log} p(\mathbf{y}) = 0$, so the score function of the inverse distribution now becomes a simple summation with two terms: $\nabla_x \text{log} p(\mathbf{x})$ is the unconditional score function that can be estimated by training an unconditional score model $\nabla_x \text{log} p(\mathbf{x}) \approx s_\theta(\mathbf{x})$, and the second term $\nabla_x \text{log} p(\mathbf{y} \vert \mathbf{x})$ is the gradient of the log forward model.
$$ \nabla_x \text{log} p(\mathbf{x} \vert \mathbf{y}) = \nabla_x \text{log} p(\mathbf{x}) + \nabla_x \text{log} p(\mathbf{y} \vert \mathbf{x}) $$
In the image domain, the forward model is an image classifier and the gradient can be easily computed through backpropagation.
With this approach, we can use different forward models or exactly the same score model, so we only have to train the unconditional score model once and repurpose the unconditional score model for various conditional generation applications just by switching the forward model.