Yang Song, Conor Durkan, Ian Murray and Stefano Ermon.

Maximum likelihood training of score-based diffusion models. In Advances in neural information processing systems, 2021.

@article{song2021maximum,
  title={Maximum likelihood training of score-based diffusion models},
  author={Song, Yang and Durkan, Conor and Murray, Iain and Ermon, Stefano},
  journal={Advances in neural information processing systems},
  volume={34},
  pages={1415--1428},
  year={2021}
}

TL;DR:

Introduction

Efficient maximum Likelihood Training

As mentioned in [Song et al. ICLR 2021], the weighting function $\lambda(t)$ can be chosen using some theoretically principled approach. We can do that to specifically maximize the maximum likelihood.

There is a connection between the Kullback-Leiber (KL) divergence and the score matching objective.

$$ \mathrm{KL}(p_{\text{data}} \,\|\, p_\theta) \leq \frac{1}{2} \mathbb{E}{t \sim \text{Uniform}[0, T]} \left[ \sigma(t)^2 \mathbb{E}{p_t(x)} \left[ \| \nabla_{\mathbf{x}} \log p_t(\mathbf{x}) - s_\theta(\mathbf{x}, t) \|_2^2 \right] \right] \\

Because this score matching loss function is very efficient to optimize, this also gives a way for efficient maximum likelihood training for score-based diffusion models. With this approach, we can further improve the density values on several tested datasets.