reparameterization trick

1 VAE example

Let's say that our loss involves a stochastic term. That is, it involves sampling from a Gaussian distribution, defined by our network. We can't compute a gradient over that sampling step.

In this situation, the computation graph might look like:

    f
    ^
    |
    |
    z <--- stochastic node
    ^
   / \
  /   \
sig    mu

Then, the computation of \(f\) proceeds as:

compute \(\mu\) according to the model parameters
compute \(\sigma\) according to the model parameters
sample \(z\) from \(\mathcal{N}(\sigma, \mu)\)
compute \(f(z)\)

Now we are in trouble, because we can't do backpropagation on \(x\) and \(\phi\). With the reparameterization trick, we move the sampling step off the critical path of the back propagation:

       f
       ^
       |
       |
       z
    __ ^ __
   /   |   \
  /    |    \
sig   mu   epsilon <--- stochastic node

Here, \(\epsilon \sim \mathcal{N}(0,1)\)

Then, the procedure for computing the output at \(f\) is:

compute \(\sigma\) according to the model parameters
compute \(\mu\) according to the model parameters
sample \(\epsilon\sim \mathcal{N}(0,1)\)
compute \(z=\mu + \sigma \odot \epsilon\). Note that this is equivalent to sampling \(z\) from \(\mathcal{N}(\mu, \sigma)\)
compute \(f(z)\)

Now, the gradient can flow from \(f\) to \(\sigma\) and \(\mu\)

reparameterization trick

1 VAE example

2 sources