What is the Wasserstein distance (Earth Mover’s Distance) and why is it preferred over Jensen-Shannon divergence for GANs?

Question

Machine Learning — Hard

What is the Wasserstein distance (Earth Mover’s Distance) and why is it preferred over Jensen-Shannon divergence for GANs?

Accepted Answer

The Wasserstein distance measures the minimum cost of transporting probability mass between distributions, providing stable gradients for GAN training. This is preferred over Jensen-Shannon divergence, which saturates when distributions have non-overlapping support.