Michalis K. Titsias, Magnus Rattray, and Neil D. Lawrence, “Markov chain Monte Carlo algorithms for Gaussian processes,” Bayesian Time Series Models, David Barber, A. Taylan Cemgil, and Silvia Chiappa, eds., Cambridge: Cambridge University Press, 2011, pp. 295–316. [Link].

Estimate latent function

$f(\bm{x})$

Observations

$y_i = f_i + \epsilon_i$

Joint distribution is

$p(\bm{y},\bm{f}) = p(\bm{y}|\bm{f}) p(\bm{f})$

Applying Bayes’ rule and posterior over $\bm{f}$ is

$p(\bm{f}|\bm{y}) = \frac{p(\bm{y}|\bm{f})p(\bm{f})}{\int p(\bm{y}|\bm{f})p(\bm{f})\,{\rm d}\bm{f}}$

Predict the function value $\bm{f}_*$ at an unseen inputs $\bm{X}_*$

$\textcolor{blue}{p(\bm{f}_*|\bm{y})} = \int p(\bm{f}_*|\bm{f}) p(\bm{f}|\bm{y})\,{\rm d}\bm{f}$

where $p(\bm{f}_*|\bm{f})$ is the conditional GP prior given by,

$p(\bm{f}_*|\bm{f}) = \mathcal{N}(\bm{f}_*|\circ,\circ)$

Predict $\bm{y}_*$ corresponding to $\bm{f}_*$ is

$\textcolor{red}{p(\bm{y}_*|\bm{y})} = \int p(\bm{y}_*|\bm{f}_*) \textcolor{blue}{p(\bm{f}_*|\bm{y})} \,{\rm d}\bm{f}_*$

In a mainstream machine learning application involving large datasets and where fast inference is required, deterministic methods are usually preferred simply because they are faster.
In contrast, in applications related to scientific questions that need to be carefully addressed by carrying out a statistical data analysis, MCMC is preferred.