In this post I summarize everything about GP. An individual post will be organized when I felt that I got enough materials, insights, and thoughts.

At the end, I attach all other individual posts about GP or having mentioned GP.

# Summary of GPstuff document v4.6

## From prior to posterior predictive

Bayes

Bayes inference

Observation model: $\bm{y}|\bm{f},\phi \sim \Pi_{i=1}^n p(y_i|f_i,\phi)$

GP prior: $f(\bm{x})|\theta \sim \mathcal{GP}\left(m(\bm{x}), k(\bm{x},\bm{x}'|\theta)\right)$

hyperprior: $\vartheta \triangleq [\theta,\phi] \sim p(\theta)p(\phi)$

The latent function value $f(\bm{x})$ at fixed $\bm{x}$ is called a latent variable.

Any set of function values $\bm{f} \triangleq [f_1,f_2,\dots]^T$ has a multivariate Gaussian distribution

$p(\bm{f}|\bm{X},\theta) = N(\bm{f}|\bm{0},\bm{K}_{f,f})$

Predict the values $\tilde{\bm{f}}$ at new input locations $\tilde{\bm{X}}$, the joint distribution is

$\begin{bmatrix} \bm{f}\\ \tilde{\bm{f}} \end{bmatrix}| \bm{X}, \tilde{\bm{X}}, \theta \sim N\left(\bm{0}, \begin{bmatrix} K_{f,f} & K_{f,\tilde{f}}\\ K_{\tilde{f},f} & K_{\tilde{f},\tilde{f}} \end{bmatrix}\right)$

The conditional distribution of $\tilde{\bm{f}}$ given $\bm{f}$ is

$\tilde{f} | \bm{f},\bm{X},\tilde{\bm{X}}, \theta \sim N(\bm{K}_{\tilde{f},f}\bm{K}_{f,f}^{-1}\bm{f},\, \bm{K}_{\tilde{f},\tilde{f}}-\bm{K}_{\tilde{f},f}\bm{K}_{f,f}^{-1}\bm{K}_{f,\tilde{f}})$

So the conditional distribution of the latent function $f(\tilde{\bm{x}})$ is also a GP with

• conditional mean function: $\textcolor{green}{\mathbb{E}_{\tilde{\bm{f}}|\bm{f},\theta,\phi}[f(\tilde{\bm{x}})]} = k(\tilde{\bm{x}},\bm{X}|\theta) \bm{K}_{f,f}^{-1} \bm{f}$
• conditional covariance function: $\textcolor{green}{\text{Cov}_{\tilde{\bm{f}}|\bm{f},\theta,\phi}[f(\tilde{\bm{x}})]} = k(\tilde{\bm{x}},\tilde{\bm{x}}') - k(\tilde{\bm{x}},\bm{X}|\theta)\bm{K}_{f,f}^{-1}k(\bm{X},\tilde{\bm{x}}'|\theta)$ （不确定等号左边的符号是否正确）

First inference step is to form the conditional posterior of the latenet variables $\bm{f}$ given the parameters $\vartheta$ ($\mathcal{D}\triangleq\{\bm{X},\bm{y}\}$) （这里暂时假设已经取得，依赖于 observation model 的选择或设计，在后面会讨论如何计算，实际上除了经典GP都需要用近似方法）

$p(\bm{f}|\mathcal{D},\theta,\phi) = \frac{ \overbrace{p(\bm{y}|\bm{f},\phi)}^\text{observation model} \overbrace{p(\bm{f}|\bm{X},\theta)}^\text{GP prior} }{ \int p(\bm{y}|\bm{f},\phi) p(\bm{f}|\bm{X},\theta) d\bm{f} \textcolor{green}{\triangleq p(\bm{y}|\bm{X},\vartheta)}} \tag{8}$

After this, we can marginalize over the parameters $\vartheta$ to obtain the marginal posterior distribution for the latent vriables $\bm{f}$

$p(\bm{f}|\mathcal{D}) = \int \overbrace{p(\bm{f}|\mathcal{D},\theta,\phi)}^\text{see above} \overbrace{p(\theta,\phi|\mathcal{D})}^\text{hyperprior} d\theta d\phi$

The conditional posterior predictive distribution $p(\tilde{f}|\mathcal{D},\vartheta,\tilde{\bm{x}})$ can be evaluated exactly or approximated, （同样，在后面会讨论如何计算，这里暂时假设已经取得）

\color{red} \begin{aligned} p(\tilde{f}|\mathcal{D},\vartheta,\tilde{\bm{x}}) &= \int p(\tilde{f},\bm{f}|\mathcal{D},\vartheta,\tilde{\bm{x}}) \, d\bm{f} \\ &= \int \overbrace{ p(\tilde{f}|\bm{f},\bm{X},\vartheta,\tilde{\bm{x}}) }^\text{got from GP prior} \cdot \overbrace{ p(\bm{f}|\mathcal{D},\vartheta,\tilde{\bm{x}}) }^\text{got from Bayes' theorem} \, d\bm{f} \\ &\text{(Not sure if this is correct.)} \end{aligned}

The posterior predictive distribution $p(\tilde{f}|\mathcal{D},\tilde{\bm{x}})$ is obtained by marginalizing out the parameters $\vartheta$ from $p(\tilde{f}|\mathcal{D},\vartheta,\tilde{\bm{x}})$.

The posterior joint predictive distribution $p(\tilde{\bm{y}}|\mathcal{D},\theta,\phi,\tilde{\bm{x}})$ requires integration over $p(\tilde{f}|\mathcal{D},\vartheta,\tilde{\bm{x}})$. (Usually not used.)

The marginal predicted distribution for individual $\tilde{y}_i$ is

$p(\tilde{y}_i|\mathcal{D},\tilde{\bm{x}}_i,\theta,\phi) = \int p(\tilde{y}_i|\tilde{f}_i,\phi) p(\tilde{f}_i|\mathcal{D},\tilde{\bm{x}}_i,\theta,\phi) \, d\tilde{f}_i$

If the parameters are considered fixed, using GP’s marginalization and conditionalization properties (still Gaussian), we can evaluate the posterior predictive mean $m_p(\tilde{f}|\mathcal{D},\theta,\phi)$ from the conditional mean $\color{green}\mathbb{E}_{\tilde{\bm{f}}|\bm{f},\theta,\phi}[\tilde{\bm{f}}]$ (where $\tilde{\bm{f}} \triangleq f(\tilde{\bm{x}})$) （前面推导已经得到）, through marginalizing out the latent variables $\bm{f}$,

$m_p(\tilde{f}|\mathcal{D},\theta,\phi) = \int \textcolor{green}{\mathbb{E}_{\tilde{\bm{f}}|\bm{f},\theta,\phi}[\tilde{\bm{f}}]}\, p(\bm{f}|\mathcal{D},\theta,\phi) d\bm{f} \xlongequal{\text{\color{red}代入并保留只与f相关的量}} k(\tilde{\bm{x}},\bm{X}|\theta) \bm{K}_{f,f}^{-1} \mathbb{E}_{\bm{f}|\mathcal{D},\theta,\phi}[\bm{f}] \tag{11}$

The posterior predictive covariance between any set of latent variables $\tilde{\bm{f}}$ is （这一步的推导是利用了Wikipedia: Law of total covariance

\begin{aligned} \text{Cov}_{\tilde{\bm{f}}|\mathcal{D},\theta,\phi} [\tilde{\bm{f}}] &= \mathbb{E}_{\bm{f}|\mathcal{D},\theta,\phi}\left[ \textcolor{green}{\text{Cov}_{\tilde{\bm{f}}|\bm{f},\theta,\phi}[\tilde{\bm{f}}]} \right] + \text{Cov}_{\bm{f}|\mathcal{D},\theta,\phi}\left[ \textcolor{green}{\mathbb{E}_{\tilde{\bm{f}}|\bm{f},\theta,\phi}[\tilde{\bm{f}}]} \right] \\ &= \overbrace{\textcolor{green}{\text{Cov}_{\tilde{\bm{f}}|\bm{f},\theta,\phi}[\tilde{\bm{f}}]}}^{\text{independent of \bm{f}}} + \underbrace{\text{Cov}_{\bm{f}|\mathcal{D},\theta,\phi}\left[ k(\tilde{\bm{x}},\bm{X}|\theta) \bm{K}_{f,f}^{-1} \bm{f}] \right]}_{k(\tilde{\bm{x}},\bm{X})\bm{K}_{f,f}^{-1}\text{Cov}_{\bm{f}|\mathcal{D},\theta,\phi}[\bm{f}]\bm{K}_{f,f}^{-1}k(\bm{X},\tilde{\bm{x}}')} \end{aligned}

Then, the posterior predictive covariance function $k_p(\tilde{\bm{x}},\tilde{\bm{x}}'|\mathcal{D},\theta,\phi)$ is

$k_p(\tilde{\bm{x}},\tilde{\bm{x}}'|\mathcal{D},\theta,\phi) = k(\tilde{\bm{x}},\tilde{\bm{x}}'|\theta) - k(\tilde{\bm{x}},\bm{X}|\theta)\left( \bm{K}_{f,f}^{-1}-\bm{K}_{f,f}^{-1}\text{Cov}_{\bm{f}|\mathcal{D},\theta,\phi}\bm{K}_{f,f}^{-1} \right) k(\bm{X},\tilde{\bm{x}}'|\theta) \tag{13}$

So, even if the exact posterior $p(\tilde{f}|\mathcal{D},\theta,\phi) is not available$

## From latents $\bm{y}$ to observations $\bm{y}$

Bayes

Gaussian observation model: $y_i \sim N(f_i,\sigma^2)$

$p(y_i|f_i,\theta,\overbrace{\phi}^\text{includes \sigma}) = \dots$

Marginal likelihood $p(\bm{y}|\bm{X},\theta,\sigma^2)$ is

$p(\bm{y}|\bm{X},\theta,\sigma^2) = N(\bm{y}|\bm{0},\bm{K}_{f,f}+\sigma^2\bm{I})$

The conditional posterior of latent variables $\bm{f}$ has analytical solution now, (should be done through completing the square. Bishop’s book or GPML book should have details.)

$\bm{f}|\mathcal{D},\theta,\phi \sim N( \bm{K}_{f,f}(\bm{K}_{f,f}+\sigma^2\bm{I})^{-1}\bm{y},\quad \bm{K}_{f,f}-\bm{K}_{f,f}(\bm{K}_{f,f}+\sigma^2\bm{I})^{-1}\bm{K}_{f,f} ) \tag{15}$

Since the conditional posterior of $\bm{f}$ is Gaussian, the posterior process is still a GP, whose mean and covariance function is obtained from Eqs. (11) and (13).

$\tilde{f}|\mathcal{D},\theta,\phi \sim \mathcal{GP}(m_p(\tilde{\bm{x}}),\quad k_p(\tilde{\bm{x}},\tilde{\bm{x}}'))) \tag{16}$

# Learning Materials

A Practical Guide to Gaussian Processes

• 对原理的讲解很精辟，简练
• 有实际使用建议
• 对初始化的建议
• 大量使用后才能体会到文章中的精髓

A Visual Exploration of Gaussian Processes

• 即时可视化，互动
• 讲解不如上一个链接

Zoubin Ghahramani, “A Tutorial on Gaussian Processes (or Why I Don’t Use SVMs)”, 2011.