INLA methodology

There are various methodologies for the computational implementation of Bayesian inference, simulation methods such as MCMC (Markov chain Monte Carlo), or approximate methods like VB (variational Bayes); all of them with their own challenges. However, INLA (Integrated Nested Laplace Approximation) is a deterministic approximate approach, developed by (Rue, Martino y Chopin, 2009) and expanded upon in (Lindgren y Rue, 2015; Rue y col., 2017; Bakka y col., 2018). . It allows for Bayesian inference in a set of structured additive models, termed latent Gaussian models (LGMs). The INLA method enables the calculation of joint posterior distributions, the marginal distributions of each parameter and hyperparameter, as well as combinations of these or the posterior predictive distributions.

At the core of INLA is the Laplace approximation applied to the expression of the conditional probability distribution of the latent field. This implies that the latent structure must follow a Gaussian Markov Random Field (GMRF) that can be linked to latent Gaussian models (Lindgren, Rue y Lindström). Although many models can be rewritten in such a way that their structure is similar to an LGM.

Laplace Approximation

The Laplace approximation for a density function $f(x)$ involves transformation using logarithms and carrying out a second-order Taylor series expansion, evaluated at the mode of the function:

$$ \begin{array}{r l} \int_X f(x)dx& =\int_X\exp[\log(f(x))]dx\\ &\approx \int_X \exp\left( \log(f(x_0)) + (x-x_0)\left.\frac{\partial \log(f(x))}{\partial x}\right\vert_{x=x_0} + \frac{(x-x_0)^2}{2}\left.\frac{\partial^2 \log(f(x))}{\partial x}^2\right\vert_{x=x_0}\right)dx, \end{array} $$

where the function $f(x)$ will be evaluated at the mode, $\left.f(x)\right\vert_{x=x_0}$, such that

$$x_0=\{x:\frac{\partial f(x)}{\partial x}=0 \wedge \frac{\partial^2 f(x)}{\partial x^2} \neq 0 \}.$$

That is, the function is evaluated when the first derivative is null, so the first-order term in the Taylor series expansion can be simplified. Also, if we express the second-order term as

$$\sigma^2=\left.\frac{1}{\partial^2 \log(f(x))/\partial x^2}\right\vert_{x=x_0},$$

then we can express the Laplace approximation as the kernel of a Gaussian function:

$$\int_Xf(x)dx\approx f(x_0)\cdot \int_X \exp\left(-\frac{(x-x_0)^2}{2\sigma^2}\right)dx.$$

Gaussian Markov Random Field

A Gaussian MArkov Random Field (GMRF) is a Gaussian Field (GF) with Markov properties. This means that, given a random vector $\mathbf{x}\in \mathbb{R}^n$ is said GMRF with reference to a graph $\mathcal{G}=(\mathcal{V}, \mathcal{E})$ with mean $\boldsymbol\mu$ and precision matrix (symmetric positive difenite) $\mathbf{Q}>0$ if its density has the following structure

$$\pi(\mathbf{x})=(2\pi)^{-n/2}|\mathbf{Q}|^{1/2}\exp\left(-\frac{1}{2}(\mathbf{x}-\boldsymbol\mu)^T\mathbf{Q}(\mathbf{x}-\boldsymbol\mu)\right),$$

and

$$(\forall i\neq j) \left\lbrace Q_{ij}\neq 0 \iff \{i,j\}\in \mathcal{E}\right\rbrace.$$

If the precision matrix $\mathbf{Q}$ is completely dense then the network $\mathcal{G}$ is completely connected. This implies that any normal distribution with symmetric positive definite (SPD) covariance matrixes is a GMRF and vice versa.

In the case where $\mathbf{Q}$ is sparse then the properties of GMRFs are really useful and we can make use of them. In particular, a very useful property is the interpretation of the conditional distributions of the elements of a GMRF.

Suppose $\mathbf{x}$ is a GMRF with respect to a graph $\mathcal{G}=(\mathcal{V}, \mathcal{E})$, with mean $\boldsymbol\mu$ and SPD precision matrix $\mathbf{Q}>0$, then

$$\begin{array}{rcl} \text{E}(x_i|\mathbf{x}_{-i}) & = & \mu_i - \frac{1}{Q_{ii}}\sum_{j:j\sim i}Q_{ij}(x_j-\mu_j),\\ \text{Prec}(x_i|\mathbf{x}_{-i}) & = & Q_{ii},\\ \text{Corr}(x_i,x_j|\mathbf{x}_{-ij}) & = & -\frac{Q_{ij}}{\sqrt{Q_{ii}Q_{jj}}},\quad i\neq j.\\ \end{array} $$

The diagonal elements of $\mathbf{Q}$ are the conditional precisions of $x_i$ given $\mathbf{x}_{-i}$, while the off-diagonal elements, with a scaling factor, are the conditional correlation between $x_i$ and $x_j$, given $\mathbf{x}_{-ij}$.

Gaussian Latent Fields

The structure on which INLA is based can be summarised in the following hierarchical model:

$$\begin{array}{rcl} \mathbf{y}|\mathbf{\mathcal{X}},\boldsymbol\theta_1 & \sim & \prod_{i=1}^{n}\pi(y_i|\mathcal{X}_i,\boldsymbol\theta_1),\\ \mathbf{\mathcal{X}}|\boldsymbol\theta_2 & \sim & N(\mathbf{0},\mathbf{Q}_{\mathbf{\mathcal{X}}}^{-1}(\boldsymbol\theta_2)),\\ \boldsymbol\theta=\{\boldsymbol\theta_1,\boldsymbol\theta_2\} & \sim & \pi(\boldsymbol\theta),\\ \end{array} $$

where $\mathbf{y}|\mathbf{\mathcal{X}},\boldsymbol\theta_1$ is the data (or likelihood) level, where $\mathbf{\mathcal{X}}$ are the elements of the latent field and $\boldsymbol\theta_1$ are the hyperparameters of the likelihood. The elements of the latent field are distributed according to $\mathbf{\mathcal{X}}|\boldsymbol\theta_2$, following a GMRF with mean ${0}$ and the latent field structure is integrated into the structure of the precision matrix $\mathbf{Q}_{\mathbf{\mathcal{X}}}^{-1}(\boldsymbol\theta_2)$, where $\boldsymbol\theta_2$ are the hyperparameters of the latent field. Finally, the last level is the one concerning the distribution of the hyperparameters of the model $(\boldsymbol\theta)$, comprising both those of the likelihood $(\boldsymbol\theta_1)$ and those concerning the latent field $(\boldsymbol\theta_2)$.

The second level is the latent Gaussian field, which constitutes a latent Gaussian model (LGM). LGMs are a class of models that follow Gaussian processes, be it for time series, spatial models, iid random effects, cluster random effects, etc. Therefore, the Gaussian field that has the above structure can also be formulated according to the linear predictor of the model as

$$\begin{array}{c} \boldsymbol\eta=\beta_0\mathbf{1} + \boldsymbol\beta\mathbf{X} + \sum_{k=1}^K f_k(\mathbf{u}_k), \end{array} $$

where $(\beta_0, \boldsymbol\beta)$ are the parameters associated with the linear effects, while $\{\mathbf{f}\}$ are the unknown functions of the random effects $\mathbf{U}=\{\mathbf{u}\}, which can have very different structures: iid, random walks, Besag, SPDE (Stochastic Partial Differential Equation), etc.

Based on the above expression, we can reformulate it in matrix terms $\boldsymbol\eta=\mathbf{A}_j\mathbf{u}_j$, where the effects $(\mathbf{u}_j)$ are linked to the predictor of the observations $(\boldsymbol\eta)$ through a projection matrix $(\mathbf{A}_j)$ relative to each effect $(\mathbf{u}_j)$. This projection matrix integrates the weights associated with the effects, i.e. the values of the explanatory variables for linear effects, smoothing weights or associated weights with SPDEs. That is, we can rewrite it as

$$ \boldsymbol\eta=\left( \begin{array}{c} \eta_1\\ \hline \vdots \\ \hline \eta_K \end{array}\right)=\left(\mathbf{A}_1| \mathbf{A}_2| \cdots| \mathbf{A}_J\right) \left( \begin{array}{c} \mathbf{u}_1\\ \hline \mathbf{u}_2 \\ \hline \vdots \\ \hline \mathbf{u}_J \end{array}\right), $$

where each effect $\mathbf{u}_j:j\in (1,...,J)$ is related to its corresponding projection matrix $\mathbf{A}_j$, linking the effects with the linear predictor $\boldsymbol\eta=(\eta_1,...,\eta_K)$.

Key Articles

Bakka, H., Rue, H., Fuglstad, G.-A., Riebler, A. I., Bolin, D., Illian, J., Krainski, E., Simpson, D. P., & Lindgren, F. K. (2018). Spatial modelling with INLA: A review. In Wires (Vol. xx, Issue Feb). https://doi.org/10.1002/wics.1443
Lindgren, F., Rue, H., & Lindström, J. (2011). An explicit link between gaussian fields and gaussian markov random fields: The stochastic partial differential equation approach. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 73(4). https://doi.org/10.1111/j.1467-9868.2011.00777.x
Lindgren, F., & Rue, H. (2015). Bayesian spatial modelling with R-INLA. Journal of Statistical Software, 63(19). https://doi.org/10.18637/jss.v063.i19
Rue, H., Martino, S., & Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 71(2). https://doi.org/10.1111/j.1467-9868.2008.00700.x
Rue, H., Riebler, A., Sørbye, S., Illian, J., Simpson, D. & Lindgren, F. (2017). Bayesian Computing with INLA: A Review. Annual Review of Statistics and Its Application, 4:1, 395-421. https://doi.org/10.1146/annurev-statistics-060116-054045

Books

Blangiardo, M., & Cameletti, M. (2015). Spatial and Spatio-temporal Bayesian Models with R - INLA. In Spatial and Spatio-temporal Bayesian Models with R - INLA. Wiley. https://doi.org/10.1002/9781118950203
Gómez-Rubio, V. (2020). Bayesian Inference with INLA. In Bayesian Inference with INLA. Chapman & Hall/CRC Press. https://doi.org/10.1201/9781315175584
Moraga, P. (2019). Geospatial Health Data. Chapman and Hall/CRC. https://doi.org/10.1201/9780429341823
Krainski, E., Gómez-Rubio, V., Bakka, H., Lenzi, A., Castro-Camilo, D., Simpson, D., Lindgren, F., & Rue, H. (2018). Advanced Spatial Modeling with Stochastic Partial Differential Equations Using R and INLA. In Advanced Spatial Modeling with Stochastic Partial Differential Equations Using R and INLA. https://doi.org/10.1201/9780429031892
Rue, H., & Held, L. (2005). Gaussian Markov Random Fields. Chapman and Hall/CRC. https://doi.org/10.1201/9780203492024
Xiaofeng Wang, Ryan Yue, & Faraway, J. J. (2018). Bayesian Regression Modeling with INLA. Chapman & Hall.