Table of Contents
Motivation
The reason that BAT exists is that nearly any Bayesian analysis these days is too complicated to be handled analytically. To address typical questions like
- What is known about a single parameter taking into account the uncertainty on all other parameters?
- How are parameters correlated?
one needs to be able to compute and visualize 1D and 2D marginal distributions; cf. Marginalization. These are defined as integrals over the posterior; for example the 2D marginal distribution is
\begin{align} \label{eq:int-marginal} P(\theta_1, \theta_2 \cond D) = \int \prod_{i \ne 1,2} \rmdx{\theta_i} P(\vecth \cond D). \end{align}
To do model comparison, one has to compute the evidence, that is the integral over all parameters
\begin{equation} Z = \int \rmdx{ \vecth} P(D|\vecth, M) P(\vecth \cond M). \end{equation}
Therefore Bayesian inference in practice requires good integration techniques. These methods are bundled in the class BCIntegrate
with the exception of the Markov chain code in BCEngineMCMC
; see also the Structure of the Code.
For low-dimensional problems, deterministic integration methods are usually the fastest and most robust but for higher dimensions, Monte Carlo techniques are the most efficient tools known. Depending on the setting, BAT defaults to deterministic methods for \(d \le 2,3\) dimensions and uses Monte Carlo for \(d>3\).
Marginalization
The main use case for BAT is to estimate and visualize the marginal distributions of the posterior. Given a model m
, all marginal distributions are estimated as histograms by
The individual distributions can be accessed using unsigned
or std::string
as keys
Directly access the underlying histogram with
In case of many (nuisance) parameters, it may be useful to constrain which histograms are stored because the number of 2D marginals grows like \(n \cdot (n-1)/2 \) with the number of parameter \(n\). This can be achieved either for all 1D or 2D marginal distributions or for individual combinations. For example:
- See also
BCEngineMCMC::SetFillHistogramParPar
,BCEngineMCMC::SetFillHistogramParObs
,BCEngineMCMC::SetFillHistogramObsObs
,BCEngineMCMC::SetFlagFillHistograms
The method for marginalizing can be selected as follows
where method
is an enum
in BCIntegrate::BCMarginalizationMethod
method | Details |
---|---|
kMargMetropolis | Metropolis algorithm; see Markov chain Monte Carlo. |
kMargMonteCarlo | Sample mean integration in each histogram bin. Least efficient method. |
kMargGrid | Evaluate target at each bin center. Most efficient for 1D and 2D. |
kMargDefault | Use kMargGrid for \(d \leq 2 \), else kMargMetropolis . |
The availability of marginalization methods can be queried at runtime using BCIntegrate::CheckMarginalizationAvailability
.
In case any pre- or postprocessing needs to happen to set up data structures, we provide the hooks virtual void BCIntegrate::MarginalizePreprocess
and virtual void BCIntegrate::MarginalizePostprocess
. They are empty by default and can be overloaded in a user model.
Evidence
In BAT terminology, the evidence or normalization constant of a model m
with the default method is computed as
Once the normalization has been determined, BCModel::LogProbability
returns the normalized value of the posterior as opposed to the not normalized result from BCModel::LogProbabilityNN
.
- Note
- Internally only
BCModel::LogProbabilityNN
is called when integrating or optimizing because it only requires the user to overrideBCModel::LogLikelihood
and to set a prior.
BCModel::Normalize
returns the evidence on the linear scale. Choose the method of integration explicitly with
where method
is an enum
in BCIntegrate::BCIntegrationMethod
.
BCIntegrationMethod | Details |
---|---|
kIntMonteCarlo | Sample mean. Usually least efficient. |
kIntCuba | Use a method from cuba. |
kIntGrid | Approximate Riemann sum over a dense grid for \(d \leq 3 \). |
kIntLaplace | Laplace approxation. Only approximately valid for peaked unimodal integrands. Incorrect if distribution is heavy tailed or if the mode is near a boundary. Very fast. No uncertainty estimate. Only method implemented on the log scale. |
kIntDefault | Use Cuba if available. Else use kIntGrid for \(d \leq 3 \) and kIntMonteCarlo for \(d > 3 \). |
General termination criteria for all integration methods except kIntLaplace
are the desired absolute precision \(\epsilon_a\) and relative precision \(\epsilon_r\) and the minimum and maximum number of iterations:
The integration terminates if
\begin{equation} |Z-\hat{Z}| \leq \max(\epsilon_a, \epsilon_r Z) \end{equation}
where \(\hat{Z}\) is the current estimate of the evidence.
Rescaling
In case the log likelihood is very small or very large, going to the linear scale may take it to exactly zero or infinity in finite precision. This often happens in practice if the log likelihood is a sum of N
terms. For example, assume each factor is 0.5. Then \(\exp(0.5 N)=\infty\) on the computer for \(N \geq 1420\) using double precision. To avoid this problem, you can rescale the log likelihood by manually subtracting the value at the mode inside LogLikelihood
.
Another option, if requirements are satisfied, is to use the Laplace method. It is naturally implemented on the log scale. While the standard interface to all integration methods via BCIntegrate::Normalize()
always transforms to the linear scale, calling BCIntegrate::IntegrateLaplace()
directly returns the evidence on the log scale.
Cuba
The Cuba package itself has four different integration methods. Cuba is an external dependency; see the installation instructions on how to build BAT with Cuba support. If Cuba is available, select the Cuba method with
where method
is an enum
in BCIntegrate::BCCubaMethod
BCCubaMethod | Details |
---|---|
kCubaVegas | VEGAS algorithm by Lepage |
kCubaSuave | Suave algorithm. |
kCubaDivonne | Divonne algorithm. |
kCubaCuhre | Cuhre algorithm. Essentially quadrature in higher dimensions. Suffers from curse of dimensionality but most efficient and robust in low dimensions. |
kCubaDefault | For \(d=1\), use VEGAS, for \(d=2,3\), use Cuhre, and for \(d> 3\), use Divonne. |
- Note
- Cuba evaluates the posterior in parallel by default. If the posterior is not thread-safe (see Multithreading and Thread Safety), it is recommended to set the environment variable
CUBACORES
to 1.
Each Cuba method comes with various parameter values to set. We have taken over default values from the example that comes with Cuba but they are by no means optimal for every problem. Please experiment and consult the Cuba manual that comes with the Cuba source code from http://www.feynarts.de/cuba/. All options are accessible in the namespace BCCubaOptions
. An example with bogus values
Slices
To get a quick visualization of a complicated posterior, a slice, or projection, may be preferable to a full marginalization. That is, all parameters except one [two] are held fixed (instead of integrated over as in marginalization) and the remaining parameter[s] are evaluated on a regular grid. In other words, the conditional distribution
\begin{equation} P(\theta_1 \cond \vecth_{\backslash 1}, D) = \frac{P(\vecth \cond D)} {P(\vecth_{\backslash 1} \cond D)} . \end{equation}
The slice is returned as one [two] dimensional histogram. This is achieved with the various variants of BCIntegrate::GetSlice
.
In a posterior where all parameters are independent, the marginal and conditional distributions coincide because
\begin{equation} P(\theta_1 \cond \vecth_{\backslash 1}, D) = \frac{P(\vecth_{\backslash 1} \cond D) P(\theta_1 \cond D)} {P(\vecth_{\backslash 1} \cond D)}= P(\theta_1 \cond D) . \end{equation}
- Note
- Independence rarely holds in practice, so marginalization is still useful.
For example, in a Gaussian posterior with independent parameters, we first find the mode and use these parameter values to find the 1D conditional distribution of the first parameter. In this example, the result is just a Gaussian. Here is how to find the result in general, for non-Gaussian or non-independent posteriors: