Week 6 - Bayesian Neural Networks

Week 6

Topic: Introduction to Bayesian Neural Networks

Keynote Speaker: Wang Ma

Time: Sept 2, 20:00 - 21:30

Venue: Room 341, School of Business

Tencent Meeting: #907-2153-6929

Compendium

Introduction

This tutorial provides a detailed overview of Bayesian Neural Networks (BNNs), discussing core principles such as Bayesian inference, posterior estimation, and variational inference. BNNs allow for the integration of uncertainty estimates in neural networks, making them ideal for tasks involving noisy or uncertain data.

1. Bayesian Inference

In Bayesian inference, the goal is to estimate the posterior distribution \( p(\theta | D) \), given the prior distribution \( p(\theta) \) and likelihood \( p(D | \theta) \).

The main equation governing Bayesian inference is:

\[ p(\theta | D) = \frac{p(D | \theta) p(\theta)}{p(D)} \]

Where:

\( p(\theta) \): prior distribution
\( p(D | \theta) \): likelihood of the data given parameters
\( p(\theta | D) \): posterior distribution
\( p(D) \): marginal likelihood (model evidence)

2. Bayesian Neural Networks (BNNs)

BNNs combine neural networks with Bayesian inference, allowing for uncertainty quantification in predictions. The standard deep learning approach seeks point estimates of parameters, while BNNs infer distributions over weights:

\[ p(y^*|x^*, D) = \int_\theta p(y^* |x^*, \theta) p(\theta| D ) d\theta \]

Advantages of BNNs:

Robust predictions with uncertainty estimates.
Ability to detect out-of-distribution (OOD) and adversarial examples.

3. Approximate Inference in BNNs

Since exact inference in BNNs is intractable, we rely on approximate methods such as variational inference (VI). In VI, the goal is to approximate the posterior \( p(\theta | D) \) with a simpler distribution \( q(\theta) \) by minimizing the Kullback-Leibler (KL) divergence:

\[ KL(q(\theta) || p(\theta | D)) \]

Key Steps in Approximate Inference:

Construct an approximate posterior \( q(\theta) \).
Fit the variational distribution using methods like mean-field Gaussian approximations.
Use Monte Carlo sampling for approximate Bayesian inference.

4. Variational Inference (VI)

Variational inference transforms inference into an optimization problem, minimizing the evidence lower bound (ELBO):

\[ \text{ELBO} = \mathbb{E}_{q(\theta)}[\log p(D | \theta)] - KL(q(\theta) || p(\theta)) \]

Rewriting the ELBO provides two terms:

Data fitting term: Encourages the model to fit the data well.
KL regularizer: Encourages the approximate posterior to remain close to the prior.

5. Applications and Case Studies

Case Study 1: Bayesian Optimization

This example illustrates how Bayesian optimization can be used to optimize expensive, black-box functions. The acquisition function balances exploration and exploitation.

Case Study 2: Detecting Adversarial Examples

BNNs help detect adversarial examples by leveraging uncertainty measures. Adversarial examples are treated as OOD data, and BNNs exhibit higher uncertainty in their predictions for such cases.

Conclusion

Bayesian Neural Networks provide a powerful framework for incorporating uncertainty into neural networks, leading to more robust and interpretable models. Approximate inference techniques like variational inference enable scalable learning with BNNs, despite the intractability of exact solutions.

Material

Presentation Slides (Thanks to Dr. Yingzhen Li): http://yingzhenli.net/home/pdf/ProbAI2022_vi_bnn_tutorial.pdf

References

Dr. Yingzhen Li's talk about BNN
Dr. Yingzhen Li's notes on BNN
Julyan, A., et al. (2023) A Primer on Bayesian Neural Networks: Review and Debates
Blundell, C., et al. (2015). Weight Uncertainty in Neural Networks. ICML.
Gal, Y., & Ghahramani, Z. (2016). Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. ICML.
Kendall, A., & Gal, Y. (2017). What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? NeurIPS.