Week 6
Topic: Introduction to Bayesian Neural Networks
Keynote Speaker: Wang Ma
Time: Sept 2, 20:00 - 21:30
Venue: Room 341, School of Business
Tencent Meeting: #907-2153-6929
Compendium
Introduction
This tutorial provides a detailed overview of Bayesian Neural Networks (BNNs), discussing core principles such as Bayesian inference, posterior estimation, and variational inference. BNNs allow for the integration of uncertainty estimates in neural networks, making them ideal for tasks involving noisy or uncertain data.
1. Bayesian Inference
In Bayesian inference, the goal is to estimate the posterior distribution \( p(\theta | D) \), given the prior distribution \( p(\theta) \) and likelihood \( p(D | \theta) \).
The main equation governing Bayesian inference is:
\[ p(\theta | D) = \frac{p(D | \theta) p(\theta)}{p(D)} \]
Where:
- \( p(\theta) \): prior distribution
- \( p(D | \theta) \): likelihood of the data given parameters
- \( p(\theta | D) \): posterior distribution
- \( p(D) \): marginal likelihood (model evidence)
2. Bayesian Neural Networks (BNNs)
BNNs combine neural networks with Bayesian inference, allowing for uncertainty quantification in predictions. The standard deep learning approach seeks point estimates of parameters, while BNNs infer distributions over weights:
\[ p(y^*|x^*, D) = \int_\theta p(y^* |x^*, \theta) p(\theta| D ) d\theta \]
Advantages of BNNs:
- Robust predictions with uncertainty estimates.
- Ability to detect out-of-distribution (OOD) and adversarial examples.
3. Approximate Inference in BNNs
Since exact inference in BNNs is intractable, we rely on approximate methods such as variational inference (VI). In VI, the goal is to approximate the posterior \( p(\theta | D) \) with a simpler distribution \( q(\theta) \) by minimizing the Kullback-Leibler (KL) divergence:
\[ KL(q(\theta) || p(\theta | D)) \]
Key Steps in Approximate Inference:
- Construct an approximate posterior \( q(\theta) \).
- Fit the variational distribution using methods like mean-field Gaussian approximations.
- Use Monte Carlo sampling for approximate Bayesian inference.
4. Variational Inference (VI)
Variational inference transforms inference into an optimization problem, minimizing the evidence lower bound (ELBO):
\[ \text{ELBO} = \mathbb{E}_{q(\theta)}[\log p(D | \theta)] - KL(q(\theta) || p(\theta)) \]
Rewriting the ELBO provides two terms:
- Data fitting term: Encourages the model to fit the data well.
- KL regularizer: Encourages the approximate posterior to remain close to the prior.
5. Applications and Case Studies
Case Study 1: Bayesian Optimization
This example illustrates how Bayesian optimization can be used to optimize expensive, black-box functions. The acquisition function balances exploration and exploitation.
Case Study 2: Detecting Adversarial Examples
BNNs help detect adversarial examples by leveraging uncertainty measures. Adversarial examples are treated as OOD data, and BNNs exhibit higher uncertainty in their predictions for such cases.
Conclusion
Bayesian Neural Networks provide a powerful framework for incorporating uncertainty into neural networks, leading to more robust and interpretable models. Approximate inference techniques like variational inference enable scalable learning with BNNs, despite the intractability of exact solutions.
Material
Presentation Slides (Thanks to Dr. Yingzhen Li): http://yingzhenli.net/home/pdf/ProbAI2022_vi_bnn_tutorial.pdf
References
- Dr. Yingzhen Li's talk about BNN
- Dr. Yingzhen Li's notes on BNN
- Julyan, A., et al. (2023) A Primer on Bayesian Neural Networks: Review and Debates
- Blundell, C., et al. (2015). Weight Uncertainty in Neural Networks. ICML.
- Gal, Y., & Ghahramani, Z. (2016). Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. ICML.
- Kendall, A., & Gal, Y. (2017). What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? NeurIPS.