Week 1
Topic: Bayesian Optimization
Keynote Speaker: Wang Ma
Time: Jul 22, 20:00 - 21:30
Venue: Room 314, School of Business
Tencent Meeting: #907-2153-6929
Compendium
Introduction
Bayesian Optimization (BO) is a machine-learning-based method for optimizing expensive, noisy, and black-box functions without gradient information. It excels in high-dimensional spaces (typically dimension is less than 20) and is used in applications like hyperparameter optimization, algorithm tuning, and materials discovery.
1. Problem Setup
BO aims to solve:
\[\hat{x} = \max_{x \in A} f(x)\]
where \( f \) is continuous, derivative-free, expensive, and possibly noisy.
2. Bayesian Optimization Algorithm
- Assume a Bayesian prior on \( f \).
- Iteratively:
- Maximize the acquisition function to find the next point.
- Evaluate the function at this point.
- Update the posterior distribution on \( f \).
3. Gaussian Process Model
A Gaussian Process (GP) models \( f \) with a mean function \( m(x) \) and a covariance function \( k(x, x') \):
- Mean: \[m(x) = \mathbb{E}[f(x)]\]
- Covariance: \[k(x, x') = \mathbb{E}[(f(x) - m(x))(f(x') - m(x'))]\]
Common Covariance Functions
- Squared Exponential
- Matern
- Rational Quadratic
Prediction
With observations \([f(x_1), f(x_2), \ldots, f(x_t)]\), the prediction at a new point will be joint normal:
\[\mathbb{P}\left( \begin{bmatrix} f \\ f^* \end{bmatrix} \right) = \mathcal{N} \left( 0, \begin{bmatrix} K[X, X] & K[X, x^*] \\ K[x^*, X] & K[x^*, x^*] \end{bmatrix} \right)\]
Then with the property of Multivariate Normal Distribution, we will get the conditional distribution \(\mathbb{P} (f^* | f)\), whose mean and variance have computable closed forms.
4. Acquisition Functions
Acquisition functions balance exploration and exploitation to select the next evaluation point.
Common Acquisition Functions
- Upper Confidence Bound (UCB)
- Probability of Improvement (PI)
- Expected Improvement (EI)
Parallel Acquisition Functions
- Parallel Expected Improvement (PEI)
- Knowledge Gradient (KG)
Conclusion
Bayesian Optimization effectively optimizes expensive and noisy functions by leveraging Gaussian Processes and acquisition functions to balance exploration and exploitation. It has broad applications in various fields requiring efficient optimization techniques.
Material
Presentation Slides: https://wma17.github.io/24summer/docs/pdfs/Week1_BayesOPT.pdf
References
- UDL book further reading about Bayesian Optimization
- Prof. Peter Frazier's talk about BO
- Louis Tiao's introduction to Knowledge Gradient
- V. Nguyen, "Bayesian Optimization for Accelerating Hyper-Parameter Tuning," 2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), Sardinia, Italy, 2019, pp. 302-305, doi: 10.1109/AIKE.2019.00060.
- Frazier, P. I. (2018). A Tutorial on Bayesian Optimization. arXiv preprint arXiv:1807.02811
- Wang, J., Clark, S. C., Liu, E., & Frazier, P. I. (2016). Parallel Bayesian global optimization of expensive functions. arXiv preprint arXiv:1602.05149.