Week 2

Topic: Gradient Flow

Keynote Speaker: Yicheng Wu

Time: Jul 29, 20:00 - 21:30 pm

Venue: Room 341, School of Business

Tencent Meeting: #907-2153-6929

Compendium

Introduction

$$\begin{aligned} &\text{Suppose }\phi\in\mathbb{R}^D\text{, and }L(\phi):\mathbb{R}^D\to\mathbb{R}\text{ is smooth. Gradient flow is a smooth curve}\\ &\phi(t):\mathbb{R}\to\mathbb{R}^D\text{ such that} \end{aligned}$$

$$\frac{d\phi}{dt}=-\frac{\partial L}{\partial\phi}$$

1. Problem Setup

$$\begin{array}{l} \text{Suppose now that } I \text{ pairs of independent sample points } \{\mathbf{x}_{i},y_{i}\}_{i=1}^{l} \text{ have been obtained, A}\\ \text{model } f[\mathbf{x},\phi] \text{ needs to be used to fit the observed data.} \end{array}$$

2. Gradient Descent Algorithms

Gradient Descent Variants
- Stochastic Gradient Descent
  - Mini-batch Stochastic Gradient Descent
Momentum Algorithm
- Standard Momentum Algorithm
- Nesterov Accelerated Gradient
Adaptive Subgradient Method
- Adagrad
- Adadelta
- RMSprop

3. Gradient Flow in Linear Regression

Gradient Descent Update Rule: $$\phi_{t+1}=\phi_t-\alpha\cdot\frac{\partial L}{\partial\phi},$$ where $\phi_t$ represents the parameters at time $t$ and $\alpha$ is termed the learning rate. $$\frac{\phi_{t+1}-\phi_t}{\alpha}=-\frac{\partial L}{\partial\phi},$$ when an infinitesimally small learning rate $\alpha$ is employed: $$\frac{d\phi}{dt}=-\frac{\partial L}{\partial\phi}.$$ This ordinary differential equation (ODE) is known as gradient flow.

Material

Presentation Slides (Thanks to Dr. Chang Liu): https://changliu00.github.io/static/Gradient-Flow.pdf

References