Week3
Week 3 - Neural Tangent Kernel

Week 3

Topic: Neural Tangent Kernel

Keynote Speaker: Xunjian Li

Time: Aug 5, 20:00 - 21:30

Venue: Room 338, School of Business

Tencent Meeting: #907-2153-6929

Compendium

The Neural Tangent Kernel (NTK) is a crucial concept in understanding the behavior of neural networks, particularly in the infinite width limit. It reveals that as the width of a neural network increases, its dynamics can be approximated as linear, allowing for closed-form solutions to training dynamics and predictions. The NTK serves as a non-linear transformation of input data, linking neural networks to kernel methods. Key topics include the gradient flow, initialization strategies, and both empirical and analytical formulations of the NTK for shallow and deep networks. This framework provides insights into the trainability and convergence properties of neural networks, bridging theoretical and practical aspects of machine learning.

Material

Presentation Slides: https://wma17.github.io/24summer/docs/pdfs/Week3_NTK.pdf

References

  1. Cho, Y., & Saul, L. (2009). Kernel methods for deep learning. Advances in neural information processing systems, 22.
  2. Golikov, E., Pokonechnyy, E., & Korviakov, V. (2022). Neural tangent kernel: A survey. arXiv preprint arXiv:2208.13614.
  3. He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision (pp. 1026-1034).
  4. Lee, J., Xiao, L., Schoenholz, S., Bahri, Y., Novak, R., Sohl-Dickstein, J., & Pennington, J. (2019). Wide neural networks of any depth evolve as linear models under gradient descent. Advances in neural information processing systems, 32.
  5. Liu, C., Zhu, L., & Belkin, M. (2020). On the linearity of large non-linear models: when and why the tangent kernel is constant. Advances in Neural Information Processing Systems, 33, 15954–15964.
  6. Neal, R. M., & Neal, R. M. (1996). Priors for infinite networks. Bayesian learning for neural networks, 29-53