Gradients of Functions of Large Matrices

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

3 Downloads (Pure)

Abstract

Tuning scientific and probabilistic machine learning models – for example, partial differential equations, Gaussian processes, or Bayesian neural networks – often relies on evaluating functions of matrices whose size grows with the data set or the number of parameters. While the state-of-the-art for evaluating these quantities is almost always based on Lanczos and Arnoldi iterations, the present work is the first to explain how to differentiate these workhorses of numerical linear algebra efficiently. To get there, we derive previously unknown adjoint systems for Lanczos and Arnoldi iterations, implement them in JAX, and show that the resulting code can compete with Diffrax when it comes to differentiating PDEs, GPyTorch for selecting Gaussian process models and beats standard factorisation methods for calibrating Bayesian neural networks. All this is achieved without any problem-specific code optimisation. Find the code at https://github.com/pnkraemer/experiments-lanczos-adjoints and install the library with pip install matfree.
Original languageEnglish
Title of host publicationProceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS 2024)
Number of pages35
Publication date2024
Publication statusPublished - 2024
Event38th Conference on Neural Information Processing Systems - Vancouver, Canada
Duration: 10 Dec 202415 Dec 2024

Conference

Conference38th Conference on Neural Information Processing Systems
Country/TerritoryCanada
CityVancouver
Period10/12/202415/12/2024

Fingerprint

Dive into the research topics of 'Gradients of Functions of Large Matrices'. Together they form a unique fingerprint.

Cite this