MIT Notes

output $\vec{x}$
set of $M$ equations parameterized by $P$ variables $\vec{p}$. These $\vec{p}$ could be:
- design parameters
- control variables
- decision parameters
compute $g(\vec{x}, \vec{p})$. $g$ might be loss funciton of a neural network comparing prediction $\vec{x}$ to data
Also want gradient $\frac{dg}{d\vec{p}}$. Adjoint methods give efficient way to evaluate $\frac{dg}{d\vec{p}}$, with cost independent of $P$ and comparable to the cost of solving for $\vec{x}$ once.
In neural network, adjoint methods are called backpropagation. In automatic differentiation they are reverse-mode differentiation.
So $g(\vec{x}, \vec{p})$
- measures sensitivity of answers to $\vec{p}$
- indicates a useful search direction to optimize $g$, picking the $\vec{p}$ that produce some result
shape or topology optimization: $\vec{p}$ controls the shape and placement of blobs, a.k.a. inverse design. $P$ can be hundreds, thousands, millions.
Take $A\vec{x}=\vec{b}$. $A$ and $\vec{b}$ are real and depend in some way on $\vec{p}$. Evaluate the gradient directly, where subscripts indicate partial derivatives:
$$ \frac{dg}{d\vec{p}}=g_{\vec{p}}+g_{\vec{x}}\vec{x}_{\vec{p}} $$

Hoopyeah Jesus Christ is King of the Universe