What the transpose (adjoint) is used for

The transpose is used for computing (first order) derivatives with respect to the initial condition

Nonlinear function (“model”)

M

of the initial state

x_{0}

0

), giving the state

x_{t}

t

x_{t} = M (x_{0})

x_{0}

x_{t}

are vectors with

I

components, respectively

x_{i}^{0}

x_{i}^{t}

M

is then a vector function.

M (x_{0})

be the Jacobian matrix of the vector function

M

, containing its first derivatives with respect to the initial variables:

M (x_{0}) \equiv \frac{\partial M}{\partial x_{0}} (x_{0})

M

is a nonlinear function, its derivatives depend on the initial state,

M = M (x_{0})

. This dependence is not indicated hereafter for simplicity. As a matrix,

M

M = [\begin{matrix} \frac{\partial M_{1}}{\partial x_{1}^{0}} & \frac{\partial M_{1}}{\partial x_{2}^{0}} & \dots & \frac{\partial M_{1}}{\partial x_{I}^{0}} \\ \frac{\partial M_{2}}{\partial x_{1}^{0}} & \frac{\partial M_{2}}{\partial x_{2}^{0}} & \dots & \frac{\partial M_{2}}{\partial x_{I}^{0}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ \frac{\partial M_{I}}{\partial x_{1}^{0}} & \frac{\partial M_{I}}{\partial x_{2}^{0}} & \dots & \frac{\partial M_{I}}{\partial x_{I}^{0}} \end{matrix}]

The first order variation is then obtained as a row-by-column product:

δ x_{t} = M δ x_{0}

δ x_{i}^{t} = \sum_{i = 1}^{I} \frac{\partial M_{i}}{\partial x_{j}^{0}} δ x_{j}^{0}

The Jacobian matrix

M

is also called tangent linear operator: its is linearly applied to variations of the initial state (tangent vectors) and it depends on the intial state because

M

Now consider a scalar function

J

of the state at time

t

J = J (x_{t})

Its first variation is obtained multiplying the row vector obtained by transposing its gradient to the vector

δ x_{t}

, which can be expressed as above:

δ J = {(\frac{\partial J}{\partial x_{t}})}^{T} δ x_{t} = {(\frac{\partial J}{\partial x_{t}})}^{T} M δ x_{0}

J

M,

is a composed function of the initial state:

J (x_{t}) = J (M (x_{0}))

So its first variation can be expressed by means of its gradient with respect to the initial condition:

δ J = {(\frac{\partial J}{\partial x_{0}})}^{T} δ x_{0}

By equating the two expressions of

J

, one obtaines:

{(\frac{\partial J}{\partial x_{0}})}^{T} = {(\frac{\partial J}{\partial x_{t}})}^{T} M

By taking the transpose of this expression:

\frac{\partial J}{\partial x_{0}} = M^{T} \frac{\partial J}{\partial x_{t}}

So the transpose of the Jacobian matrix, the transpose operator, is linearly applied to the gradient with respect to final time variables, to give the gradient with respect to initial time variables.

The transpose operator is sometimes called “adjoint” operator, though they are not exactly the same because the adjoint operator depends on the definition of a scalar product. The gradients are then “adjoint vectors”: remark that if the state components have physical dimensions, then the tangent vectors have the same dimensions and the components of the adjoint vectors have physical dimension which are the inverse (apart from possible physical dimensions of

J

) of their corresponding tangent or state components.

Licenza Creative Commons

Francesco Uboldi 2014,2015,2016,2017