- 前言
- 基本约定
- 标量对向量求导
- 基本法则
- 公式
- 标量对矩阵求导
- 基本法则
- 公式
- 后记
上篇矩阵求导(1)解决了求导时的布局问题,也是矩阵求导最基础的求导方法。现在进入矩阵求导的核心:基本求导法则与基本公式。
基本约定本篇只涉及标量对向量、矩阵的求导,默认向量是列向量。
标量对向量求导 基本法则常数求导: ∂ c 0 ∂ x = 0 n × 1 \frac {\partial c_0}{\partial x}=0^{n\times 1} ∂x∂c0=0n×1 常数求导很简单,在此不证明。
线性变换: ∂ ( c 1 f ( x ) + c 2 g ( x ) ) ∂ x = c 1 ∂ f ∂ x + c 2 ∂ g ∂ x \frac {\partial (c_1f(x)+c_2g(x))}{\partial x}=c_1\frac {\partial f}{\partial x}+c_2\frac {\partial g}{\partial x} ∂x∂(c1f(x)+c2g(x))=c1∂x∂f+c2∂x∂g 证明: ∂ ( c 1 f ( x ) + c 2 g ( x ) ) ∂ x = [ ∂ ( c 1 f ( x ) + c 2 g ( x ) ) ∂ x 1 ∂ ( c 1 f ( x ) + c 2 g ( x ) ) ∂ x 2 … ∂ ( c 1 f ( x ) + c 2 g ( x ) ) ∂ x n ] = [ c 1 ∂ ( f ( x ) ) ∂ x 1 c 1 ∂ ( f ( x ) ) ∂ x 2 … c 1 ∂ ( f ( x ) ) ∂ x n ] + [ c 2 ∂ ( g ( x ) ) ∂ x 1 c 2 ∂ ( g ( x ) ) ∂ x 2 … c 2 ∂ ( g ( x ) ) ∂ x n ] = c 1 ∂ f ∂ x + c 2 ∂ g ∂ x \frac {\partial (c_1f(x)+c_2g(x))}{\partial x}= \begin{bmatrix} \frac {\partial (c_1f(x)+c_2g(x))}{\partial x_1}\\ \frac {\partial (c_1f(x)+c_2g(x))}{\partial x_2}\\ \dots \\ \frac {\partial (c_1f(x)+c_2g(x))}{\partial x_n} \end{bmatrix} \\ \quad \\ =\begin{bmatrix} \frac {c_1\partial (f(x))}{\partial x_1}\\ \frac {c_1\partial (f(x))}{\partial x_2}\\ \dots \\ \frac {c_1\partial (f(x))}{\partial x_n} \end{bmatrix} + \begin{bmatrix} \frac {c_2\partial (g(x))}{\partial x_1}\\ \frac {c_2\partial (g(x))}{\partial x_2}\\ \dots \\ \frac {c_2\partial (g(x))}{\partial x_n} \end{bmatrix} \\ \quad \\ = c_1\frac {\partial f}{\partial x}+c_2\frac {\partial g}{\partial x} ∂x∂(c1f(x)+c2g(x))=⎣⎢⎢⎢⎡∂x1∂(c1f(x)+c2g(x))∂x2∂(c1f(x)+c2g(x))…∂xn∂(c1f(x)+c2g(x))⎦⎥⎥⎥⎤=⎣⎢⎢⎢⎡∂x1c1∂(f(x))∂x2c1∂(f(x))…∂xnc1∂(f(x))⎦⎥⎥⎥⎤+⎣⎢⎢⎢⎡∂x1c2∂(g(x))∂x2c2∂(g(x))…∂xnc2∂(g(x))⎦⎥⎥⎥⎤=c1∂x∂f+c2∂x∂g 加减法就不细说了,和普通函数求导是一样的,也很好证。
乘积: ∂ ( f ( x ) g ( x ) ) ∂ x = ∂ f ( x ) ∂ x g ( x ) + f ( x ) ∂ g ( x ) ∂ x \frac {\partial (f(x)g(x))}{\partial x}= \frac {\partial f(x)}{\partial x}g(x)+f(x)\frac {\partial g(x)}{\partial x} ∂x∂(f(x)g(x))=∂x∂f(x)g(x)+f(x)∂x∂g(x) 证明: ∂ f ( x ) g ( x ) ∂ x = [ ∂ f g ∂ x 1 ∂ f g ∂ x 2 … ∂ f g ∂ x n ] = [ ∂ f ∂ x 1 g + f ∂ g ∂ x 1 ∂ f ∂ x 2 g + f ∂ g ∂ x 2 … ∂ f ∂ x n g + f ∂ g ∂ x n ] = [ ∂ f ∂ x 1 ∂ f ∂ x 2 … ∂ f ∂ x n ] g + f [ ∂ g ∂ x 1 ∂ g ∂ x 2 … ∂ g ∂ x n ] = ∂ f ( x ) ∂ x g ( x ) + f ( x ) ∂ g ( x ) ∂ x \frac {\partial f(x)g(x)}{\partial x} = \begin{bmatrix} \frac {\partial fg}{\partial x_1} \\ \frac {\partial fg}{\partial x_2} \\ \dots \\ \frac {\partial fg}{\partial x_n} \\ \end{bmatrix}\\ \quad \\ = \begin{bmatrix} \frac {\partial f}{\partial x_1}g+ f\frac {\partial g}{\partial x_1}\\ \frac {\partial f}{\partial x_2}g+ f\frac {\partial g}{\partial x_2}\\ \dots \\ \frac {\partial f}{\partial x_n}g+ f\frac {\partial g}{\partial x_n}\\ \end{bmatrix}\\ \quad \\ = \begin{bmatrix} \frac {\partial f}{\partial x_1} \\ \frac {\partial f}{\partial x_2} \\ \dots \\ \frac {\partial f}{\partial x_n} \\ \end{bmatrix}g + f\begin{bmatrix} \frac {\partial g}{\partial x_1} \\ \frac {\partial g}{\partial x_2} \\ \dots \\ \frac {\partial g}{\partial x_n} \\ \end{bmatrix}\\ \quad \\ = \frac {\partial f(x)}{\partial x}g(x)+f(x)\frac {\partial g(x)}{\partial x} ∂x∂f(x)g(x)=⎣⎢⎢⎢⎡∂x1∂fg∂x2∂fg…∂xn∂fg⎦⎥⎥⎥⎤=⎣⎢⎢⎢⎡∂x1∂fg+f∂x1∂g∂x2∂fg+f∂x2∂g…∂xn∂fg+f∂xn∂g⎦⎥⎥⎥⎤=⎣⎢⎢⎢⎡∂x1∂f∂x2∂f…∂xn∂f⎦⎥⎥⎥⎤g+f⎣⎢⎢⎢⎡∂x1∂g∂x2∂g…∂xn∂g⎦⎥⎥⎥⎤=∂x∂f(x)g(x)+f(x)∂x∂g(x)
除法: ∂ f ( x ) g ( x ) ∂ x = ∂ f ( x ) ∂ x g ( x ) − f ( x ) ∂ g ( x ) ∂ x g ( x ) 2 \frac {\partial \frac {f(x)}{g(x)}}{\partial x} = \frac {\frac {\partial f(x)}{\partial x}g(x) - f(x)\frac {\partial g(x)}{\partial x}}{g(x)^2} ∂x∂g(x)f(x)=g(x)2∂x∂f(x)g(x)−f(x)∂x∂g(x) 这个证明和乘积的流程是一样的,只是 ∂ ( f g ) / ∂ x \partial (fg)/\partial x ∂(fg)/∂x和 ∂ ( f / g ) / ∂ x \partial (f/g)/\partial x ∂(f/g)/∂x 求导形式不一样而已,在此省略。
公式公式1 ∂ a T x ∂ x = ∂ x T a ∂ x = a \frac {\partial a^Tx}{\partial x}=\frac {\partial x^Ta}{\partial x}=a ∂x∂aTx=∂x∂xTa=a 证明: ∂ a T x ∂ x = ∂ ( a 1 x 1 + a 2 x 2 + ⋯ + a n x n ) ∂ x = ∂ x T a ∂ x = [ a 1 a 2 … a n ] = a \frac {\partial a^Tx}{\partial x}= \frac {\partial (a_1x_1+a_2x_2+\dots+a_nx_n)}{\partial x} =\frac {\partial x^Ta}{\partial x} \\ \quad \\ =\begin{bmatrix} a_1\\ a_2\\ \dots\\ a_n \end{bmatrix} = a ∂x∂aTx=∂x∂(a1x1+a2x2+⋯+anxn)=∂x∂xTa=⎣⎢⎢⎡a1a2…an⎦⎥⎥⎤=a
公式2 ∂ f ( x T x ) ∂ x = 2 x ∂ f ( x T x ) ∂ x T = 2 x T \frac {\partial f(x^Tx)}{\partial x}=2x \\ \quad \\ \frac {\partial f(x^Tx)}{\partial x^T}=2x^T \\ ∂x∂f(xTx)=2x∂xT∂f(xTx)=2xT
证明: ∂ f ( x T x ) ∂ x = ∂ ( x 1 2 + x 2 2 + ⋯ + x n 2 ) ∂ x = [ 2 x 1 2 x 2 … 2 x n ] = 2 x \frac {\partial f(x^Tx)}{\partial x}=\frac {\partial (x_1^2+x_2^2+\dots+x_n^2)}{\partial x} \\ = \begin{bmatrix} 2x_1 \\ 2x_2 \\ \dots \\ 2x_n \end{bmatrix} =2x ∂x∂f(xTx)=∂x∂(x12+x22+⋯+xn2)=⎣⎢⎢⎡2x12x2…2xn⎦⎥⎥⎤=2x
公式3 ∂ f ( x T A x ) ∂ x = A x + A T x \frac {\partial f(x^TAx)}{\partial x}=Ax+A^Tx ∂x∂f(xTAx)=Ax+ATx
证明: ∂ f ( x T A x ) ∂ x = ∂ ( [ a 11 x 1 + a 21 x 2 + ⋯ + a n 1 x n a 12 x 1 + a 22 x 2 + ⋯ + a n 2 x n … a 1 n x 1 + a 2 n x 2 + ⋯ + a n n x n ] x ) / ∂ x = ∂ ( a 11 x 1 2 + a 21 x 2 x 1 + ⋯ + a n 1 x n x 1 + a 12 x 1 x 2 + a 22 x 2 2 + ⋯ + a n 2 x n x 2 + … a 1 n x 1 x n + a 2 n x 2 x n + ⋯ + a n n x n x n ) / ∂ x = [ a 11 x 1 + a 12 x 2 + ⋯ + a 1 n x n + a 11 x 1 + a 21 x 2 + ⋯ + a n 1 x n a 21 x 1 + a 22 x 2 + ⋯ + a 2 n x n + a 12 x 1 + a 22 x 2 + ⋯ + a n 2 x n … a n 1 x 1 + a n 2 x 2 + ⋯ + a n n x n + a 1 n x 1 + a 2 n x 2 + ⋯ + a n n x n ] = [ a 11 x 1 + a 12 x 2 + ⋯ + a 1 n x n a 21 x 1 + a 22 x 2 + ⋯ + a 2 n x n … a n 1 x 1 + a n 2 x 2 + ⋯ + a n n x n ] + [ a 11 x 1 + a 21 x 2 + ⋯ + a n 1 x n a 12 x 1 + a 22 x 2 + ⋯ + a n 2 x n … a 1 n x 2 + a 2 n x 2 + ⋯ + a n n x n ] = A x + A T x \frac {\partial f(x^TAx)}{\partial x}=\partial(\begin{bmatrix} a_{11}x_1 + a_{21}x_2+\dots+a_{n1}x_n \\ a_{12}x_1 + a_{22}x_2+\dots+a_{n2}x_n \\ \dots \\ a_{1n}x_1 + a_{2n}x_2+\dots+a_{nn}x_n \\ \end{bmatrix}x)/ \partial x \\ =\partial( a_{11}x_1^2 + a_{21}x_2x_1+\dots+a_{n1}x_nx_1 + \\ a_{12}x_1x_2 + a_{22}x_2^2+\dots+a_{n2}x_nx_2 + \\ \dots \\ a_{1n}x_1x_n + a_{2n}x_2x_n+\dots+a_{nn}x_nx_n )/ \partial x \\ \quad \\ =\begin{bmatrix} a_{11}x_1 + a_{12}x_2+\dots+a_{1n}x_n +a_{11}x_1 +a_{21}x_2+\dots+a_{n1}x_n \\ a_{21}x_1 + a_{22}x_2+\dots+a_{2n}x_n +a_{12}x_1+a_{22}x_2+\dots+a_{n2}x_n \\ \dots \\ a_{n1}x_1 + a_{n2}x_2+\dots+a_{nn}x_n +a_{1n}x_1+a_{2n}x_2+\dots+a_{nn}x_n \\ \end{bmatrix} \\ \quad \\ =\begin{bmatrix} a_{11}x_1 + a_{12}x_2+\dots+a_{1n}x_n \\ a_{21}x_1 + a_{22}x_2+\dots+a_{2n}x_n \\ \dots \\ a_{n1}x_1 + a_{n2}x_2+\dots+a_{nn}x_n \\ \end{bmatrix} + \begin{bmatrix} a_{11}x_1 +a_{21}x_2+\dots+a_{n1}x_n \\ a_{12}x_1+a_{22}x_2+\dots+a_{n2}x_n \\ \dots \\ a_{1n}x_2+a_{2n}x_2+\dots+a_{nn}x_n \\ \end{bmatrix} \\ \quad \\ = Ax+A^Tx ∂x∂f(xTAx)=∂(⎣⎢⎢⎡a11x1+a21x2+⋯+an1xna12x1+a22x2+⋯+an2xn…a1nx1+a2nx2+⋯+annxn⎦⎥⎥⎤x)/∂x=∂(a11x12+a21x2x1+⋯+an1xnx1+a12x1x2+a22x22+⋯+an2xnx2+…a1nx1xn+a2nx2xn+⋯+annxnxn)/∂x=⎣⎢⎢⎡a11x1+a12x2+⋯+a1nxn+a11x1+a21x2+⋯+an1xna21x1+a22x2+⋯+a2nxn+a12x1+a22x2+⋯+an2xn…an1x1+an2x2+⋯+annxn+a1nx1+a2nx2+⋯+annxn⎦⎥⎥⎤=⎣⎢⎢⎡a11x1+a12x2+⋯+a1nxna21x1+a22x2+⋯+a2nxn…an1x1+an2x2+⋯+annxn⎦⎥⎥⎤+⎣⎢⎢⎡a11x1+a21x2+⋯+an1xna12x1+a22x2+⋯+an2xn…a1nx2+a2nx2+⋯+annxn⎦⎥⎥⎤=Ax+ATx
公式4: ∂ ( a T x x T b ) ∂ x = a b T x + b a T x \frac {\partial (a^Txx^Tb)}{\partial x}=ab^Tx+ba^Tx ∂x∂(aTxxTb)=abTx+baTx 证明: a T x = x T a , x T b = b T x ∂ ( a T x x T b ) ∂ x = ∂ ( x T a b T x ) ∂ x = a b T x + ( a b T ) T x = a b T x + b a T x a^Tx=x^Ta,x^Tb=b^Tx \\ \quad \\ \frac {\partial (a^Txx^Tb)}{\partial x}=\frac {\partial (x^Tab^Tx)}{\partial x}\\ \quad \\ =ab^Tx+(ab^T)^Tx=ab^Tx+ba^Tx aTx=xTa,xTb=bTx∂x∂(aTxxTb)=∂x∂(xTabTx)=abTx+(abT)Tx=abTx+baTx
标量对矩阵求导 基本法则常数求导: ∂ c 0 ∂ X = 0 m × n \frac {\partial c_0}{\partial X}=0^{m\times n} ∂X∂c0=0m×n 常数求导很简单,在此不证明。
线性变换: ∂ ( c 1 f ( X ) + c 2 g ( X ) ) ∂ X = c 1 ∂ f ( X ) ∂ X + c 2 ∂ g ( X ) ∂ X \frac {\partial (c_1f(X)+c_2g(X))}{\partial X}=c_1\frac {\partial f(X)}{\partial X}+c_2\frac {\partial g(X)}{\partial X} ∂X∂(c1f(X)+c2g(X))=c1∂X∂f(X)+c2∂X∂g(X) 证明方法与标量的线性变换对向量求导相同。
乘积: ∂ ( f ( X ) g ( X ) ) ∂ X = ∂ f ( X ) ∂ X g ( X ) + f ( X ) ∂ g ( X ) ∂ X \frac {\partial (f(X)g(X))}{\partial X}= \frac {\partial f(X)}{\partial X}g(X)+f(X)\frac {\partial g(X)}{\partial X} ∂X∂(f(X)g(X))=∂X∂f(X)g(X)+f(X)∂X∂g(X) 证明方法与标量的乘积对向量求导相同。
除法: ∂ f ( X ) g ( X ) ∂ X = ∂ f ( X ) ∂ X g ( X ) − f ( X ) ∂ g ( X ) ∂ X g ( X ) 2 \frac {\partial \frac {f(X)}{g(X)}}{\partial X} = \frac {\frac {\partial f(X)}{\partial X}g(X) - f(X)\frac {\partial g(X)}{\partial X}}{g(X)^2} ∂X∂g(X)f(X)=g(X)2∂X∂f(X)g(X)−f(X)∂X∂g(X) 证明方法与标量除法对向量求导相同。
公式公式1: ∂ a T X b ∂ X = a b T \frac {\partial a^TXb}{\partial X}=ab^T ∂X∂aTXb=abT 证明: a T X b = a 1 b 1 x 11 + a 2 b 1 x 21 + ⋯ + a n b 1 x n 1 + a 1 b 2 x 12 + a 2 b 2 x 22 + ⋯ + a n b 2 x n 2 + … + a 1 b n x 1 n + a 2 b n x 2 n + ⋯ + a n b n x n n ∂ a T X b ∂ X = [ a 1 b 1 a 1 b 2 … a 1 b n a 2 b 1 a 2 b 2 … a 2 b n … … … … a n b 1 a n b 2 … a n b n ] = a b T a^TXb=a_1b_1x_{11}+a_2b_1x_{21}+\dots+a_nb_1x_{n1} \\ +a_1b_2x_{12}+a_2b_2x_{22}+\dots+a_nb_2x_{n2}\\ +\dots \\+a_1b_nx_{1n}+a_2b_nx_{2n}+\dots+a_nb_nx_{nn} \\ \quad \\ \frac {\partial a^TXb}{\partial X}=\begin{bmatrix} a_1b_1 & a_1b_2 & \dots & a_1b_n \\ a_2b_1 & a_2b_2 & \dots & a_2b_n \\ \dots & \dots & \dots & \dots \\ a_nb_1 & a_nb_2 & \dots & a_nb_n \end{bmatrix} =ab^T aTXb=a1b1x11+a2b1x21+⋯+anb1xn1+a1b2x12+a2b2x22+⋯+anb2xn2+…+a1bnx1n+a2bnx2n+⋯+anbnxnn∂X∂aTXb=⎣⎢⎢⎡a1b1a2b1…anb1a1b2a2b2…anb2…………a1bna2bn…anbn⎦⎥⎥⎤=abT
公式2: ∂ a T X T b ∂ X = b a T \frac {\partial a^TX^Tb}{\partial X}=ba^T ∂X∂aTXTb=baT 证明: a T X T b = a 1 b 1 x 11 + a 2 b 1 x 12 + ⋯ + a n b 1 x 1 n + a 1 b 2 x 21 + a 2 b 2 x 22 + ⋯ + a n b 2 x 2 n + … + a 1 b n x n 1 + a 2 b n x n 2 + ⋯ + a n b n x n n ∂ a T X T b ∂ X = [ a 1 b 1 a 2 b 1 … a n b 1 a 1 b 2 a 2 b 2 … a n b 2 … … … … a 1 b n a 2 b n … a n b n ] = b a T a^TX^Tb=a_1b_1x_{11}+a_2b_1x_{12}+\dots+a_nb_1x_{1n} \\ +a_1b_2x_{21}+a_2b_2x_{22}+\dots+a_nb_2x_{2n}\\ +\dots \\+a_1b_nx_{n1}+a_2b_nx_{n2}+\dots+a_nb_nx_{nn} \\ \quad \\ \frac {\partial a^TX^Tb}{\partial X}=\begin{bmatrix} a_1b_1 & a_2b_1 & \dots & a_nb_1 \\ a_1b_2 & a_2b_2 & \dots & a_nb_2 \\ \dots & \dots & \dots & \dots \\ a_1b_n & a_2b_n & \dots & a_nb_n \end{bmatrix} =ba^T aTXTb=a1b1x11+a2b1x12+⋯+anb1x1n+a1b2x21+a2b2x22+⋯+anb2x2n+…+a1bnxn1+a2bnxn2+⋯+anbnxnn∂X∂aTXTb=⎣⎢⎢⎡a1b1a1b2…a1bna2b1a2b2…a2bn…………anb1anb2…anbn⎦⎥⎥⎤=baT
公式3: ∂ a T X X T b ∂ X = a b T X + b a T X \frac {\partial a^TXX^Tb}{\partial X}=ab^TX+ba^TX ∂X∂aTXXTb=abTX+baTX 这个证明与之前的标量对向量求导公式3过程类似,但是展开 a T X X T b a^TXX^Tb aTXXTb非常麻烦,在此省略。
后记本篇写起来太蛮烦了,证明部分的katex写起来简直折磨。下一篇将记录矩阵的迹的性质。