您当前的位置: 首页 >  矩阵

RuiH.AI

暂无认证

  • 0浏览

    0关注

    274博文

    0收益

  • 0浏览

    0点赞

    0打赏

    0留言

私信
关注
热门博文

线性代数之 矩阵求导(2)标量函数求导基本法则与公式

RuiH.AI 发布时间:2021-11-09 12:11:08 ,浏览量:0

线性代数之 矩阵求导(2)基本法则与公式
  • 前言
  • 基本约定
  • 标量对向量求导
    • 基本法则
    • 公式
  • 标量对矩阵求导
    • 基本法则
    • 公式
  • 后记

前言

上篇矩阵求导(1)解决了求导时的布局问题,也是矩阵求导最基础的求导方法。现在进入矩阵求导的核心:基本求导法则与基本公式。

基本约定

本篇只涉及标量对向量、矩阵的求导,默认向量是列向量。

标量对向量求导 基本法则

常数求导: ∂ c 0 ∂ x = 0 n × 1 \frac {\partial c_0}{\partial x}=0^{n\times 1} ∂x∂c0​​=0n×1 常数求导很简单,在此不证明。

线性变换: ∂ ( c 1 f ( x ) + c 2 g ( x ) ) ∂ x = c 1 ∂ f ∂ x + c 2 ∂ g ∂ x \frac {\partial (c_1f(x)+c_2g(x))}{\partial x}=c_1\frac {\partial f}{\partial x}+c_2\frac {\partial g}{\partial x} ∂x∂(c1​f(x)+c2​g(x))​=c1​∂x∂f​+c2​∂x∂g​ 证明: ∂ ( c 1 f ( x ) + c 2 g ( x ) ) ∂ x = [ ∂ ( c 1 f ( x ) + c 2 g ( x ) ) ∂ x 1 ∂ ( c 1 f ( x ) + c 2 g ( x ) ) ∂ x 2 … ∂ ( c 1 f ( x ) + c 2 g ( x ) ) ∂ x n ] = [ c 1 ∂ ( f ( x ) ) ∂ x 1 c 1 ∂ ( f ( x ) ) ∂ x 2 … c 1 ∂ ( f ( x ) ) ∂ x n ] + [ c 2 ∂ ( g ( x ) ) ∂ x 1 c 2 ∂ ( g ( x ) ) ∂ x 2 … c 2 ∂ ( g ( x ) ) ∂ x n ] = c 1 ∂ f ∂ x + c 2 ∂ g ∂ x \frac {\partial (c_1f(x)+c_2g(x))}{\partial x}= \begin{bmatrix} \frac {\partial (c_1f(x)+c_2g(x))}{\partial x_1}\\ \frac {\partial (c_1f(x)+c_2g(x))}{\partial x_2}\\ \dots \\ \frac {\partial (c_1f(x)+c_2g(x))}{\partial x_n} \end{bmatrix} \\ \quad \\ =\begin{bmatrix} \frac {c_1\partial (f(x))}{\partial x_1}\\ \frac {c_1\partial (f(x))}{\partial x_2}\\ \dots \\ \frac {c_1\partial (f(x))}{\partial x_n} \end{bmatrix} + \begin{bmatrix} \frac {c_2\partial (g(x))}{\partial x_1}\\ \frac {c_2\partial (g(x))}{\partial x_2}\\ \dots \\ \frac {c_2\partial (g(x))}{\partial x_n} \end{bmatrix} \\ \quad \\ = c_1\frac {\partial f}{\partial x}+c_2\frac {\partial g}{\partial x} ∂x∂(c1​f(x)+c2​g(x))​=⎣⎢⎢⎢⎡​∂x1​∂(c1​f(x)+c2​g(x))​∂x2​∂(c1​f(x)+c2​g(x))​…∂xn​∂(c1​f(x)+c2​g(x))​​⎦⎥⎥⎥⎤​=⎣⎢⎢⎢⎡​∂x1​c1​∂(f(x))​∂x2​c1​∂(f(x))​…∂xn​c1​∂(f(x))​​⎦⎥⎥⎥⎤​+⎣⎢⎢⎢⎡​∂x1​c2​∂(g(x))​∂x2​c2​∂(g(x))​…∂xn​c2​∂(g(x))​​⎦⎥⎥⎥⎤​=c1​∂x∂f​+c2​∂x∂g​ 加减法就不细说了,和普通函数求导是一样的,也很好证。

乘积: ∂ ( f ( x ) g ( x ) ) ∂ x = ∂ f ( x ) ∂ x g ( x ) + f ( x ) ∂ g ( x ) ∂ x \frac {\partial (f(x)g(x))}{\partial x}= \frac {\partial f(x)}{\partial x}g(x)+f(x)\frac {\partial g(x)}{\partial x} ∂x∂(f(x)g(x))​=∂x∂f(x)​g(x)+f(x)∂x∂g(x)​ 证明: ∂ f ( x ) g ( x ) ∂ x = [ ∂ f g ∂ x 1 ∂ f g ∂ x 2 … ∂ f g ∂ x n ] = [ ∂ f ∂ x 1 g + f ∂ g ∂ x 1 ∂ f ∂ x 2 g + f ∂ g ∂ x 2 … ∂ f ∂ x n g + f ∂ g ∂ x n ] = [ ∂ f ∂ x 1 ∂ f ∂ x 2 … ∂ f ∂ x n ] g + f [ ∂ g ∂ x 1 ∂ g ∂ x 2 … ∂ g ∂ x n ] = ∂ f ( x ) ∂ x g ( x ) + f ( x ) ∂ g ( x ) ∂ x \frac {\partial f(x)g(x)}{\partial x} = \begin{bmatrix} \frac {\partial fg}{\partial x_1} \\ \frac {\partial fg}{\partial x_2} \\ \dots \\ \frac {\partial fg}{\partial x_n} \\ \end{bmatrix}\\ \quad \\ = \begin{bmatrix} \frac {\partial f}{\partial x_1}g+ f\frac {\partial g}{\partial x_1}\\ \frac {\partial f}{\partial x_2}g+ f\frac {\partial g}{\partial x_2}\\ \dots \\ \frac {\partial f}{\partial x_n}g+ f\frac {\partial g}{\partial x_n}\\ \end{bmatrix}\\ \quad \\ = \begin{bmatrix} \frac {\partial f}{\partial x_1} \\ \frac {\partial f}{\partial x_2} \\ \dots \\ \frac {\partial f}{\partial x_n} \\ \end{bmatrix}g + f\begin{bmatrix} \frac {\partial g}{\partial x_1} \\ \frac {\partial g}{\partial x_2} \\ \dots \\ \frac {\partial g}{\partial x_n} \\ \end{bmatrix}\\ \quad \\ = \frac {\partial f(x)}{\partial x}g(x)+f(x)\frac {\partial g(x)}{\partial x} ∂x∂f(x)g(x)​=⎣⎢⎢⎢⎡​∂x1​∂fg​∂x2​∂fg​…∂xn​∂fg​​⎦⎥⎥⎥⎤​=⎣⎢⎢⎢⎡​∂x1​∂f​g+f∂x1​∂g​∂x2​∂f​g+f∂x2​∂g​…∂xn​∂f​g+f∂xn​∂g​​⎦⎥⎥⎥⎤​=⎣⎢⎢⎢⎡​∂x1​∂f​∂x2​∂f​…∂xn​∂f​​⎦⎥⎥⎥⎤​g+f⎣⎢⎢⎢⎡​∂x1​∂g​∂x2​∂g​…∂xn​∂g​​⎦⎥⎥⎥⎤​=∂x∂f(x)​g(x)+f(x)∂x∂g(x)​

除法: ∂ f ( x ) g ( x ) ∂ x = ∂ f ( x ) ∂ x g ( x ) − f ( x ) ∂ g ( x ) ∂ x g ( x ) 2 \frac {\partial \frac {f(x)}{g(x)}}{\partial x} = \frac {\frac {\partial f(x)}{\partial x}g(x) - f(x)\frac {\partial g(x)}{\partial x}}{g(x)^2} ∂x∂g(x)f(x)​​=g(x)2∂x∂f(x)​g(x)−f(x)∂x∂g(x)​​ 这个证明和乘积的流程是一样的,只是 ∂ ( f g ) / ∂ x \partial (fg)/\partial x ∂(fg)/∂x和 ∂ ( f / g ) / ∂ x \partial (f/g)/\partial x ∂(f/g)/∂x 求导形式不一样而已,在此省略。

公式

公式1 ∂ a T x ∂ x = ∂ x T a ∂ x = a \frac {\partial a^Tx}{\partial x}=\frac {\partial x^Ta}{\partial x}=a ∂x∂aTx​=∂x∂xTa​=a 证明: ∂ a T x ∂ x = ∂ ( a 1 x 1 + a 2 x 2 + ⋯ + a n x n ) ∂ x = ∂ x T a ∂ x = [ a 1 a 2 … a n ] = a \frac {\partial a^Tx}{\partial x}= \frac {\partial (a_1x_1+a_2x_2+\dots+a_nx_n)}{\partial x} =\frac {\partial x^Ta}{\partial x} \\ \quad \\ =\begin{bmatrix} a_1\\ a_2\\ \dots\\ a_n \end{bmatrix} = a ∂x∂aTx​=∂x∂(a1​x1​+a2​x2​+⋯+an​xn​)​=∂x∂xTa​=⎣⎢⎢⎡​a1​a2​…an​​⎦⎥⎥⎤​=a

公式2 ∂ f ( x T x ) ∂ x = 2 x ∂ f ( x T x ) ∂ x T = 2 x T \frac {\partial f(x^Tx)}{\partial x}=2x \\ \quad \\ \frac {\partial f(x^Tx)}{\partial x^T}=2x^T \\ ∂x∂f(xTx)​=2x∂xT∂f(xTx)​=2xT

证明: ∂ f ( x T x ) ∂ x = ∂ ( x 1 2 + x 2 2 + ⋯ + x n 2 ) ∂ x = [ 2 x 1 2 x 2 … 2 x n ] = 2 x \frac {\partial f(x^Tx)}{\partial x}=\frac {\partial (x_1^2+x_2^2+\dots+x_n^2)}{\partial x} \\ = \begin{bmatrix} 2x_1 \\ 2x_2 \\ \dots \\ 2x_n \end{bmatrix} =2x ∂x∂f(xTx)​=∂x∂(x12​+x22​+⋯+xn2​)​=⎣⎢⎢⎡​2x1​2x2​…2xn​​⎦⎥⎥⎤​=2x

公式3 ∂ f ( x T A x ) ∂ x = A x + A T x \frac {\partial f(x^TAx)}{\partial x}=Ax+A^Tx ∂x∂f(xTAx)​=Ax+ATx

证明: ∂ f ( x T A x ) ∂ x = ∂ ( [ a 11 x 1 + a 21 x 2 + ⋯ + a n 1 x n a 12 x 1 + a 22 x 2 + ⋯ + a n 2 x n … a 1 n x 1 + a 2 n x 2 + ⋯ + a n n x n ] x ) / ∂ x = ∂ ( a 11 x 1 2 + a 21 x 2 x 1 + ⋯ + a n 1 x n x 1 + a 12 x 1 x 2 + a 22 x 2 2 + ⋯ + a n 2 x n x 2 + … a 1 n x 1 x n + a 2 n x 2 x n + ⋯ + a n n x n x n ) / ∂ x = [ a 11 x 1 + a 12 x 2 + ⋯ + a 1 n x n + a 11 x 1 + a 21 x 2 + ⋯ + a n 1 x n a 21 x 1 + a 22 x 2 + ⋯ + a 2 n x n + a 12 x 1 + a 22 x 2 + ⋯ + a n 2 x n … a n 1 x 1 + a n 2 x 2 + ⋯ + a n n x n + a 1 n x 1 + a 2 n x 2 + ⋯ + a n n x n ] = [ a 11 x 1 + a 12 x 2 + ⋯ + a 1 n x n a 21 x 1 + a 22 x 2 + ⋯ + a 2 n x n … a n 1 x 1 + a n 2 x 2 + ⋯ + a n n x n ] + [ a 11 x 1 + a 21 x 2 + ⋯ + a n 1 x n a 12 x 1 + a 22 x 2 + ⋯ + a n 2 x n … a 1 n x 2 + a 2 n x 2 + ⋯ + a n n x n ] = A x + A T x \frac {\partial f(x^TAx)}{\partial x}=\partial(\begin{bmatrix} a_{11}x_1 + a_{21}x_2+\dots+a_{n1}x_n \\ a_{12}x_1 + a_{22}x_2+\dots+a_{n2}x_n \\ \dots \\ a_{1n}x_1 + a_{2n}x_2+\dots+a_{nn}x_n \\ \end{bmatrix}x)/ \partial x \\ =\partial( a_{11}x_1^2 + a_{21}x_2x_1+\dots+a_{n1}x_nx_1 + \\ a_{12}x_1x_2 + a_{22}x_2^2+\dots+a_{n2}x_nx_2 + \\ \dots \\ a_{1n}x_1x_n + a_{2n}x_2x_n+\dots+a_{nn}x_nx_n )/ \partial x \\ \quad \\ =\begin{bmatrix} a_{11}x_1 + a_{12}x_2+\dots+a_{1n}x_n +a_{11}x_1 +a_{21}x_2+\dots+a_{n1}x_n \\ a_{21}x_1 + a_{22}x_2+\dots+a_{2n}x_n +a_{12}x_1+a_{22}x_2+\dots+a_{n2}x_n \\ \dots \\ a_{n1}x_1 + a_{n2}x_2+\dots+a_{nn}x_n +a_{1n}x_1+a_{2n}x_2+\dots+a_{nn}x_n \\ \end{bmatrix} \\ \quad \\ =\begin{bmatrix} a_{11}x_1 + a_{12}x_2+\dots+a_{1n}x_n \\ a_{21}x_1 + a_{22}x_2+\dots+a_{2n}x_n \\ \dots \\ a_{n1}x_1 + a_{n2}x_2+\dots+a_{nn}x_n \\ \end{bmatrix} + \begin{bmatrix} a_{11}x_1 +a_{21}x_2+\dots+a_{n1}x_n \\ a_{12}x_1+a_{22}x_2+\dots+a_{n2}x_n \\ \dots \\ a_{1n}x_2+a_{2n}x_2+\dots+a_{nn}x_n \\ \end{bmatrix} \\ \quad \\ = Ax+A^Tx ∂x∂f(xTAx)​=∂(⎣⎢⎢⎡​a11​x1​+a21​x2​+⋯+an1​xn​a12​x1​+a22​x2​+⋯+an2​xn​…a1n​x1​+a2n​x2​+⋯+ann​xn​​⎦⎥⎥⎤​x)/∂x=∂(a11​x12​+a21​x2​x1​+⋯+an1​xn​x1​+a12​x1​x2​+a22​x22​+⋯+an2​xn​x2​+…a1n​x1​xn​+a2n​x2​xn​+⋯+ann​xn​xn​)/∂x=⎣⎢⎢⎡​a11​x1​+a12​x2​+⋯+a1n​xn​+a11​x1​+a21​x2​+⋯+an1​xn​a21​x1​+a22​x2​+⋯+a2n​xn​+a12​x1​+a22​x2​+⋯+an2​xn​…an1​x1​+an2​x2​+⋯+ann​xn​+a1n​x1​+a2n​x2​+⋯+ann​xn​​⎦⎥⎥⎤​=⎣⎢⎢⎡​a11​x1​+a12​x2​+⋯+a1n​xn​a21​x1​+a22​x2​+⋯+a2n​xn​…an1​x1​+an2​x2​+⋯+ann​xn​​⎦⎥⎥⎤​+⎣⎢⎢⎡​a11​x1​+a21​x2​+⋯+an1​xn​a12​x1​+a22​x2​+⋯+an2​xn​…a1n​x2​+a2n​x2​+⋯+ann​xn​​⎦⎥⎥⎤​=Ax+ATx

公式4: ∂ ( a T x x T b ) ∂ x = a b T x + b a T x \frac {\partial (a^Txx^Tb)}{\partial x}=ab^Tx+ba^Tx ∂x∂(aTxxTb)​=abTx+baTx 证明: a T x = x T a , x T b = b T x ∂ ( a T x x T b ) ∂ x = ∂ ( x T a b T x ) ∂ x = a b T x + ( a b T ) T x = a b T x + b a T x a^Tx=x^Ta,x^Tb=b^Tx \\ \quad \\ \frac {\partial (a^Txx^Tb)}{\partial x}=\frac {\partial (x^Tab^Tx)}{\partial x}\\ \quad \\ =ab^Tx+(ab^T)^Tx=ab^Tx+ba^Tx aTx=xTa,xTb=bTx∂x∂(aTxxTb)​=∂x∂(xTabTx)​=abTx+(abT)Tx=abTx+baTx

标量对矩阵求导 基本法则

常数求导: ∂ c 0 ∂ X = 0 m × n \frac {\partial c_0}{\partial X}=0^{m\times n} ∂X∂c0​​=0m×n 常数求导很简单,在此不证明。

线性变换: ∂ ( c 1 f ( X ) + c 2 g ( X ) ) ∂ X = c 1 ∂ f ( X ) ∂ X + c 2 ∂ g ( X ) ∂ X \frac {\partial (c_1f(X)+c_2g(X))}{\partial X}=c_1\frac {\partial f(X)}{\partial X}+c_2\frac {\partial g(X)}{\partial X} ∂X∂(c1​f(X)+c2​g(X))​=c1​∂X∂f(X)​+c2​∂X∂g(X)​ 证明方法与标量的线性变换对向量求导相同。

乘积: ∂ ( f ( X ) g ( X ) ) ∂ X = ∂ f ( X ) ∂ X g ( X ) + f ( X ) ∂ g ( X ) ∂ X \frac {\partial (f(X)g(X))}{\partial X}= \frac {\partial f(X)}{\partial X}g(X)+f(X)\frac {\partial g(X)}{\partial X} ∂X∂(f(X)g(X))​=∂X∂f(X)​g(X)+f(X)∂X∂g(X)​ 证明方法与标量的乘积对向量求导相同。

除法: ∂ f ( X ) g ( X ) ∂ X = ∂ f ( X ) ∂ X g ( X ) − f ( X ) ∂ g ( X ) ∂ X g ( X ) 2 \frac {\partial \frac {f(X)}{g(X)}}{\partial X} = \frac {\frac {\partial f(X)}{\partial X}g(X) - f(X)\frac {\partial g(X)}{\partial X}}{g(X)^2} ∂X∂g(X)f(X)​​=g(X)2∂X∂f(X)​g(X)−f(X)∂X∂g(X)​​ 证明方法与标量除法对向量求导相同。

公式

公式1: ∂ a T X b ∂ X = a b T \frac {\partial a^TXb}{\partial X}=ab^T ∂X∂aTXb​=abT 证明: a T X b = a 1 b 1 x 11 + a 2 b 1 x 21 + ⋯ + a n b 1 x n 1 + a 1 b 2 x 12 + a 2 b 2 x 22 + ⋯ + a n b 2 x n 2 + … + a 1 b n x 1 n + a 2 b n x 2 n + ⋯ + a n b n x n n ∂ a T X b ∂ X = [ a 1 b 1 a 1 b 2 … a 1 b n a 2 b 1 a 2 b 2 … a 2 b n … … … … a n b 1 a n b 2 … a n b n ] = a b T a^TXb=a_1b_1x_{11}+a_2b_1x_{21}+\dots+a_nb_1x_{n1} \\ +a_1b_2x_{12}+a_2b_2x_{22}+\dots+a_nb_2x_{n2}\\ +\dots \\+a_1b_nx_{1n}+a_2b_nx_{2n}+\dots+a_nb_nx_{nn} \\ \quad \\ \frac {\partial a^TXb}{\partial X}=\begin{bmatrix} a_1b_1 & a_1b_2 & \dots & a_1b_n \\ a_2b_1 & a_2b_2 & \dots & a_2b_n \\ \dots & \dots & \dots & \dots \\ a_nb_1 & a_nb_2 & \dots & a_nb_n \end{bmatrix} =ab^T aTXb=a1​b1​x11​+a2​b1​x21​+⋯+an​b1​xn1​+a1​b2​x12​+a2​b2​x22​+⋯+an​b2​xn2​+…+a1​bn​x1n​+a2​bn​x2n​+⋯+an​bn​xnn​∂X∂aTXb​=⎣⎢⎢⎡​a1​b1​a2​b1​…an​b1​​a1​b2​a2​b2​…an​b2​​…………​a1​bn​a2​bn​…an​bn​​⎦⎥⎥⎤​=abT

公式2: ∂ a T X T b ∂ X = b a T \frac {\partial a^TX^Tb}{\partial X}=ba^T ∂X∂aTXTb​=baT 证明: a T X T b = a 1 b 1 x 11 + a 2 b 1 x 12 + ⋯ + a n b 1 x 1 n + a 1 b 2 x 21 + a 2 b 2 x 22 + ⋯ + a n b 2 x 2 n + … + a 1 b n x n 1 + a 2 b n x n 2 + ⋯ + a n b n x n n ∂ a T X T b ∂ X = [ a 1 b 1 a 2 b 1 … a n b 1 a 1 b 2 a 2 b 2 … a n b 2 … … … … a 1 b n a 2 b n … a n b n ] = b a T a^TX^Tb=a_1b_1x_{11}+a_2b_1x_{12}+\dots+a_nb_1x_{1n} \\ +a_1b_2x_{21}+a_2b_2x_{22}+\dots+a_nb_2x_{2n}\\ +\dots \\+a_1b_nx_{n1}+a_2b_nx_{n2}+\dots+a_nb_nx_{nn} \\ \quad \\ \frac {\partial a^TX^Tb}{\partial X}=\begin{bmatrix} a_1b_1 & a_2b_1 & \dots & a_nb_1 \\ a_1b_2 & a_2b_2 & \dots & a_nb_2 \\ \dots & \dots & \dots & \dots \\ a_1b_n & a_2b_n & \dots & a_nb_n \end{bmatrix} =ba^T aTXTb=a1​b1​x11​+a2​b1​x12​+⋯+an​b1​x1n​+a1​b2​x21​+a2​b2​x22​+⋯+an​b2​x2n​+…+a1​bn​xn1​+a2​bn​xn2​+⋯+an​bn​xnn​∂X∂aTXTb​=⎣⎢⎢⎡​a1​b1​a1​b2​…a1​bn​​a2​b1​a2​b2​…a2​bn​​…………​an​b1​an​b2​…an​bn​​⎦⎥⎥⎤​=baT

公式3: ∂ a T X X T b ∂ X = a b T X + b a T X \frac {\partial a^TXX^Tb}{\partial X}=ab^TX+ba^TX ∂X∂aTXXTb​=abTX+baTX 这个证明与之前的标量对向量求导公式3过程类似,但是展开 a T X X T b a^TXX^Tb aTXXTb非常麻烦,在此省略。

后记

本篇写起来太蛮烦了,证明部分的katex写起来简直折磨。下一篇将记录矩阵的迹的性质。

关注
打赏
1658651101
查看更多评论
立即登录/注册

微信扫码登录

0.1995s