- 引言
- 场景介绍
- 样本均值与样本方差
- 样本均值与样本方差的矩阵表示
- 样本均值的矩阵表达
- 样本方差的矩阵表达
- 中心矩阵的性质
上一节介绍了维数灾难,随着维数增加对计算量与样本稀疏性造成的影响。本节将介绍降维的预备知识——对样本均值和样本方差进行矩阵表示。
场景介绍已知数据集合 X \mathcal X X由 N N N个样本构成: X = { x ( 1 ) , x ( 2 ) , ⋯ , x ( N ) } \mathcal X = \{x^{(1)},x^{(2)},\cdots,x^{(N)}\} X={x(1),x(2),⋯,x(N)} 任意样本 x ( i ) ( i = 1 , 2 , ⋯ , N ) x^{(i)}(i=1,2,\cdots,N) x(i)(i=1,2,⋯,N)均包含 p p p维特征: x ( i ) = ( x 1 ( i ) x 2 ( i ) ⋮ x p ( i ) ) x ( i ) ∈ R p , i = 1 , 2 , ⋯ , N x^{(i)} = \begin{pmatrix}x_1^{(i)} \\ x_2^{(i)} \\ \vdots \\ x_p^{(i)}\end{pmatrix} \quad x^{(i)} \in \mathbb R^p,i=1,2,\cdots,N x(i)=⎝ ⎛x1(i)x2(i)⋮xp(i)⎠ ⎞x(i)∈Rp,i=1,2,⋯,N 至此,数据集合 X \mathcal X X可表示为 N × p N \times p N×p的矩阵形式: X = ( x ( 1 ) , x ( 2 ) , ⋯ , x ( N ) ) T = ( x 1 ( 1 ) , x 2 ( 1 ) , ⋯ , x p ( 1 ) x 1 ( 2 ) , x 2 ( 2 ) , ⋯ , x p ( 2 ) ⋮ x 1 ( N ) , x 2 ( N ) , ⋯ , x p ( N ) ) N × p \mathcal X = \begin{pmatrix}x^{(1)},x^{(2)},\cdots,x^{(N)}\end{pmatrix}^{T} = \begin{pmatrix} x_1^{(1)},x_2^{(1)},\cdots,x_p^{(1)} \\ x_1^{(2)},x_2^{(2)},\cdots,x_p^{(2)} \\ \vdots \\ x_1^{(N)},x_2^{(N)},\cdots,x_p^{(N)}\end{pmatrix}_{N \times p} X=(x(1),x(2),⋯,x(N))T=⎝ ⎛x1(1),x2(1),⋯,xp(1)x1(2),x2(2),⋯,xp(2)⋮x1(N),x2(N),⋯,xp(N)⎠ ⎞N×p
样本均值与样本方差假设数据集合 X \mathcal X X是 一维数据集合:
- 其样本均值(Sample Mean) X ˉ \bar {\mathcal X} Xˉ表示具体如下: X ˉ = 1 N ∑ i = 1 N x ( i ) \bar {\mathcal X} = \frac{1}{N} \sum_{i=1}^{N} x^{(i)} Xˉ=N1i=1∑Nx(i)
- 样本方差(Sample Convariance) S \mathcal S S具体表示如下: S = 1 N ∑ i = 1 N ( x ( i ) − X ˉ ) 2 \mathcal S = \frac{1}{N} \sum_{i=1}^N (x^{(i)} - \bar{\mathcal X})^2 S=N1i=1∑N(x(i)−Xˉ)2
对应地,如果数据集合 X \mathcal X X中的样本 x ( i ) ( i = 1 , 2 , ⋯ , N ) x^{(i)}(i=1,2,\cdots,N) x(i)(i=1,2,⋯,N)是 p p p维向量。这样一个集合,它的 样本均值和样本方差如何表示呢?
-
样本均值: X ˉ = 1 N ∑ i = 1 N x ( i ) \bar {\mathcal X} = \frac{1}{N} \sum_{i=1}^{N} x^{(i)} Xˉ=N1i=1∑Nx(i)
-
样本方差: S = 1 N ∑ i = 1 N ( x ( i ) − X ˉ ) ( x ( i ) − X ˉ ) T \mathcal S = \frac{1}{N} \sum_{i=1}^N (x^{(i)} - \bar {\mathcal X})(x^{(i)} - \bar {\mathcal X})^{T} S=N1i=1∑N(x(i)−Xˉ)(x(i)−Xˉ)T
由于 x ( i ) x^{(i)} x(i)是一个 p p p维向量,因而样本均值 X ˉ \bar {\mathcal X} Xˉ同样是一个 p p p维向量形式: X ˉ = ( 1 N ∑ i = 1 N x 1 ( i ) 1 N ∑ i = 1 N x 2 ( i ) ⋮ 1 N ∑ i = 1 N x p ( i ) ) p × 1 \bar {\mathcal X} = \begin{pmatrix} \frac{1}{N} \sum_{i=1}^{N} x_1^{(i)} \\ \frac{1}{N} \sum_{i=1}^{N} x_2^{(i)} \\ \vdots \\ \frac{1}{N} \sum_{i=1}^{N} x_p^{(i)}\end{pmatrix}_{p \times 1} Xˉ=⎝ ⎛N1∑i=1Nx1(i)N1∑i=1Nx2(i)⋮N1∑i=1Nxp(i)⎠ ⎞p×1
对应的样本方差是一个 p × p p\times p p×p的方阵形式: S p × p = 1 N ∑ i = 1 N ( x ( i ) − X ˉ ) p × 1 ⋅ ( x ( i ) − X ˉ ) 1 × p T \mathcal S_{p \times p} = \frac{1}{N} \sum_{i=1}^N (x^{(i)} - \bar {\mathcal X})_{p \times 1} \cdot (x^{(i)} - \bar {\mathcal X})_{1 \times p}^{T} Sp×p=N1i=1∑N(x(i)−Xˉ)p×1⋅(x(i)−Xˉ)1×pT
样本均值与样本方差的矩阵表示 样本均值的矩阵表达将样本均值公式展开,写成如下向量乘法形式:
X
ˉ
=
1
N
∑
i
=
1
N
x
(
i
)
=
1
N
(
x
(
1
)
,
x
(
2
)
,
⋯
,
x
(
N
)
)
p
×
N
⋅
(
1
1
⋮
1
)
N
×
1
\bar {\mathcal X} = \frac{1}{N} \sum_{i=1}^N x^{(i)} = \frac{1}{N}\begin{pmatrix}x^{(1)},x^{(2)},\cdots,x^{(N)}\end{pmatrix}_{p \times N} \cdot \begin{pmatrix}1 \\ 1\\ \vdots \\1\end{pmatrix}_{N \times 1}
Xˉ=N1i=1∑Nx(i)=N1(x(1),x(2),⋯,x(N))p×N⋅⎝
⎛11⋮1⎠
⎞N×1 定义如下符号: 元素均为1的
N
N
N维列向量。
(
1
1
⋮
1
)
N
×
1
=
I
N
\begin{pmatrix}1 \\ 1\\ \vdots \\1\end{pmatrix}_{N \times 1} = \mathcal I_N
⎝
⎛11⋮1⎠
⎞N×1=IN 样本均值
X
ˉ
\bar {\mathcal X}
Xˉ可表示为如下形式:
X
ˉ
=
1
N
(
x
(
1
)
,
x
(
2
)
,
⋯
,
x
(
N
)
)
⋅
I
N
=
1
N
X
T
⋅
I
N
\begin{aligned} \bar{\mathcal X} & = \frac{1}{N} \begin{pmatrix}x^{(1)},x^{(2)},\cdots,x^{(N)}\end{pmatrix} \cdot \mathcal I_N \\ & = \frac{1}{N} \mathcal X^{T} \cdot \mathcal I_N \end{aligned}
Xˉ=N1(x(1),x(2),⋯,x(N))⋅IN=N1XT⋅IN
同样本均值,将样本方差公式进行展开:
S
=
1
N
∑
i
=
1
N
(
x
(
i
)
−
X
ˉ
)
(
x
(
i
)
−
X
ˉ
)
T
=
1
N
[
(
x
(
1
)
−
X
ˉ
)
(
x
(
1
)
−
X
ˉ
)
T
+
⋯
+
(
x
(
N
)
−
X
ˉ
)
(
x
(
N
)
−
X
ˉ
)
T
]
\begin{aligned} \mathcal S & = \frac{1}{N} \sum_{i=1}^N (x^{(i)} - \bar {\mathcal X})(x^{(i)} - \bar {\mathcal X})^{T} \\ & = \frac{1}{N} \left[ (x^{(1)} - \bar{\mathcal X})(x^{(1)} - \bar{\mathcal X})^{T} + \cdots + (x^{(N)} - \bar{\mathcal X})(x^{(N)} - \bar{\mathcal X})^{T}\right] \end{aligned}
S=N1i=1∑N(x(i)−Xˉ)(x(i)−Xˉ)T=N1[(x(1)−Xˉ)(x(1)−Xˉ)T+⋯+(x(N)−Xˉ)(x(N)−Xˉ)T] 将上述中括号中的元素表示为向量乘积的形式:
S
=
1
N
(
x
(
1
)
−
X
ˉ
,
⋯
,
x
(
N
)
−
X
ˉ
)
⋅
[
(
x
(
1
)
−
X
ˉ
)
T
(
x
(
2
)
−
X
ˉ
)
T
⋮
(
x
(
N
)
−
X
ˉ
)
T
]
\mathcal S = \frac{1}{N} \left(x^{(1)} - \bar {\mathcal X} , \cdots, x^{(N)} - \bar{\mathcal X}\right) \cdot \begin{bmatrix} \left(x^{(1)} - \bar {\mathcal X}\right)^{T} \\ \left(x^{(2)} - \bar {\mathcal X}\right)^{T} \\ \vdots \\ \left(x^{(N)} - \bar {\mathcal X}\right)^{T}\end{bmatrix}
S=N1(x(1)−Xˉ,⋯,x(N)−Xˉ)⋅⎣
⎡(x(1)−Xˉ)T(x(2)−Xˉ)T⋮(x(N)−Xˉ)T⎦
⎤ 其中,
(
x
(
1
)
−
X
ˉ
,
⋯
,
x
(
N
)
−
X
ˉ
)
\left(x^{(1)} - \bar {\mathcal X} , \cdots, x^{(N)} - \bar{\mathcal X}\right)
(x(1)−Xˉ,⋯,x(N)−Xˉ)可拆成两个行向量的差值:
(
x
(
1
)
−
X
ˉ
,
⋯
,
x
(
N
)
−
X
ˉ
)
=
(
x
(
1
)
,
⋯
,
x
(
N
)
)
−
(
X
ˉ
,
⋯
,
X
ˉ
)
=
X
T
−
X
ˉ
⋅
(
1
,
1
,
⋯
,
1
)
1
×
N
=
X
T
−
X
ˉ
⋅
I
N
T
\begin{aligned} \left(x^{(1)} - \bar {\mathcal X} , \cdots, x^{(N)} - \bar{\mathcal X}\right) & = \left(x^{(1)}, \cdots ,x^{(N)}\right) - \left(\bar {\mathcal X},\cdots, \bar {\mathcal X}\right) \\ & = \mathcal X^{T} - \bar {\mathcal X} \cdot (1,1, \cdots,1)_{1 \times N} \\ & = \mathcal X^{T} - \bar {\mathcal X} \cdot \mathcal I_N^{T} \end{aligned}
(x(1)−Xˉ,⋯,x(N)−Xˉ)=(x(1),⋯,x(N))−(Xˉ,⋯,Xˉ)=XT−Xˉ⋅(1,1,⋯,1)1×N=XT−Xˉ⋅INT 将样本均值
X
ˉ
\bar {\mathcal X}
Xˉ的矩阵表示结果代入上式,并将
X
T
\mathcal X^{T}
XT提出来: 注意该位置提取公因式的时候,
X
T
\mathcal X^{T}
XT对应的公因式结果是
E
N
\mathcal E_N
EN而不是单纯的1。其中
E
N
\mathcal E_N
EN是N维单位矩阵。即:
X
T
E
N
=
X
T
\mathcal X^{T}\mathcal E_N = \mathcal X^{T}
XTEN=XT
X T − 1 N X T ⋅ I N ⋅ I N T = X T ( E N − 1 N I N I N T ) \mathcal X^{T} - \frac{1}{N} \mathcal X^{T}\cdot \mathcal I_N \cdot \mathcal I_N^{T} = \mathcal X^{T} \left(\mathcal E_N - \frac{1}{N} \mathcal I_N\mathcal I_N^{T}\right) XT−N1XT⋅IN⋅INT=XT(EN−N1ININT)
由于 [ ( x ( 1 ) − X ˉ ) T ( x ( 2 ) − X ˉ ) T ⋮ ( x ( N ) − X ˉ ) T ] \begin{bmatrix} \left(x^{(1)} - \bar {\mathcal X}\right)^{T} \\ \left(x^{(2)} - \bar {\mathcal X}\right)^{T} \\ \vdots \\ \left(x^{(N)} - \bar {\mathcal X}\right)^{T}\end{bmatrix} ⎣ ⎡(x(1)−Xˉ)T(x(2)−Xˉ)T⋮(x(N)−Xˉ)T⎦ ⎤是 ( x ( 1 ) − X ˉ , ⋯ , x ( N ) − X ˉ ) \left(x^{(1)} - \bar {\mathcal X} , \cdots, x^{(N)} - \bar{\mathcal X}\right) (x(1)−Xˉ,⋯,x(N)−Xˉ)的转置形式,则有: [ ( x ( 1 ) − X ˉ ) T ( x ( 2 ) − X ˉ ) T ⋮ ( x ( N ) − X ˉ ) T ] = [ X T ( E N − 1 N I N I N T ) ] T = ( E N − 1 N I N I N T ) T ⋅ X \begin{bmatrix} \left(x^{(1)} - \bar {\mathcal X}\right)^{T} \\ \left(x^{(2)} - \bar {\mathcal X}\right)^{T} \\ \vdots \\ \left(x^{(N)} - \bar {\mathcal X}\right)^{T}\end{bmatrix} = \left[\mathcal X^{T} \left(\mathcal E_N - \frac{1}{N} \mathcal I_N\mathcal I_N^{T}\right)\right]^{T} = \left(\mathcal E_N - \frac{1}{N} \mathcal I_N\mathcal I_N^{T}\right)^{T} \cdot \mathcal X ⎣ ⎡(x(1)−Xˉ)T(x(2)−Xˉ)T⋮(x(N)−Xˉ)T⎦ ⎤=[XT(EN−N1ININT)]T=(EN−N1ININT)T⋅X
至此,样本方差 S \mathcal S S的矩阵结果表示如下: S = 1 N X T ( E N − 1 N I N I N T ) ⋅ ( E N − 1 N I N I N T ) T ⋅ X \mathcal S = \frac{1}{N} \mathcal X^{T} (\mathcal E_N - \frac{1}{N} \mathcal I_N \mathcal I_N^{T}) \cdot (\mathcal E_N - \frac{1}{N} \mathcal I_N \mathcal I_N^{T})^{T} \cdot \mathcal X S=N1XT(EN−N1ININT)⋅(EN−N1ININT)T⋅X
我们将 E N − 1 N I N I N T \mathcal E_N - \frac{1}{N} \mathcal I_N \mathcal I_N^{T} EN−N1ININT称作中心矩阵(Centering Matrix),用 H N \mathcal H_N HN表示。至此,样本方差 S \mathcal S S的矩阵表示可化简为: S = 1 N X T H ⋅ H T X \mathcal S = \frac{1}{N} \mathcal X^{T} \mathcal H \cdot \mathcal H^{T} \mathcal X S=N1XTH⋅HTX
中心矩阵的性质中心矩阵的物理意义相当于对样本空间中的样本点进行平移,平移至样本空间的原点位置。
示例: 已知
K
\mathcal K
K是由两个2维样本组成的样本集合,具体表示如下:
K
=
(
1
2
3
4
)
\mathcal K = \begin{pmatrix} 1 \quad 2 \\ 3 \quad 4 \end{pmatrix}
K=(1234) 此时,数据集合中仅包含两个样本:
A
:
(
1
,
3
)
\mathcal A:(1,3)
A:(1,3),
B
:
(
2
,
4
)
\mathcal B:(2,4)
B:(2,4)。其对应的中心矩阵
H
2
\mathcal H_2
H2表示如下:
H
2
=
E
2
−
1
2
I
2
⋅
I
2
T
=
(
1
0
0
1
)
−
1
2
(
1
1
1
1
)
=
(
1
2
−
1
2
−
1
2
1
2
)
\begin{aligned} \mathcal H_2 & = \mathcal E_2 - \frac{1}{2} \mathcal I_2 \cdot \mathcal I_2^{T} \\ & = \begin{pmatrix}1 \quad 0 \\ 0 \quad 1\end{pmatrix} - \frac{1}{2}\begin{pmatrix}1 \quad 1 \\ 1 \quad 1\end{pmatrix} \\ & = \begin{pmatrix}\frac{1}{2} \quad -\frac{1}{2} \\ -\frac{1}{2} \quad \frac{1}{2}\end{pmatrix} \end{aligned}
H2=E2−21I2⋅I2T=(1001)−21(1111)=(21−21−2121) 此时,样本
K
\mathcal K
K左乘中心矩阵
H
2
\mathcal H_2
H2,具体结果如下:
K
⋅
H
2
=
(
−
1
2
1
2
−
1
2
1
2
)
\mathcal K \cdot \mathcal H_2 = \begin{pmatrix}-\frac{1}{2} \quad \frac{1}{2} \\ -\frac{1}{2} \quad \frac{1}{2}\end{pmatrix}
K⋅H2=(−2121−2121) 此时得到两个新的样本点:
A
′
:
(
−
1
2
,
−
1
2
)
\mathcal A':(-\frac{1}{2},-\frac{1}{2})
A′:(−21,−21),
B
′
:
(
1
2
,
1
2
)
\mathcal B':(\frac{1}{2},\frac{1}{2})
B′:(21,21)。原始样本点与新样本点在样本空间中的表示如下: 其中橙色点是原始样本点;蓝色点是左乘中心矩阵后的新样本点。可以观察得到相比于原始样本点,新样本点向原点位置偏移,但样本点之间的相对位置没有变化。
观察中心矩阵 H \mathcal H H: H N × N = E N − 1 N I N I N T \mathcal H_{N \times N} = \mathcal E_N - \frac{1}{N} \mathcal I_N\mathcal I_N^{T} HN×N=EN−N1ININT
-
中心矩阵的转置 H T \mathcal H^{T} HT:
由于
E \mathcal E E是单位矩阵,因此有:
E T = E \mathcal E^{T} = \mathcal E ET=E H T = ( E N − 1 N I N I N T ) T = ( E N ) T − 1 N ⋅ ( I N T ) T ⋅ I N T = E N − 1 N I N I N T = H \begin{aligned} \mathcal H^{T} & = (\mathcal E_N - \frac{1}{N} \mathcal I_N \mathcal I_N^{T})^{T} \\ & = (\mathcal E_N)^{T} - \frac{1}{N} \cdot (\mathcal I_N^{T})^{T} \cdot \mathcal I_N^{T} \\ & = \mathcal E_N - \frac{1}{N} \mathcal I_N \mathcal I_N^{T} \\ & = \mathcal H \end{aligned} HT=(EN−N1ININT)T=(EN)T−N1⋅(INT)T⋅INT=EN−N1ININT=H 由此可知,中心矩阵 H \mathcal H H是一个对称矩阵; -
中心矩阵的平方 H 2 \mathcal H^2 H2: H 2 = H ⋅ H = ( E N − 1 N I N I N T ) ⋅ ( E N − 1 N I N I N T ) = E N − 2 N I N I N T + 1 N 2 I N I N T ⋅ I N I N T \begin{aligned} \mathcal H^2 & = \mathcal H \cdot \mathcal H \\ & = (\mathcal E_N - \frac{1}{N} \mathcal I_N \mathcal I_N^{T})\cdot (\mathcal E_N - \frac{1}{N} \mathcal I_N \mathcal I_N^{T}) \\ & = \mathcal E_N - \frac{2}{N}\mathcal I_N\mathcal I_N^{T} + \frac{1}{N^2} \mathcal I_N\mathcal I_N^{T}\cdot \mathcal I_N\mathcal I_N^{T} \end{aligned} H2=H⋅H=(EN−N1ININT)⋅(EN−N1ININT)=EN−N2ININT+N21ININT⋅ININT 因为 I N I N T \mathcal I_N\mathcal I_N^{T} ININT它的结果是一个各元素均为1的 N × N N\times N N×N的方阵,因而 I N I N T ⋅ I N I N T \mathcal I_N\mathcal I_N^{T}\cdot \mathcal I_N\mathcal I_N^{T} ININT⋅ININT结果是一个各元素均为 N N N的 N × N N \times N N×N的方阵。即: I N I N T ⋅ I N I N T = N ⋅ I N I N T \mathcal I_N\mathcal I_N^{T}\cdot \mathcal I_N\mathcal I_N^{T} = N \cdot \mathcal I_N\mathcal I_N^{T} ININT⋅ININT=N⋅ININT 将该结果带入上式: H 2 = E N − 2 N I N I N T + 1 N I N I N T = E N − 1 N I N I N T = H \begin{aligned} \mathcal H^2 & = \mathcal E_N - \frac{2}{N}\mathcal I_N\mathcal I_N^{T} + \frac{1}{N}\mathcal I_N\mathcal I_N^{T} \\ & = \mathcal E_N - \frac{1}{N} \mathcal I_N \mathcal I_N^{T} \\ & = \mathcal H \end{aligned} H2=EN−N2ININT+N1ININT=EN−N1ININT=H
综上,因而有: H 2 = H ⋅ H = H T ⋅ H = H ⋅ H T = H \mathcal H^2 = \mathcal H \cdot \mathcal H = \mathcal H^{T} \cdot \mathcal H = \mathcal H \cdot \mathcal H^{T} = \mathcal H H2=H⋅H=HT⋅H=H⋅HT=H
回顾样本方差的矩阵表示,可以继续化简为如下形式: S = 1 N X T H ⋅ H T X = 1 N X T H ⋅ X \begin{aligned} \mathcal S & = \frac{1}{N} \mathcal X^{T} \mathcal H \cdot \mathcal H^{T}\mathcal X \\ & = \frac{1}{N} \mathcal X^{T} \mathcal H \cdot \mathcal X \end{aligned} S=N1XTH⋅HTX=N1XTH⋅X
至此,我们介绍了样本均值与样本方差的矩阵表达,以及中心矩阵的相关性质。下一节将介绍PCA降维。
相关参考: 机器学习-降维2-样本均值&样本方差的矩阵表示