- 目录
- 有偏估计与无偏估计
- 介绍
- 关于有偏与无偏估计的误区
- 高斯分布中基于参数预测的有偏估计与无偏估计
- 回顾
- 推导过程
- 为什么 σ M L E 2 \sigma_{MLE}^2 σMLE2会出现偏差
在机器学习笔记——极大似然估计与最大后验概率估计中介绍到,包含 N N N个样本的样本集合 X = { x ( 1 ) , x ( 2 ) , . . . , x ( N ) } \begin{aligned}\mathcal X = \{ x^{(1)},x^{(2)},...,x^{(N)} \}\end{aligned} X={x(1),x(2),...,x(N)}可以理解为 从某一固定参数 θ \theta θ的概率模型 P ( X ∣ θ ) P(\mathcal X \mid \theta) P(X∣θ)生成的样本 x ( i ) x^{(i)} x(i)组成的集合。
已知概率模型 P ( X ∣ θ ) P(X \mid \theta) P(X∣θ)确定的条件下,样本是生成不完的(可以无限地生成样本)——相反,如果想要预估 P ( X ∣ θ ) P(\mathcal X \mid \theta) P(X∣θ)中的参数 θ \theta θ,只能依靠有限的样本进行估计。因此,有偏估计与无偏估计的定义如下:
如果基于有限的样本求得参数的估计结果 θ ^ \hat \theta θ^和概率模型 P ( X ∣ θ ) P(\mathcal X \mid \theta) P(X∣θ)中的真实参数 θ \theta θ之间没有系统误差——即参数估计结果的期望 E [ θ ^ ] \mathbb E[\hat \theta] E[θ^]是否与概率模型真实参数 θ \theta θ相等:
- 如果两者相等——称 θ ^ \hat \theta θ^是 θ \theta θ的无偏估计。我们也称 θ ^ \hat\theta θ^具有无偏性: E [ θ ^ ] = θ \mathbb E[\hat\theta] = \theta E[θ^]=θ
- 相反,如果两者不相等——称其为有偏估计: E [ θ ^ ] ≠ θ \mathbb E[\hat\theta] \neq \theta E[θ^]=θ
无偏性本质上是指对统计量进行预估时,没有 系统误差。系统推断的误差包含系统误差和随机误差两种:
- 我们无论什么样的方法求得的
θ
^
\hat \theta
θ^总是和
P
(
X
∣
θ
)
P(\mathcal X \mid \theta)
P(X∣θ)的真实参数
θ
\theta
θ存在误差。
因为样本是采不完的,因而这个误差消不掉。
- 但如果将这些偏差在概率上平均起来——如果其结果为0,该估计量只有随机误差而没有系统误差。
也就是说,
θ ^ \hat \theta θ^与
E [ θ ^ ] \mathbb E[\hat \theta] E[θ^]之间是两回事。
实例: 已知一个包含 N N N个样本的数据集合 X = { x ( 1 ) , x ( 2 ) , . . . , x ( N ) } \mathcal X = \left\{x^{(1)},x^{(2)},...,x^{(N)} \right\} X={x(1),x(2),...,x(N)},该集合的各样本均从 ( 0 , 100 ) (0,100) (0,100)的均匀分布中随机获得,并且各样本之间相互独立。 事先计算该数据集合 X \mathcal X X的样本均值 θ \theta θ: θ = 1 N ∑ i = 1 N x ( i ) \theta = \frac{1}{N} \sum_{i=1}^N x^{(i)} θ=N1i=1∑Nx(i) 接下来从数据集合 X \mathcal X X中随机抽取若干数量的样本 X X X并放回,并对 X X X取均值作 θ ^ \hat\theta θ^ 作为一个随机变量,随着抽样次数的增多, θ ^ \hat\theta θ^结果也随之增多。最终对随机变量结果求期望 E ( θ ^ ) E(\hat\theta) E(θ^)——观察 E [ θ ^ ] E[\hat\theta] E[θ^]与 θ \theta θ之间的关系。 代码如下:
由于各样本之间独立同分布——因而期望用均值替代;
这里使用’增量更新计算‘的方式求解均值——能够观察期望结果的变化过程。
import random
import matplotlib.pyplot as plt
random.seed(1)
a = [round(random.uniform(1,100),2) for i in range(10000)]
mean_a = round(sum(a) / len(a),4)
plt.figure(figsize=(15,4))
sampling_times = 10000
plt.plot([i for i in range(sampling_times)],[mean_a for _ in range(sampling_times)])
sampling_num = 88
u = 0
k = 1
for i in range(sampling_times):
s = random.sample(a,sampling_num)
sampling_res = sum(s) / len(s)
u = u + (1 / k) * (sampling_res - u)
k += 1
plt.scatter(i,u,c="#ff7f0e",s=2)
plt.show()
返回图像结果如下: 蓝色线表示数据集合的样本均值,橙色点表示随机变量X的期望随着随机变量X数量增加的变化情况。
观察图像发现,期望结果从最开始的不稳定,随着抽样次数的增多趋于稳定,并收敛至数据集合
X
\mathcal X
X的均值结果。
- 无偏估计等于在任何时候都能给出正确无误的估计;
- 无偏估计一定存在;
- 无偏估计一定优于有偏估计;
- 无偏估计一定是好估计;
估计量的无偏性确实是一种优秀的性质,但在真实情况下,无偏性的实际价值还是需要具体问题具体分析。
高斯分布中基于参数预测的有偏估计与无偏估计 回顾在上一节中介绍了使用极大似然估计计算高斯分布的最优参数。 以一维高斯分布为例: 包含 N N N个样本的数据集合 X \mathcal X X中的样本 x ( i ) ( i = 1 , 2 , 3 , . . . , N ) x^{(i)}(i=1,2,3,...,N) x(i)(i=1,2,3,...,N)服从一维高斯分布,且各样本间相互独立(其中 μ , σ \mu,\sigma μ,σ分别为一维高斯分布的均值和标准差): x ( i ) ∼ iid N ( μ , σ 2 ) x^{(i)} \overset{\text{iid}}{\sim} \mathcal N(\mu,\sigma^2) x(i)∼iidN(μ,σ2) 由于样本间独立同分布,因此: E [ x i ] = x 1 + x 2 + ⋯ + x N N = μ \mathbb E[x_i] = \frac{x_1 + x_2+\cdots +x_N}{N} = \mu E[xi]=Nx1+x2+⋯+xN=μ 将 X \mathcal X X视为从概率模型 P ( X ; θ ) P(X;\theta) P(X;θ)以参数 θ \theta θ生成的样本集合,则确定: θ = ( μ , σ ) \theta = (\mu,\sigma) θ=(μ,σ) 使用极大似然估计 θ M L E = arg max θ log P ( X ; θ ) \theta_{MLE} = \mathop{\arg\max}\limits_{\theta} \log P(X;\theta) θMLE=θargmaxlogP(X;θ)求解最优参数 μ M L E , σ M L E 2 \mu_{MLE},\sigma_{MLE}^2 μMLE,σMLE2: μ M L E = 1 N ∑ i = 1 N x ( i ) σ M L E 2 = 1 N ∑ i = 1 N ( x ( i ) − μ M L E ) 2 \begin{aligned} \mu_{MLE} & = \frac{1}{N}\sum_{i=1}^N x^{(i)} \\ \sigma_{MLE}^2 & = \frac{1}{N} \sum_{i=1}^N (x^{(i)} - \mu_{MLE})^2 \end{aligned} μMLEσMLE2=N1i=1∑Nx(i)=N1i=1∑N(x(i)−μMLE)2 下面观察 μ M L E , σ M L E \mu_{MLE},\sigma_{MLE} μMLE,σMLE是有偏估计还是无偏估计。
推导过程首先观察
μ
M
L
E
\mu_{MLE}
μMLE是否为无偏估计。基于上述关于无偏估计的描述,只要满足:
E
[
μ
M
L
E
]
=
μ
\mathbb E[\mu_{MLE}] = \mu
E[μMLE]=μ 证明过程如下: 关于期望带到公式内部详见:
数学期望的性质
E
[
μ
M
L
E
]
=
E
[
1
N
∑
i
=
1
N
x
(
i
)
]
=
1
N
∑
i
=
1
N
E
[
x
(
i
)
]
\begin{aligned} \mathbb E[\mu_{MLE}] & = \mathbb E \left[\frac{1}{N} \sum_{i=1}^N x^{(i)} \right] \\ & = \frac{1}{N} \sum_{i=1}^N \mathbb E[x^{(i)}] \end{aligned}
E[μMLE]=E[N1i=1∑Nx(i)]=N1i=1∑NE[x(i)] 由于
E
[
x
(
i
)
]
=
μ
\begin{aligned}\mathbb E[x^{(i)}] = \mu \end{aligned}
E[x(i)]=μ,并且
μ
\mu
μ是描述概率模型
P
(
X
;
θ
)
P(\mathcal X;\theta)
P(X;θ)的参数,和
i
i
i无关。因此可继续化简为:
=
1
N
∑
i
=
1
N
μ
=
1
N
⋅
N
⋅
μ
=
μ
\begin{aligned} & = \frac{1}{N} \sum_{i=1}^N \mu \\ & = \frac{1}{N} \cdot N \cdot \mu \\ & = \mu \\ \end{aligned}
=N1i=1∑Nμ=N1⋅N⋅μ=μ 至此,我们发现
E
[
μ
M
L
E
]
=
μ
\mathbb E[\mu_{MLE}]= \mu
E[μMLE]=μ,证明极大似然估计关于参数
μ
\mu
μ的预测结果属于无偏估计。
继续观察 σ M L E 2 \sigma_{MLE}^2 σMLE2是否属于无偏估计:即: E [ σ M L E 2 ] = ? σ 2 \mathbb E[\sigma_{MLE}^2] \overset{\text{?}}{=} \sigma^2 E[σMLE2]=?σ2 推导如下: 首先将 σ M L E 2 \sigma_{MLE}^2 σMLE2进行变换: σ M L E 2 = 1 N ∑ i = 1 N ( x ( i ) − μ M L E ) 2 = 1 N ∑ i = 1 N { [ x ( i ) ] 2 − 2 ⋅ x ( i ) ⋅ μ M L E + μ M L E 2 } = 1 N ∑ i = 1 N [ x ( i ) ] 2 − 1 N ∑ i = 1 N 2 ⋅ x ( i ) ⋅ μ M L E + 1 N ∑ i = 1 N μ M L E 2 \begin{aligned} \sigma_{MLE}^2 & = \frac{1}{N} \sum_{i=1}^N (x^{(i)} - \mu_{MLE})^2 \\ & = \frac{1}{N} \sum_{i=1}^N \left\{ [x^{(i)}]^2 - 2 \cdot x^{(i)} \cdot\mu_{MLE} + \mu_{MLE}^2 \right \} \\ & = \frac{1}{N}\sum_{i=1}^N [x^{(i)}]^2 - \frac{1}{N} \sum_{i=1}^N2 \cdot x^{(i)} \cdot\mu_{MLE} + \frac{1}{N} \sum_{i=1}^N\mu_{MLE}^2 \end{aligned} σMLE2=N1i=1∑N(x(i)−μMLE)2=N1i=1∑N{[x(i)]2−2⋅x(i)⋅μMLE+μMLE2}=N1i=1∑N[x(i)]2−N1i=1∑N2⋅x(i)⋅μMLE+N1i=1∑NμMLE2
-
观察第2项: μ M L E \mu_{MLE} μMLE
同样与
i i i无关;
1 N ∑ i = 1 N 2 ⋅ x ( i ) ⋅ μ M L E = 2 ⋅ ( 1 N ∑ i = 1 N x ( i ) ) ⋅ μ M L E = 2 ⋅ μ M L E ⋅ μ M L E = 2 ⋅ μ M L E 2 \begin{aligned} \frac{1}{N} \sum_{i=1}^N2 \cdot x^{(i)} \cdot\mu_{MLE} & = 2 \cdot \left(\frac{1}{N} \sum_{i=1}^Nx^{(i)} \right)\cdot\mu_{MLE} \\ & = 2 \cdot \mu_{MLE} \cdot \mu_{MLE} \\ & = 2 \cdot \mu_{MLE}^2 \end{aligned} N1i=1∑N2⋅x(i)⋅μMLE=2⋅(N1i=1∑Nx(i))⋅μMLE=2⋅μMLE⋅μMLE=2⋅μMLE2 -
观察第3项: 1 N ∑ i = 1 N μ M L E 2 = 1 N ⋅ N ⋅ μ M L E 2 = μ M L E 2 \begin{aligned} \frac{1}{N} \sum_{i=1}^N \mu_{MLE}^2 = \frac{1}{N} \cdot N \cdot \mu_{MLE}^2 = \mu_{MLE}^2 \end{aligned} N1i=1∑NμMLE2=N1⋅N⋅μMLE2=μMLE2
-
将3项合并: σ M L E 2 = 1 N ∑ i = 1 N [ x ( i ) ] 2 − 2 ⋅ μ M L E 2 + μ M L E 2 = 1 N ∑ i = 1 N [ x ( i ) ] 2 − μ M L E 2 \begin{aligned} \sigma_{MLE}^2 & = \frac{1}{N}\sum_{i=1}^N [x^{(i)}]^2 - 2 \cdot \mu_{MLE}^2 + \mu_{MLE}^2 \\ & = \frac{1}{N}\sum_{i=1}^N [x^{(i)}]^2 - \mu_{MLE}^2 \end{aligned} σMLE2=N1i=1∑N[x(i)]2−2⋅μMLE2+μMLE2=N1i=1∑N[x(i)]2−μMLE2
因此,基于上述推导,将
E
[
σ
M
L
E
2
]
\mathbb E[\sigma_{MLE}^2]
E[σMLE2]进行如下变换: 这里添加一些技巧:
E
[
σ
M
L
E
2
]
=
E
{
1
N
∑
i
=
1
N
[
x
(
i
)
]
2
−
μ
M
L
E
2
}
=
E
{
1
N
∑
i
=
1
N
[
x
(
i
)
]
2
−
μ
2
+
μ
2
−
μ
M
L
E
2
}
=
E
{
[
1
N
∑
i
=
1
N
(
x
(
i
)
)
2
−
μ
2
]
−
(
μ
M
L
E
2
−
μ
2
)
}
=
E
[
1
N
∑
i
=
1
N
(
x
(
i
)
)
2
−
μ
2
]
−
E
[
μ
M
L
E
2
−
μ
2
]
\begin{aligned} \mathbb E[\sigma_{MLE}^2] & = \mathbb E \left\{ \frac{1}{N}\sum_{i=1}^N [x^{(i)}]^2 - \mu_{MLE}^2 \right\} \\ & = \mathbb E \left\{ \frac{1}{N}\sum_{i=1}^N [x^{(i)}]^2 - \mu^2 + \mu^2 -\mu_{MLE}^2 \right\} \\ & = \mathbb E \left\{ \left[\frac{1}{N}\sum_{i=1}^N (x^{(i)})^2 - \mu^2 \right] -(\mu_{MLE}^2 - \mu^2) \right\} \\ & = \mathbb E \left[ \frac{1}{N}\sum_{i=1}^N (x^{(i)})^2 - \mu^2 \right] -\mathbb E[\mu_{MLE}^2 - \mu^2] \\ \end{aligned}
E[σMLE2]=E{N1i=1∑N[x(i)]2−μMLE2}=E{N1i=1∑N[x(i)]2−μ2+μ2−μMLE2}=E{[N1i=1∑N(x(i))2−μ2]−(μMLE2−μ2)}=E[N1i=1∑N(x(i))2−μ2]−E[μMLE2−μ2]
- 观察第1项:
μ
2
\mu^2
μ2
是常数,因此
E [ μ 2 ] = μ 2 \mathbb E[\mu^2] = \mu^2 E[μ2]=μ2;观察:
E { [ x ( i ) ] 2 } − μ 2 = E { [ x ( i ) ] 2 } − ( E [ x ( i ) ] ) 2 = Var ( x ( i ) ) \mathbb E \left\{[x^{(i)}]^2 \right\} - \mu^2 = \mathbb E \left\{[x^{(i)}]^2 \right\} - (\mathbb E[x^{(i)}])^2 = \text{Var}(x^{(i)}) E{[x(i)]2}−μ2=E{[x(i)]2}−(E[x(i)])2=Var(x(i)),即
x ( i ) x^{(i)} x(i)的方差;
E [ 1 N ∑ i = 1 N ( x ( i ) ) 2 − μ 2 ] = 1 N ∑ i = 1 N { E [ x ( i ) ] 2 − E [ μ 2 ] } = 1 N ∑ i = 1 N { E [ x ( i ) ] 2 − μ 2 } = 1 N ∑ i = 1 N Var [ x ( i ) ] = σ 2 \begin{aligned} \mathbb E \left[ \frac{1}{N}\sum_{i=1}^N (x^{(i)})^2 - \mu^2 \right] & = \frac{1}{N}\sum_{i=1}^N \left\{\mathbb E [x^{(i)}]^2 - \mathbb E[\mu^2] \right\} \\ & = \frac{1}{N}\sum_{i=1}^N \left\{\mathbb E [x^{(i)}]^2 - \mu^2 \right\} \\ & = \frac{1}{N}\sum_{i=1}^N \text{Var}[x^{(i)}] \\ & = \sigma^2 \end{aligned} E[N1i=1∑N(x(i))2−μ2]=N1i=1∑N{E[x(i)]2−E[μ2]}=N1i=1∑N{E[x(i)]2−μ2}=N1i=1∑NVar[x(i)]=σ2 - 观察第2项:
基于上面无偏估计:
E [ μ M L E ] = μ \mathbb E[\mu_{MLE}] = \mu E[μMLE]=μ,将
μ \mu μ使用
E [ μ M L E ] \mathbb E[\mu_{MLE}] E[μMLE]进行替换;
E [ μ M L E 2 − μ 2 ] = E [ μ M L E 2 ] − E [ μ 2 ] = E [ μ M L E 2 ] − μ 2 = E [ μ M L E 2 ] − E [ μ M L E ] 2 = V a r ( μ M L E ) \begin{aligned} \mathbb E[\mu_{MLE}^2 - \mu^2] & = \mathbb E[\mu_{MLE}^2] - \mathbb E[\mu^2] \\ & = \mathbb E[\mu_{MLE}^2] - \mu^2 \\ & = \mathbb E[\mu_{MLE}^2] - \mathbb E[\mu_{MLE}]^2 \\ & = Var(\mu_{MLE}) \end{aligned} E[μMLE2−μ2]=E[μMLE2]−E[μ2]=E[μMLE2]−μ2=E[μMLE2]−E[μMLE]2=Var(μMLE)
那么
Var
(
μ
M
L
E
)
←
\text{Var}(\mu_{MLE}) \gets
Var(μMLE)← 要如何理解呢? 将其进行展开
这里用到了方差的系数运算:
V
a
r
[
1
N
∑
i
=
1
N
x
(
i
)
]
=
E
[
(
1
N
∑
i
=
1
N
x
(
i
)
)
2
]
−
E
[
1
N
∑
i
=
1
N
x
(
i
)
]
2
=
1
N
2
E
[
(
∑
i
=
1
N
x
(
i
)
)
2
]
−
(
1
N
E
[
∑
i
=
1
N
x
(
i
)
]
)
2
=
1
N
2
E
[
(
∑
i
=
1
N
x
(
i
)
)
2
]
−
1
N
2
(
E
[
∑
i
=
1
N
x
(
i
)
]
)
2
=
1
N
2
[
E
[
(
∑
i
=
1
N
x
(
i
)
)
2
]
−
(
E
[
∑
i
=
1
N
x
(
i
)
]
)
2
]
=
1
N
2
Var
(
∑
i
=
1
N
x
(
i
)
)
\begin{aligned} Var \left[\frac{1}{N}\sum_{i=1}^N x^{(i)} \right] & = \mathbb E \left[ \left(\frac{1}{N}\sum_{i=1}^N x^{(i)} \right)^2 \right] - \mathbb E \left[\frac{1}{N}\sum_{i=1}^N x^{(i)} \right]^2 \\ & = \frac{1}{N^2} \mathbb E \left[ \left(\sum_{i=1}^N x^{(i)} \right)^2 \right] - \left(\frac{1}{N} \mathbb E \left[\sum_{i=1}^N x^{(i)} \right] \right)^2 \\ & = \frac{1}{N^2} \mathbb E[(\sum_{i=1}^N x^{(i)})^2] - \frac{1}{N^2} (\mathbb E[\sum_{i=1}^N x^{(i)}])^2 \\ & = \frac{1}{N^2} [\mathbb E[(\sum_{i=1}^N x^{(i)})^2] - (\mathbb E[\sum_{i=1}^N x^{(i)}])^2] \\ & = \frac{1}{N^2} \text{Var} \left(\sum_{i=1}^N x^{(i)} \right) \end{aligned}
Var[N1i=1∑Nx(i)]=E
(N1i=1∑Nx(i))2
−E[N1i=1∑Nx(i)]2=N21E
(i=1∑Nx(i))2
−(N1E[i=1∑Nx(i)])2=N21E[(i=1∑Nx(i))2]−N21(E[i=1∑Nx(i)])2=N21[E[(i=1∑Nx(i))2]−(E[i=1∑Nx(i)])2]=N21Var(i=1∑Nx(i)) 这里用到了方差的加法运算:如果各样本之间相互独立的随机变量,则
:
V
a
r
(
X
+
Y
)
=
E
[
(
X
+
Y
)
2
]
−
[
E
(
X
+
Y
)
]
2
=
E
[
X
2
+
2
X
Y
+
Y
2
]
−
(
E
[
X
]
+
E
[
Y
]
)
2
=
E
[
X
2
]
+
2
⋅
E
[
X
]
E
[
Y
]
+
E
[
Y
2
]
−
(
(
E
[
X
]
)
2
+
(
E
[
Y
]
)
2
+
2
⋅
E
[
X
]
E
[
Y
]
)
=
(
E
[
X
2
]
−
(
E
[
X
]
)
2
)
+
(
E
[
Y
2
]
−
(
E
[
Y
]
)
2
)
=
Var
(
X
)
+
Var
(
Y
)
\begin{aligned} Var(X+Y) & = \mathbb E[(X+Y)^2] - [\mathbb E(X+Y)]^2 \\ & = \mathbb E[X^2 + 2XY + Y^2] - (\mathbb E[X]+\mathbb E[Y])^2 \\ & = \mathbb E[X^2] + 2\cdot\mathbb E[X]\mathbb E[Y] + \mathbb E[Y^2] - ((\mathbb E[X])^2 + (\mathbb E[Y])^2 + 2\cdot\mathbb E[X]\mathbb E[Y])\\ & = (\mathbb E[X^2] - (\mathbb E[X])^2) + (\mathbb E[Y^2] - (\mathbb E[Y])^2) \\ & = \text{Var}(X) + \text{Var}(Y) \end{aligned}
Var(X+Y)=E[(X+Y)2]−[E(X+Y)]2=E[X2+2XY+Y2]−(E[X]+E[Y])2=E[X2]+2⋅E[X]E[Y]+E[Y2]−((E[X])2+(E[Y])2+2⋅E[X]E[Y])=(E[X2]−(E[X])2)+(E[Y2]−(E[Y])2)=Var(X)+Var(Y) 因此:
1
N
2
Var
[
∑
i
=
1
N
x
(
i
)
]
=
1
N
2
∑
i
=
1
N
Var
[
x
(
i
)
]
\begin{aligned} \frac{1}{N^2} \text{Var} \left[\sum_{i=1}^N x^{(i)} \right] & = \frac{1}{N^2}\sum_{i=1}^N \text{Var}[x^{(i)}] \end{aligned}
N21Var[i=1∑Nx(i)]=N21i=1∑NVar[x(i)] 回归原式:
Var
(
μ
M
L
E
)
\text{Var}(\mu_{MLE})
Var(μMLE)
Var
[
μ
M
L
E
]
=
Var
[
1
N
∑
i
=
1
N
x
(
i
)
]
=
1
N
2
∑
i
=
1
N
Var
(
x
(
i
)
)
=
1
N
[
1
N
∑
i
=
1
N
Var
(
x
(
i
)
)
]
=
1
N
2
⋅
N
⋅
σ
2
=
1
N
σ
2
\begin{aligned} \text{Var}[\mu_{MLE}] & = \text{Var} \left[ \frac{1}{N}\sum_{i=1}^N x^{(i)} \right] \\ & = \frac{1}{N^2}\sum_{i=1}^N \text{Var}(x^{(i)}) \\ & = \frac{1}{N} \left[\frac{1}{N}\sum_{i=1}^N \text{Var}(x^{(i)}) \right] \\ & = \frac{1}{N^2} \cdot N \cdot \sigma^2 \\ & = \frac{1}{N}\sigma^2 \end{aligned}
Var[μMLE]=Var[N1i=1∑Nx(i)]=N21i=1∑NVar(x(i))=N1[N1i=1∑NVar(x(i))]=N21⋅N⋅σ2=N1σ2
个人理解(小插曲):和
视频存在出入的地方:
1
N
∑
i
=
1
N
Var
[
x
(
i
)
]
\begin{aligned}\frac{1}{N}\sum_{i=1}^N \text{Var}[x^{(i)}] \end{aligned}
N1i=1∑NVar[x(i)]中包含
x
(
i
)
x^{(i)}
x(i),使用下面方法求解 不太正确(虽然结果相同)
1
N
∑
i
=
1
N
Var
[
x
(
i
)
]
=
1
N
⋅
N
⋅
σ
2
=
σ
2
\begin{aligned} \frac{1}{N}\sum_{i=1}^N \text{Var}[x^{(i)}] & = \frac{1}{N}\cdot N \cdot \sigma^2 \\ & = \sigma^2 \end{aligned}
N1i=1∑NVar[x(i)]=N1⋅N⋅σ2=σ2 正确的理解方式:
1
N
∑
i
=
1
N
Var
[
x
(
i
)
]
=
1
N
{
Var
[
x
(
i
)
]
+
Var
[
x
(
2
)
]
+
⋯
+
Var
[
x
(
N
)
]
}
=
1
N
[
(
x
(
1
)
−
x
ˉ
1
)
2
+
(
x
2
(
2
)
−
x
ˉ
1
)
2
+
⋯
+
(
x
(
N
)
−
x
ˉ
1
)
2
]
=
1
N
∑
i
=
1
N
(
x
(
i
)
−
x
ˉ
)
2
=
σ
2
\begin{aligned} \frac{1}{N}\sum_{i=1}^N \text{Var}[x^{(i)}] & = \frac{1}{N} \left\{\text{Var}[x^{(i)}] + \text{Var}[x^{(2)}] + \cdots + \text{Var}[x^{(N)}] \right\} \\ & = \frac{1}{N} \left[(\frac{x^{(1)} - \bar{x}}{1})^2 + (\frac{x_2^{(2)} - \bar{x}}{1})^2 + \cdots + (\frac{x^{(N)} - \bar{x}}{1})^2 \right] \\ & = \frac{1}{N}\sum_{i=1}^N(x^{(i)} - \bar{x})^2 \\ & = \sigma^2 \end{aligned}
N1i=1∑NVar[x(i)]=N1{Var[x(i)]+Var[x(2)]+⋯+Var[x(N)]}=N1[(1x(1)−xˉ)2+(1x2(2)−xˉ)2+⋯+(1x(N)−xˉ)2]=N1i=1∑N(x(i)−xˉ)2=σ2 言归正传~
因此,原式 = 第1项 - 第2项 =
σ
2
−
1
N
σ
2
=
N
−
1
N
σ
2
\begin{aligned}\sigma^2 - \frac{1}{N} \sigma^2 = \frac{N-1}{N} \sigma^2\end{aligned}
σ2−N1σ2=NN−1σ2 综上:
E
[
σ
M
L
E
2
]
=
N
−
1
N
σ
2
\mathbb E[\sigma_{MLE}^2] = \frac{N-1}{N} \sigma^2
E[σMLE2]=NN−1σ2 我们发现,使用 极大似然估计 得到
σ
M
L
E
2
\sigma_{MLE}^2
σMLE2的期望不等于
σ
2
\sigma^2
σ2,因此
σ
M
L
E
2
\sigma_{MLE}^2
σMLE2属于有偏估计。 那么实际关于
σ
2
\sigma^2
σ2的无偏估计是多少呢?
σ
^
M
L
E
2
=
1
N
−
1
∑
i
=
1
N
(
x
i
−
μ
M
L
E
)
2
\hat \sigma_{MLE}^2 = \frac{1}{N -1} \sum_{i=1}^N (x_i - \mu_{MLE})^2
σ^MLE2=N−11i=1∑N(xi−μMLE)2 此时,将上述所有关于
σ
M
L
E
2
\sigma_{MLE}^2
σMLE2的推导将
N
→
N
−
1
N \to N-1
N→N−1:
E
[
σ
^
M
L
E
2
]
=
N
−
1
N
−
1
σ
2
=
σ
2
\mathbb E [\hat \sigma_{MLE}^2] = \frac{N-1}{N-1} \sigma^2 = \sigma^2
E[σ^MLE2]=N−1N−1σ2=σ2
回顾 极大似然估计 下 σ M L E 2 \sigma_{MLE}^2 σMLE2的估计结果: σ M L E 2 = 1 N ∑ i = 1 N ( x i − μ M L E ) 2 \sigma_{MLE}^2 = \frac{1}{N} \sum_{i=1}^N (x_i - \mu_{MLE})^2 σMLE2=N1i=1∑N(xi−μMLE)2 而真正的无偏结果可以进行如下表示: σ 2 = 1 N ∑ i = 1 N ( x i − μ ) 2 \sigma^2 = \frac{1}{N} \sum_{i=1}^N (x_i - \mu)^2 σ2=N1i=1∑N(xi−μ)2 观察 μ \mu μ和 μ M L E \mu_{MLE} μMLE:
- μ \mu μ表示描述某一高斯分布的充分统计量(参数);如果高斯分布确定, μ \mu μ值是客观存在的,是确定的;
- μ M L E \mu_{MLE} μMLE是通过 极大似然估计 对 μ \mu μ产生的 估计结果; 即: μ M L E ≈ μ \mu_{MLE} \approx \mu μMLE≈μ
只要
μ
M
L
E
\mu_{MLE}
μMLE和
μ
\mu
μ存在差异,
σ
M
L
E
2
\sigma_{MLE}^2
σMLE2就是有偏估计。 需要注意的一点:无偏估计表示估计值的期望是无偏估计,但是实际估计值是存在偏差的。 从理论上讲,若想要得到
μ
\mu
μ,那么必须将概率模型中的所有样本全部取出,进行估计; 如果真能把所有样本全部取出来去估计,那也就不叫估计了,那叫确定~
但是上面提到,一个概率模型确定下来,其样本是生成不完的 → \to → 只能反过来通过有限的样本对模型进行估计。
因此,上述推导过程中,从来没有出现过 直接将 μ M L E \mu_{MLE} μMLE直接替换成 μ \mu μ的情况,而是基于 E [ μ M L E ] = μ \mathbb E[\mu_{MLE}] = \mu E[μMLE]=μ去完成的。
相关参考: 无偏估计 - 百度百科 机器学习-白板推导系列(二)-数学基础