根据 loopy BP算法, 其流程实际要计算两个消息算子: Δ i → j ( t , x j ) = c o n s t + max x \ x j f out ( z i , y i ) + ∑ r ≠ j Δ i ← r ( t , x r ) (1) \begin{aligned} {\Delta}_{i \rightarrow j}\left(t, x_{j}\right)=\mathrm{const} +\max _{\mathbf{x}\backslash x_j} f_{\text {out }}\left(z_{i}, y_{i}\right)+\sum_{r \neq j} \Delta_{i \leftarrow r}\left(t, x_{r}\right) \end{aligned}\tag{1} Δi→j(t,xj)=const+x\xjmaxfout (zi,yi)+r=j∑Δi←r(t,xr)(1) 和 Δ i ← j ( t + 1 , x j ) = c o n s t + f i n ( x j , q j ) + ∑ ℓ ≠ i Δ ℓ → j ( t , x j ) (2) \begin{aligned} {\Delta}_{i \leftarrow j}\left(t+1, x_{j}\right)=\mathrm{const} +f_{\mathrm{in}}\left(x_{j}, q_{j}\right)+\sum_{\ell \neq i} \Delta_{\ell \rightarrow j}\left(t, x_{j}\right) \end{aligned}\tag{2} Δi←j(t+1,xj)=const+fin(xj,qj)+ℓ=i∑Δℓ→j(t,xj)(2)
接下来就是通过高斯和二次近似,对计算进行简化。
定义: x ^ j ( t ) : = arg max x j Δ j ( t , x j ) x ^ i ← j ( t ) : = arg max x j Δ i ← j ( t , x j ) 1 τ j x ( t ) : = − ∂ 2 ∂ x j 2 Δ j ( t , x j ) ∣ x j = x ^ j ( t ) 1 τ i ← j x ( t ) : = − ∂ 2 ∂ x j 2 Δ i ← j ( t , x j ) ∣ x j = x ^ i ← j ( t ) . \begin{aligned} \widehat{x}_{j}(t) &:=\underset{x_{j}}{\arg \max } \Delta_{j}\left(t, x_{j}\right) \\ \widehat{x}_{i \leftarrow j}(t) &:=\underset{x_{j}}{\arg \max } \Delta_{i \leftarrow j}\left(t, x_{j}\right) \\ \frac{1}{\tau_{j}^{x}(t)} &:=-\left.\frac{\partial^{2}}{\partial x_{j}^{2}} \Delta_{j}\left(t, x_{j}\right)\right|_{x_{j}=\widehat{x}_{j}(t)} \\ \frac{1}{\tau_{i \leftarrow j}^{x}(t)} &:=-\left.\frac{\partial^{2}}{\partial x_{j}^{2}} \Delta_{i \leftarrow j}\left(t, x_{j}\right)\right|_{x_{j}=\widehat{x}_{i \leftarrow j}(t)} . \end{aligned} x j(t)x i←j(t)τjx(t)1τi←jx(t)1:=xjargmaxΔj(t,xj):=xjargmaxΔi←j(t,xj):=−∂xj2∂2Δj(t,xj)∣∣∣∣∣xj=x j(t):=−∂xj2∂2Δi←j(t,xj)∣∣∣∣∣xj=x i←j(t). 那么, 将(2)在 x ^ i ← r ( t ) \widehat{x}_{i \leftarrow r}(t) x i←r(t)点进行泰勒展开, 可以得到: Δ i ← r ( t , x r ) ≈ Δ i ← r ( t , x ^ i ← r ( t ) ) − 1 2 τ r x ( t ) ( x r − x ^ i ← r ( t ) ) 2 (3) \Delta_{i \leftarrow r}\left(t, x_{r}\right) \approx \Delta_{i \leftarrow r}\left(t, \widehat{x}_{i \leftarrow r}(t)\right)-\frac{1}{2 \tau_{r}^{x}(t)}\left(x_{r}-\widehat{x}_{i \leftarrow r}(t)\right)^{2} \tag{3} Δi←r(t,xr)≈Δi←r(t,x i←r(t))−2τrx(t)1(xr−x i←r(t))2(3) 注意这里 x ^ i ← r ( t ) \widehat{x}_{i \leftarrow r}(t) x i←r(t)是第 t t t次迭代时得到的, 因此在我们推导 t + 1 t+1 t+1次迭代时为常数。
将(3)代入(1)中,有:
Δ i → j ( t , x j ) = max z i [ f out ( z i , y i ) − 1 2 τ i → j p ( t ) ( z i − p ^ i → j ( t ) − a i j x j ) 2 , ] + c o n s t (4) \begin{aligned} \Delta_{i \rightarrow j}\left(t, x_{j}\right) = \max _{z_{i}}\left[f_{\text {out }}\left(z_{i}, y_{i}\right)-\frac{1}{2 \tau_{i \rightarrow j}^{p}(t)}\left(z_{i}-\widehat{p}_{i \rightarrow j}(t)-a_{i j} x_{j}\right)^{2},\right] +\mathrm{const} \end{aligned}\tag{4} Δi→j(t,xj)=zimax[fout (zi,yi)−2τi→jp(t)1(zi−p i→j(t)−aijxj)2,]+const(4)
其中 p ^ i → j ( t ) : = ∑ r ≠ j a i r x ^ i ← r ( t ) τ i → j p ( t ) : = ∑ r ≠ j ∣ a i r ∣ 2 τ r x ( t ) \begin{aligned} \widehat{p}_{i \rightarrow j}(t) &:=\sum_{r \neq j} a_{i r} \widehat{x}_{i \leftarrow r}(t) \quad\tau_{i \rightarrow j}^{p}(t) &:=\sum_{r \neq j}\left|a_{i r}\right|^{2} \tau_{r}^{x}(t) \end{aligned} p i→j(t):=r=j∑airx i←r(t)τi→jp(t):=r=j∑∣air∣2τrx(t)
推导(4)是一个以 z i = a i j x j + ∑ r ≠ j a i r x r z_{i}=a_{i j} x_{j}+\sum_{r \neq j} a_{i r} x_{r} zi=aijxj+∑r=jairxr 为约束的优化问题,通过拉格朗日乘子法可以求得闭式解,推导出(4)。
我们定义: H ( p ^ , y , τ p ) : = max z [ f out ( z , y ) − 1 2 τ p ( z − p ^ ) 2 ] H\left(\widehat{p}, y, \tau^{p}\right):=\max _{z}\left[f_{\text {out }}(z, y)-\frac{1}{2 \tau^{p}}(z-\widehat{p})^{2}\right] H(p ,y,τp):=zmax[fout (z,y)−2τp1(z−p )2] 那么(4)可以进一步写为: Δ i → j ( t , x j ) = H ( p ^ i → j ( t ) + a i j x j , y i , τ i → j p ( t ) ) + const (5) \Delta_{i \rightarrow j}\left(t, x_{j}\right) =H\left(\widehat{p}_{i \rightarrow j}(t)+a_{i j} x_{j}, y_{i}, \tau_{i \rightarrow j}^{p}(t)\right)+\text { const }\tag{5} Δi→j(t,xj)=H(p i→j(t)+aijxj,yi,τi→jp(t))+ const (5)
继续定义: p ^ i ( t ) : = ∑ j a i j x ^ i ← j ( t ) \widehat{p}_{i}(t):=\sum_{j} a_{i j} \widehat{x}_{i \leftarrow j}(t) p i(t):=∑jaijx i←j(t) 和 τ i p ( t ) = ∑ j ∣ a i j ∣ 2 τ j x ( t ) \tau_{i}^{p}(t)=\sum_{j}\left|a_{i j}\right|^{2} \tau_{j}^{x}(t) τip(t)=∑j∣aij∣2τjx(t), 有: p ^ i → j ( t ) = p ^ i ( t ) − a i j x ^ i ← j ( t ) τ i → j p ( t ) = τ i p ( t ) − a i j 2 τ j x ( t ) \begin{aligned} \widehat{p}_{i \rightarrow j}(t) &=\widehat{p}_{i}(t)-a_{i j} \widehat{x}_{i \leftarrow j}(t) \\ \tau_{i \rightarrow j}^{p}(t) &=\tau_{i}^{p}(t)-a_{i j}^{2} \tau_{j}^{x}(t) \end{aligned} p i→j(t)τi→jp(t)=p i(t)−aijx i←j(t)=τip(t)−aij2τjx(t) 那么,(5)可以进一步写为: Δ i → j ( t , x j ) ≈ H ( p ^ i ( t ) + a i j ( x j − x ^ j ) , y i , τ i p ( t ) ) + const \Delta_{i \rightarrow j}\left(t, x_{j}\right) \approx H\left(\widehat{p}_{i}(t)+a_{i j}\left(x_{j}-\widehat{x}_{j}\right), y_{i}, \tau_{i}^{p}(t)\right)+\text { const } Δi→j(t,xj)≈H(p i(t)+aij(xj−x j),yi,τip(t))+ const 这里是因为我们忽略了 O ( a i j 2 ) O\left(a_{i j}^{2}\right) O(aij2)级的项, 因此有了 τ i → j p ( t ) = τ i p ( t ) \tau_{i \rightarrow j}^{p}(t)=\tau_{i}^{p}(t) τi→jp(t)=τip(t)的效果。
继续定义: s ^ i ( t ) = g out ( t , p ^ i ( t ) , y i , τ i p ( t ) ) τ i s ( t ) = − ∂ ∂ p ^ g out ( t , p ^ i ( t ) , y i , τ i p ( t ) ) (6) \begin{aligned} \widehat{s}_{i}(t) &=g_{\text {out }}\left(t, \widehat{p}_{i}(t), y_{i}, \tau_{i}^{p}(t)\right) \\ \tau_{i}^{s}(t) &=-\frac{\partial}{\partial \widehat{p}} g_{\text {out }}\left(t, \widehat{p}_{i}(t), y_{i}, \tau_{i}^{p}(t)\right) \end{aligned}\tag{6} s i(t)τis(t)=gout (t,p i(t),yi,τip(t))=−∂p ∂gout (t,p i(t),yi,τip(t))(6) 其中, g out ( p ^ , y , τ p ) : = ∂ ∂ p ^ H ( p ^ , y , τ p ) g_{\text {out }}\left(\widehat{p}, y, \tau^{p}\right):=\frac{\partial}{\partial \widehat{p}} H\left(\widehat{p}, y, \tau^{p}\right) gout (p ,y,τp):=∂p ∂H(p ,y,τp) 这里很明确了, (6)中的 s ^ i ( t ) \hat{s}_i(t) s^i(t) 和 τ i s ( t ) \tau_{i}^{s}(t) τis(t) 分别代表了 H H H 函数 在 p ^ i ( t ) \widehat{p}_i(t) p i(t)的一阶导和二阶导, 那么继续泰勒展开:
Δ i → j ( t , x j ) ≈ c o n s t + s i ( t ) a i j ( x j − x ^ j ( t ) ) − τ i s ( t ) 2 a i j 2 ( x j − x ^ j ( t ) ) 2 = const [ s i ( t ) a i j + a i j 2 τ i s ( t ) x ^ j ( t ) ] x j − τ i s ( t ) 2 a i j 2 x j 2 (7) \begin{aligned} \Delta_{i \rightarrow j}\left(t, x_{j}\right) &\approx \mathrm{const} +s_{i}(t) a_{i j}\left(x_{j}-\widehat{x}_{j}(t)\right)-\frac{\tau_{i}^{s}(t)}{2} a_{i j}^{2}\left(x_{j}-\widehat{x}_{j}(t)\right)^{2} \\ &= \operatorname{const}\left[s_{i}(t) a_{i j}+a_{i j}^{2} \tau_{i}^{s}(t) \widehat{x}_{j}(t)\right] x_{j} -\frac{\tau_{i}^{s}(t)}{2} a_{i j}^{2} x_{j}^{2} \end{aligned}\tag{7} Δi→j(t,xj)≈const+si(t)aij(xj−x j(t))−2τis(t)aij2(xj−x j(t))2=const[si(t)aij+aij2τis(t)x j(t)]xj−2τis(t)aij2xj2(7)
此时, 把(7)代入到(2)中, (2)也可以被近似为: Δ i ← j ( t + 1 , x j ) ≈ c o n s t + f i n ( x j , q j ) − 1 2 τ i ← j r ( t ) ( r ^ i ← j ( t ) − x j ) 2 \begin{aligned} \Delta_{i \leftarrow j}\left(t+1, x_{j}\right) \approx \mathrm{const} +f_{\mathrm{in}}\left(x_{j}, q_{j}\right)-\frac{1}{2 \tau_{i \leftarrow j}^{r}(t)}\left(\widehat{r}_{i \leftarrow j}(t)-x_{j}\right)^{2} \end{aligned} Δi←j(t+1,xj)≈const+fin(xj,qj)−2τi←jr(t)1(r i←j(t)−xj)2 其中, 1 τ i ← j r ( t ) = ∑ ℓ ≠ i a ℓ j 2 τ ℓ s ( t ) \frac{1}{\tau_{i \leftarrow j}^{r}(t)}=\sum_{\ell \neq i} a_{\ell j}^{2} \tau_{\ell}^{s}(t) τi←jr(t)1=ℓ=i∑aℓj2τℓs(t) 和 r ^ i ← j ( t ) = τ i ← j r ( t ) ∑ ℓ ≠ i [ s ℓ ( t ) a ℓ j + a ℓ j 2 τ ℓ s ( t ) x ^ j ( t ) ] = x ^ j ( t ) + τ i ← j r ( t ) ∑ s ℓ ( t ) a ℓ j \begin{aligned} \widehat{r}_{i \leftarrow j}(t) &=\tau_{i \leftarrow j}^{r}(t) \sum_{\ell \neq i}\left[s_{\ell}(t) a_{\ell j}+a_{\ell j}^{2} \tau_{\ell}^{s}(t) \widehat{x}_{j}(t)\right] \\ &=\widehat{x}_{j}(t)+\tau_{i \leftarrow j}^{r}(t) \sum s_{\ell}(t) a_{\ell j} \end{aligned} r i←j(t)=τi←jr(t)ℓ=i∑[sℓ(t)aℓj+aℓj2τℓs(t)x j(t)]=x j(t)+τi←jr(t)∑sℓ(t)aℓj 这时候,如果我们定义: g in ( r ^ , q , τ r ) : = arg max x f in ( x , q ) − 1 2 τ r ( r ^ − x ) 2 g_{\text {in }}\left(\widehat{r}, q, \tau^{r}\right):=\underset{x}{\arg \max }f_{\text {in }}(x, q)-\frac{1}{2 \tau^{r}}(\widehat{r}-x)^{2} gin (r ,q,τr):=xargmaxfin (x,q)−2τr1(r −x)2 那么有: x ^ i ← j ( t + 1 ) ≈ g in ( r ^ i ← j ( t ) , q j , τ i ← j r ( t ) ) (8) \widehat{x}_{i \leftarrow j}(t+1) \approx g_{\text {in }}\left(\widehat{r}_{i \leftarrow j}(t), q_{j}, \tau_{i \leftarrow j}^{r}(t)\right)\tag{8} x i←j(t+1)≈gin (r i←j(t),qj,τi←jr(t))(8)
再定义: τ j r ( t ) = [ ∑ i ∣ a i j ∣ 2 τ i s ( t ) ] − 1 r ^ j ( t ) = x ^ j ( t ) + τ j r ( t ) ∑ i a i j s ^ i ( t ) \begin{aligned} &\tau_{j}^{r}(t)=\left[\sum_{i}\left|a_{i j}\right|^{2} \tau_{i}^{s}(t)\right]^{-1} \\ &\widehat{r}_{j}(t)=\widehat{x}_{j}(t)+\tau_{j}^{r}(t) \sum_{i} a_{i j} \widehat{s}_{i}(t) \end{aligned} τjr(t)=[i∑∣aij∣2τis(t)]−1r j(t)=x j(t)+τjr(t)i∑aijs i(t) 又可以有下列近似关系: τ i ← j r ( t ) ≈ τ j r ( t ) r ^ i ← j ( t ) ≈ x ^ j ( t ) + τ j r ( t ) ∑ ℓ ≠ i s ℓ ( t ) a ℓ j = r ^ j ( t ) − τ j r ( t ) a i j s i ( t ) \begin{aligned} \tau_{i \leftarrow j}^{r}(t) & \approx \tau_{j}^{r}(t) \\ \widehat{r}_{i \leftarrow j}(t) & \approx \widehat{x}_{j}(t)+\tau_{j}^{r}(t) \sum_{\ell \neq i} s_{\ell}(t) a_{\ell j} \\ &=\widehat{r}_{j}(t)-\tau_{j}^{r}(t) a_{i j} s_{i}(t) \end{aligned} τi←jr(t)r i←j(t)≈τjr(t)≈x j(t)+τjr(t)ℓ=i∑sℓ(t)aℓj=r j(t)−τjr(t)aijsi(t) 这里同样是忽略了 O ( a i j 2 ) O\left(a_{i j}^{2}\right) O(aij2) 级的项。 利用这一近似, (8)可以近似为: x ^ i ← j ( t + 1 ) ≈ ( a ) g i n ( r ^ j ( t ) − a i j s i ( t ) τ j r ( t ) , q j , τ j T ( t ) ) ≈ ( b ) x ^ j ( t + 1 ) − a i j s j ( t ) D j ( t + 1 ) (9) \begin{aligned} &\widehat{x}_{i \leftarrow j}(t+1) \\ &\quad \stackrel{(a)}{\approx} g_{i n}\left(\widehat{r}_{j}(t)-a_{i j} s_{i}(t) \tau_{j}^{r}(t), q_{j}, \tau_{j}^{T}(t)\right) \\ &\stackrel{(b)}{\approx} \widehat{x}_{j}(t+1)-a_{i j} s_{j}(t) D_{j}(t+1) \end{aligned}\tag{9} x i←j(t+1)≈(a)gin(r j(t)−aijsi(t)τjr(t),qj,τjT(t))≈(b)x j(t+1)−aijsj(t)Dj(t+1)(9) 这里(a)就是直接代入了近似关系, 而(b)则是对进行了泰勒一次展开: x ^ j ( t + 1 ) : = g in ( r ^ j ( t ) , q j , τ j r ( t ) ) D j ( t + 1 ) : = τ j r ( t ) ∂ ∂ r ^ g in ( r ^ j ( t ) , q j , τ j r ( t ) ) \begin{aligned} \widehat{x}_{j}(t+1) &:=g_{\text {in }}\left(\widehat{r}_{j}(t), q_{j}, \tau_{j}^{r}(t)\right) \\ D_{j}(t+1) &:=\tau_{j}^{r}(t) \frac{\partial}{\partial \widehat{r}} g_{\text {in }}\left(\widehat{r}_{j}(t), q_{j}, \tau_{j}^{r}(t)\right) \end{aligned} x j(t+1)Dj(t+1):=gin (r j(t),qj,τjr(t)):=τjr(t)∂r ∂gin (r j(t),qj,τjr(t)) 而又有: D j ( t + 1 ) ≈ ( a ) τ j r ( t ) ∂ ∂ r ^ ( Γ f i n ( ⋅ , q j ) ) ( r ^ j ( t ) , τ j r ( t ) ) = ( b ) τ j T ( t ) 1 − τ j r ( t ) f i n ′ ′ ( x ^ j ( t + 1 ) , q j ) ≈ ( c ) [ − ∂ 2 ∂ x j 2 Δ i ← j ( t + 1 , x ^ j ( t + 1 ) ) ] − 1 ≈ ( d ) τ j x ( t + 1 ) \begin{aligned} &D_{j}(t+1) \stackrel{(a)}{\approx} \tau_{j}^{r}(t) \frac{\partial}{\partial \widehat{r}}\left(\Gamma f_{\mathrm{in}}\left(\cdot, q_{j}\right)\right)\left(\widehat{r}_{j}(t), \tau_{j}^{r}(t)\right) \\ &\stackrel{(b)}{=} \frac{\tau_{j}^{T}(t)}{1-\tau_{j}^{r}(t) f_{i n}^{\prime \prime}\left(\widehat{x}_{j}(t+1), q_{j}\right)} \\ &\stackrel{(c)}{\approx}\left[-\frac{\partial^{2}}{\partial x_{j}^{2}} \Delta_{i \leftarrow j}\left(t+1, \widehat{x}_{j}(t+1)\right)\right]^{-1} \\ &\stackrel{(d)}{\approx} \tau_{j}^{x}(t+1) \end{aligned} Dj(t+1)≈(a)τjr(t)∂r ∂(Γfin(⋅,qj))(r j(t),τjr(t))=(b)1−τjr(t)fin′′(x j(t+1),qj)τjT(t)≈(c)[−∂xj2∂2Δi←j(t+1,x j(t+1))]−1≈(d)τjx(t+1) 将上式代入(9)和 p ^ i ( t ) \widehat{p}_{i}(t) p i(t)的定义中,我们有: p ^ i ( t ) = ∑ j a i j x ^ j ( t ) − τ i p ( t ) s ^ i ( t − 1 ) \widehat{p}_{i}(t)=\sum_{j} a_{i j} \widehat{x}_{j}(t)-\tau_{i}^{p}(t) \widehat{s}_{i}(t-1) p i(t)=j∑aijx j(t)−τip(t)s i(t−1)
至此,推导结束。 总结以下几个点:
- 整个GAMP推导的核心任务就是尽可能地把所有操作变为 关于 x j x_j xj, p i p_i pi这样的计算, 而不涉及 x i ← j x_{i\leftarrow j} xi←j, p i ← j p_{i\leftarrow j} pi←j, 从而大幅地降低计算复杂度。
- 使用的方法就是不断假设整个矩阵 A \mathbf{A} A的元素服从同分布的高斯,从而可以进行不断的泰勒展开近似。
最终的算法流程可以被简化为: 我们可以将算法流程整理一下:
- 初始化
- 计算 τ i p ( t ) \tau_i^p(t) τip(t) 和 p ^ i ( t ) \hat{p}_i(t) p^i(t), s ^ i \hat{s}_i s^i 和 τ i s ( t ) \tau_i^s(t) τis(t), 从而计算 Δ i → j ( t , x j ) \Delta_{i \rightarrow j}\left(t, x_{j}\right) Δi→j(t,xj), 在GAMP算法里最后体现在对 r ^ j \hat{r}_j r^j的计算中。
- 计算 τ j r \tau_j^r τjr 和 r ^ j \hat{r}_j r^j, 用于最后对 x ^ j \hat{x}_j x^j的计算。
Sum-product(对应于MMSE估计): x ^ m m s e : = E [ x ∣ y , q ] \widehat{\mathrm{x}}^{\mathrm{mmse}}:=\mathbb{E}[\mathrm{x} \mid \mathrm{y}, \mathbf{q}] x mmse:=E[x∣y,q]
要进行MMSE版本GAMP的推导,首先我们有下列定理:
如果一个随机变量 U U U 满足条件概率密度函数: p U ∣ V ( u ∣ v ) : = 1 Z ( v ) exp ( ϕ 0 ( u ) + u v ) p_{U \mid V}(u \mid v):=\frac{1}{Z(v)} \exp \left(\phi_{0}(u)+u v\right) pU∣V(u∣v):=Z(v)1exp(ϕ0(u)+uv) 那么我们有: ∂ ∂ v log Z ( v ) = E ( U ∣ V = v ) ∂ 2 ∂ v 2 log Z ( v ) = ∂ ∂ v E ( U ∣ V = v ) = var ( U ∣ V = v ) \begin{aligned} \frac{\partial}{\partial v} \log Z(v) &=\mathbb{E}(U \mid V=v) \\ \frac{\partial^{2}}{\partial v^{2}} \log Z(v) &=\frac{\partial}{\partial v} \mathbb{E}(U \mid V=v) \\ &=\operatorname{var}(U \mid V=v) \end{aligned} ∂v∂logZ(v)∂v2∂2logZ(v)=E(U∣V=v)=∂v∂E(U∣V=v)=var(U∣V=v) 这是 exponential families 的标准性质。 此时, 传递的消息信息变成了 Δ i → j ( t , x j ) = log E ( p Y ∣ Z ( y i , z i ) ∣ x j ) + const (10) \Delta_{i \rightarrow j}\left(t, x_{j}\right)=\log \mathbb{E}\left(p_{Y \mid Z}\left(y_{i}, z_{i}\right) \mid x_{j}\right)+\text { const }\tag{10} Δi→j(t,xj)=logE(pY∣Z(yi,zi)∣xj)+ const (10) 和 Δ i ← j ( x j ) = const + log p X ∣ Q ( x j ∣ q j ) + ∑ l ≠ i Δ l → j ( x j ) \Delta_{i \leftarrow j}\left(x_{j}\right)=\text { const }+\log p_{X \mid Q}\left(x_{j} \mid q_{j}\right)+\sum_{l \neq i} \Delta_{l \rightarrow j}\left(x_{j}\right) Δi←j(xj)= const +logpX∣Q(xj∣qj)+l=i∑Δl→j(xj) 即, 此时传递的信息是对 x x x的似然函数的估计。 因此, p i ← r ( x r ) ∝ exp Δ i ← r ( t , x r ) p_{i \leftarrow r}\left(x_{r}\right) \propto \exp \Delta_{i \leftarrow r}\left(t, x_{r}\right) pi←r(xr)∝expΔi←r(t,xr) 我们需要定义的变量也变为: x ^ j ( t ) : = E [ x j ∣ Δ j ( t , ⋅ ) ] x ^ i ← j ( t ) : = E [ x j ∣ Δ i ← j ( t , ⋅ ) ] τ j x ( t ) : = var [ x j ∣ Δ j ( t , ⋅ ) ] τ i ← j x ( t ) : = var [ x j ∣ Δ i ← j ( t , ⋅ ) ] , \begin{aligned} \widehat{x}_{j}(t) &:=\mathbb{E}\left[x_{j} \mid \Delta_{j}(t, \cdot)\right] \\ \widehat{x}_{i \leftarrow j}(t) &:=\mathbb{E}\left[x_{j} \mid \Delta_{i \leftarrow j}(t, \cdot)\right] \\ \tau_{j}^{x}(t) &:=\operatorname{var}\left[x_{j} \mid \Delta_{j}(t, \cdot)\right] \\ \tau_{i \leftarrow j}^{x}(t) &:=\operatorname{var}\left[x_{j} \mid \Delta_{i \leftarrow j}(t, \cdot)\right], \end{aligned} x j(t)x i←j(t)τjx(t)τi←jx(t):=E[xj∣Δj(t,⋅)]:=E[xj∣Δi←j(t,⋅)]:=var[xj∣Δj(t,⋅)]:=var[xj∣Δi←j(t,⋅)], 其中, x x x 和 τ \tau τ 可以分别看为是在该概率密度下的变量均值和方差。 且有: p Δ ( x ) ∝ exp ( Δ ( x ) ) p_{\Delta}(x) \propto \exp (\Delta(x)) pΔ(x)∝exp(Δ(x)) 根据中心极限定理, 给定 x j x_j xj 时, z i = a i T x z_i = a_i^Tx zi=aiTx 的 条件分布应当满足高斯分布: z i ∼ N ( p ^ i ← j ( t ) , τ i ← j p ( t ) ) z_{i} \sim \mathcal{N}\left(\widehat{p}_{i \leftarrow j}(t), \tau_{i \leftarrow j}^{p}(t)\right) zi∼N(p i←j(t),τi←jp(t)) 其中 p ^ i → j ( t ) : = ∑ r ≠ j a i r x ^ i ← r ( t ) τ i → j p ( t ) : = ∑ r ≠ j ∣ a i r ∣ 2 τ r x ( t ) \begin{aligned} \widehat{p}_{i \rightarrow j}(t) &:=\sum_{r \neq j} a_{i r} \widehat{x}_{i \leftarrow r}(t) \\ \tau_{i \rightarrow j}^{p}(t) &:=\sum_{r \neq j}\left|a_{i r}\right|^{2} \tau_{r}^{x}(t) \end{aligned} p i→j(t)τi→jp(t):=r=j∑airx i←r(t):=r=j∑∣air∣2τrx(t) (根据中心极限定理, 一组随机变量之和,其分布均值为这组随机变量均值的和, 方差为这组随机变量方差的和) 因此, Δ i → j ( t , x j ) ≈ H ( p ^ i ← j ( t ) , y i , τ i ← j p ( t ) ) (11) \Delta_{i \rightarrow j}\left(t, x_{j}\right) \approx H\left(\widehat{p}_{i \leftarrow j}(t), y_{i}, \tau_{i \leftarrow j}^{p}\tag{11}(t)\right) Δi→j(t,xj)≈H(p i←j(t),yi,τi←jp(t))(11) 其中, H ( p ^ , y , τ p ) : = log E [ p Y ∣ Z ( y ∣ z ) ∣ p ^ , τ p ] H\left(\widehat{p}, y, \tau^{p}\right):=\log \mathbb{E}\left[p_{Y \mid Z}(y \mid z) \mid \widehat{p}, \tau^{p}\right] H(p ,y,τp):=logE[pY∣Z(y∣z)∣p ,τp] 这一步其实就是计算的, 当 x j x_j xj取各种不同值时, 其对应的 z z z的分布(会与 x j x_j xj的具体取值有关)对应的 y y y的条件概率的期望值。 显然期望值越大, 这个 x j x_j xj的取值也就越可靠。 因此,(11)就是对(10)进行了一步中心极限定理的近似。通过贝叶斯公示,上式可以转化为: H ( p ^ , y , τ p ) : = log p ( z ∣ p ^ , y , τ p ) + c o n s t H\left(\widehat{p}, y, \tau^{p}\right):=\log p\left(z \mid \widehat{p}, y, \tau^{p}\right)+\mathrm{const} H(p ,y,τp):=logp(z∣p ,y,τp)+const
注意到(11)的形式 和 (5)是高度一致的, 因此,也可以通过泰勒二阶展开为(7)。但此时, 由于H函数的不同, H的一阶导和二阶导变为: g out ( p ^ , y , τ p ) : = ∂ ∂ p ^ log p ( y ∣ p ^ , τ p ) g_{\text {out }}\left(\widehat{p}, y, \tau^{p}\right):=\frac{\partial}{\partial \widehat{p}} \log p\left(y \mid \widehat{p}, \tau^{p}\right) gout (p ,y,τp):=∂p ∂logp(y∣p ,τp) 而这将得到: g out ( p ^ , y , τ p ) : = 1 τ p ( z ^ 0 − p ^ ) , z ^ 0 : = E ( z ∣ p ^ , y , τ p ) g_{\text {out }}\left(\widehat{p}, y, \tau^{p}\right):=\frac{1}{\tau^{p}}\left(\widehat{z}^{0}-\widehat{p}\right), \quad \widehat{z}^{0}:=\mathbb{E}\left(z \mid \widehat{p}, y, \tau^{p}\right) gout (p ,y,τp):=τp1(z 0−p ),z 0:=E(z∣p ,y,τp)
再对 Δ i → j ( t , x j ) \Delta_{i \rightarrow j}\left(t, x_{j}\right) Δi→j(t,xj)进行近似,同样可以得到: Δ i ← j ( t + 1 , x j ) ≈ c o n s t + f i n ( x j , q j ) − 1 2 τ i ← j r ( t ) ( r ^ i ← j ( t ) − x j ) 2 \begin{aligned} \Delta_{i \leftarrow j}\left(t+1, x_{j}\right) \approx \mathrm{const} +f_{\mathrm{in}}\left(x_{j}, q_{j}\right)-\frac{1}{2 \tau_{i \leftarrow j}^{r}(t)}\left(\widehat{r}_{i \leftarrow j}(t)-x_{j}\right)^{2} \end{aligned} Δi←j(t+1,xj)≈const+fin(xj,qj)−2τi←jr(t)1(r i←j(t)−xj)2 因此: Δ i ← j ( t , x j ) ≈ F i n ( x j , r ^ i ← j ( t ) , q j , τ i ← j r ( t ) ) , \begin{aligned} {\Delta_{i \leftarrow j}(t,x_j)}\quad \approx F_{\mathrm{in}}\left(x_{j}, \widehat{r}_{i \leftarrow j}(t), q_{j}, \tau_{i \leftarrow j}^{r}(t)\right) \end{aligned}, Δi←j(t,xj)≈Fin(xj,r i←j(t),qj,τi←jr(t)), 其中: F in ( x , r ^ , q , τ r ) : = f in ( x , q ) − 1 2 τ r ( r ^ − x ) 2 (12) F_{\text {in }}\left(x, \widehat{r}, q, \tau^{r}\right):=f_{\text {in }}(x, q)-\frac{1}{2 \tau^{r}}(\widehat{r}-x)^{2}\tag{12} Fin (x,r ,q,τr):=fin (x,q)−2τr1(r −x)2(12) 注意到(12)的右边事实上是以高斯随机变量条件概率分布的次方项,因此又有: x ^ i ← j ( t ) : = E [ x j ∣ Δ i ← j ( t , ⋅ ) ] ≈ g in ( r ^ i ← j ( t ) , q j , τ i ← j T ( t ) ) \widehat{x}_{i \leftarrow j}(t) \quad:=\mathbb{E}\left[x_{j} \mid \Delta_{i \leftarrow j}(t, \cdot)\right]\approx g_{\text {in }}\left(\widehat{r}_{i \leftarrow j}(t), q_{j}, \tau_{i \leftarrow j}^{T}(t)\right) x i←j(t):=E[xj∣Δi←j(t,⋅)]≈gin (r i←j(t),qj,τi←jT(t)) 其中, g in ( r ^ , q , τ r ) : = E [ X ∣ R ^ = r ^ , Q = q ] g_{\text {in }}\left(\widehat{r}, q, \tau^{r}\right):=\mathbb{E}[X \mid \widehat{R}=\widehat{r}, Q=q] gin (r ,q,τr):=E[X∣R =r ,Q=q] 这是对一个随机变量 X X X有: R ^ = X + V , V ∼ N ( 0 , τ r ) \widehat{R}=X+V, \quad V \sim \mathcal{N}\left(0, \tau^{r}\right) R =X+V,V∼N(0,τr) 在给定 R R R和 Q Q Q时,求取的期望。
那么后续的推导就和MAP保持一致了。