您当前的位置: 首页 > 

B417科研笔记

暂无认证

  • 2浏览

    0关注

    154博文

    0收益

  • 0浏览

    0点赞

    0打赏

    0留言

私信
关注
热门博文

2021-09-14

B417科研笔记 发布时间:2021-09-22 20:19:07 ,浏览量:2

根据 loopy BP算法, 其流程实际要计算两个消息算子: Δ i → j ( t , x j ) = c o n s t + max ⁡ x \ x j f out  ( z i , y i ) + ∑ r ≠ j Δ i ← r ( t , x r ) (1) \begin{aligned} {\Delta}_{i \rightarrow j}\left(t, x_{j}\right)=\mathrm{const} +\max _{\mathbf{x}\backslash x_j} f_{\text {out }}\left(z_{i}, y_{i}\right)+\sum_{r \neq j} \Delta_{i \leftarrow r}\left(t, x_{r}\right) \end{aligned}\tag{1} Δi→j​(t,xj​)=const+x\xj​max​fout ​(zi​,yi​)+r​=j∑​Δi←r​(t,xr​)​(1) 和 Δ i ← j ( t + 1 , x j ) = c o n s t + f i n ( x j , q j ) + ∑ ℓ ≠ i Δ ℓ → j ( t , x j ) (2) \begin{aligned} {\Delta}_{i \leftarrow j}\left(t+1, x_{j}\right)=\mathrm{const} +f_{\mathrm{in}}\left(x_{j}, q_{j}\right)+\sum_{\ell \neq i} \Delta_{\ell \rightarrow j}\left(t, x_{j}\right) \end{aligned}\tag{2} Δi←j​(t+1,xj​)=const+fin​(xj​,qj​)+ℓ​=i∑​Δℓ→j​(t,xj​)​(2)

接下来就是通过高斯和二次近似,对计算进行简化。

定义: x ^ j ( t ) : = arg ⁡ max ⁡ x j Δ j ( t , x j ) x ^ i ← j ( t ) : = arg ⁡ max ⁡ x j Δ i ← j ( t , x j ) 1 τ j x ( t ) : = − ∂ 2 ∂ x j 2 Δ j ( t , x j ) ∣ x j = x ^ j ( t ) 1 τ i ← j x ( t ) : = − ∂ 2 ∂ x j 2 Δ i ← j ( t , x j ) ∣ x j = x ^ i ← j ( t ) . \begin{aligned} \widehat{x}_{j}(t) &:=\underset{x_{j}}{\arg \max } \Delta_{j}\left(t, x_{j}\right) \\ \widehat{x}_{i \leftarrow j}(t) &:=\underset{x_{j}}{\arg \max } \Delta_{i \leftarrow j}\left(t, x_{j}\right) \\ \frac{1}{\tau_{j}^{x}(t)} &:=-\left.\frac{\partial^{2}}{\partial x_{j}^{2}} \Delta_{j}\left(t, x_{j}\right)\right|_{x_{j}=\widehat{x}_{j}(t)} \\ \frac{1}{\tau_{i \leftarrow j}^{x}(t)} &:=-\left.\frac{\partial^{2}}{\partial x_{j}^{2}} \Delta_{i \leftarrow j}\left(t, x_{j}\right)\right|_{x_{j}=\widehat{x}_{i \leftarrow j}(t)} . \end{aligned} x j​(t)x i←j​(t)τjx​(t)1​τi←jx​(t)1​​:=xj​argmax​Δj​(t,xj​):=xj​argmax​Δi←j​(t,xj​):=−∂xj2​∂2​Δj​(t,xj​)∣∣∣∣∣​xj​=x j​(t)​:=−∂xj2​∂2​Δi←j​(t,xj​)∣∣∣∣∣​xj​=x i←j​(t)​.​ 那么, 将(2)在 x ^ i ← r ( t ) \widehat{x}_{i \leftarrow r}(t) x i←r​(t)点进行泰勒展开, 可以得到: Δ i ← r ( t , x r ) ≈ Δ i ← r ( t , x ^ i ← r ( t ) ) − 1 2 τ r x ( t ) ( x r − x ^ i ← r ( t ) ) 2 (3) \Delta_{i \leftarrow r}\left(t, x_{r}\right) \approx \Delta_{i \leftarrow r}\left(t, \widehat{x}_{i \leftarrow r}(t)\right)-\frac{1}{2 \tau_{r}^{x}(t)}\left(x_{r}-\widehat{x}_{i \leftarrow r}(t)\right)^{2} \tag{3} Δi←r​(t,xr​)≈Δi←r​(t,x i←r​(t))−2τrx​(t)1​(xr​−x i←r​(t))2(3) 注意这里 x ^ i ← r ( t ) \widehat{x}_{i \leftarrow r}(t) x i←r​(t)是第 t t t次迭代时得到的, 因此在我们推导 t + 1 t+1 t+1次迭代时为常数。

将(3)代入(1)中,有:

Δ i → j ( t , x j ) = max ⁡ z i [ f out  ( z i , y i ) − 1 2 τ i → j p ( t ) ( z i − p ^ i → j ( t ) − a i j x j ) 2 , ] + c o n s t (4) \begin{aligned} \Delta_{i \rightarrow j}\left(t, x_{j}\right) = \max _{z_{i}}\left[f_{\text {out }}\left(z_{i}, y_{i}\right)-\frac{1}{2 \tau_{i \rightarrow j}^{p}(t)}\left(z_{i}-\widehat{p}_{i \rightarrow j}(t)-a_{i j} x_{j}\right)^{2},\right] +\mathrm{const} \end{aligned}\tag{4} Δi→j​(t,xj​)=zi​max​[fout ​(zi​,yi​)−2τi→jp​(t)1​(zi​−p ​i→j​(t)−aij​xj​)2,]+const​(4)

其中 p ^ i → j ( t ) : = ∑ r ≠ j a i r x ^ i ← r ( t ) τ i → j p ( t ) : = ∑ r ≠ j ∣ a i r ∣ 2 τ r x ( t ) \begin{aligned} \widehat{p}_{i \rightarrow j}(t) &:=\sum_{r \neq j} a_{i r} \widehat{x}_{i \leftarrow r}(t) \quad\tau_{i \rightarrow j}^{p}(t) &:=\sum_{r \neq j}\left|a_{i r}\right|^{2} \tau_{r}^{x}(t) \end{aligned} p ​i→j​(t)​:=r​=j∑​air​x i←r​(t)τi→jp​(t)​:=r​=j∑​∣air​∣2τrx​(t)​

推导(4)是一个以 z i = a i j x j + ∑ r ≠ j a i r x r z_{i}=a_{i j} x_{j}+\sum_{r \neq j} a_{i r} x_{r} zi​=aij​xj​+∑r​=j​air​xr​ 为约束的优化问题,通过拉格朗日乘子法可以求得闭式解,推导出(4)。

我们定义: H ( p ^ , y , τ p ) : = max ⁡ z [ f out  ( z , y ) − 1 2 τ p ( z − p ^ ) 2 ] H\left(\widehat{p}, y, \tau^{p}\right):=\max _{z}\left[f_{\text {out }}(z, y)-\frac{1}{2 \tau^{p}}(z-\widehat{p})^{2}\right] H(p ​,y,τp):=zmax​[fout ​(z,y)−2τp1​(z−p ​)2] 那么(4)可以进一步写为: Δ i → j ( t , x j ) = H ( p ^ i → j ( t ) + a i j x j , y i , τ i → j p ( t ) ) +  const  (5) \Delta_{i \rightarrow j}\left(t, x_{j}\right) =H\left(\widehat{p}_{i \rightarrow j}(t)+a_{i j} x_{j}, y_{i}, \tau_{i \rightarrow j}^{p}(t)\right)+\text { const }\tag{5} Δi→j​(t,xj​)=H(p ​i→j​(t)+aij​xj​,yi​,τi→jp​(t))+ const (5)

继续定义: p ^ i ( t ) : = ∑ j a i j x ^ i ← j ( t ) \widehat{p}_{i}(t):=\sum_{j} a_{i j} \widehat{x}_{i \leftarrow j}(t) p ​i​(t):=∑j​aij​x i←j​(t) 和 τ i p ( t ) = ∑ j ∣ a i j ∣ 2 τ j x ( t ) \tau_{i}^{p}(t)=\sum_{j}\left|a_{i j}\right|^{2} \tau_{j}^{x}(t) τip​(t)=∑j​∣aij​∣2τjx​(t), 有: p ^ i → j ( t ) = p ^ i ( t ) − a i j x ^ i ← j ( t ) τ i → j p ( t ) = τ i p ( t ) − a i j 2 τ j x ( t ) \begin{aligned} \widehat{p}_{i \rightarrow j}(t) &=\widehat{p}_{i}(t)-a_{i j} \widehat{x}_{i \leftarrow j}(t) \\ \tau_{i \rightarrow j}^{p}(t) &=\tau_{i}^{p}(t)-a_{i j}^{2} \tau_{j}^{x}(t) \end{aligned} p ​i→j​(t)τi→jp​(t)​=p ​i​(t)−aij​x i←j​(t)=τip​(t)−aij2​τjx​(t)​ 那么,(5)可以进一步写为: Δ i → j ( t , x j ) ≈ H ( p ^ i ( t ) + a i j ( x j − x ^ j ) , y i , τ i p ( t ) ) +  const  \Delta_{i \rightarrow j}\left(t, x_{j}\right) \approx H\left(\widehat{p}_{i}(t)+a_{i j}\left(x_{j}-\widehat{x}_{j}\right), y_{i}, \tau_{i}^{p}(t)\right)+\text { const } Δi→j​(t,xj​)≈H(p ​i​(t)+aij​(xj​−x j​),yi​,τip​(t))+ const  这里是因为我们忽略了 O ( a i j 2 ) O\left(a_{i j}^{2}\right) O(aij2​)级的项, 因此有了 τ i → j p ( t ) = τ i p ( t ) \tau_{i \rightarrow j}^{p}(t)=\tau_{i}^{p}(t) τi→jp​(t)=τip​(t)的效果。

继续定义: s ^ i ( t ) = g out  ( t , p ^ i ( t ) , y i , τ i p ( t ) ) τ i s ( t ) = − ∂ ∂ p ^ g out  ( t , p ^ i ( t ) , y i , τ i p ( t ) ) (6) \begin{aligned} \widehat{s}_{i}(t) &=g_{\text {out }}\left(t, \widehat{p}_{i}(t), y_{i}, \tau_{i}^{p}(t)\right) \\ \tau_{i}^{s}(t) &=-\frac{\partial}{\partial \widehat{p}} g_{\text {out }}\left(t, \widehat{p}_{i}(t), y_{i}, \tau_{i}^{p}(t)\right) \end{aligned}\tag{6} s i​(t)τis​(t)​=gout ​(t,p ​i​(t),yi​,τip​(t))=−∂p ​∂​gout ​(t,p ​i​(t),yi​,τip​(t))​(6) 其中, g out  ( p ^ , y , τ p ) : = ∂ ∂ p ^ H ( p ^ , y , τ p ) g_{\text {out }}\left(\widehat{p}, y, \tau^{p}\right):=\frac{\partial}{\partial \widehat{p}} H\left(\widehat{p}, y, \tau^{p}\right) gout ​(p ​,y,τp):=∂p ​∂​H(p ​,y,τp) 这里很明确了, (6)中的 s ^ i ( t ) \hat{s}_i(t) s^i​(t) 和 τ i s ( t ) \tau_{i}^{s}(t) τis​(t) 分别代表了 H H H 函数 在 p ^ i ( t ) \widehat{p}_i(t) p ​i​(t)的一阶导和二阶导, 那么继续泰勒展开:

Δ i → j ( t , x j ) ≈ c o n s t + s i ( t ) a i j ( x j − x ^ j ( t ) ) − τ i s ( t ) 2 a i j 2 ( x j − x ^ j ( t ) ) 2 = const ⁡ [ s i ( t ) a i j + a i j 2 τ i s ( t ) x ^ j ( t ) ] x j − τ i s ( t ) 2 a i j 2 x j 2 (7) \begin{aligned} \Delta_{i \rightarrow j}\left(t, x_{j}\right) &\approx \mathrm{const} +s_{i}(t) a_{i j}\left(x_{j}-\widehat{x}_{j}(t)\right)-\frac{\tau_{i}^{s}(t)}{2} a_{i j}^{2}\left(x_{j}-\widehat{x}_{j}(t)\right)^{2} \\ &= \operatorname{const}\left[s_{i}(t) a_{i j}+a_{i j}^{2} \tau_{i}^{s}(t) \widehat{x}_{j}(t)\right] x_{j} -\frac{\tau_{i}^{s}(t)}{2} a_{i j}^{2} x_{j}^{2} \end{aligned}\tag{7} Δi→j​(t,xj​)​≈const+si​(t)aij​(xj​−x j​(t))−2τis​(t)​aij2​(xj​−x j​(t))2=const[si​(t)aij​+aij2​τis​(t)x j​(t)]xj​−2τis​(t)​aij2​xj2​​(7)

此时, 把(7)代入到(2)中, (2)也可以被近似为: Δ i ← j ( t + 1 , x j ) ≈ c o n s t + f i n ( x j , q j ) − 1 2 τ i ← j r ( t ) ( r ^ i ← j ( t ) − x j ) 2 \begin{aligned} \Delta_{i \leftarrow j}\left(t+1, x_{j}\right) \approx \mathrm{const} +f_{\mathrm{in}}\left(x_{j}, q_{j}\right)-\frac{1}{2 \tau_{i \leftarrow j}^{r}(t)}\left(\widehat{r}_{i \leftarrow j}(t)-x_{j}\right)^{2} \end{aligned} Δi←j​(t+1,xj​)≈const+fin​(xj​,qj​)−2τi←jr​(t)1​(r i←j​(t)−xj​)2​ 其中, 1 τ i ← j r ( t ) = ∑ ℓ ≠ i a ℓ j 2 τ ℓ s ( t ) \frac{1}{\tau_{i \leftarrow j}^{r}(t)}=\sum_{\ell \neq i} a_{\ell j}^{2} \tau_{\ell}^{s}(t) τi←jr​(t)1​=ℓ​=i∑​aℓj2​τℓs​(t) 和 r ^ i ← j ( t ) = τ i ← j r ( t ) ∑ ℓ ≠ i [ s ℓ ( t ) a ℓ j + a ℓ j 2 τ ℓ s ( t ) x ^ j ( t ) ] = x ^ j ( t ) + τ i ← j r ( t ) ∑ s ℓ ( t ) a ℓ j \begin{aligned} \widehat{r}_{i \leftarrow j}(t) &=\tau_{i \leftarrow j}^{r}(t) \sum_{\ell \neq i}\left[s_{\ell}(t) a_{\ell j}+a_{\ell j}^{2} \tau_{\ell}^{s}(t) \widehat{x}_{j}(t)\right] \\ &=\widehat{x}_{j}(t)+\tau_{i \leftarrow j}^{r}(t) \sum s_{\ell}(t) a_{\ell j} \end{aligned} r i←j​(t)​=τi←jr​(t)ℓ​=i∑​[sℓ​(t)aℓj​+aℓj2​τℓs​(t)x j​(t)]=x j​(t)+τi←jr​(t)∑sℓ​(t)aℓj​​ 这时候,如果我们定义: g in  ( r ^ , q , τ r ) : = arg ⁡ max ⁡ x f in  ( x , q ) − 1 2 τ r ( r ^ − x ) 2 g_{\text {in }}\left(\widehat{r}, q, \tau^{r}\right):=\underset{x}{\arg \max }f_{\text {in }}(x, q)-\frac{1}{2 \tau^{r}}(\widehat{r}-x)^{2} gin ​(r ,q,τr):=xargmax​fin ​(x,q)−2τr1​(r −x)2 那么有: x ^ i ← j ( t + 1 ) ≈ g in  ( r ^ i ← j ( t ) , q j , τ i ← j r ( t ) ) (8) \widehat{x}_{i \leftarrow j}(t+1) \approx g_{\text {in }}\left(\widehat{r}_{i \leftarrow j}(t), q_{j}, \tau_{i \leftarrow j}^{r}(t)\right)\tag{8} x i←j​(t+1)≈gin ​(r i←j​(t),qj​,τi←jr​(t))(8)

再定义: τ j r ( t ) = [ ∑ i ∣ a i j ∣ 2 τ i s ( t ) ] − 1 r ^ j ( t ) = x ^ j ( t ) + τ j r ( t ) ∑ i a i j s ^ i ( t ) \begin{aligned} &\tau_{j}^{r}(t)=\left[\sum_{i}\left|a_{i j}\right|^{2} \tau_{i}^{s}(t)\right]^{-1} \\ &\widehat{r}_{j}(t)=\widehat{x}_{j}(t)+\tau_{j}^{r}(t) \sum_{i} a_{i j} \widehat{s}_{i}(t) \end{aligned} ​τjr​(t)=[i∑​∣aij​∣2τis​(t)]−1r j​(t)=x j​(t)+τjr​(t)i∑​aij​s i​(t)​ 又可以有下列近似关系: τ i ← j r ( t ) ≈ τ j r ( t ) r ^ i ← j ( t ) ≈ x ^ j ( t ) + τ j r ( t ) ∑ ℓ ≠ i s ℓ ( t ) a ℓ j = r ^ j ( t ) − τ j r ( t ) a i j s i ( t ) \begin{aligned} \tau_{i \leftarrow j}^{r}(t) & \approx \tau_{j}^{r}(t) \\ \widehat{r}_{i \leftarrow j}(t) & \approx \widehat{x}_{j}(t)+\tau_{j}^{r}(t) \sum_{\ell \neq i} s_{\ell}(t) a_{\ell j} \\ &=\widehat{r}_{j}(t)-\tau_{j}^{r}(t) a_{i j} s_{i}(t) \end{aligned} τi←jr​(t)r i←j​(t)​≈τjr​(t)≈x j​(t)+τjr​(t)ℓ​=i∑​sℓ​(t)aℓj​=r j​(t)−τjr​(t)aij​si​(t)​ 这里同样是忽略了 O ( a i j 2 ) O\left(a_{i j}^{2}\right) O(aij2​) 级的项。 利用这一近似, (8)可以近似为: x ^ i ← j ( t + 1 ) ≈ ( a ) g i n ( r ^ j ( t ) − a i j s i ( t ) τ j r ( t ) , q j , τ j T ( t ) ) ≈ ( b ) x ^ j ( t + 1 ) − a i j s j ( t ) D j ( t + 1 ) (9) \begin{aligned} &\widehat{x}_{i \leftarrow j}(t+1) \\ &\quad \stackrel{(a)}{\approx} g_{i n}\left(\widehat{r}_{j}(t)-a_{i j} s_{i}(t) \tau_{j}^{r}(t), q_{j}, \tau_{j}^{T}(t)\right) \\ &\stackrel{(b)}{\approx} \widehat{x}_{j}(t+1)-a_{i j} s_{j}(t) D_{j}(t+1) \end{aligned}\tag{9} ​x i←j​(t+1)≈(a)gin​(r j​(t)−aij​si​(t)τjr​(t),qj​,τjT​(t))≈(b)x j​(t+1)−aij​sj​(t)Dj​(t+1)​(9) 这里(a)就是直接代入了近似关系, 而(b)则是对进行了泰勒一次展开: x ^ j ( t + 1 ) : = g in  ( r ^ j ( t ) , q j , τ j r ( t ) ) D j ( t + 1 ) : = τ j r ( t ) ∂ ∂ r ^ g in  ( r ^ j ( t ) , q j , τ j r ( t ) ) \begin{aligned} \widehat{x}_{j}(t+1) &:=g_{\text {in }}\left(\widehat{r}_{j}(t), q_{j}, \tau_{j}^{r}(t)\right) \\ D_{j}(t+1) &:=\tau_{j}^{r}(t) \frac{\partial}{\partial \widehat{r}} g_{\text {in }}\left(\widehat{r}_{j}(t), q_{j}, \tau_{j}^{r}(t)\right) \end{aligned} x j​(t+1)Dj​(t+1)​:=gin ​(r j​(t),qj​,τjr​(t)):=τjr​(t)∂r ∂​gin ​(r j​(t),qj​,τjr​(t))​ 而又有: D j ( t + 1 ) ≈ ( a ) τ j r ( t ) ∂ ∂ r ^ ( Γ f i n ( ⋅ , q j ) ) ( r ^ j ( t ) , τ j r ( t ) ) = ( b ) τ j T ( t ) 1 − τ j r ( t ) f i n ′ ′ ( x ^ j ( t + 1 ) , q j ) ≈ ( c ) [ − ∂ 2 ∂ x j 2 Δ i ← j ( t + 1 , x ^ j ( t + 1 ) ) ] − 1 ≈ ( d ) τ j x ( t + 1 ) \begin{aligned} &D_{j}(t+1) \stackrel{(a)}{\approx} \tau_{j}^{r}(t) \frac{\partial}{\partial \widehat{r}}\left(\Gamma f_{\mathrm{in}}\left(\cdot, q_{j}\right)\right)\left(\widehat{r}_{j}(t), \tau_{j}^{r}(t)\right) \\ &\stackrel{(b)}{=} \frac{\tau_{j}^{T}(t)}{1-\tau_{j}^{r}(t) f_{i n}^{\prime \prime}\left(\widehat{x}_{j}(t+1), q_{j}\right)} \\ &\stackrel{(c)}{\approx}\left[-\frac{\partial^{2}}{\partial x_{j}^{2}} \Delta_{i \leftarrow j}\left(t+1, \widehat{x}_{j}(t+1)\right)\right]^{-1} \\ &\stackrel{(d)}{\approx} \tau_{j}^{x}(t+1) \end{aligned} ​Dj​(t+1)≈(a)τjr​(t)∂r ∂​(Γfin​(⋅,qj​))(r j​(t),τjr​(t))=(b)1−τjr​(t)fin′′​(x j​(t+1),qj​)τjT​(t)​≈(c)[−∂xj2​∂2​Δi←j​(t+1,x j​(t+1))]−1≈(d)τjx​(t+1)​ 将上式代入(9)和 p ^ i ( t ) \widehat{p}_{i}(t) p ​i​(t)的定义中,我们有: p ^ i ( t ) = ∑ j a i j x ^ j ( t ) − τ i p ( t ) s ^ i ( t − 1 ) \widehat{p}_{i}(t)=\sum_{j} a_{i j} \widehat{x}_{j}(t)-\tau_{i}^{p}(t) \widehat{s}_{i}(t-1) p ​i​(t)=j∑​aij​x j​(t)−τip​(t)s i​(t−1)

至此,推导结束。 总结以下几个点:

  • 整个GAMP推导的核心任务就是尽可能地把所有操作变为 关于 x j x_j xj​, p i p_i pi​这样的计算, 而不涉及 x i ← j x_{i\leftarrow j} xi←j​, p i ← j p_{i\leftarrow j} pi←j​, 从而大幅地降低计算复杂度。
  • 使用的方法就是不断假设整个矩阵 A \mathbf{A} A的元素服从同分布的高斯,从而可以进行不断的泰勒展开近似。

最终的算法流程可以被简化为: 在这里插入图片描述 我们可以将算法流程整理一下:

  • 初始化
  • 计算 τ i p ( t ) \tau_i^p(t) τip​(t) 和 p ^ i ( t ) \hat{p}_i(t) p^​i​(t), s ^ i \hat{s}_i s^i​ 和 τ i s ( t ) \tau_i^s(t) τis​(t), 从而计算 Δ i → j ( t , x j ) \Delta_{i \rightarrow j}\left(t, x_{j}\right) Δi→j​(t,xj​), 在GAMP算法里最后体现在对 r ^ j \hat{r}_j r^j​的计算中。
  • 计算 τ j r \tau_j^r τjr​ 和 r ^ j \hat{r}_j r^j​, 用于最后对 x ^ j \hat{x}_j x^j​的计算。

Sum-product(对应于MMSE估计): x ^ m m s e : = E [ x ∣ y , q ] \widehat{\mathrm{x}}^{\mathrm{mmse}}:=\mathbb{E}[\mathrm{x} \mid \mathrm{y}, \mathbf{q}] x mmse:=E[x∣y,q]

要进行MMSE版本GAMP的推导,首先我们有下列定理:

如果一个随机变量 U U U 满足条件概率密度函数: p U ∣ V ( u ∣ v ) : = 1 Z ( v ) exp ⁡ ( ϕ 0 ( u ) + u v ) p_{U \mid V}(u \mid v):=\frac{1}{Z(v)} \exp \left(\phi_{0}(u)+u v\right) pU∣V​(u∣v):=Z(v)1​exp(ϕ0​(u)+uv) 那么我们有: ∂ ∂ v log ⁡ Z ( v ) = E ( U ∣ V = v ) ∂ 2 ∂ v 2 log ⁡ Z ( v ) = ∂ ∂ v E ( U ∣ V = v ) = var ⁡ ( U ∣ V = v ) \begin{aligned} \frac{\partial}{\partial v} \log Z(v) &=\mathbb{E}(U \mid V=v) \\ \frac{\partial^{2}}{\partial v^{2}} \log Z(v) &=\frac{\partial}{\partial v} \mathbb{E}(U \mid V=v) \\ &=\operatorname{var}(U \mid V=v) \end{aligned} ∂v∂​logZ(v)∂v2∂2​logZ(v)​=E(U∣V=v)=∂v∂​E(U∣V=v)=var(U∣V=v)​ 这是 exponential families 的标准性质。 此时, 传递的消息信息变成了 Δ i → j ( t , x j ) = log ⁡ E ( p Y ∣ Z ( y i , z i ) ∣ x j ) +  const  (10) \Delta_{i \rightarrow j}\left(t, x_{j}\right)=\log \mathbb{E}\left(p_{Y \mid Z}\left(y_{i}, z_{i}\right) \mid x_{j}\right)+\text { const }\tag{10} Δi→j​(t,xj​)=logE(pY∣Z​(yi​,zi​)∣xj​)+ const (10) 和 Δ i ← j ( x j ) =  const  + log ⁡ p X ∣ Q ( x j ∣ q j ) + ∑ l ≠ i Δ l → j ( x j ) \Delta_{i \leftarrow j}\left(x_{j}\right)=\text { const }+\log p_{X \mid Q}\left(x_{j} \mid q_{j}\right)+\sum_{l \neq i} \Delta_{l \rightarrow j}\left(x_{j}\right) Δi←j​(xj​)= const +logpX∣Q​(xj​∣qj​)+l​=i∑​Δl→j​(xj​) 即, 此时传递的信息是对 x x x的似然函数的估计。 因此, p i ← r ( x r ) ∝ exp ⁡ Δ i ← r ( t , x r ) p_{i \leftarrow r}\left(x_{r}\right) \propto \exp \Delta_{i \leftarrow r}\left(t, x_{r}\right) pi←r​(xr​)∝expΔi←r​(t,xr​) 我们需要定义的变量也变为: x ^ j ( t ) : = E [ x j ∣ Δ j ( t , ⋅ ) ] x ^ i ← j ( t ) : = E [ x j ∣ Δ i ← j ( t , ⋅ ) ] τ j x ( t ) : = var ⁡ [ x j ∣ Δ j ( t , ⋅ ) ] τ i ← j x ( t ) : = var ⁡ [ x j ∣ Δ i ← j ( t , ⋅ ) ] , \begin{aligned} \widehat{x}_{j}(t) &:=\mathbb{E}\left[x_{j} \mid \Delta_{j}(t, \cdot)\right] \\ \widehat{x}_{i \leftarrow j}(t) &:=\mathbb{E}\left[x_{j} \mid \Delta_{i \leftarrow j}(t, \cdot)\right] \\ \tau_{j}^{x}(t) &:=\operatorname{var}\left[x_{j} \mid \Delta_{j}(t, \cdot)\right] \\ \tau_{i \leftarrow j}^{x}(t) &:=\operatorname{var}\left[x_{j} \mid \Delta_{i \leftarrow j}(t, \cdot)\right], \end{aligned} x j​(t)x i←j​(t)τjx​(t)τi←jx​(t)​:=E[xj​∣Δj​(t,⋅)]:=E[xj​∣Δi←j​(t,⋅)]:=var[xj​∣Δj​(t,⋅)]:=var[xj​∣Δi←j​(t,⋅)],​ 其中, x x x 和 τ \tau τ 可以分别看为是在该概率密度下的变量均值和方差。 且有: p Δ ( x ) ∝ exp ⁡ ( Δ ( x ) ) p_{\Delta}(x) \propto \exp (\Delta(x)) pΔ​(x)∝exp(Δ(x)) 根据中心极限定理, 给定 x j x_j xj​ 时, z i = a i T x z_i = a_i^Tx zi​=aiT​x 的 条件分布应当满足高斯分布: z i ∼ N ( p ^ i ← j ( t ) , τ i ← j p ( t ) ) z_{i} \sim \mathcal{N}\left(\widehat{p}_{i \leftarrow j}(t), \tau_{i \leftarrow j}^{p}(t)\right) zi​∼N(p ​i←j​(t),τi←jp​(t)) 其中 p ^ i → j ( t ) : = ∑ r ≠ j a i r x ^ i ← r ( t ) τ i → j p ( t ) : = ∑ r ≠ j ∣ a i r ∣ 2 τ r x ( t ) \begin{aligned} \widehat{p}_{i \rightarrow j}(t) &:=\sum_{r \neq j} a_{i r} \widehat{x}_{i \leftarrow r}(t) \\ \tau_{i \rightarrow j}^{p}(t) &:=\sum_{r \neq j}\left|a_{i r}\right|^{2} \tau_{r}^{x}(t) \end{aligned} p ​i→j​(t)τi→jp​(t)​:=r​=j∑​air​x i←r​(t):=r​=j∑​∣air​∣2τrx​(t)​ (根据中心极限定理, 一组随机变量之和,其分布均值为这组随机变量均值的和, 方差为这组随机变量方差的和) 因此, Δ i → j ( t , x j ) ≈ H ( p ^ i ← j ( t ) , y i , τ i ← j p ( t ) ) (11) \Delta_{i \rightarrow j}\left(t, x_{j}\right) \approx H\left(\widehat{p}_{i \leftarrow j}(t), y_{i}, \tau_{i \leftarrow j}^{p}\tag{11}(t)\right) Δi→j​(t,xj​)≈H(p ​i←j​(t),yi​,τi←jp​(t))(11) 其中, H ( p ^ , y , τ p ) : = log ⁡ E [ p Y ∣ Z ( y ∣ z ) ∣ p ^ , τ p ] H\left(\widehat{p}, y, \tau^{p}\right):=\log \mathbb{E}\left[p_{Y \mid Z}(y \mid z) \mid \widehat{p}, \tau^{p}\right] H(p ​,y,τp):=logE[pY∣Z​(y∣z)∣p ​,τp] 这一步其实就是计算的, 当 x j x_j xj​取各种不同值时, 其对应的 z z z的分布(会与 x j x_j xj​的具体取值有关)对应的 y y y的条件概率的期望值。 显然期望值越大, 这个 x j x_j xj​的取值也就越可靠。 因此,(11)就是对(10)进行了一步中心极限定理的近似。通过贝叶斯公示,上式可以转化为: H ( p ^ , y , τ p ) : = log ⁡ p ( z ∣ p ^ , y , τ p ) + c o n s t H\left(\widehat{p}, y, \tau^{p}\right):=\log p\left(z \mid \widehat{p}, y, \tau^{p}\right)+\mathrm{const} H(p ​,y,τp):=logp(z∣p ​,y,τp)+const

注意到(11)的形式 和 (5)是高度一致的, 因此,也可以通过泰勒二阶展开为(7)。但此时, 由于H函数的不同, H的一阶导和二阶导变为: g out  ( p ^ , y , τ p ) : = ∂ ∂ p ^ log ⁡ p ( y ∣ p ^ , τ p ) g_{\text {out }}\left(\widehat{p}, y, \tau^{p}\right):=\frac{\partial}{\partial \widehat{p}} \log p\left(y \mid \widehat{p}, \tau^{p}\right) gout ​(p ​,y,τp):=∂p ​∂​logp(y∣p ​,τp) 而这将得到: g out  ( p ^ , y , τ p ) : = 1 τ p ( z ^ 0 − p ^ ) , z ^ 0 : = E ( z ∣ p ^ , y , τ p ) g_{\text {out }}\left(\widehat{p}, y, \tau^{p}\right):=\frac{1}{\tau^{p}}\left(\widehat{z}^{0}-\widehat{p}\right), \quad \widehat{z}^{0}:=\mathbb{E}\left(z \mid \widehat{p}, y, \tau^{p}\right) gout ​(p ​,y,τp):=τp1​(z 0−p ​),z 0:=E(z∣p ​,y,τp)

再对 Δ i → j ( t , x j ) \Delta_{i \rightarrow j}\left(t, x_{j}\right) Δi→j​(t,xj​)进行近似,同样可以得到: Δ i ← j ( t + 1 , x j ) ≈ c o n s t + f i n ( x j , q j ) − 1 2 τ i ← j r ( t ) ( r ^ i ← j ( t ) − x j ) 2 \begin{aligned} \Delta_{i \leftarrow j}\left(t+1, x_{j}\right) \approx \mathrm{const} +f_{\mathrm{in}}\left(x_{j}, q_{j}\right)-\frac{1}{2 \tau_{i \leftarrow j}^{r}(t)}\left(\widehat{r}_{i \leftarrow j}(t)-x_{j}\right)^{2} \end{aligned} Δi←j​(t+1,xj​)≈const+fin​(xj​,qj​)−2τi←jr​(t)1​(r i←j​(t)−xj​)2​ 因此: Δ i ← j ( t , x j ) ≈ F i n ( x j , r ^ i ← j ( t ) , q j , τ i ← j r ( t ) ) , \begin{aligned} {\Delta_{i \leftarrow j}(t,x_j)}\quad \approx F_{\mathrm{in}}\left(x_{j}, \widehat{r}_{i \leftarrow j}(t), q_{j}, \tau_{i \leftarrow j}^{r}(t)\right) \end{aligned}, Δi←j​(t,xj​)≈Fin​(xj​,r i←j​(t),qj​,τi←jr​(t))​, 其中: F in  ( x , r ^ , q , τ r ) : = f in  ( x , q ) − 1 2 τ r ( r ^ − x ) 2 (12) F_{\text {in }}\left(x, \widehat{r}, q, \tau^{r}\right):=f_{\text {in }}(x, q)-\frac{1}{2 \tau^{r}}(\widehat{r}-x)^{2}\tag{12} Fin ​(x,r ,q,τr):=fin ​(x,q)−2τr1​(r −x)2(12) 注意到(12)的右边事实上是以高斯随机变量条件概率分布的次方项,因此又有: x ^ i ← j ( t ) : = E [ x j ∣ Δ i ← j ( t , ⋅ ) ] ≈ g in  ( r ^ i ← j ( t ) , q j , τ i ← j T ( t ) ) \widehat{x}_{i \leftarrow j}(t) \quad:=\mathbb{E}\left[x_{j} \mid \Delta_{i \leftarrow j}(t, \cdot)\right]\approx g_{\text {in }}\left(\widehat{r}_{i \leftarrow j}(t), q_{j}, \tau_{i \leftarrow j}^{T}(t)\right) x i←j​(t):=E[xj​∣Δi←j​(t,⋅)]≈gin ​(r i←j​(t),qj​,τi←jT​(t)) 其中, g in  ( r ^ , q , τ r ) : = E [ X ∣ R ^ = r ^ , Q = q ] g_{\text {in }}\left(\widehat{r}, q, \tau^{r}\right):=\mathbb{E}[X \mid \widehat{R}=\widehat{r}, Q=q] gin ​(r ,q,τr):=E[X∣R =r ,Q=q] 这是对一个随机变量 X X X有: R ^ = X + V , V ∼ N ( 0 , τ r ) \widehat{R}=X+V, \quad V \sim \mathcal{N}\left(0, \tau^{r}\right) R =X+V,V∼N(0,τr) 在给定 R R R和 Q Q Q时,求取的期望。

那么后续的推导就和MAP保持一致了。

关注
打赏
1649265742
查看更多评论
立即登录/注册

微信扫码登录

0.1119s