Plonky = Plonk + Halo + Rescue

1. 引言

由mir protocol团队实现的Plonky，代码库为：

https://github.com/mir-protocol/plonky

Plonky为mir团队基于 “Plonk+Halo+Rescue” 实现的 fast recursive argument 原型，其核心组成为：

Plonk的permutation argument 【需open多个polynomial commitments at a challenge point x x x，以及open其中一个at an additional point y y y。】
Halo的polynomial commitment scheme 【可将多个opening reduce为一个opening，但是Verifier需做大量昂贵的multi-scalar multiplication，借鉴Plonk中的circuit model 思想，引入了Elliptic curve gate。】
Rescue的high-arity circuit model 【解决了在recursive circuit中的另一个性能瓶颈来自于——生成Fiat-Shamir challenge。】

通过将以上三者结合，Plonky可recursively verify an argument using around 2 14 2^{14} 214 gates，尽管它还没有准备好真正使用，但是在6核笔记本上，Plonky构建resursive proofs仅需约9秒钟，具有令人欣喜的性能。

2. Batch opening polynomial commitments

Verify Plonk argument的过程中，需open多个polynomial commitments at a challenge point x x x，同时open 其中的一个at an additional point y y y。在Halo论文中，介绍了将多个openings reduce为一个opening的算法实现，但是，该算法要求在Verifier端为每个opeing额外增加大量的计算，这种对Verifier的算力要求在实际应用中应尽量避免。

Plonky中，借鉴了 Multipoint, multipolynomial batched openings from inner-product arguments 中的batch polynomial opening算法：令 G G G为a squence of random generators， X = ( x 0 , x 1 , x 2 , ⋯ ) X=(x^0,x^1,x^2,\cdots) X=(x0,x1,x2,⋯)， Y = ( y 0 , y 1 , y 2 , ⋯ ) Y=(y^0,y^1,y^2,\cdots) Y=(y0,y1,y2,⋯)。

Prover：发送每个polynomial commitment c ( f i ) = < f i , G > c(f_i)= c(fi)=，及所宣称的evaluation值 z i = < f i , X > , w i = < f i , Y > z_i=,w_i= zi=,wi=。
Verifier：给Prover发送random challenge α , β ∈ F \alpha,\beta\in\mathbb{F} α,β∈F，并计算 Z = ∑ i α i z i , W = ∑ i α i w i , c ( F ) = ∑ i α i c ( f i ) Z=\sum_i\alpha^iz_i,W=\sum_i\alpha^iw_i,c(F)=\sum_i\alpha^ic(f_i) Z=∑iαizi,W=∑iαiwi,c(F)=∑iαic(fi)，其中 F = ∑ i α i f i F=\sum_i\alpha^if_i F=∑iαifi。
此时转为由Prover证明 F ( x ) = Z , F ( y ) = W F(x)=Z,F(y)=W F(x)=Z,F(y)=W，with negligible loss of soundness，reduce为证明 < F , X + β Y > = Z + β W =Z+\beta W =Z+βW。
Prover：采用inner product argument来证明 < F , X + β Y > = Z + β W =Z+\beta W =Z+βW即可。
Verifier：需要计算 < s , G > 和 < s , X + β Y > ，其中 s s s在Halo论文中做了定义，实际即为inner product argument中每一轮的challenge组合。 < s , G > 的计算可采用Halo中的算法，而 < s , X + β Y > = < s , X > + β < s , Y > = g ( X , u i ) + β g ( Y , u i ) =+\beta=g(X,u_i)+\beta g(Y,u_i) =+β=g(X,ui)+βg(Y,ui)，从而转换为多项式证明。

3. Halo中的瓶颈：curve multiplication

在Halo中，Verifier在验证polynomial opening时，需计算： Q = ∑ j = 1 k ( [ u j 2 ] L j ) + P ′ + ∑ j = 1 k ( [ u j − 2 ] R j ) Q=\sum_{j=1}^{k}([u_j^2]L_j)+P'+\sum_{j=1}^{k}([u_j^{-2}]R_j) Q=∑j=1k([uj2]Lj)+P′+∑j=1k([uj−2]Rj) 其中， L j , R j L_j,R_j Lj,Rj由Prover提供， u j ∈ { 0 , 1 } λ u_j\in\{0,1\}^{\lambda} uj∈{0,1}λ为random challenge。

每个curve multiplication [ r ] P [r]P [r]P若通过简单的double-and-add算法来计算，需要 λ \lambda λ个addition和 λ \lambda λ个doubling运算。

Halo中采用了类似的算法，但是每次以2个bit为单位进行计算，加法对应为 { P , − P , ϕ ( P ) , ϕ ( − P ) } \{P,-P,\phi(P),\phi(-P)\} {P,−P,ϕ(P),ϕ(−P)}中的一个，其中 ϕ \phi ϕ为计算起来很简单的自同态。相比于简单的double-and-add，Halo中所需的运算reduce为 λ / 2 \lambda/2 λ/2个addition和 λ / 2 \lambda/2 λ/2个doubling运算。

同时，也可将以上 Q Q Q的计算公式看成是某种形式的multi-scalar multiplication运算，其具有 2 k 2k 2k个terms，最后再加上 P ′ P' P′。通过简单的、circuit-friendly MSM（multi-scalar multiplication）优化，可同时为所有terms进行doubling运算（具体参看simultaneous squaring）。从而将计算开销降为 k λ k\lambda kλ个加法和 λ / 2 \lambda/2 λ/2个doubling运算，或者降为 ( k + 1 / 2 ) λ (k+1/2)\lambda (k+1/2)λ个group operations。

4. 推广Plonk中的circuit model

标准的Plonk circuit model 为：

每个gate具有3根wire—— a , b , c a,b,c a,b,c
为每个gate赋予constraint的形式为： q L a + q R b + q O c + q M a b + q C = 0 q_La+q_Rb+q_Oc+q_Mab+q_C=0 qLa+qRb+qOc+qMab+qC=0，其中 q q q为gate的配置参数，如对于multiplicative constraint a b = c ab=c ab=c，需配置 q M = 1 , q C = − 1 q_M=1,q_C=-1 qM=1,qC=−1，其它 q q q值均为0。

Plonk的这种circuit model非常简单友好，但是会导致相当大的gate数量。如，对于curve operations，需要大概7个gate，具体取决于the curve and completeness assumptions。但是，幸亏Plonk的basic scheme非常灵活，可通过多种方式来减少gate数量：

方法1：采用higher-arity gate。如若想以1个gate来表示1个curve operation，可设置其gate arity为6。对于Prover来说，其每个gate的计算开销将增加，但将大大减少特定operation的gate数量。
方法2：不应局限于单个的constraint。初看，增加多个constraints，Prover将需要进行更多的FFT运算。令 d d d为任一constraint的最高degree，若将每个polynomial的degree 提升至最大的 d d d，则所有constraint相关的arithmetic可以point-value的形式进行，而不需要额外的FFT运算。
方法3：并不是所有的wires都需要routed。“Advice” wires对于宣称的逆运算或中间结果有用。Advice wires并不会对Plonk的permutation argument 的degree产生影响。Plonk的permutation argument通常贡献了整个Plonk-based construction中的最高degree polynomial。
方法4：除challenge point x x x之外，Plonk还会open每个polynomial at a “shifted” point g x gx gx，因此，constraints可operate on the wires of the “next gate” in addition to the “local gate”。这是 TurboPlonk 的主要发现。在此基础之上，为了减少Plonk permutation argument中“copying” wires的数量，Plonky额外增加了shifted openings。

5. Elliptic curve gates

在之前介绍的MSM（multi-scalar multiplication）中，大多数步骤中都包含了conditionally negating a point、conditionally applying the endomorphism ϕ ( ( x , y ) ) = ( ζ x , y ) \phi((x,y))=(\zeta x,y) ϕ((x,y))=(ζx,y)，然后将该修改后的point加到累加器accumulator中。整个过程可表示为： x 1 ← ( 1 + ( ζ − 1 ) r h i g h ) x , x_1\leftarrow (1+(\zeta-1)r_{high})x, x1←(1+(ζ−1)rhigh)x, y 1 ← ( 2 r l o w − 1 ) y , y_1\leftarrow (2r_{low}-1)y, y1←(2rlow−1)y, ( x 3 , y 3 ) ← ( x 1 , y 1 ) + ( x 2 , y 2 ) . (x_3,y_3)\leftarrow (x_1,y_1)+(x_2,y_2). (x3,y3)←(x1,y1)+(x2,y2). 其中， ( x , y ) (x,y) (x,y)为the point being multiplied， ( x 1 , y 1 ) (x_1,y_1) (x1,y1)为the point to be added to the accumulator， ( x 2 , y 2 ) (x_2,y_2) (x2,y2)为the old state of the accumulator， ( x 3 , y 3 ) (x_3,y_3) (x3,y3)为相应的updated state， r h i g h , r l o w r_{high},r_{low} rhigh,rlow为two consecutive bits of the scalar。

5.1 明确的公式描述

affine坐标系下的Weierstrass curve，其incomplete加法运算可表示为：（不考虑 x 1 = x 2 x_1=x_2 x1=x2等特殊情况） i n v = 1 / ( x 1 − x 2 ) , inv=1/(x_1-x_2), inv=1/(x1−x2), λ = ( y 1 − y 2 ) ⋅ i n v , \lambda = (y_1-y_2)\cdot inv, λ=(y1−y2)⋅inv, x 3 = λ 2 − x 1 − x 2 , x_3=\lambda^2-x_1-x_2, x3=λ2−x1−x2, y 3 = λ ( x 1 − x 3 ) − y 1 . y_3=\lambda(x_1-x_3)-y_1. y3=λ(x1−x3)−y1.

将以上计算arithmetize化的一种简单方法是为中间结果（如 λ \lambda λ）引入advice wire。特别地，相应的gate可以如下方式定义：

Routed wires有： x , y , x 2 , y 2 , x 3 , y 3 , r l o w , r h i g h x,y,x_2,y_2,x_3,y_3,r_{low},r_{high} x,y,x2,y2,x3,y3,rlow,rhigh
Advice wires有： x 1 , y 1 , i n v , λ x_1,y_1,inv,\lambda x1,y1,inv,λ
Contraints有： r l o w ( r l o w − 1 ) = 0 , r_{low}(r_{low}-1)=0, rlow(rlow−1)=0, r h i g h ( r h i g h − 1 ) = 0 , r_{high}(r_{high}-1)=0, rhigh(rhigh−1)=0, x 1 = ( 1 + ( ζ − 1 ) r h i g h ) x , x_1=(1+(\zeta-1)r_{high})x, x1=(1+(ζ−1)rhigh)x, y 1 = ( 2 r l o w − 1 ) y , y_1=(2r_{low}-1)y, y1=(2rlow−1)y, ( x 1 − x 2 ) ⋅ i n v = 1 , (x_1-x_2)\cdot inv=1, (x1−x2)⋅inv=1, λ = ( y 1 − y 2 ) ⋅ i n v , \lambda = (y_1-y_2)\cdot inv, λ=(y1−y2)⋅inv, x 3 = λ 2 − x 1 − x 2 , x_3=\lambda^2-x_1-x_2, x3=λ2−x1−x2, y 3 = λ ( x 1 − x 3 ) − y 1 . y_3=\lambda(x_1-x_3)-y_1. y3=λ(x1−x3)−y1.

这看起来可能有很多constraints，但low-degree constraints 对性能影响不大，因为它们只需要在单个challenge point进行check。

5.2 特殊情况下的公式描述

5.1中针对的是 x 1 ≠ x 2 x_1\neq x_2 x1=x2的情况。在Halo的MSM（multi-scalar multiplication）中指出，恶意的Prover通过在相邻的两轮中发送相同的 L j L_j Lj值即可打破 x 1 ≠ x 2 x_1\neq x_2 x1=x2的假设。因此，从安全角度来说，我们必须能验证确认 x 1 ≠ x 2 x_1\neq x_2 x1=x2 ——》通过 ( x 1 − x 2 ) ⋅ i n v = 1 (x_1-x_2)\cdot inv=1 (x1−x2)⋅inv=1 constraint即可实现。

而对于honest Prover，每一轮的 L j L_j Lj都是独立随机的，因为其嵌入了随机group element [ l j ] H [l_j]H [lj]H。对于honest Prover， P ( x 1 = x 2 ) P(x_1=x_2) P(x1=x2)的概率为 2 − ∣ F ∣ 2^{-|F|} 2−∣F∣，因此在实际协议实现时，可简单地禁用该情况，由此引入的completeness error为negligible的。

5.3 进一步优化

进一步可在如下维度进行优化：

实际上，可不需要如此多的advice wires，因为每根wire都要求Prover计算一个polynomial commitment并增加argument length。 x 1 , y 1 , λ x_1,y_1,\lambda x1,y1,λ可直接内联进去。
进一步地，可通过constrain the accumulator wires of the “next” gate，来将accumulator ( x 2 , y 2 ) (x_2,y_2) (x2,y2)和 ( x 3 , y 3 ) (x_3,y_3) (x3,y3) 结合起来。
最后，Plonky的实际实现中，仅使用一个单独的gate就可表达以上curve operations，同时可验证每个scalar的decomposition。

6. 使用Rescue来生成challenge

在recursive circuit中的另一个性能瓶颈来自于——生成Fiat-Shamir challenge。为了减少生成Fiat-Shamir challenge的开销，根据 Daira Hopwood的建议，Plonky中采用duplex with Rescue作为底层的permutation实现。

令 M = ( A B C D ) M=\begin{pmatrix} A & B\\ C & D \end{pmatrix} M=(ACBD) 为MDS矩阵，单轮的width-2 Rescue permutation可定义为： s t e p i , 1 ( [ x 1 x 2 ] ) = M [ x 1 1 / α x 2 1 / α ] + [ r i , 1 r i , 2 ] step_{i,1}\begin{pmatrix} \begin{bmatrix} x_1\\ x_2 \end{bmatrix} \end{pmatrix}=M\begin{bmatrix} x_1^{1/\alpha}\\ x_2^{1/\alpha} \end{bmatrix} + \begin{bmatrix} r_{i,1}\\ r_{i,2} \end{bmatrix} stepi,1([x1x2])=M[x11/αx21/α]+[ri,1ri,2] s t e p i , 2 ( [ x 1 x 2 ] ) = M [ x 1 α x 2 α ] + [ r i , 3 r i , 4 ] step_{i,2}\begin{pmatrix} \begin{bmatrix} x_1\\ x_2 \end{bmatrix} \end{pmatrix}=M\begin{bmatrix} x_1^{\alpha}\\ x_2^{\alpha} \end{bmatrix} + \begin{bmatrix} r_{i,3}\\ r_{i,4} \end{bmatrix} stepi,2([x1x2])=M[x1αx2α]+[ri,3ri,4] r o u n d i = s t e p i , 2 ∘ s t e p i , 1 round_i=step_{i,2}\circ step_{i,1} roundi=stepi,2∘stepi,1 其中 r i , 1 ⋯ r i , 4 r_{i,1}\cdots r_{i,4} ri,1⋯ri,4 为round constants。

deterministically计算 α \alpha α-th root是expensive的，可由Prover通过增加advice wire y 1 , y 2 y_1,y_2 y1,y2的方式来提供。令 z 1 , z 2 z_1,z_2 z1,z2 为round function的输出，则相应的gate可定义为：

Routed wire有： x 1 , x 2 , z 1 , z 2 x_1,x_2,z_1,z_2 x1,x2,z1,z2
Advice wire有： y 1 , y 2 y_1,y_2 y1,y2
Contraints有： x 1 = y 1 α x_1=y_1^{\alpha} x1=y1α x 2 = y 2 α x_2=y_2^{\alpha} x2=y2α z 1 = A ( A y 1 + B y 2 + r i , 1 ) α + B ( C y 1 + D y 2 + r i , 2 ) α + r i , 3 z_1=A(Ay_1+By_2+r_{i,1})^{\alpha}+B(Cy_1+Dy_2+r_{i,2})^{\alpha}+r_{i,3} z1=A(Ay1+By2+ri,1)α+B(Cy1+Dy2+ri,2)α+ri,3 z 2 = C ( A y 1 + B y 2 + r i , 1 ) α + D ( C y 1 + D y 2 + r i , 2 ) α + r i , 4 z_2=C(Ay_1+By_2+r_{i,1})^{\alpha}+D(Cy_1+Dy_2+r_{i,2})^{\alpha}+r_{i,4} z2=C(Ay1+By2+ri,1)α+D(Cy1+Dy2+ri,2)α+ri,4

尽管有可能在单个gate中实现多轮Resuce运算，但是实际上我们会限制每个gate中constants的数量。在单轮Resuce运算中，已包含4个constants r i , 1 , ⋯ , r i , 4 r_{i,1},\cdots,r_{i,4} ri,1,⋯,ri,4，若再增加更多的constants，意味着需要对更多的preprocessed polynomial 进行challenge open操作。（adding more would mean more preprocessed polynomials which must be opened at a challenge point。）

为了进一步优化，可在Rescue gate中插入不需要constant的gate，如curve operation gate。这样，每个Rescue gate可利用其邻居的unused constant slots。同时，作为备选方案之一，也可使用不需要配置那么多constant的其它key schedule。

7. 统一的constraint set

迄今为止，已定义了不同gate类型的constraint表示，但是，在Plonk中只设定了一组constraint。因此，需要借助“0”或“1”的“filter”来将以上constraint 组合起来，以明确特定gate的特定constraint。Plonky中统一的constraint set是将这些contraint简单的进行filter求和。

最简单的实现方式是为每种gate类型定义一个constant polynomial，如is_resuce_gate定义为is_rescue_gate(g^i)=1 if and only if gate i is a Rescue gate。同时，也可以有各种自定义的gate类型，但是，实际为了减少constant polynomial的数量，会控制gate类型的数量。

Plonky中以binary tree的形式来管理gates：在这里插入图片描述同时，为tree 的每一层额外引入constant polynomial C i C_i Ci，根据其在tree的左端还是右端，分别设置其值为0或1。如某arithmetic gate的path表示为 1001 1001 1001，则其constraint对应的filter为 C 1 ( x ) ( 1 − C 2 ( x ) ) ( 1 − C 3 ( x ) ) C 4 ( x ) C_1(x)(1-C_2(x))(1-C_3(x))C_4(x) C1(x)(1−C2(x))(1−C3(x))C4(x)。

不同的gate在binary tree中有不同的depth，主要原因为：

smaller depth对应为lower-degree filter polynomial。为了使最大degree尽可能小，某些具有higer-degree constraint的gate，应对其赋予lower-degree filter。
任何不在filter polynomial中使用的constant，这些constant都可用于gate configuration。如某arithmetic gate的filter中仅包含了 C 1 , ⋯ , C 4 C_1,\cdots,C_4 C1,⋯,C4，则其可使用 C 5 , C 6 C_5,C_6 C5,C6来定义其arithmetic type。

8. 未来的优化方向

当前Plonky在专注于优化recursive circuit size的同时，也致力于提升Plonky中primitives的性能。 Plonky采用纯Rust语言编写，Plonky希望在X86系统上借助 carry chain optimizations 来加速，但是Rust compiler当前并不支持。

Plonky中的proving time主要由multi-scalar multiplication占据，Plonky中的multi-scalar multiplication实现采用 Yao算法的一种变种。其性能要优于Pippinger算法，特别是对于包含variable-base MSM的IPA reduction场景。

Daira Hopwood也提供了另一种潜在的优化方案，相比于applying the endomorphism zero or one times based on a bit of the scalar，可apply it zero or one or two times based on a base-3 limb of the scalar。这可将iteration数量由64降为50，同时可保证其一一对应关系，尽管相应的证明可能更复杂。

参考资料

[1] mir团队blog Fast recursive arguments based on Plonk and Halo [2] https://github.com/mir-protocol/plonky [3] arkworks讨论 Plonk、TurboPLONK、PLOOKUP实现等

Plonky = Plonk + Halo + Rescue

最近更新

热门博客

[ 申请 ]友情链接：