Monday, July 13, 2015

Modular Hardware Architecture for Somewhat Homomorphic Function Evaluation

This post is based on the paper

This paper proposes an instruction-set hardware architecture for all building blocks required in polynomial ring based fully homomorphic schemes and uses the hardware architecture to instantiate the somewhat homomorphic encryption scheme YASHE on a large Virtex 7 FPGA. The building blocks present in the architecture are sufficiently generic to allow implementation of other homomorphic schemes based on RLWE or NTRU.

System Setup
The computations are performed in a modular polynomial rings of the form R = Z[x]/(f(x)) where f(x) is a monic irreducible polynomial Φd(x) of degree n. To utilize single instruction multiple data (SIMD) operations f(x) is a cyclotomic polynomial. The plaintext space is R2.

The used parameter supports homomorphic evaluations of SIMON-64/128 (having a multiplicative depth of 44); in particular d = 65535 (and thus the degree of f(x) is n= 215), log2(q) = 1228 and the discrete Gaussian distribution with parameter is 8. The chosen parameter set meets at least 128-bit security and SIMD in 2048 slots.

High-level optimizations

Two main operations in the YASHE scheme are the YASHE.Add for homomorphic addition and the YASHE.Mult for homomorphic multiplication. While YASHE.Add is relatively simple, YASHE.Mult is very costly since one first needs to compute c1 · c2 over the integers, then scale the result by t/q and round, before mapping back into the ring Rq. For the polynomial multiplication of such a large degree n = 215,  an NTT is used over other polynomial multiplication algorithms thanks to its almost linear time complexity. Moreover to tackle the problem of long integer arithmetic CRT is used to split the coefficients into 30-bit small coefficients. Application of the CRT helps to achieve parallelization and to utilize the small DSP multipliers available in FPGAs.

High-level Architecture

The instruction-set architecture shown in the figure has three main components: a computer, a HE-coprocessor and an external memory. The HE-coprocessor is implemented on an FPGA and supports NTT of polynomials, coefficient wise additions, subtractions and multiplications, computation of the residues using the CRT, computation of the coefficients modulo a large modulus from the CRT-residues, and scaling of the coefficients. The computer works in master-mode and instructs the HE-coprocessor to perform the required operations. Since the FPGA has a limited amount of internal memory in the form of block RAMs, only few residue polynomials can be kept during a computation. A large external memory is used to store the set of cipher-text polynomials.

The HE-coprocessor has a set of parallel processors, where each parallel processor is composed of several cores. The parallel processors are independent of each other and supports a high degree of parallelism and scalability. Since the architecture is very large, special care is taken to reduce the routing complexity. For example, a routing-friendly parallel NTT algorithm is used in the HE-coprocessor.    

Experimental Results

The designed HE-coprocessor with 8 parallel processors, each processor having 16 parallel cores, 16 small-CRT units and 2 large-CRT units was compiled for the Xilinx Virtex-7 XC7V1140T-2 FPGA, the largest device of the Virtex-7 FPGA family. The architecture consumes 23 % of the available registers, 50 % of LUTs, 53 % of DSP slices, and 38 % of block RAMs. This implementation evaluates SIMON-64/128 in approximately 157.7s (at 143MHz) and it processes 2048 cipher-texts at once giving a relative time of only 77ms per block. This is 26.6 times faster than the leading software implementation on a 4-core Intel Core-i7 processor running at 3.4GHz. The timing values do not consider the overhead of data transfer between the external memory and the HE-coprocessor.

 Future Work

1. The architecture does not implement the required interface between the external memory and the FPGA-based HE-coprocessor. The cost of data transfer is very important considering the vast amount of data exchange during various operations; and hence the interface should be properly designed. It should be noted that most of the memory access can be performed in parallel with the computation using a ping-pong scheme of two sets of block RAMs in the FPGA. The FPGA and the master-computer operate on these two sets alternatively between two consecutive instructions: when the FPGA operates on the first set, the master-computer operates on the second, and vice versa. Since only 38% of the available block RAMs are utilized in the proposed architecture, such a ping-pong styled memory access seems practical in order to reduce the cost of memory access.

2. The computation time can be reduced by performing low-level optimizations in the architecture. For example, it might be possible to increase the operating frequency beyond 200 MHz by optimizing the long critical paths that are present in the architecture. The cycle requirement of the NTT algorithm can also be reduced. Moreover, since the implemented architecture consumes only ~50% of the available resources, it might be possible to put more number of parallel processors in the FPGA. This will require area reduction of each processor and a better placement-routing optimization.

3. The results are obtained from a single FPGA. Since the HE-coprocessor is composed of independent parallel processors, several FPGAs can be used to scale down the computation time. 

Sujoy Sinha Roy

Monday, July 6, 2015

Cryptanalysis of the Co-ACD Assumption

The HEAT project has a new public output:

Homomorphic cryptography (at the core of the HEAT project) allows to securely delegate computation over encrypted data and is a very active research area. At ACM-CCS 2014, a top-tier conference on computer and communications security, a new scheme claimed to be the "most efficient of those that support an additive homomorphic property" was proposed by Cheon, Lee and Seo.

Understanding the security of the homomorphic cryptographic schemes is therefore of utmost importance, especially to select the most efficient and secure systems in HEAT. In this paper that will appear at CRYPTO 2015, a top-tier conference in cryptography, we show that the latter scheme is completely insecure. We present new lattice-based attacks that are effectively devastating for the proposed constructions. More precisely, we show that the parameters proposed by Cheon et al. and originally aiming at 128-bit security can be broken in a matter of seconds. And while it is possible to select parameters outside of the range in which our attacks run in polynomial time, they have to be so large as to render the proposed constructions severely uncompetitive (e.g. our asymptotic estimates indicate that 128 bits of security against our attacks require a modulus of at least 400,000 bits).

Friday, July 3, 2015

Modular Hardware Architecture for Somewhat Homomorphic Function Evaluation

The HEAT project has a new public output:

This paper reports on a hardware architecture implementing all building blocks used in polynomial ring based fully homomorphic schemes. As an example, the YASHE encryption scheme is implemented on top of this architecture and the SIMON-64/128 block cipher was executed in the encrypted domain.

The building blocks are integrated in an instruction-set coprocessor, which can be controlled by a computer for evaluating arbitrary functions (up to the multiplicative depth 44 and 128-bit security level).

This implementation evaluates SIMON-64/128 in approximately 157.7s (at 143MHz) and it processes 2048 ciphertexts at once giving a relative time of only 77ms per block. This is 26.6 times faster than the leading software implementation on a 4-core Intel Core-i7 processor running at 3.4GHz.

Wednesday, July 1, 2015

Yet Another Somewhat Homomorphic Encryption (YASHE) Scheme

In this blog post we outline the main operations in the YASHE fully homomorphic encryption (FHE)  scheme. For the exact details see the paper introducing YASHE

López-Alt, Tromer, and Vaikuntanathan proposed a (multi-key) FHE scheme based on the work by Stehlé and Steinfeld in which a provably secure version of NTRUEncrypt is presented with security based on standard problems in ideal lattices. The FHE scheme from López-Alt, Tromer, and Vaikuntanathan needs to make an additional assumption relating to the uniformity of the public key, the so-called decisional small polynomial ratio (DSPR) assumption, to allow homomorphic operations and remain semantically secure. Brakerski introduced an approach to limit the noise growth during homomorphic operations via a tensoring technique.

In this paper, Yet Another Somewhat Homomorphic Encryption (YASHE) scheme is introduced which incorporates the best of these techniques. YASHE avoids the DSPR assumption by using the techniques described by Brakerski and construct a new fully homomorphic encryption scheme from the Stehlé and Steinfeld version based on standard lattice assumptions and a circular security assumption.

Besides this theoretical advantage, YASHE has other attractive properties
  1. This new scheme is scale-invariant: this means it avoids the modulus-switching technique of Brakerski, Gentry and Vaikuntanathan.
  2. The ciphertext consists of only a single ring element (as in this paper) as opposed to the two or more ring elements for schemes based purely on the ring learning witherrors (RLWE) assumption.  
In the following I will describe the more practical variant of YASHE (denoted YASHE’ in the paper). Note that this practical variant YASHE' does need the DSPR assumption.

In order to show how YASHE works we need to fix some parameters. Selecting secure parameters is a difficult task by itself for which tools have been generated (see for instance our previous blog post). For this post I simply assume correct and secure parameters have been chosen (but see the paper for some example parameters). Given the security parameter $\lambda$, fix a positive integer $d$ and modulus $q$ that determine $R={\mathbb Z}[X]/(\Phi_d(X))$, and $t$ with $1< t < q$, and distributions $\chi_{\mathrm{key}}$, $\chi_{\mathrm{err}}$ on $R$.

The message space is  $R/tR=({\mathbb Z}[X]/(\Phi_d(X)))/(t({\mathbb Z}[X]/(\Phi_d(X)))).$

The function $P_{w, q}$ is a generalization of the PowersofTwo and $D_{w, q}$ is a generalization of BitDecomp from this paper. Instead of a radix-2 representation these functions use a radix-$w$ system. These function take a single ring element and output $\ell_{w,q} = \lfloor\log_w(q)\rfloor + 2$ ring elements. The choice of $w$ is important since it allows for a trade-off between efficiency and error growth.

 Key generation. KeyGen $(d,q,t,\chi_{\mathrm{key}}, \chi_{\mathrm{err}}, w)$
  1. Sample $f', g \leftarrow \chi_{\mathrm{key}}$ and let $f = [tf' + 1]_q $.
  2. If $f$ is not invertible modulo $q$, choose a new $f'$.
  3. Compute the inverse $f^{-1} \in R$ of $f$ modulo $q$ and set $h = [tgf^{-1}]_q$.
  4. Sample $\vec{e},\vec{s}\leftarrow \chi_{\mathrm{err}}^{\ell_{w,q}}$, compute $\vec{\gamma} = [P_{w,q}(f) + \vec{e} + h \cdot \vec{s}]_q \in R^{\ell_{w,q}}$.
  5. Output $({\sf pk}, {\sf sk}, {\sf evk}) = (h,f,\vec{\gamma})$.
 Encryption. Encrypt $(m, {\sf pk})$
  1. For a message $m+tR$, choose $[m]_t$ as its representative.
  2. Sample $s,e \leftarrow \chi_{\mathrm{err}}$.
  3. Output the ciphertext $c = [\lfloor q/t \rfloor [m]_t + e + {\sf pk}\cdot s]_q \in R.$
Decryption. Decrypt $(c, {\sf sk})$
  1. To decrypt a ciphertext $c$, compute  $m = \left[\left\lfloor \frac{t}{q}\cdot[{\sf sk}\cdot c]_q \right\rceil \right]_t \in R.$
Key switching. KeySwitch $(\tilde{c}_{\sf Mult}, {\sf evk})$
  1. Output the ciphertext $[\langle D_{w,q}(\tilde{c}_{\sf Mult}) , {\sf evk} \rangle]_q$.
The key switching algorithm transforms an intermediate encryption into a ciphertext that can be decrypted with $f$  itself. The evaluation key is ${\sf evk} = [P_{w,q}(f) + \vec{e} + h \cdot \vec{s}]_q$, where $\vec{e}, \vec{s} \leftarrow \chi_{\mathrm{err}}^{\ell_{w,q}}$ are vectors of polynomials sampled from the error distribution $\chi_{\mathrm{err}}$. This key is a vector of quasi-encryptions of the secret key $f$ under its corresponding  public key. It is required for the homomorphic multiplication operation and is therefore made public. This means, we need to make a circular security assumption, namely that the scheme is still secure even given that ${\sf evk}$ is publicly known.

Homomorphic addition. Add $(c_1,c_2)$
Given two ciphertexts $c_1, c_2\in R$, which encrypt two messages $m_1, m_2$, their sum modulo $q$, $c_{\sf Add} = [c_1 + c_2]_q$, encrypts the sum of the messages modulo $t$, $[m_1 + m_2]_t$.

Homomorphic multiplication. Mult $(c_1,c_2, {\sf evk})$
Output the ciphertext
$$c_{{\sf Mult}} = \textrm{KeySwitch} (\tilde{c}_{\sf Mult}, {\sf evk}), \mbox{ where } \tilde{c}_{\sf Mult} = \left[\left\lfloor\frac{t}{q} \cdot c_1 \cdot c_2\right\rceil\right]_q.$$

YASHE in practice
This practical version of YASHE has been used in order to ensure privacy of sensitive medical data. In this work it is shown how to privately conduct predictive analysis tasks on encrypted data using homomorphic encryption. As a proof of concept a working implementation of a prediction service running in the cloud, which takes as input private encrypted health data, and returns the probability for suffering cardiovascular disease in encrypted form. Since the cloud service uses homomorphic encryption, it makes this prediction while handling only encrypted data, learning nothing about the submitted confidential medical data.

In a recent paper in the framework of the HEAT project, a modular hardware architecture for somewhat homomorphic function evaluation using YASHE is presented. In another recent publication, YASHE is implemented to investigate the potential of FPGAs.