Tuesday, December 20, 2016

Privacy-friendly Forecasting for the Smart Grid using Homomorphic Encryption and the Group Method of Data Handling

Machine learning has become more and more popular last years. Every day new applications appear on the market such as augmented reality, pattern recognition, intelligent search, personal recommendations, forecasting etc. Big companies like Google, Amazon, Apple use comprehensive statistical models to improve their business. They rely on immense amounts of data that clients provide through different on-line services. This information can be very privacy sensitive and causes immediate damage after unapproved disclosure.
The question, then, arises: how can private data be encrypted and managed at the same time by machine learning algorithms? In our recent paper we tried to answer this question for the load forecasting problem in the smart-grid using homomorphic encryption.

The idea behind the smart-grid technology is very simple. It assumes that a client housing and a service provider are equipped with communication devices called smart-meters. These allow to transmit current information about load consumption, weather conditions, current financial information, structure of a local utility grid etc. A service provider can apply statistical techniques in order to forecast energy demand for a next period of time. Having predictions clients might control their consumption more efficiently and public utilities may optimise production and logistic costs to lower prices or avoid blackouts.

One popular class of prediction algorithms is based on artificial neural networks. ANNs compute the so-called activation functions, in practice it is common to use a sigmoid function where the logistic function $t \mapsto 1/(1 + e^{−t})$ is a popular choice. Due to the highly non-linear nature computing such a sigmoid function homomorphically is far from practical. To overcome that issue one can substitute a sigmoid by a polynomial function that was done recently in papers of Xie et al. and Dowlin et al. for pattern recognition purposes. But there exist an older statistical approach built especially for prediction purposes by a Soviet computer scientist Alexey Ivakhnenko in 1970. It is called the Group Method of Data Handling (GMDH).

GMDH takes a history of the system outcome $\{x_i\}$ and construct truncated Wiener series to approach the next output value

$a_0 + \sum_{i=1}^{n_0} a_i x_i + \sum_{i=1}^{n_0} \sum_{j=i}^{n_0} a_{ij} x_i x_j + \sum_{i=1}^{n_0} \sum_{j=i}^{n_0} \sum_{k=j}^{n_0} a_{ijk} x_i x_j x_k + \dots$

In the original paper this polynomial is called a Kolmogorov-Gabor polynomial and it can be nicely illustrated in a neural network manner

where each node has only two input edges and its own activation function described by a quadratic polynomial over real numbers

$\nu_{ij} : \textbf{R}^2 \rightarrow \textbf{R}: (x, y) \mapsto b_{ij0} + b_{ij1} x + b_{ij2} y + b_{ij3} xy + b_{ij4} x^2 + b_{ij5} y^2.$

The learning phase determines the polynomial coefficients and the wiring of nodes while the number of layers and the number of nodes at each layer are fixed in the beginning. Actually, one should solve a linear regression problem at each node and decide whether this node is among the "best" ones in a layer in terms of error performance. As a result a GMDH neural network can be expressed by an arithmetic circuit of multiplicative depth equal to the number of layers. This construction is a perfect match for homomorphic evaluation.

To perform experiments on real data we found a source provided by Irish government over 5000 housings and commercial buildings. It turns out that the prediction performance of ANNs for individual housings is quite low. But combining a few houses makes the forecasting error decrease drastically. Thus we considered aggregated consumption for 10 house combinations which correspond to small apartment complexes where privacy issues still make sense.

Finally, we applied an FV SHE scheme on top of the forecasting algorithm together with a fixed-point representation of real numbers recently studied by Costache et al. The simplest GMDH network with 4 layers was chosen to show reasonable error performance. The resulting algorithm evaluates a load demand for the next 30 min with relative error around 20% which is comparable with modern ANNs. The computation of one prediction takes 90 seconds in the sequential mode or 4 seconds in the parallel implementation on an average laptop equipped with an Intel Core i5-3427U CPU (running at 1.80GHz).

Please, check http://eprint.iacr.org/2016/1117 for further details.

Monday, November 28, 2016

Interval arithmetic and practical lattice reduction

IALatRed in a Nutshell

Lattice reduction is fundamental in computational number theory and in computer science, especially in cryptography. The celebrated Lenstra–Lenstra–Lovász reduction algorithm (called LLL or L3) has been improved in many ways through the past decades and remains one of the central tool for reducing lattice basis. In particular, its floating-point variants — where the long-integer arithmetic required by Gram–Schmidt orthogonalization is replaced by floating-point arithmetic — are now the fastest known. Yet, the running time of these floating-point versions is mostly determined by the precision needed to perform sound computations: theoretical lower bounds are large whereas the precision actually needed on average is much lower. IALatRed is a proof-of-concept of an adaptive precision version of LLL, one of its variant Potential-LLL and the BKZ reduction. In these algorithms, floating-point arithmetic is replaced by Interval Arithmetic. The certification property of interval arithmetic enables runtime detection of precision defects in numerical computations and accordingly, makes it possible to run the reduction algorithms with guaranteed nearly optimal precision. As such, these adaptive reduction algorithms run faster than the state-of-the-art implementations, while still being provable.

A brief history of lattice reduction

Lattices are defined as additive discrete subgroups of $\mathbb{R}^n$,  i.e. the integer span $L(\mathbf{b_1}, \ldots \mathbf{b_d}) = \bigoplus_{i=1}^d \mathbb{Z} \mathbf{b_i}$ of a linearly independent family of vectors $\mathbf{b_1}, \ldots, \mathbf{b_d}$ in $\mathbb{R}^n$. Such a family is called a basis of the lattice, and is not unique. Nevertheless, all the bases of a given lattice have the same number of elements, $d$, which is called the *dimension* of the lattice. Among the infinite number of different bases of a n-dimensional lattice with $n\geq 2$, some have interesting properties,  such as having reasonably small vectors and low orthogonality defect. Finding such reduced bases is the goal of *lattice reduction* theory and has been crucial in several fields of computer science and mathematics, especially in cryptology. For instance, lattices have been used to break many public-key cryptosystems in the last decades,  including knapsack cryptosystems or RSA in specific settings thanks to Coppersmith's method.

The problem of finding good bases goes back to the early works of Lagrange and Gauss, for dimension two lattices, and the introduction of the Gauss algorithm, even though first introduced by Lagrange. This procedure can be seen as a 2-dimensional extension of the well-known Euclid algorithm for computing the greatest common divisor of two integers. In 1850, Hermite published the first reduction algorithm for arbitrary dimension. A century later, in 1982, Lenstra, Lenstra and Lovász designed the LLL algorithm, with the polynomial factorization problem as an application, after the celebrated work of Lenstra on integer programming. This algorithm is a milestone in the history of lattice reduction algorithms, being the first algorithm whose running-time is polynomial, for arbitrary dimension. This work has been improved in multiple ways among others by Kaltofen, Schnorr, and Gama and Nguyen, decreasing the time complexity or improving the quality of the reduction.


 A bird's eye view on Interval Arithmetic

Interval arithmetic is a representation of reals by intervals — whose endpoints are floating-point numbers — that contain them. Arithmetic operations, in particular the basic operations $+$, $-$, $\times$ , $\div$ can be redefined in this context. The main interest of this representation lies in its certification property:  if real numbers are represented by intervals, the interval resulting from the evaluation of an algebraic expression contains the exact value of the evaluated expression.

If the birth of Lattice Reduction is well-dated, it is not the case of Interval Arithmetic. For some authors, it has been introduced by R. Moore in 1962 in his PhD thesis. For others it can be dated back in 1958 in an article of T. Sunaga which describes an algebraic interpretation of the lattice of real intervals, or even sooner in 1931 as a proposal in the PhD thesis of R.C. Young at Cambridge. Nonetheless its development and industrial applications have to wait for 1980, and the momentum of U. Kulisch in Karlsruhe, leading IBM to develop a specific instruction set and a compiler natively integrating Interval Arithmetic.  Its main asset — calculating directly on sets — is nowadays used to deterministically determine the global extrema of a continuous function or determining the zeroes of a function and proving their existence. Another application of Interval Arithmetic is to be able to detect lack of precision at run-time of numerical algorithms, thanks to the guarantees it provides on computations.

Mixing the two of them: IALatred

This last application can lead to the design of adaptive precision numerical algorithms. In particular e  transformed the celebrated LLL algorithm  into an adaptive precision version, leading to design a provable reduction algorithm which in some cases even run faster than the state-of-the-art standard FpLLL of D. Stehle et al. We also applied this technique to a stronger variant of LLL, called Potential-LLL, first  introduced by Fontain et al., with similar results, as well as the stronger BKZ algorithm.

Wednesday, July 20, 2016

Full RNS Implementation of Fan Vercauteren scheme

In a paper (https://eprint.iacr.org/2016/510), Jean-Claude Bajard, Julien Eynard, Anwar Hasan and Vincent Zucca explain how to perform the division and rounding steps occurring during decryption and homomorphic multiplication procedures of scale invariant schemes such as FV and YASHE without using any multi-precision library. 

Such schemes, based on RLWE, usually lie in a power-of-two cyclotomic ring $\mathcal{R} = \mathbb{Z}[X]/(X^n+1)$ for efficiency reasons. In FV scheme, a ciphertext $\texttt{c} = (\texttt{c}_0, \texttt{c}_1)$ is a polynomial of degree 1 with coefficients in $\mathcal{R}_q = \mathbb{Z}_q[X]/(X^n+1)$. Each $\texttt{c}_i$ is then represented as a polynomial of degree $n-1$ with coefficients in $\mathbb{Z}_q$, $q$ is usually chosen as a product of several small prime moduli $q_i$. This allows to represent the integer coefficients of the $\texttt{c}_i$'s, through the Chinese Reminder Theorem (CRT), by their residues modulo the smaller $q_i$'s. However the use of the Residue Number System (a.k.a non-positional CRT representation) remains hardly compatible with procedures involving coefficient-wise division and rounding which require costly comparisons to be performed. Thus, until now, one had to invert the CRT representation and then use multi-precision arithmetic to perform these steps. 

The techniques presented in this paper allow to perform these steps directly in RNS permiting to avoid a costly conversion between RNS and positional system and the use of multi-precision library. Compared with the FV-NFLlib implementation (https://github.com/CryptoExperts/FV-NFLlib) it enables speed-ups from a factor 5 to 20 for decryption and from 2 to 4 for homomorphic multiplication. It may also lead to future implementations on highly parallel accelerator cards such as GPU. This paper will be presented at SAC (http://www.engr.mun.ca/~sac2016/), in St John's Canada, in August, this blog post briefly presents it.

A plaintext $\texttt{m}$ is an element of $\mathcal{R}_t = \mathbb{Z}_t[X]/(X^n+1)$, after being encrypted it becomes a ciphertext $\texttt{c} = (\texttt{c}_0,\texttt{c}_1)\in (\mathcal{R}_q)^2$  with $t < q$. To recover the message the decryption procedure consists in computing $[\lfloor t/q(\texttt{c}_0+\texttt{c}_1 \texttt{s}) \rceil]_t$ with $\texttt{s}$ the secret key.

There are two difficulties to perform this computation in RNS. First $\texttt{c}'=t(\texttt{c}_0+\texttt{c}_1 \texttt{s})$ lies in $\mathcal{R}_q$ and is represented in the "native" RNS base $\mathcal{B}_q = \{q_1,....,q_k\}$ with $q = q_1...q_k$. Thus in base $\mathcal{B}_q$, it is impossible to perform the division by $q$. Second computing a rounding is unnatural in a non-positional systeme like RNS.

To overcome the first difficulty we need to convert the elements in base $\mathcal{B}_q$ to another base coprime to $\mathcal{B}_q$.  As we only need the result modulo $t$ we can use the single moduli base $\{ t \}$, as long as $t$ and $q$ are coprime (usually $t$ is a power of two while $q$ is odd). To achieve this conversion we use a fast base conversion which adds a multiple of $q$ called $q$-overflow. After the division by $q$ in the new base, $q$ is canceled and it only remains an error term $\tau$.

To solve the second difficulty, we choose to compute a flooring, using the equality $\lfloor \frac{x}{q}\rfloor = \frac{x-|x|_q}{q}$ instead of the rounding. The computation of the residue modulo $q$ is achieved by using a Montgomery reduction, which fits well in RNS. Of course, as flooring and rounding are not the same, an error may occur. In such case we simply consider this error as part of the previous one $\tau$.

At this step we need to cancel, the error $\tau$ due to the conversion and the flooring but also the inherent noise in the ciphertext to recover the message. To do this, we add an other moduli $\gamma$, coprime to $q$ to the base $\{t\}$, we multiply $\texttt{c}'$ by $\gamma$ in base $\mathcal{B}_q$ and we run the computations explained above in base $\{\gamma, t\}$ instead of $\{t\}$. Then, for a well-suited $\gamma$, we are able to cancel the noise in the residue modulo $t$ by subtracting the centered residue modulo $\gamma$ to it. Finally we are able to recover $[\texttt{m}]_t$.

Homomorphic multiplication

To perform an homomorphic multiplication of two ciphertexts $\texttt{c}^{(1)}$ and $\texttt{c}^{(2)}$, you first have to scale the product $\texttt{c}^{(1)}\times\texttt{c}^{(2)}$ (which is a degree two polynomial with coefficients in $\mathcal{R}_q$) by $t/q$ and to round the result coefficient-wise as for the decryption. Then one has to relinearize this degree two polynomial to get back a degree one polynomial. But, to limit the noise growth during this step, it requires to decompose the coefficients of an element of $\mathcal{R}_q$ in some radix base $\omega$. Sadly, this is precisely the kind of thing we cannot do in a non-positional number representation.

One should understand that the product $\texttt{c}^{(1)}\times\texttt{c}^{(2)}$ should be lifted in $\mathbb{Z}$ and because we scale down by $t/q$ the coefficients of the product fits back in $\mathcal{R}_q$. Instead of making the product computation in $\mathbb{Z}$, we use a second RNS base $\mathcal{B}'$ coprime to $\mathcal{B}_q$ of same size to not loose any information.

Then we use a fast conversion to convert $\texttt{c}^{(1)}$ and $\texttt{c}^{(2)}$ from $\mathcal{B}_q$ to $\mathcal{B}'$. We manage to reduce the error in base $\mathcal{B}'$ (due to the $q$-overflow) in order to reduce its growth during the product and then we perform the product in both bases.

Next, we use a fast conversion to convert the residues of the product in base $\mathcal{B}_q$ to base $\mathcal{B}'$. Then, as for decryption, we perform a flooring using Montgomery reduction instead of a rounding and we get $\lfloor t/q (\texttt{c}^{(1)}\times\texttt{c}^{(2)})\rfloor$ in base $\mathcal{B}'$. We consider the error due to the different conversions as part of the noise.

For the next step, we convert the result back in base $\mathcal{B}_q$, but this time using an exact base conversion (more costly), because we cannot tolerate overflows from the base $\mathcal{B}'$. To overcome the last difficulty we have managed to decompose the elements belonging to $\mathcal{R}_q$ in a very similar way from the original, however we can perform this one in RNS. The end of the homomorphic multiplication procedure remains the same, we just replace the original decomposition function by our own.  

Impact on the noise
Of course, using all this tricks add some errors which become part of the noise and thus make it bigger. One could wonder if, and how, our multiplicative depth has been affected. Indeed our methods modify the different bounds on the noise for the homomorphic multiplication and also the decryption bound. Nevertheless, by running an analysis with our new bounds, we have shown that our multiplicative depth remains unchanged from the one of the original scheme, for all the sets of parameters but one.

Thursday, July 7, 2016

WHEAT 2016: The Pre-history of Lattice-based Cryptanalysis

On the second day of the Workshop HEAT, Antoine Joux gave an invited talk on the history of cryptanalytic applications of  lattice-basis reduction algorithms.

Lattice is a discrete subgroup of $\mathbb{R}^n$. Equivalently, it is a span of integer-linear combinations of $\mathbb{R}$-linearly independent vectors in $\mathbb{R}^n$. Finding a short basis in $\mathbb{R}^2$ is relatively easy and Gauss' lattice-basis reduction algorithm is effective for this purpose. As the speaker emphasised, things become more complicated when we move to higher dimensions. He then went on to discuss the details of the famous LLL algorithm, invented by Lenstra, Lenstra and Lovász in 1982, for lattice-basis reduction in arbitrary dimensions. LLL is a polynomial-time algorithm but returns a short vector within an exponential approximation factor of the length of the shortest vector in the lattice. This algorithm, which performs well in practice, may be viewed as a combination of the Gauss lattice reduction and the Gram-Schmidt orthogonalisation methods.

An early application of lattice-basis reduction is the cryptanalysis of knapsack-based cryptosystems. The knapsack problem, a.k.a. the subset-sum problem, is given integers $a_1, \ldots, a_n$, and $S$, find $\epsilon_1,\ldots,\epsilon_n \in \{0,1\}$ such that 
\[S = \overset{n}{\underset{i=1}{\sum}} \epsilon_i \cdot a_ i.\]
This problem is NP-hard though some cases are easy, and it served as a basis for cryptosystems such as Merkle-Hellman cryptosystem. The basic idea is to hide an easy knapsack in a hard-looking one. However, at CRYPTO 1982, Shamir broke this cryptosystem using lattice-basis reduction. This attacks works for super-increasing knapsacks, where we have $a_i > \sum_{j=1}^{i-1} a_j$.  Other works starting with that of Lagarias-Odlyzko in 1985 lead to improved attacks on low-density knapsacks.

The speaker also briefly spoke about the application of lattie-basis reduction to the problems of

  •   reconstructing small linear dependencies: let  $(x_1,\ldots,x_n)$ be a sequence of real numbers, we need to find small coefficients $v_i$ such that $\sum v_i\cdot x_i =0$,
  • constructing polynomials for use in the Number Field Sieve for solving the Discrete Logarithm Problem, and
  • finding small roots of univariate and bivariate integer polynomials modulo composite numbers: Coppersmith's attack.

Tuesday, July 5, 2016

WHEAT Workshop 2016, Paris

The second HEAT workshop on Fully Homomorphic Encryption (otherwise known as WHEAT) kicked off this morning in (disappointingly cloudy) Paris.

The first talk of the day was Martin Albrecht's invited talk on solving short secret LWE instances. This is joint work with Rachel Player and Sam Scott. He provided a clear summary of the state of the art in terms of algorithms available and thus set the grounds for the rest of the day. Attacks using parameter choices inspired by HElib and the LP model were discussed and where possible, estimate running times were provided. A partial conclusion was the latter should not be assumed. For a more detailed read, see https://eprint.iacr.org/2015/046.pdf.

Also discussed was RLWE security in the FHE case, a talk given by Guillaume Bonnoron. This is joint work with Caroline Fontaine. They present an attack on specific FHE parameter choices and use the FV scheme for the reason that they wish to avoid doing a Modulus Switch Operation (see for example https://eprint.iacr.org/2015/889.pdf for a description). The conclusion of this talk is that their attack does work, but FHE and RLWE are not broken, because it does not scale up. A bigger parameter $n$ leads to bigger errors. Whilst they took one day to complete the attack versus a previous 3.5 days (https://eprint.iacr.org/2015/176.pdf), this talk's final conclusion was that more cryptanalysis is needed.

The second invited talk in the afternoon was Leo Ducas', joint work with Martin Albrecht and Shi Bai. This discussed the subfield lattice attack on "over stretched NTRU assumptions"; for the full technical read, see https://eprint.iacr.org/2016/127.pdf. This attack is based on using a subfield of the field we are working in - they assume the latter is a power of two cyclotomic, but claim it can be done using any field which is Galois. We map down our RLWE instances using the (partial) norm map in order to solve a (hopefully) easier problem in the subfield. Given the right conditions, we can then map the solution we find back up and recover a short enough solution to carry through with the attack. The practicality grows with the modulus $q$, hence the "over stretched" condition. Although this does not seem (yet?) to affect the NTRU schemes used in practice, this talk's conclusion was to drop the NTRU assumption altogether.

Monday, July 4, 2016

Fixed Point Arithmetic

In a paper (http://eprint.iacr.org/2016/250) to be presented at SAC (http://www.engr.mun.ca/~sac2016/), in St Johns Canada, in August HEAT researchers Ana Costache, Nigel P. Smart, Srinivas Vivek and Adrian Waller describe how to perform homomorphic operations on fixed point numbers. The work builds upon earlier work of Dowlin et al (http://research.microsoft.com/pubs/258435/ManualHEv2.pdf). In this blog post we present a quick overview of the technique.

To understand the technique you need to appreciate that all known efficient Somewhat Homomorphic Encryption (SHE) schemes are homomorphic over a given ring of polynomials. In particular the set of polymomials modulo both a prime $p$ (the so-called plaintext modulus) and a polynomial $F(X)$. Given such a plaintext space we can embed integers upto a bounded size within the plaintext space in a redundant manner as follows.

Suppose we have two integers, say $x=127$ and $y=78$, we write these as polynomials in {\em balanced base} $B$, for $B-3$. This means we write the integers in base $B$, but we allow plus or minus signs in the representation and the coefficients lie in $[0,\ldots,(B-1)/2]$. Thus we have
$$  127 \equiv 3^5-3^4 -3^3 -3^2+1 =X^5-X^4-X^3-X^2+1$$
$$  78 \equiv 3^4-3 = X^4-X$$.
When we add and multiply two such polynomials, and then substitute back in $X=3$, we will obtain the same result as multiplying and adding the original integers.  So for example
\begin{eqnarray*}  (X^5-X^4-X^3&-&X^2+1)
    (X^4-X)  \\
    &=& X^9-X^8-X^7-2 \cdot X^6+X^5+2 \cdot X^4+X^3-X.
Notice how both the degree and the coefficient sizes have increased. If the resulting polynomial can still be represented in the plaintext space, i.e. the prime $p$ and the degree of $F(X)$ are large enough, then we can map the polynomial operation to the homomorphic operation. Thus assuming $p$ and $F(X)$ are {\em big enough} we can perform operations on integers; even though the homomorphic encryption scheme only supports operations on polynomials modulo $(p,F(X))$.

To encode a fixed point number we then simply encode it as the ratio of an integer numerator and a power of $B$ denominator; so for example $4.7037 \approx 127/3^3$. We can then apply operations on these fractional numbers, as long as we keep track of which denominator we are working with, by simply operating on the numberators. Thus each encrypted fixed point number is represented by an encrypted integer (the numerator) and a public power of three (the denominator).

Whilst this representation has been known before, and was given in the paper of Dowlin, a key issue was that one needed to know for a given homomorphic operation what the requisite sizes of $p$ and $F(X)$ should be to ensure correctness. It is this latter problem which our paper addresses; we show how to obtian lower bounds on the plaintext size which are needed to support a given homomorphic calculation.  In addition we show that a technique of Dowlin et al, which appears at first sight to be more efficient, turns out to be actually just the same as the above encoding method for fixed point numbers.

Fourier Transforms in the Encrypted Domain

Thales UK has identified Synthetic Aperture Radar (SAR) as a promising candidate for the application of Homomorphic Encryption (HE). It is proposed that the data collected from SAR can be protected throughout its processing up to the formation of image products. SAR data can be processed in many ways to produce different types of images; the simplest of which is the Single Look Complex Image. The main challenge of implementing the Single Look Complex Image algorithm in the encrypted domain is that of implementing the Fourier Transform.

Discrete Fourier Transform

Consider the Discrete Fourier Transform (DFT) for a sequence of complex numbers $x_0, x_1, \ldots, x_{N-1}$:
$X_k = \sum_{n=0}^{N-1} x_n \cdot e^{ -\frac{2\pi i k n} {N} } , \quad k=0,\ldots, N-1$
This is a linear transform; it requires only the summation of sequence samples and multiplication by scalar values $e^{-\frac{2\pi i k n}{N} }$ (sometimes called twiddle factors). We usually implement this via a Fast Fourier Transform (FFT) which reduces the computational complexity by the use of a few mathematical tricks.

Paillier Cryptosystem

The Paillier cryptosystem [1] encrypts plaintext messages $m\in \mathbb{Z}_n$, where $n$ is the product of two large prime numbers, to a ciphertext $c\in \mathbb{Z}^*_{n^2}$. Let $\text{Enc}$ and $\text{Dec}$ denote the encryption and decryption processes respectively. It can be shown that the Paillier cryptosystem satisfies the two following properties:
1)      For two plaintext messages $m_1, m_2\in \mathbb{Z}_n$
                  $\text{Dec}( \text{Enc}(m_1) \cdot \text{Enc}(m_2) \text{ mod } n^2)$ $ = m_1 + m_2 \text{ mod } n$.
2)      For a plaintext $m$ and any constant integer $k$
$\text{Dec} ( \text{Enc}(m)^k \text{ mod } n ) = km \text{ mod } n^2$.

Properties 1 and 2 allow us to add any two plaintexts and multiply any plaintext by an integer constant in the encrypted domain; these operations are sufficient to implement the Fourier Transform (DFT/FFT).

The Fourier Transform under Paillier Encryption

There are two major challenges in the implementation of a Fourier Transform under Paillier encryption:   
1)      Paillier encrypts and allows operations only on integer values modulo $n$. We must convert our sequence samples $x_0, x_1, \ldots, x_{N-1}$, and twiddle factors $e^{-\frac{2\pi ikn}{N} }$, into integers modulo $n$. We achieve this by multiplying by two appropriately chosen scale factors; one for the sequence samples and one for the twiddle factors. These scale factors must be dealt with throughout the Fourier Transform.
2)      The homomorphic addition and scalar multiplication operations result in plaintext values modulo $n$. That is, if we wish to sum two plaintext values $m_1, m_2$ in the encrypted domain we must ensure the resulting plaintext is no greater than $n-1$. Otherwise, we can say nothing about the magnitude of $m_1 + m_2$. Since we wish to perform many additions and scalar multiplications throughout the Fourier Transform, we require $n$ to be large enough to cope with the growth from all of these operations.

These two challenges have been discussed in [2]. Challenge 1 means that we need to deal with the quantisation scale factors when performing operations. The analysis of challenge 2 in [2] results in the following lower bound for the word length $n_p$ of the modulus $n$ after radix-2 FFT operations:
$n_p \geq v + (v-2) n_2 + n_1 + 3$

$v = \log_2 (N)$
$n_1 = b_1-1$, where $b_1$ is the number of bits of precision in the sequence samples
$n_2 = \left\lceil b_2 - \dfrac{v}{2} + \dfrac{1}{2} \right\rceil$ where $b_2$ is the hardware register size in an equivalent plaintext implementation of the Fourier Transform. Equivalently, $n_2$ is the number of bits of precision required in the twiddle factors to ensure the same precision in the result of the Fourier Transform as a standard (in clear) hardware implementation. 

Typical SAR applications perform multiple FFT operations, as well as other multiplications. For example, the Single Look Complex Image processing employs four 2-dimensional FFTs and 3 Hadamard products. Using the inequality above for one (1-dimensional) FFT operation, our analysis of the Single Look Complex Image processing results in a lower bound: 
$n_p \geq 8v + 8(v-2) n_2 + n_1 + 3(n_3-1) + 24$
for the modulus word length, where $n_3$ is the word length associated with each of the inputs to the Hadamard products. We expect values of the order $v=13$, $n_1 = 7$, $n_2=10$, $n_3=16$, giving a lower bound of $n_p\geq 1060$. It is likely that the modulus word length is required to be larger than this bound for security reasons so there is plenty of scope to perform Single Look Complex Image processing.


[1]  Paillier, "Public-Key Cryptosystems Based on Composite Degree Residuosity Classes", EUROCRYPT, Springer. pp. 223–238 (1999).
[2]  Bianchi, T., Piva, A., Barni, M, "On the Implementation of the Discrete Fourier Transform in the Encrypted Domain", IEE Transaction on Information and Security , 86-97 (2009).

Wednesday, April 27, 2016

Tightness of the error bound in Ring-LWE

About two months ago we have posted a short paper on eprint sharing some thoughts on the error size used in Ring-LWE, an increasingly important hard problem that was proposed by Lyubashevsky, Peikert and Regev [LPR] as a building block for cryptography. Besides resistance against quantum computers, one of the main features of Ring-LWE for HEAT is that it enables homomorphic encryption.

Very briefly, in Ring-LWE one fixes a degree $n$ number field $K$ with ring of integers $R$ and discriminant $\Delta$. One also fixes an integer modulus $q$ and an error parameter $r \in \mathbb{R}_{> 0}$, which we think of as depending on $n$ only. We write $R_q$ and $R_q^\vee$ for the reduction of $R$ and its codifferent (or `dual') modulo $q$. The Ring-LWE problem is about guessing a secret $\mathbf{s} \in R_q^\vee$ from a number of samples of the form $(\mathbf{a}, \mathbf{a} \cdot \mathbf{s} + \mathbf{e} )$, where $\mathbf{a} \in R_q$ is random and $\mathbf{e}$ is sampled from the distribution on $(K \otimes \mathbb{R})/q\mathbb{Z}$ that one obtains by pulling back a spherical Gaussian of width $r$ under the canonical embedding $K \hookrightarrow \mathbb{R}^n$.

In a recent series of papers Elias, Lauter, Ozman and Stange [ELOS], and Chen, Lauter and Stange [CLS1], [CLS2] investigated whether the existence of a homomorphism $\phi : R_q \rightarrow S$ to some small ring $S$ can be exploited to leak information about the secret $\mathbf{s}$. For instance if $R_q$ is of the form $\mathbb{Z}[x]/(q,f(x))$ for some $f(x) \in \mathbb{Z}[x]$ satisfying $f(1) = 0$ mod $q$, then $\phi$ could be evaluation at $1$: $$ \phi : R_q \rightarrow \mathbb{F}_q: \mathbf{a} \mapsto \mathbf{a}(1).$$ Although this idea is attractive, the concrete vulnerable instances proposed by [ELOS], [CLS1], [CLS2] do not follow the original set-up from [LPR]. Instead they
  1. sample the secrets from the ring $R_q$, rather than its dual $R_q^\vee$,
  2. use an error parameter of the form $|\Delta|^{1/2n}\cdot r$, rather than just $r$.
It is not entirely clear to us what the implications of 1. are, but at least one should scale up the error parameter in order to compensate for the relative sparseness of $R$, as is indeed done in 2. However the scalar $|\Delta|^{1/2n}$ is chosen from a Polynomial-LWE point of view and is not adapted to the way the Ring-LWE errors are generated. Indeed, the canonical embedding may be very skew, so even though `on average' things will be okay, there could exist a basis of $R$ with respect to which some of the coordinates of $\mathbf{e}$ are negligible. This in turn leads to exact linear equations in the secret $\mathbf{s}$, obtained by merely rounding, which allows for a recovery of the secret using elementary linear algebra.

In a previous paper (reported upon in a blogpost below) we showed that the families from [ELOS] indeed suffer from this skewness, and therefore were easy to start from. To some extent this also seems to apply to the examples from [CLS1] and [CLS2], as is discussed in the last section of our current paper, where we make a more systematic analysis of the skewness phenomenon. Our main contributions are as follows:
  • We first give an informal motivation for why the most natural choice of scalar is $|\Delta|^{1/n}$, rather than $|\Delta|^{1/2n}$, if one wants to set up a non-dual version of Ring-LWE as in 1.
  • Next we provide for any $\varepsilon > 0$ a family of number fields in which non-dual Ring-LWE can again be broken using elementary linear algebra as soon as the errors are scaled up by $|\Delta|^{(1 - \varepsilon)/n}$ only.
  • Finally we show that also the actual (i.e., dual) Ring-LWE problem is easily broken for the same families, as soon as the errors proposed by [LPR] as scaled down by $|\Delta|^{\varepsilon/n}$.
So our conclusions strongly suggest that the problems with the examples from [ELOS] are related to 2., rather than 1. They also provide a tightness statement on the error size proposed by [LPR], at least from a discriminant point of view. Immediate research questions that arise are whether there exist weak instances of non-dual Ring-LWE, when set up with the scalar $|\Delta|^{1/n}$, and, more specifically, whether there exist attacks involving a homomorphism $\phi : R_q \rightarrow S$ that apply here? Or that apply to the actual Ring-LWE problem as introduced in [LPR]?

About a month ago Peikert posted a paper sharing similar (but more detailed) thoughts on the vulnerability of the Ring-LWE instantiations described in [ELOS], [CLS1] and [CLS2].

Wouter Castryck, Ilia Iliashenko, Frederik Vercauteren

Wednesday, March 30, 2016

CryptoExperts is hiring a Post-Doc Researcher

CryptoExperts (Paris, France) is looking for an excellent postdoctoral fellow to work on the design, the analysis, and the implementation of homomorphic encryption and lattice-based cryptography, who will be primarily involved in the HEAT project.

In this position, the candidate will also be involved in the French CryptoComp project on homomorphic encryption, which aims at developing a fully automated “source to source” compiler which convert any function into a “secure” version of this function able to run on encrypted data. 

In both projects, the challenging industrial use-cases will allow to ground the research in practical issues. The candidate will be able to carry research according to its research interests (theoretical and/or practical), and will be offered multiple travel opportunities and interactions with scientists all over Europe (including the HEAT partners). Participation to the open-source libraries on homomorphic encryption and proof-of-concept demonstrators, main outcomes of the HEAT project, will be expected.

Applicants should have a Ph.D. in a relevant subject or equivalent, have a strong publication record and interest in practical protocols. Implementation and coding experience are required.

The position is fully funded for two years (with a flexible starting date). Review of applications will begin immediately until the position has been filled. Knowledge of French is not mandatory. To apply, please send an e-mail to jobs (at) cryptoexperts.com with the following documents:
  • Curriculum vitae, with publication list
  • Reference letters

Saturday, February 6, 2016

NIST Draft on Post Quantum Cryptography

On February 4th 2016, the National Institute of Standards and Technology (NIST) released a draft entitled Report on Post-Quantum Cryptography. This draft presents NIST's current understanding about the status of quantum computers and post-quantum cryptography, and its initial plan to move forward.

In particular, the NIST has decided that time has come to prepare for the transition to quantum-resistant cryptography. Some striking facts reported in the draft are:

  • "researchers working on building a quantum computer have estimated that it is likely that a quantum computer capable of breaking RSA-2048 in a matter of hours could be built by 2030 for a budget of about a billion dollars" (p.6)
  • "transitioning from 112 to 128 bits of security is perhaps less urgent than transitioning from existing cryptosystems to post-quantum cryptosystems." (p.6)

This engagement of the NIST to standardize post-quantum cryptography is a true opportunity for everyone to react before practical quantum computing is imminent. The NIST plan is as follows:

  1. Feb. 3 2016 - Mar. 9 2016: public comment period for the draft Report on Post-Quantum Cryptography.
  2. 2016: proposition of a draft on the evaluation criteria for quantum-resistant public key cryptography, and public comment period.
  3. until late 2017: the NIST will accept proposals for quantum-resistant public key encryption, digital signatures, and key exchange algorithms.
  4. 2018 + 3/5 years: public scrutiny of the candidates to be standardized.
The NIST emphasizes that this process should not be considered as a competition. The institute see its role as a manager of a transparent process engaging with the cryptographic community, that will eventually reach a consensus on cryptographic standards to be endorsed by industry and other standards organizations around the world.

The HEAT project focuses on bringing fully homomorphic encryption one step further, into products and standards. Obviously, this cannot be, and is not, orthogonal to the quantum-resistant cryptography process. In particular, most of the fully homomorphic encryption schemes belong to one of the main family of post-quantum cryptographic primitives: lattice-based cryptography. As a consequence, all the research that is, and will be, done on fully homomorphic encryption goes hand in hand with the process initiated by the NIST. 

In particular, the Security analysis and parameter recommendations outcome will directly feed the research on analysis of post-quantum primitives. Our open-source software and hardware libraries will include algorithms that will help better understand the performance potential of lattice-based cryptography. Also a member of the HEAT consortium is involved in the ISO standardization, and the main editor of the upcoming WG 2 standard on Homomorphic Encryption, that is, encryption that supports computing over encrypted data. A workshop on Cryptographic Standards and Evaluations  (AWACS 2016) will be organized by CryptoExperts, on behalf of the ECRYPT-CSA project, on May 8 2016, and will include talks and a panel discussion on the present and near-future trends in crypto standards (with an emphasis on the post-quantum rush). 

Last but not least, many members of the HEAT consortium are actively involved in the research on post-quantum primitives such as key-exchange or digital signatures.

Monday, January 11, 2016

Satellite Applications of Homomorphic Encryption

Thales UK has been investigating the use of Homomorphic Encryption (HE) in satellite scenarios. In particular, our assessment has focused on the data processing of imaging products; one of the many types of signal processing applications which could have been chosen.  

It is relatively simple to find signal processing satellite applications where HE could, in theory at least, provide real benefit. In particular, to provide end-to-end security of data from the sensor on the satellite to the end user, while allowing the potentially complex and specialised signal processing to be performed on shared (and less trusted from the end user’s point of view) infrastructure. The real challenge has been to find such applications that are potentially practical in the near term, using Somewhat Homomorphic Encryption (SHE).

Consider the Sentinel 2 Multi Spectral Imager which is a representative example of an earth observation satellite (in this case designed for environmental monitoring). In order to be suitable for the application of HE to provide sensor to end user security, all the signal processing algorithms would have to be amenable to HE. It turns out that many of them potentially are, in particular the deconvolution algorithms that use Fourier Transforms. However, other parts of the processing are more difficult to deal with in HE. For example, processing to deal with imperfections such as:
  • saturated pixels
  • no data pixels and partially corrected pixels (crosstalk correction)
  • defective pixels
These are very simple algorithms, but involve a lot of conditional operators (<, >, etc. to test pixel values) that would be expensive when implemented on HE encrypted data.
A much more promising candidate is Synthetic Aperture Radar (SAR) (e.g. Envisat). SAR is used to provide high precision, and often 3D, imaging for applications such as sea ice and glacier monitoring, vegetation coverage analysis, disaster monitoring and traffic surveillance. Being a radio-based technique, it is not affected by weather or the lack of daylight. The raw data collected by the satellite consists of a series of radar echoes, and these must be processed to form an image. On examining the details of this processing, it can be seen that the core functionality consists of a series of:
  • Fourier Transforms (and their inverses)
  • Hadamard products of matrices
These can be implemented, in theory at least, with a low multiplicative depth, and without the need for conditional operators. This is therefore a good candidate application for SHE, but much work needs to be done to demonstrate its practicality.