Wednesday, April 29, 2015

Homomorphic Encryption At EUROCRYPT 2015

This is a blog post from the Eurocrypt 2015 conference held in Sofia, Bulgaria, this week. There were a number of talks which were related to homomorphic encryption; which we discuss below.

Kristen Lauter: Practical Applications of Homomorphic Encryption

Kristen gave one of the three invited talks, and she focused on topics which are closely related to the HEAT project. She first described the Secure Genome Analysis competition run last month, which was sponsored by the United States' NIH funding agency. The idea behind the competition was to encourage research on practical computations on encrypted genomic data. The competition involved both FHE and MPC technology, and attracted eleven submissions. The submissions related to FHE from Kristen's group at Microsoft as well as submissions from IBM, Stanford/MIT, UCI and Tsukuba.

The competition raised a lot of excitement in the bioinformatics community and in popular science journals. The reason for this excitement is the confluence of interest in both privact protection and genome research data, which on its own poses a significant risk in relation to personal data. The competition also encouraged interdisciplinary work between Computer Science, Mathematics,
Bioinformatics and Policy.

The spur in research in the area has resulted from the drop in the cost of sequencing a human genome. This was estimated to cost 3 billion US dollars in 2000 per person, but now costs around 1000 USD per person. This drop in cost opens up new applications in personalized medicine, forensic work, ancestry, anthropology and other areas. Yet, it brings to the fore the need for  LARGE
SCALE PRIVACY PRESERVING analysis. Since the privacy risks are not only for the people being sequenced but also their families.  

Kristen went onto discuss the various Secure GWAS (Genome Wide Association Scenarios) that the competition covered. The two main areas were where there was a single data owner (e.g. one patient and one hospital) versus multiple data owners (which are mututally distrusting).  The talk focused on the single data owner case, as the other case is really more suited to MPC than FHE.

There were two challenges in the single data owner case. The first challenge was related to large scale statistical studies. Two hundred cases were taken from the PGP (Personal Genome Project) from Harvard Medical School, as well as 200 randomized controls and various statistical operations were performed. It turned out that the required computation could be done by only the additively homomorphic Paillier system. However, the somewhat homomorphic LWE based system had faster evaluation time than the Paillier scheme, and was hence relatively competitive.

In the second challenge two individuals were randomly selected from the PGP and 5000, 10000 and 100000 locations in the genome sequence were also selected. The competitors task was to compute the edit distance and Hamming Weight on the two sequences at the given locations. Despite the complexity of the task (the edit distance has a relatively deep circuit), the processing times for the LWE based schemes was measured in seconds as opposed to minutes.

It turned out that much of the work in these challenges was not directly cryptographically related. There were issues related to cleaning of the data, how data was encoded, time vs memory trade-offs in the algorithms used, as well as parameter selection (related to function and data size as opposed to
security).
   
Kristen then discussed another example, which was demonstrated at a recent American Assoication for the Advancement of Science (AAAS) meeting. In this application encrypted measurements of an individuals weight, age, height, blood pressure etc were transmitted to the cloud, and then the cloud produced an encryption of the likelihood of the individual having a heart attack. The overall computation took less than 0.2 of a second on the cloud, however the cost of sending the ciphertext to the cloud was far more due to the large ciphertext sizes.  In some sense this was a silly example as the function could be computed by the individual themselves. However there are scenarios where the function is private, thus this example demonstrates that something can be done in such situations.

Kristen ended with a number of research open problems which included that of reduced the storage and transmission costs, which often dominated the computation in the examples considered in the talk, the problem of analyzing the new hard problems such as RLWE upon which most schemes depend, as well as moving the computations to those of real life in scale by (for example) considering larger amounts of data or deeper circuits.

Ciphers for MPC and FHE:  Martin Albrecht, Christian Rechberger, Thomas Schneider, Tyge Tiessen and Michael Zohner

Earlier in the conference this paper was presented which tries to reduce the storage requirements by using a hybrid FHE scheme $HE_{pk}(k), Enc_k(m)$. The key problem being to define a symmetric encryption scheme $Enc_k$ which is easy to homomorphically evaluate using a homomorphic encryption scheme. This is a new design requirement for such symmetric primitives. The main difference between standard ciphers is that in the homomorphic environment XOR gates are cheaper than AND gates (a situation which holds also in MPC applications as well as FHE ones). Usually cipher designers assume the two costs are equivalent. Thus the question arises as to what would an efficient cipher look like if such linear operations are for free.

The paper considered a number of possible metrics for such ciphers for example i) the number of ANDs per bit of encrypted text, ii) the multiplicative depth, or iii) the total number of ANDs per encryption. Existing schemes such as Noekeon, Fantomas, and Robin meet these design criteria; but they were introduced for side-channel resistance. For FHE applications we need something more extreme.

The authors decided to pick an SPN design with very small S-Boxes. However, not all S-Boxes where used in each substitution later. The parameters of the scheme are the block size n, the number of S-Boxes used in the substitution layer m, the key size k, and the data complexity d (effort required to break the cipher). From these values they estimate the required number of rounds r.

For the S-Boxes the authors selected a 3-bit to 3-bit S-box consisting of three AND gates and having multiplicative depth one. The affine layers are selected by choosing a uniform random binary matrix and vector and computing  
$$ Ax+b. $$ 
To produce a NUMS (Nothing Up My Sleeve) matrix the Grain LFSR is used to generate the matrix.
 
The talk went on to discuss security against differential, algebraic and other forms of cryptanalysis. Coming up with example parameters of $n=256$, $m=63$, $r=12$ and $d=2^{128}$ with a total AND depth of twelve (compared to $40$ for AES).
 
Fully Homomorphic Encryption over the Integers Revisited: Jung He Cheon and Damien Stehle

This paper examined the security of integer based FHE/SHE schemes. These are derived from the original DGHV scheme and their security is based on the AGCD problem. There are two key issues with schemes based on AGCD; first there is no known reduction to a classical lattice problem (unlike the LWE based schemes) and the schemes need an additional sparse subset sum problem for
bootstrapping.

This talk solved both of these problems. It showed a reduction from LWE to AGCD, and hence AGCD is no easier than worst case lattice problem. It then showed that the cost estimate of various previous orthogonal lattice attacks on these schemes was overly pessimistic. Finally it gave an integer based scheme which was as secure as LWE, had smaller ciphertexts than previous schemes and
which did not require an additional assumption for bootstrapping.

The new scheme was similar to previous schemes in that encryption involved sampling a subset of encryptions of zero and then adding these onto the message. The message itself was embedded in the upper bits of the ciphertext (as in usual scale invariant schemes). Finally multiplication was performed using the usual trick of BitDecomp and PowerOfTwo used in other schemes.

Leo Ducas and Daniele Micciancio: FHEW: Bootstrapping in less than a second.

Whilst HELib (see below) can do bootstrapping in about 6 mins, and this is amortized over many ciphertexts using SIMD operations, this work refreshes one ciphertext, with no amortization, in 0.61 seconds on a 3GHz machine.

The key idea is to provide a new cheap NAND gate and then a simpler refreshing procedure using the ring structure. The cost of the Homomorphic NAND operation is relatively very small.  This was then combined with the general bootstrapping  approach using the ring variant of GSW scheme and using the approach of Alperin-Sheriff and Peikert from last year.  Implementing mod q arithmetic "in
the exponent".   The decryption circuit itself is implemented as an accumulator; which is very well suited to the GSW scheme.  To do reduction mod q we embed the group $(Z_q,+)$ into the $q$th roots of unity.

For sample parameters the authors have a bootstrapping key size of 1.3 GB. The main issue is computing fast fourier transforms (for which they used the FFTW library).

Bootstrapping for HELib: Shai Halevi and Victor Shoup.

The talk started by discussing the three generations of FHE schemes (as typified by the original Gentry scheme, the BGV scheme and the GSW scheme). The slow noise growth in third generation schemes such as GSW is however incompatible  with ciphertext packing.  Thus in terms of efficiency there is a fork in current schemes. One either uses packing and large noise (as in BGV style schemes) or small noise and no packing (as in GSW style schemes). The purpose of this talk was to focus on schemes which support packing.

There are very few bootstrapping implementations; and none (up to now) utilize packing. The paper starts by utilizing the techniques of GHS12 (The "Better Bootstrapping" paper), which focuses on $p=2$ and $q=2^e+1$. The technique requires one to encrypt the secret key modulo the plaintext
space $2^{e+1}$. Then we need to extract the most significant bit; this is the non-linear part of the procedure.

The problem in GHS12 was applying the procedure to packed ciphertexts. This is a linear operation. Then we do the above non-linear step. Then reverse the linear step. Alperin-Sheriff and Peikert showed how to perform these linear steps more efficiently.  This paper generalises GHS12 technique to arbitrary $p$, and not just $p=2$ and then specializes the AP linear techniques to make them more efficient. 

Computationally the hard bit is the non-linear step, the hardest to implement is however the linear parts.  The authors presented a technique to implement matrix-vector multiplicatin in a SIMD manner. This method uses a arotation methodology (slow but adds little noise) and multiplication by constants (fast but adds lot of noise). To get the algebra to work they needed to impose constraints on m and p (dimension and plaintext modulus).

For small $p$ bootstrapping takes 12 levels, for larger $p$ can go to 24 (or even 48 levels). Can put extension field elements in the slots, and not just integers.  Single thread performance between 5 mins and 5 hours. For AES style parameters have a depth 24 evaluation of which 11 are for bootstrapping,
allowing enough for a round of AES to be done.  Now AES with bootstrapping allows much smaller parameters, without need bigger parameters. The new implementation without bootstrapping can evaluate 120 blocks in 245 secs, whilst the method with bootstrapping can evaluate 180 blocks in 1050 seconds

Thursday, April 23, 2015

A short history of lattice reduction in cryptography: from direct applications to FHE.

After the discovery of the LLL lattice reduction algorithm in 1982, it quickly became an handy tool for cryptanalysts. In particular, it was used as a weapon of choice against knapsack-based cryptosystems. More generally, at that time, lattice reduction was mostly used to find linear relations between integers or vectors of integers. This is a very direct application but it already showed the power of the LLL algorithm as a cryptographic tool. Interestingly, in order to analyze the performance of lattice reduction in this context, it was pretty standard to assume that a lattice reduction oracle was available and to prove that a given system could broken with such an oracle. The underlying heuristic assumption was that in practice, the behavior of LLL closely approximated the reduction oracle. This heuristic was motivated by the fact that for the lattice dimensions that were tractable at that time, the result of LLL was usually good enough to make attacks developped in the reduction oracle setting pratical.

In 1996, the discovery of Coppersmith's small root algorithms was a major change that greatly renewed the interest of cryptanalysts in lattice reductions. After this breakthrough, the technique was used in a variety of cryptanalyses, allowing to break some special, speed-optimized, constructions of RSA keys or to leverage side-channel techniques leaking partial information on keys into full-fledged attacks. More surprisingly, it was also used constructively as a tool to understand the security of RSA-OAEP. A very strong property of Coppersmith's small root algorithms it that it is no longer necessary to assume the existence of a reduction oracle and that LLL itself can provably be used in the method.

Droping the reduction oracle is a very important progress. Indeed, around the same time, it became clear that lattice reduction remains a difficult problem. From a complexity theoretic viewpoint, Ajtai showed that the problem of finding a shortest non-zero lattice vector is NP-hard for randomized reduction. From a constructive cryptographic viewpoint, the first cryptosystems based on the hardness of lattice related problems started to appear. Since then lattice-based cryptography underwent steady progress. A major success of the field was the discovery by Gentry of the first fully homomorphic encryption scheme based on ideal lattices.

Friday, April 17, 2015

Case Study: Automated Detection of Organized Crime

Automated Detection of Organized Crime (ADOC)

Europol’s 2013 Organised Crime Threat Assessment (OCTA), published April 2013, reveals that organised crime is becoming increasingly diverse in its methods, group structures, and impact on society. A new criminal landscape is emerging, marked increasingly by highly mobile and flexible groups operating in multiple jurisdictions and criminal sectors.
Organised crime is a multi–billion euro business in Europe and it is growing. The ex- pansion of Internet and mobile technologies, the proliferation of illicit trafficking routes as well as opportunities offered by the global economic crisis, have all contributed to the development of a more potent threat from organised crime 
says Rob Wainwright, Director of Europol.

Tools supporting the systematic environmental scanning for weak signals, searching / fusing / interpreting data from different sources such as databases with personal information and public web sites exist and are increasingly being used by intelligence services. Such systems typically have access to many large personal databases.

EU Data Protection

The most important instruments governing data privacy in Europe (which the HEAT architecture will fully comply with) include
- the Council of Europe (CoE) Convention on Human Rights and Fundamental Freedom (ECHR) – namely art. 8;
- the Charter of Fundamental Rights of the European Union (CFREU) – namely art. 7 and 8;
- the CoE Convention No. 108 on the protection of individuals with regard to automatic processing
of personal data (Convention No. 108) and;
- the Recommendation No. R (87) 15 of the committee of ministers to member states regulating the use of personal data in the police sector.

On one side, the disrespect of these constraints not only exposes police investigators and other OC fighter to legal sanctions and understandable public protest, but also may jeopardise legal cases against members of OC, who often benefit from excellent legal assistance.
On the other side, law enforcement forces need the detection of weak OC signals resulting from environmental scanning to effectively fight existing and emerging OC.
Therefore the abovementioned legal instruments protecting data privacy severely handicap investigators in their attempt to anticipate, detect and fight OC, which in affects threatens personal freedom. The protection of citizen rights makes it more difficult to protect citizen rights.

Solving the Problem with an FHE-based Architecture


The basic principle consists in using FHE/SHE to encrypt databases while enabling data aggrega- tion in the Cloud for authorized users. Authorized users of the ADOC system are law enforcement entities (EU police forces, financial/tax authorities, etc.). A trusted authority (a.k.a a ”judge”) owns a master private key sk and performs on-demand decryptions. The architecture supports the following high-level functionalities:
Database enrollment. a user detaining a number of databases containing personal data encrypts these databases under a system-wide FHE public key pk and publishes the encrypted databases in the Cloud.
Data aggregation. a user performs a data aggregation algorithm homomorphically, possibly across multiple encrypted databases, and gets an encrypted result.
Authorized decryptions. the trusted authority is sollicited by the user to decrypt the aggregation result. Prior to decryption, evidence is provided by the user that the performed aggregation is authorized, e.g. by providing a verifiable transcript of the computation. The trusted authority may provide the necessary material for the user to verify that the decryption was performed correctly.


Tuesday, April 7, 2015

University of Luxembourg: Postdoc in Cryptography

The University of Luxembourg is looking for a Postdoc in Cryptography, with a fixed-term contract of 3 years.

You will work on a new project on Fully Homomorphic Encryption (FHE). The goal is to improve existing FHE schemes, and possibly design and implement new ones.

You should have a PhD in cryptography. Experience with FHE is a plus but not a necessity.

We offer a personal work space at the University, a highly competitive salary, and a dynamic and multicultural environment.

To apply: http://emea3.mrted.ly/n139

Please send your application online until May 15th, 2015. Applications will be considered on receipt therefore applying before the deadline is encouraged

Contact: Jean-Sebastien Coron
jean-sebastien.coron (at) uni.lu

Closing Date for Applications: 2015-05-15

Monday, April 6, 2015

The BGV scheme

The BGV scheme aims to achieve fully homomorphic encryption without bootstrapping. The key technical tool used in doing so is modulus switching. This is used recursively in order to keep the noise level constant by switching to a smaller modulus, thus eliminating the need for bootstrapping.

The scheme is over a ring $R$, usually the ring of integers modulo an odd integer $q$ in the LWE instantiation, or a polynomial ring over the integers modulo an irreducible $f(x)$ and an odd integer $q$ in the RLWE one. In the GHS variant, $f(x)$ is always chosen to be a cyclotomic polynomial $\Phi_{m}$ for some $m$ coprime with $q$.

A ciphertext c in $R^{n}$ is said to encrypt a message $m$ if it satisfies

$m = [[<c,s>]_{q}]_{2}$,

where s is the secret key. In the RLWE version, this becomes $m=[[<c,s>\mod f(x)]_{q}]_{2}$. $[.]_{q}$ denotes modular reduction in the interval $(-q/2, q/2]$.

$[<c,s>]_{q}$ is called the noise associated with c under s, and c decrypts as long as the noise remains below $q/2$. For two ciphertexts of noise at most $B$, addition results in noise with magnitude at most $2B$, whereas multiplication roughly squares the noise.

Modulus-switching allows for the noise level to be kept constant, while the modulus size decreases with each modulus switch, as does the homomorphic capacity of the scheme. It allows us to go from a ciphertext c with respect to a modulus $q$ to a ciphertext c' with respect to a modulus $p$ by simply scaling by $p/q$ and rounding appropriately. Most importantly, while preserving correctness,

$[<c,s>]_{q}=[<c',s>]_{p} \mod 2$.

This reduces the magnitude of the noise without having to resort to bootstrapping and without needing to know the secret key s. Note however that it only decreases the absolute magnitude of the noise. The noise to modulus ratio remains unchanged. To see why the absolute magnitude is important, suppose your modulus $q$ is such that $q \approx x^{k}$, and you multiply two modulus-$q$ ciphertexts with magnitude $\approx x$. After four multiplications only, the noise magnitude reaches $x^{16}$. Another multiplication brings the noise level to $x^{32}$. In this case, the noise ceiling is reached in only $\log k$ multiplications.

To allow for more multiplications, we take a ladder of gradually decreasing moduli $q_{0} > q_{1} > ... > q_{L}$ such that $q_{i} \approx q/ x^{i} $ for $i < k$ and $q$ as above. After each multiplication, switch down to the next modulus; multiplying two $\mod q_{0}$ ciphertexts of noise $x$ gives a ciphertext of noise $x^{2}$. Then switching down to $q_{1}=q_{0}/x$ reduces the noise back down to $x$. This approach allows for $k$ levels of homomorphic operations, as opposed to $\log k$.

Of course, once the last level $q_{L}$ is reached, modulus switching cannot be performed anymore. Bootstrapping is then used as an optimisation to allow for an indefinite number of homomorphic operations.

Another interesting optimisation proposed is batching. The idea is to pack multiple plaintexts into one ciphertexts, so that homomorphic operations can be evaluated on multiple inputs efficiently. This relies on the Chinese Remainder theorem; use the ring $R = \mathbb{Z}[x]/(f)$ for some irreducible polynomial $f \in \mathbb{Z}[x]$, and take plaintexts to be elements of $R_{p}=R/(p)$. If we pick the parameters carefully, it will turn out that $f(x)$ factors $\mod p$, which by the Chinese Remainder theorem gives the isomorphism $R_{p} \cong R_{\mathfrak{p}_{1}} \times ... R_{\mathfrak{p}_{l}}$. Each $\mathfrak{p}_{i}$ is an ideal in the ring $R$, corresponding to a factor of $f(x) \mod p$. Then, performing one operation in $R_{p}$ means performing many operations in parallel on $R_{\mathfrak{p}_{1}} \times ... R_{\mathfrak{p}_{l}}$.