In a paper (https://eprint.iacr.org/2016/510), Jean-Claude Bajard, Julien Eynard, Anwar Hasan and Vincent Zucca explain how to perform the division and rounding steps occurring during decryption and homomorphic multiplication procedures of scale invariant schemes such as FV and YASHE without using any multi-precision library.
Such schemes, based on RLWE, usually lie in a power-of-two cyclotomic ring $\mathcal{R} = \mathbb{Z}[X]/(X^n+1)$ for efficiency reasons. In FV scheme, a ciphertext $\texttt{c} = (\texttt{c}_0, \texttt{c}_1)$ is a polynomial of degree 1 with coefficients in $\mathcal{R}_q = \mathbb{Z}_q[X]/(X^n+1)$. Each $\texttt{c}_i$ is then represented as a polynomial of degree $n-1$ with coefficients in $\mathbb{Z}_q$, $q$ is usually chosen as a product of several small prime moduli $q_i$. This allows to represent the integer coefficients of the $\texttt{c}_i$'s, through the Chinese Reminder Theorem (CRT), by their residues modulo the smaller $q_i$'s. However the use of the Residue Number System (a.k.a non-positional CRT representation) remains hardly compatible with procedures involving coefficient-wise division and rounding which require costly comparisons to be performed. Thus, until now, one had to invert the CRT representation and then use multi-precision arithmetic to perform these steps.
The techniques presented in this paper allow to perform these steps directly in RNS permiting to avoid a costly conversion between RNS and positional system and the use of multi-precision library. Compared with the FV-NFLlib implementation (https://github.com/CryptoExperts/FV-NFLlib) it enables speed-ups from a factor 5 to 20 for decryption and from 2 to 4 for homomorphic multiplication. It may also lead to future implementations on highly parallel accelerator cards such as GPU. This paper will be presented at SAC (http://www.engr.mun.ca/~sac2016/), in St John's Canada, in August, this blog post briefly presents it.
Decryption
A plaintext $\texttt{m}$ is an element of $\mathcal{R}_t = \mathbb{Z}_t[X]/(X^n+1)$, after being encrypted it becomes a ciphertext $\texttt{c} = (\texttt{c}_0,\texttt{c}_1)\in (\mathcal{R}_q)^2$ with $t < q$. To recover the message the decryption procedure consists in computing $[\lfloor t/q(\texttt{c}_0+\texttt{c}_1 \texttt{s}) \rceil]_t$ with $\texttt{s}$ the secret key.
There are two difficulties to perform this computation in RNS. First $\texttt{c}'=t(\texttt{c}_0+\texttt{c}_1 \texttt{s})$ lies in $\mathcal{R}_q$ and is represented in the "native" RNS base $\mathcal{B}_q = \{q_1,....,q_k\}$ with $q = q_1...q_k$. Thus in base $\mathcal{B}_q$, it is impossible to perform the division by $q$. Second computing a rounding is unnatural in a non-positional systeme like RNS.
To overcome the first difficulty we need to convert the elements in base $\mathcal{B}_q$ to another base coprime to $\mathcal{B}_q$. As we only need the result modulo $t$ we can use the single moduli base $\{ t \}$, as long as $t$ and $q$ are coprime (usually $t$ is a power of two while $q$ is odd). To achieve this conversion we use a fast base conversion which adds a multiple of $q$ called $q$-overflow. After the division by $q$ in the new base, $q$ is canceled and it only remains an error term $\tau$.
To solve the second difficulty, we choose to compute a flooring, using the equality $\lfloor \frac{x}{q}\rfloor = \frac{x-|x|_q}{q}$ instead of the rounding. The computation of the residue modulo $q$ is achieved by using a Montgomery reduction, which fits well in RNS. Of course, as flooring and rounding are not the same, an error may occur. In such case we simply consider this error as part of the previous one $\tau$.
At this step we need to cancel, the error $\tau$ due to the conversion and the flooring but also the inherent noise in the ciphertext to recover the message. To do this, we add an other moduli $\gamma$, coprime to $q$ to the base $\{t\}$, we multiply $\texttt{c}'$ by $\gamma$ in base $\mathcal{B}_q$ and we run the computations explained above in base $\{\gamma, t\}$ instead of $\{t\}$. Then, for a well-suited $\gamma$, we are able to cancel the noise in the residue modulo $t$ by subtracting the centered residue modulo $\gamma$ to it. Finally we are able to recover $[\texttt{m}]_t$.
To solve the second difficulty, we choose to compute a flooring, using the equality $\lfloor \frac{x}{q}\rfloor = \frac{x-|x|_q}{q}$ instead of the rounding. The computation of the residue modulo $q$ is achieved by using a Montgomery reduction, which fits well in RNS. Of course, as flooring and rounding are not the same, an error may occur. In such case we simply consider this error as part of the previous one $\tau$.
At this step we need to cancel, the error $\tau$ due to the conversion and the flooring but also the inherent noise in the ciphertext to recover the message. To do this, we add an other moduli $\gamma$, coprime to $q$ to the base $\{t\}$, we multiply $\texttt{c}'$ by $\gamma$ in base $\mathcal{B}_q$ and we run the computations explained above in base $\{\gamma, t\}$ instead of $\{t\}$. Then, for a well-suited $\gamma$, we are able to cancel the noise in the residue modulo $t$ by subtracting the centered residue modulo $\gamma$ to it. Finally we are able to recover $[\texttt{m}]_t$.
Homomorphic multiplication
To perform an homomorphic multiplication of two ciphertexts $\texttt{c}^{(1)}$ and $\texttt{c}^{(2)}$, you first have to scale the product $\texttt{c}^{(1)}\times\texttt{c}^{(2)}$ (which is a degree two polynomial with coefficients in $\mathcal{R}_q$) by $t/q$ and to round the result coefficient-wise as for the decryption. Then one has to relinearize this degree two polynomial to get back a degree one polynomial. But, to limit the noise growth during this step, it requires to decompose the coefficients of an element of $\mathcal{R}_q$ in some radix base $\omega$. Sadly, this is precisely the kind of thing we cannot do in a non-positional number representation.
One should understand that the product $\texttt{c}^{(1)}\times\texttt{c}^{(2)}$ should be lifted in $\mathbb{Z}$ and because we scale down by $t/q$ the coefficients of the product fits back in $\mathcal{R}_q$. Instead of making the product computation in $\mathbb{Z}$, we use a second RNS base $\mathcal{B}'$ coprime to $\mathcal{B}_q$ of same size to not loose any information.
Then we use a fast conversion to convert $\texttt{c}^{(1)}$ and $\texttt{c}^{(2)}$ from $\mathcal{B}_q$ to $\mathcal{B}'$. We manage to reduce the error in base $\mathcal{B}'$ (due to the $q$-overflow) in order to reduce its growth during the product and then we perform the product in both bases.
To perform an homomorphic multiplication of two ciphertexts $\texttt{c}^{(1)}$ and $\texttt{c}^{(2)}$, you first have to scale the product $\texttt{c}^{(1)}\times\texttt{c}^{(2)}$ (which is a degree two polynomial with coefficients in $\mathcal{R}_q$) by $t/q$ and to round the result coefficient-wise as for the decryption. Then one has to relinearize this degree two polynomial to get back a degree one polynomial. But, to limit the noise growth during this step, it requires to decompose the coefficients of an element of $\mathcal{R}_q$ in some radix base $\omega$. Sadly, this is precisely the kind of thing we cannot do in a non-positional number representation.
One should understand that the product $\texttt{c}^{(1)}\times\texttt{c}^{(2)}$ should be lifted in $\mathbb{Z}$ and because we scale down by $t/q$ the coefficients of the product fits back in $\mathcal{R}_q$. Instead of making the product computation in $\mathbb{Z}$, we use a second RNS base $\mathcal{B}'$ coprime to $\mathcal{B}_q$ of same size to not loose any information.
Then we use a fast conversion to convert $\texttt{c}^{(1)}$ and $\texttt{c}^{(2)}$ from $\mathcal{B}_q$ to $\mathcal{B}'$. We manage to reduce the error in base $\mathcal{B}'$ (due to the $q$-overflow) in order to reduce its growth during the product and then we perform the product in both bases.
Next, we use a fast conversion to convert the residues of the product in base $\mathcal{B}_q$ to base $\mathcal{B}'$. Then, as for decryption, we perform a flooring using Montgomery reduction instead of a rounding and we get $\lfloor t/q (\texttt{c}^{(1)}\times\texttt{c}^{(2)})\rfloor$ in base $\mathcal{B}'$. We consider the error due to the different conversions as part of the noise.
For the next step, we convert the result back in base $\mathcal{B}_q$, but this time using an exact base conversion (more costly), because we cannot tolerate overflows from the base $\mathcal{B}'$. To overcome the last difficulty we have managed to decompose the elements belonging to $\mathcal{R}_q$ in a very similar way from the original, however we can perform this one in RNS. The end of the homomorphic multiplication procedure remains the same, we just replace the original decomposition function by our own.
Impact on the noise
Of course, using all this tricks add some errors which become part of the noise and thus make it bigger. One could wonder if, and how, our multiplicative depth has been affected. Indeed our methods modify the different bounds on the noise for the homomorphic multiplication and also the decryption bound. Nevertheless, by running an analysis with our new bounds, we have shown that our multiplicative depth remains unchanged from the one of the original scheme, for all the sets of parameters but one.