## Costs of arithmetic over various rings.

In the field of Real numbers, we can evaluate a rational function (in fact, any algebraic function) to n bits in O(n log n log log n) time. In the ring Z_{2^n}, we can likewise evaluate a rational function in O(n log n log log n) time. In the field Z_p, where p is an n-bit prime, however, the best we can do is O(n (log n)^2 log log n) -- slower by a factor of log n. Why?

The obvious answer is that the multiplicative inverse -- the hard step in computing a rational function -- can be computed via a Newton iteration (also known as p-adic lifting and Hensel lifting) over the Real numbers and Z_{2^n}, whereas inversion modulo a prime requires use of a GCD algorithm in some form. Inversion modulo a large prime, rational reconstruction (over the Real numbers or over Z_m), and computing elementary functions are all problems which "live" in the domain of continued fractions instead of polynomials, and they pay the price of an extra factor of log n for that generality.

I'm not altogether satisfied with this explanation; I get the feeling that there must be something more subtle involved. In the case of computing algebraic functions, note that this is easy over the Real numbers but hard modulo a large composite -- in fact, RSA depends upon this -- but of greater significance, even determining if a root exists is hard. The same applies somewhat to inversion: You can determine if an inverse exists over the Real numbers trivially (is the value non-zero?), over Z_{2^n} trivially (is the value odd?), but over Z_m you need to determine relative primality. That said, determining if an inverse exists modulo a prime is also trivial, so this is not merely a case of "solving a problem is at least as hard as determining if a solution exists", but my intuition suggests that there's something along those lines involved.

While this is a rather indistinct train of thought, it comes out of a very concrete problem. I'm looking at algorithms for arithmetic of RSA-sized values as part of my work towards a side channel free cryptographic library, and there's a curious dilemma: Using the FFT, arithmetic modulo 2^n + 1 is twice as fast as arithmetic modulo 2^n, so I'd prefer to use the first ring when performing Montgomery multiplication; but the initial setup requires computing an inverse over the ring, which is much faster for the second ring. I think the best approach might be to work with one ring for private key operations and the other for public key operations, due to the difference in the relative importance of overhead costs.