| 1 | Multiprecision maths library |
| 2 | |
| 3 | |
| 4 | Catacomb comes with a fairly reasonable multiprecision maths |
| 5 | library. It's not stunningly fast, and doesn't do everything, |
| 6 | but it seems good for the sorts of things needed in |
| 7 | cryptographic algorithms (which, for example, GMP isn't[1]). |
| 8 | |
| 9 | There are basically three layers to the system: |
| 10 | |
| 11 | * The low-level `mpx' layer, which implements some basic |
| 12 | arithmetic but is fairly unpleasant to actually use. |
| 13 | |
| 14 | * A high-level `mp' layer, which provides a useful `mp' |
| 15 | multiprecision type and implements useful arithmetic on |
| 16 | them. |
| 17 | |
| 18 | * Everything else. A collection of miscellaneous routines for |
| 19 | performing particular operations efficiently or simply. |
| 20 | |
| 21 | [1] In particular, GMP is truly rotten at converting numbers |
| 22 | into well-defined binary formats. It has its own format, |
| 23 | but it's not defined in the manual, has changed once before, |
| 24 | probably doesn't conform to any standards in the |
| 25 | cryptographic community, and can only be written to or read |
| 26 | from a file stream, so you can't use it in memory (for |
| 27 | example to encrypt it). Basically, it's a right pain. |
| 28 | |
| 29 | |
| 30 | The low-level interface |
| 31 | |
| 32 | The low level is described in `mpx.h'. It includes everything |
| 33 | else it needs. |
| 34 | |
| 35 | An integer is represented in an array of words. Each word has |
| 36 | type `mpw' (multi precision word). The least significant word |
| 37 | comes first. An array is described by its `base' -- the address |
| 38 | of the first element -- and its `limit' -- the address *just |
| 39 | after* the last element. Thus, the length of the array is |
| 40 | `limit - base'. This representation is particularly convenient |
| 41 | for many things. An old version used base and length, which was |
| 42 | a nuisance. |
| 43 | |
| 44 | The largest value which can be represented in a word is |
| 45 | MPW_MAX. The number of bits used in the representation of a |
| 46 | word is MPW_BITS. Actually, you might be able to use more bits |
| 47 | and store larger numbers, but everything will go wrong if you |
| 48 | try. |
| 49 | |
| 50 | The macro call MPW(x) returns the low MPW_BITS bits of x. It's |
| 51 | very useful in low-level arithmetic. |
| 52 | |
| 53 | The call MPWS(n) tells you the `sizeof' an array of n words. |
| 54 | It's also very useful. Try not to get MPW and MPWS confused |
| 55 | because they mean very different things. |
| 56 | |
| 57 | The actual low-level macros and functions are documented |
| 58 | relatively well. They are given source and destination arrays |
| 59 | for things to be read and written to. Usually the sources |
| 60 | mustn't overlap the destinations, and destinations certainly |
| 61 | mustn't overlap other destinations; sometimes this restriction |
| 62 | is lifted, where it would be both convenient and easy to do so. |
| 63 | Sources can overlap each other without problems, because they're |
| 64 | not written to. |
| 65 | |
| 66 | The difficult parts are the Karatsuba functions. Karatsuba |
| 67 | and Ofman discovered a neat recursive way to multiply large |
| 68 | numbers. It requires quite a bit of workspace, and adds some |
| 69 | overhead, but has lower asymptotic complexity than the standard |
| 70 | multiply algorithm. Usually, Karatsuba functions get used |
| 71 | automatically by the rest of the library when appropriate and |
| 72 | you don't have to care. |
| 73 | |
| 74 | |
| 75 | The high-level interface |
| 76 | |
| 77 | The high-level invents a type, `mp'. It's a structured type, |
| 78 | with the following members: |
| 79 | |
| 80 | mpw *v, *vl Base and limit of the word array. |
| 81 | size_t sz Number of words actually allocated to the array. |
| 82 | unsigned f Various flags. |
| 83 | unsigned ref Number of references to the integer. |
| 84 | |
| 85 | The flags are interesting. Here's what they mean: |
| 86 | |
| 87 | MP_NEG The number is negative |
| 88 | MP_BURN Burn the number after using it |
| 89 | MP_CONST The number mustn't be modified or freed |
| 90 | MP_UNDEF The number contains no interesting value |
| 91 | MP_DESTROYED The number has been freed once already |
| 92 | |
| 93 | The last one is handy for catching bugs. MP_NEG is obvious -- |
| 94 | Catacomb uses a signed-magnitude representation because things |
| 95 | are usually easier that way. MP_CONST is set on various |
| 96 | constants provided for general convenience so they don't get |
| 97 | destroyed by accident. MP_UNDEF is used to avoid copying |
| 98 | useless data. |
| 99 | |
| 100 | MP_BURN marks a `secret' number. Secret numbers are overwritten |
| 101 | with zeroes (`burnt') when they're freed. Secrecy is a viral |
| 102 | concept: anything calculated from a secret number is also |
| 103 | secret, so they get burnt too. |
| 104 | |
| 105 | Most of the time, you refer to numbers by their addresses. Lots |
| 106 | of `mp *' pointers get used. You hardly ever see a real `mp', |
| 107 | just pointers to them. |
| 108 | |
| 109 | There are a collection of utility functions for keeping the |
| 110 | system running -- for creating new numbers without interesting |
| 111 | values, for throwing old ones away, for extending the amount of |
| 112 | memory they use, and so on. |
| 113 | |
| 114 | Throwing numbers away is simple. You pass the pointer to the |
| 115 | number to `mp_drop'. If someone else still has a reference to |
| 116 | the number, it stays where it is, but remembers that you're not |
| 117 | interested in it any more. If nobody's interested then the |
| 118 | number is disposed of. |
| 119 | |
| 120 | You can add a reference using MP_COPY. It returns the new |
| 121 | reference. The new reference is the same as the old one, but |
| 122 | it's a useful fiction to pretend that it's different. |
| 123 | |
| 124 | Real arithmetic happens in a fairly uniform way. There's an |
| 125 | important point to make about `destinations', though. Just |
| 126 | about all of the arithmetic functions take `destination' |
| 127 | arguments (one for each result), with the idea that, at least |
| 128 | conceptually, they store the results there. What actually |
| 129 | happens is that a reference is lost to whatever the destination |
| 130 | used to refer to, and a reference to the new result is gained. |
| 131 | The address might be different, and then again it might not. |
| 132 | Destinations can be the same as sources -- that works fine. |
| 133 | Finally, there's the magic value `MP_NEW'. MP_NEW is a special |
| 134 | constant which, as a destination, means `create a new place and |
| 135 | put the answer there'. |
| 136 | |
| 137 | The idea is that, sometimes it helps that memory is already |
| 138 | allocated for a destination, so it doesn't have to be obtained |
| 139 | specially, and, even more often, you want to throw away old |
| 140 | intermediate results while you're calculating new ones. |
| 141 | |
| 142 | Here are some examples of how to do it right: |
| 143 | |
| 144 | x = mp_add(MP_NEW, y, z); |
| 145 | Assumes x didn't have a useful value before. |
| 146 | Afterwards it refers to the sum of y and z. y and z |
| 147 | themselves are unchanged. |
| 148 | |
| 149 | y = mp_add(y, y, z); |
| 150 | Add z to y. Now y is the sum of z and the old y. z is |
| 151 | unchanged. |
| 152 | |
| 153 | q = MP_NEW; |
| 154 | mp_div(&q, &x, x, y); |
| 155 | Sets q (a new value) equal to x / y, and puts the |
| 156 | remainder back in x. y is unchanged. |
| 157 | |
| 158 | z = mp_sub(z, y, z); |
| 159 | Subtract z from y, putting the result back in z. y is |
| 160 | unchanged. |
| 161 | |
| 162 | x = mp_mul(y, y, z); |
| 163 | Multiply y by z, putting the result in x, but making y |
| 164 | no longer useful. |
| 165 | |
| 166 | Here are some examples of how to do it wrong: |
| 167 | |
| 168 | mp_mul(y, y, z); |
| 169 | mp_mul might choose somewhere other than y's storage to |
| 170 | write its result. (In fact, the current version |
| 171 | certainly will.) So y is trashed because there are no |
| 172 | references to it any more, and the result of the |
| 173 | multiplication is lost. |
| 174 | |
| 175 | x = mp_mul(y, y, z); |
| 176 | x = mp_add(x, x, y); |
| 177 | Whoops. We stored the result in x, but y still got |
| 178 | trashed and might contain anything. Adding it to x is a |
| 179 | bad move. |
| 180 | |
| 181 | It's not hard once you get the hang of it, although it might |
| 182 | feel a bit weird to start with. |
| 183 | |
| 184 | |
| 185 | Modular multiplication and exponentiation |
| 186 | |
| 187 | Lots of public-key crypto uses modular multiplication and |
| 188 | exponentiation operations. Doing them efficiently is very |
| 189 | important. Multiplication on its own is easy, and Catacomb's |
| 190 | multiplication algorithms are fairly good (though not up with |
| 191 | the very best). Doing the modular reduction afterwards is the |
| 192 | tricky bit. |
| 193 | |
| 194 | There are three approaches to modular reduction in Catacomb. |
| 195 | All of them have their good and bad points. |
| 196 | |
| 197 | 1. Use mp_div |
| 198 | |
| 199 | For a one-off calculation, this is a good idea. Division is a |
| 200 | slow operation -- that's a shame, but life's like that. But |
| 201 | when you're doing a one-shot operation, that's the best you can |
| 202 | manage. |
| 203 | |
| 204 | 2. Use Barrett reduction |
| 205 | |
| 206 | Barrett reduction is a system whereby you do a little |
| 207 | precomputation up-front (one division, in fact) and are then |
| 208 | able to perform modular reductions without resorting to |
| 209 | expensive division operations. To use it, you initialize a |
| 210 | `context' with the modulus, reduce some numbers, and then |
| 211 | destroy the context when you've finished. |
| 212 | |
| 213 | Conceptually, Barrett reduction is simple. You give it x and it |
| 214 | gives you back x mod m. The mathematics behind it is fairly |
| 215 | clever, though -- the real manual has a page of explanation for |
| 216 | how it works. |
| 217 | |
| 218 | Barrett reduction works on any positive modulus m. It will do a |
| 219 | modular reduction on any integer x which is at most twice as |
| 220 | long as m (in words). x can be larger than m^2, but it mustn't |
| 221 | be more than twice as long. |
| 222 | |
| 223 | 3. Use Montgomery reduction |
| 224 | |
| 225 | Montgomery reduction is a little more clever. It does a little |
| 226 | more work up-front than Barrett reduction, and understanding the |
| 227 | benefits is a little harder. However, it's actually slightly |
| 228 | more efficient for general use. There's also a disadvantage |
| 229 | that Montgomery reduction only works when the modulus is *odd*. |
| 230 | It all goes horribly wrong with an even modulus. Don't try it. |
| 231 | |
| 232 | I'm not going to explain how Montgomery reduction actually |
| 233 | works. But I am going to explain how to use it. |
| 234 | |
| 235 | The precomputation for Montgomery reduction, performed when you |
| 236 | initialize a context, computes a value called R, and also R^2 |
| 237 | (both mod m). (It also computes some other stuff, but that's |
| 238 | not important for this discussion.) Note that R is dependent on |
| 239 | the modulus m. |
| 240 | |
| 241 | A Montgomery reduction takes a number x, which should be less |
| 242 | than m^2 (it can actually be a bit larger) and computes the |
| 243 | value x R^{-1} mod m. That doesn't sound very useful, does it? |
| 244 | |
| 245 | Things start looking more hopeful when you multiply your inputs |
| 246 | by R. (There's a clever way of doing that: see below.) To |
| 247 | compute xy mod m, you can first calculate xR and yR, multiply |
| 248 | them together to get xyR^2, and do a Montgomery reduction to get |
| 249 | xyR^2 R^{-1} mod m. Then the R^{-1} cancels one of the Rs and |
| 250 | you end up with xyR mod m. Essentially, you do your entire |
| 251 | calculation with an extra factor of R all the way through it. |
| 252 | |
| 253 | Removing the factor of R at the end is easy. You just run the |
| 254 | Montgomery reduction algorithm again -- the R^{-1} cancels the R |
| 255 | and the answer pops out the end. Getting the factor of R in the |
| 256 | first place is easy -- you multiply by R^2 and do a reduction. |
| 257 | |
| 258 | The multiply-and-reduce thing is so common that there's a |
| 259 | dedicated function which does them both in one go. |
| 260 | |
| 261 | Here's a simple example of multiplying two numbers mod m: |
| 262 | |
| 263 | mpmont mm; |
| 264 | mp *ar, *br, *p; |
| 265 | mpmont_create(&mm, m); |
| 266 | ar = mpmont_mul(MP_NEW, a, m.r2); /* aR mod m */ |
| 267 | br = mpmont_mul(MP_NEW, b, m.r2); /* bR mod m */ |
| 268 | p = mpmont_mul(&mm, MP_NEW, ar, br); /* abR mod m */ |
| 269 | p = mpmont_reduce(&mm, p, p); /* ab mod m */ |
| 270 | mpmont_destroy(&mm); |
| 271 | mp_drop(ar); |
| 272 | mp_drop(br); |
| 273 | |
| 274 | Actually, it's not a very clever example. Here's a better way, |
| 275 | for such a simple calculation: |
| 276 | |
| 277 | mpmont mm; |
| 278 | mp *p; |
| 279 | mpmont_create(&mm, m); |
| 280 | p = mpmont_mul(&mm, MP_NEW, a, b); /* abR^{-1} mod m */ |
| 281 | p = mpmont_mul(&mm, p, p, mm.r2); /* ab mod m */ |
| 282 | mpmont_destroy(&mm); |
| 283 | |
| 284 | Of course, for a single multiply-and-reduce, you're better off |
| 285 | just using |
| 286 | |
| 287 | mp *p = mp_mul(MP_NEW, a, b); |
| 288 | mp_div(0, &p, p, m); |
| 289 | |
| 290 | the traditional division method. |
| 291 | |
| 292 | Both Barrett and Montgomery reduction methods have modular |
| 293 | exponentiation functions. If m is a message, n is an RSA |
| 294 | modulus, and e is the public exponent, I can do an RSA |
| 295 | encryption simply by doing |
| 296 | |
| 297 | mpmont mm; |
| 298 | mpmont_create(&mm, n); /* RSA modulus is odd */ |
| 299 | c = mpmont_exp(&mm, MP_NEW, m, e); |
| 300 | mpmont_destroy(&mm); |
| 301 | |
| 302 | (Yes, it's well worth creating a Montgomery context for a |
| 303 | full-blown exponentiation, unless someone's deliberately gone |
| 304 | out of their way to make it easy for you by choosing a very |
| 305 | small public exponent.) |
| 306 | |
| 307 | RSA decryption is fully covered by the library, because doing it |
| 308 | properly is complicated and difficult since it involves playing |
| 309 | with blinding factors and the Chinese Remainder Theorem. |
| 310 | |
| 311 | (There's also a useful RSA parameter recovery system which can |
| 312 | determine all the useful bits of information for efficient RSA |
| 313 | decryption given a very small amount of information. For |
| 314 | example, given the modulus and both exponents, it can recover |
| 315 | the two secret factors. This is, by the way, a good reason not |
| 316 | to share a modulus among many users.) |
| 317 | |
| 318 | |
| 319 | Finding prime numbers |
| 320 | |
| 321 | Prime numbers are important too. Finding them is not ever-so |
| 322 | easy. There's a huge variety of useful properties which are |
| 323 | needed, and it's basically impossible to cover everything. |
| 324 | |
| 325 | Catacomb has two useful facilities. There's a fast sequential- |
| 326 | search filtering system called `pfilt', and a good (but |
| 327 | probabilistic) primality tester which implements the Rabin- |
| 328 | Miller test. |
| 329 | |
| 330 | Over the top of this is a configurable plug-in-appropriate-bits |
| 331 | system called `pgen' which ties everything together. You're |
| 332 | much better off using `pgen' than grovelling about with the |
| 333 | filter and Rabin-Miller tests by hand. The low-level details |
| 334 | are much better used to create new `pgen' components. |
| 335 | |
| 336 | |
| 337 | Conclusion |
| 338 | |
| 339 | This basically concludes the tour of Catacomb's multiprecision |
| 340 | maths library. I hope it's provided enough background that the |
| 341 | comments start making sense. |
| 342 | |
| 343 | -- [mdw] |
| 344 | |
| 345 | \f |
| 346 | Local variables: |
| 347 | mode: text |
| 348 | End: |