759513d1 |
1 | Multiprecision maths library |
2 | |
3 | |
4 | Catacomb comes with a fairly reasonable multiprecision maths |
5 | library. It's not stunningly fast, and doesn't do everything, |
6 | but it seems good for the sorts of things needed in |
7 | cryptographic algorithms (which, for example, GMP isn't[1]). |
8 | |
9 | There are basically three layers to the system: |
10 | |
11 | * The low-level `mpx' layer, which implements some basic |
12 | arithmetic but is fairly unpleasant to actually use. |
13 | |
14 | * A high-level `mp' layer, which provides a useful `mp' |
15 | multiprecision type and implements useful arithmetic on |
16 | them. |
17 | |
18 | * Everything else. A collection of miscellaneous routines for |
19 | performing particular operations efficiently or simply. |
20 | |
21 | [1] In particular, GMP is truly rotten at converting numbers |
22 | into well-defined binary formats. It has its own format, |
23 | but it's not defined in the manual, has changed once before, |
24 | probably doesn't conform to any standards in the |
25 | cryptographic community, and can only be written to or read |
26 | from a file stream, so you can't use it in memory (for |
27 | example to encrypt it). Basically, it's a right pain. |
28 | |
29 | |
30 | The low-level interface |
31 | |
32 | The low level is described in `mpx.h'. It includes everything |
33 | else it needs. |
34 | |
35 | An integer is represented in an array of words. Each word has |
36 | type `mpw' (multi precision word). The least significant word |
37 | comes first. An array is described by its `base' -- the address |
38 | of the first element -- and its `limit' -- the address *just |
39 | after* the last element. Thus, the length of the array is |
40 | `limit - base'. This representation is particularly convenient |
41 | for many things. An old version used base and length, which was |
42 | a nuisance. |
43 | |
44 | The largest value which can be represented in a word is |
45 | MPW_MAX. The number of bits used in the representation of a |
46 | word is MPW_BITS. Actually, you might be able to use more bits |
47 | and store larger numbers, but everything will go wrong if you |
48 | try. |
49 | |
50 | The macro call MPW(x) returns the low MPW_BITS bits of x. It's |
51 | very useful in low-level arithmetic. |
52 | |
53 | The call MPWS(n) tells you the `sizeof' an array of n words. |
54 | It's also very useful. Try not to get MPW and MPWS confused |
55 | because they mean very different things. |
56 | |
57 | The actual low-level macros and functions are documented |
58 | relatively well. They are given source and destination arrays |
59 | for things to be read and written to. Usually the sources |
60 | mustn't overlap the destinations, and destinations certainly |
61 | mustn't overlap other destinations; sometimes this restriction |
62 | is lifted, where it would be both convenient and easy to do so. |
63 | Sources can overlap each other without problems, because they're |
64 | not written to. |
65 | |
66 | The difficult parts are the Karatsuba functions. Karatsuba |
67 | and Ofman discovered a neat recursive way to multiply large |
68 | numbers. It requires quite a bit of workspace, and adds some |
69 | overhead, but has lower asymptotic complexity than the standard |
70 | multiply algorithm. Usually, Karatsuba functions get used |
71 | automatically by the rest of the library when appropriate and |
72 | you don't have to care. |
73 | |
74 | |
75 | The high-level interface |
76 | |
77 | The high-level invents a type, `mp'. It's a structured type, |
78 | with the following members: |
79 | |
80 | mpw *v, *vl Base and limit of the word array. |
81 | size_t sz Number of words actually allocated to the array. |
82 | unsigned f Various flags. |
83 | unsigned ref Number of references to the integer. |
84 | |
85 | The flags are interesting. Here's what they mean: |
86 | |
87 | MP_NEG The number is negative |
88 | MP_BURN Burn the number after using it |
89 | MP_CONST The number mustn't be modified or freed |
90 | MP_UNDEF The number contains no interesting value |
91 | MP_DESTROYED The number has been freed once already |
92 | |
93 | The last one is handy for catching bugs. MP_NEG is obvious -- |
94 | Catacomb uses a signed-magnitude representation because things |
95 | are usually easier that way. MP_CONST is set on various |
96 | constants provided for general convenience so they don't get |
97 | destroyed by accident. MP_UNDEF is used to avoid copying |
98 | useless data. |
99 | |
100 | MP_BURN marks a `secret' number. Secret numbers are overwritten |
101 | with zeroes (`burnt') when they're freed. Secrecy is a viral |
102 | concept: anything calculated from a secret number is also |
103 | secret, so they get burnt too. |
104 | |
105 | Most of the time, you refer to numbers by their addresses. Lots |
106 | of `mp *' pointers get used. You hardly ever see a real `mp', |
107 | just pointers to them. |
108 | |
109 | There are a collection of utility functions for keeping the |
110 | system running -- for creating new numbers without interesting |
111 | values, for throwing old ones away, for extending the amount of |
112 | memory they use, and so on. |
113 | |
114 | Throwing numbers away is simple. You pass the pointer to the |
115 | number to `mp_drop'. If someone else still has a reference to |
116 | the number, it stays where it is, but remembers that you're not |
117 | interested in it any more. If nobody's interested then the |
118 | number is disposed of. |
119 | |
120 | You can add a reference using MP_COPY. It returns the new |
121 | reference. The new reference is the same as the old one, but |
122 | it's a useful fiction to pretend that it's different. |
123 | |
124 | Real arithmetic happens in a fairly uniform way. There's an |
125 | important point to make about `destinations', though. Just |
126 | about all of the arithmetic functions take `destination' |
127 | arguments (one for each result), with the idea that, at least |
128 | conceptually, they store the results there. What actually |
129 | happens is that a reference is lost to whatever the destination |
130 | used to refer to, and a reference to the new result is gained. |
131 | The address might be different, and then again it might not. |
4a2da2d1 |
132 | Destinations can be the same as sources -- that works fine. |
759513d1 |
133 | Finally, there's the magic value `MP_NEW'. MP_NEW is a special |
134 | constant which, as a destination, means `create a new place and |
135 | put the answer there'. |
136 | |
137 | The idea is that, sometimes it helps that memory is already |
138 | allocated for a destination, so it doesn't have to be obtained |
139 | specially, and, even more often, you want to throw away old |
140 | intermediate results while you're calculating new ones. |
141 | |
142 | Here are some examples of how to do it right: |
143 | |
144 | x = mp_add(MP_NEW, y, z); |
145 | Assumes x didn't have a useful value before. |
146 | Afterwards it refers to the sum of y and z. y and z |
147 | themselves are unchanged. |
148 | |
149 | y = mp_add(y, y, z); |
150 | Add z to y. Now y is the sum of z and the old y. z is |
151 | unchanged. |
152 | |
153 | q = MP_NEW; |
154 | mp_div(&q, &x, x, y); |
155 | Sets q (a new value) equal to x / y, and puts the |
156 | remainder back in x. y is unchanged. |
157 | |
158 | z = mp_sub(z, y, z); |
159 | Subtract z from y, putting the result back in z. y is |
160 | unchanged. |
161 | |
162 | x = mp_mul(y, y, z); |
163 | Multiply y by z, putting the result in x, but making y |
164 | no longer useful. |
165 | |
4a2da2d1 |
166 | Here are some examples of how to do it wrong: |
759513d1 |
167 | |
168 | mp_mul(y, y, z); |
4a2da2d1 |
169 | mp_mul might choose somewhere other than y's storage to |
759513d1 |
170 | write its result. (In fact, the current version |
171 | certainly will.) So y is trashed because there are no |
172 | references to it any more, and the result of the |
173 | multiplication is lost. |
174 | |
175 | x = mp_mul(y, y, z); |
176 | x = mp_add(x, x, y); |
177 | Whoops. We stored the result in x, but y still got |
178 | trashed and might contain anything. Adding it to x is a |
179 | bad move. |
180 | |
181 | It's not hard once you get the hang of it, although it might |
182 | feel a bit weird to start with. |
183 | |
184 | |
185 | Modular multiplication and exponentiation |
186 | |
187 | Lots of public-key crypto uses modular multiplication and |
6540d069 |
188 | exponentiation operations. Doing them efficiently is very |
759513d1 |
189 | important. Multiplication on its own is easy, and Catacomb's |
190 | multiplication algorithms are fairly good (though not up with |
191 | the very best). Doing the modular reduction afterwards is the |
192 | tricky bit. |
193 | |
194 | There are three approaches to modular reduction in Catacomb. |
195 | All of them have their good and bad points. |
196 | |
197 | 1. Use mp_div |
198 | |
199 | For a one-off calculation, this is a good idea. Division is a |
200 | slow operation -- that's a shame, but life's like that. But |
201 | when you're doing a one-shot operation, that's the best you can |
202 | manage. |
203 | |
204 | 2. Use Barrett reduction |
205 | |
206 | Barrett reduction is a system whereby you do a little |
207 | precomputation up-front (one division, in fact) and are then |
208 | able to perform modular reductions without resorting to |
209 | expensive division operations. To use it, you initialize a |
210 | `context' with the modulus, reduce some numbers, and then |
211 | destroy the context when you've finished. |
212 | |
213 | Conceptually, Barrett reduction is simple. You give it x and it |
214 | gives you back x mod m. The mathematics behind it is fairly |
215 | clever, though -- the real manual has a page of explanation for |
216 | how it works. |
217 | |
218 | Barrett reduction works on any positive modulus m. It will do a |
219 | modular reduction on any integer x which is at most twice as |
220 | long as m (in words). x can be larger than m^2, but it mustn't |
221 | be more than twice as long. |
222 | |
223 | 3. Use Montgomery reduction |
224 | |
225 | Montgomery reduction is a little more clever. It does a little |
226 | more work up-front than Barrett reduction, and understanding the |
227 | benefits is a little harder. However, it's actually slightly |
228 | more efficient for general use. There's also a disadvantage |
229 | that Montgomery reduction only works when the modulus is *odd*. |
230 | It all goes horribly wrong with an even modulus. Don't try it. |
231 | |
232 | I'm not going to explain how Montgomery reduction actually |
233 | works. But I am going to explain how to use it. |
234 | |
235 | The precomputation for Montgomery reduction, performed when you |
236 | initialize a context, computes a value called R, and also R^2 |
237 | (both mod m). (It also computes some other stuff, but that's |
238 | not important for this discussion.) Note that R is dependent on |
239 | the modulus m. |
240 | |
241 | A Montgomery reduction takes a number x, which should be less |
242 | than m^2 (it can actually be a bit larger) and computes the |
243 | value x R^{-1} mod m. That doesn't sound very useful, does it? |
244 | |
245 | Things start looking more hopeful when you multiply your inputs |
45c0fd36 |
246 | by R. (There's a clever way of doing that: see below.) To |
759513d1 |
247 | compute xy mod m, you can first calculate xR and yR, multiply |
248 | them together to get xyR^2, and do a Montgomery reduction to get |
249 | xyR^2 R^{-1} mod m. Then the R^{-1} cancels one of the Rs and |
250 | you end up with xyR mod m. Essentially, you do your entire |
251 | calculation with an extra factor of R all the way through it. |
252 | |
253 | Removing the factor of R at the end is easy. You just run the |
254 | Montgomery reduction algorithm again -- the R^{-1} cancels the R |
255 | and the answer pops out the end. Getting the factor of R in the |
256 | first place is easy -- you multiply by R^2 and do a reduction. |
257 | |
258 | The multiply-and-reduce thing is so common that there's a |
259 | dedicated function which does them both in one go. |
260 | |
261 | Here's a simple example of multiplying two numbers mod m: |
262 | |
263 | mpmont mm; |
264 | mp *ar, *br, *p; |
265 | mpmont_create(&mm, m); |
4a2da2d1 |
266 | ar = mpmont_mul(MP_NEW, a, m.r2); /* aR mod m */ |
267 | br = mpmont_mul(MP_NEW, b, m.r2); /* bR mod m */ |
759513d1 |
268 | p = mpmont_mul(&mm, MP_NEW, ar, br); /* abR mod m */ |
269 | p = mpmont_reduce(&mm, p, p); /* ab mod m */ |
270 | mpmont_destroy(&mm); |
271 | mp_drop(ar); |
272 | mp_drop(br); |
273 | |
274 | Actually, it's not a very clever example. Here's a better way, |
275 | for such a simple calculation: |
276 | |
277 | mpmont mm; |
278 | mp *p; |
279 | mpmont_create(&mm, m); |
280 | p = mpmont_mul(&mm, MP_NEW, a, b); /* abR^{-1} mod m */ |
281 | p = mpmont_mul(&mm, p, p, mm.r2); /* ab mod m */ |
282 | mpmont_destroy(&mm); |
283 | |
284 | Of course, for a single multiply-and-reduce, you're better off |
285 | just using |
286 | |
287 | mp *p = mp_mul(MP_NEW, a, b); |
288 | mp_div(0, &p, p, m); |
289 | |
290 | the traditional division method. |
291 | |
292 | Both Barrett and Montgomery reduction methods have modular |
293 | exponentiation functions. If m is a message, n is an RSA |
294 | modulus, and e is the public exponent, I can do an RSA |
295 | encryption simply by doing |
296 | |
297 | mpmont mm; |
298 | mpmont_create(&mm, n); /* RSA modulus is odd */ |
299 | c = mpmont_exp(&mm, MP_NEW, m, e); |
300 | mpmont_destroy(&mm); |
301 | |
302 | (Yes, it's well worth creating a Montgomery context for a |
303 | full-blown exponentiation, unless someone's deliberately gone |
304 | out of their way to make it easy for you by choosing a very |
305 | small public exponent.) |
306 | |
0ca05eea |
307 | RSA decryption is fully covered by the library, because doing it |
308 | properly is complicated and difficult since it involves playing |
309 | with blinding factors and the Chinese Remainder Theorem. |
310 | |
311 | (There's also a useful RSA parameter recovery system which can |
312 | determine all the useful bits of information for efficient RSA |
313 | decryption given a very small amount of information. For |
314 | example, given the modulus and both exponents, it can recover |
315 | the two secret factors. This is, by the way, a good reason not |
316 | to share a modulus among many users.) |
317 | |
759513d1 |
318 | |
319 | Finding prime numbers |
320 | |
321 | Prime numbers are important too. Finding them is not ever-so |
4a2da2d1 |
322 | easy. There's a huge variety of useful properties which are |
0ca05eea |
323 | needed, and it's basically impossible to cover everything. |
759513d1 |
324 | |
325 | Catacomb has two useful facilities. There's a fast sequential- |
4a2da2d1 |
326 | search filtering system called `pfilt', and a good (but |
759513d1 |
327 | probabilistic) primality tester which implements the Rabin- |
328 | Miller test. |
329 | |
4a2da2d1 |
330 | Over the top of this is a configurable plug-in-appropriate-bits |
331 | system called `pgen' which ties everything together. You're |
0ca05eea |
332 | much better off using `pgen' than grovelling about with the |
333 | filter and Rabin-Miller tests by hand. The low-level details |
334 | are much better used to create new `pgen' components. |
759513d1 |
335 | |
336 | |
337 | Conclusion |
338 | |
339 | This basically concludes the tour of Catacomb's multiprecision |
340 | maths library. I hope it's provided enough background that the |
341 | comments start making sense. |
342 | |
ee62fa16 |
343 | -- [mdw] |
759513d1 |
344 | |
345 | \f |
346 | Local variables: |
347 | mode: text |
348 | End: |