How to represent arbitrary-precision integers?
4 23 Jun 2015 21:34 by u/NotSurvivingLife
Perhaps this may be better off in /v/compsci or something, and indeed I shall crosslink this post there.
I'm wondering: what are the different ways of encoding arbitrary-precision integers, and how do they compare w.r.t. efficiency.
The ways I know of:
- Encode it in unary, with
nones followed by a zero. This takesn+1bits to encode. So10(base 10) gets encoded as1111 1111 11 0. - Encode the number of digits in the above scheme, followed by the digits of the number in base 2. This takes
2*ceil(log2(n+1)) + 1digits to encode, or 1 bit for zero. So10(base 10) gets encoded as1111 0 1010. (Or potentially with the number encoded LSB-first instead. Same difference) - Encode the number of digits with zeros, followed by the number MSB-first. (Effectively you reuse the implied 1 at the highest bit of the number - similar to how floats work.) Note that in this scheme
0has to be represented specially, by1, for instance. So10gets encoded as0000 1010. This takes2*ceil(log2(n+1))digits to encode, or 1 bit for 0. Again, MSB-first versus LSB-first, same difference. - Encode the number of digits for the number of digits with zeros, followed by the number of digits MSB-first, followed by the number MSB-first. Again, with
0represented specially, for instance with1. This takesceil(log2(n+1)) + 2*ceil(log2(ceil(log2(n+1))+1))digits to encode, or1bit for0. - Note that you can iterate the above as many times as you wish.
- Encode the number as groups of
idigits, for some value ofi >= 1, where all ones means "the number is done" else it is a digit of the number base2^i-1. Note thati=1is equivalent to unary, above. So, for example, withi=210(base 10) gets encoded as01 00 01 11(101, base 3, then stop. Again, LSB or MSB first, same difference.) This takesi*(ceil(log_(2^i-1)(n) + 1) + 1)bits to encode, or i for zero.
I am sure there are more potential encodings, however. And it would not surprise me if I messed up on counting the numbers of bits required. Note that encoding 2 is always beat or tied by encoding 3.
3 comments
1 u/nefreat 24 Jun 2015 02:16
I would recommend you look at Java's BigInteger. Internally it's an array of ints.
0 u/rdnetto 24 Jun 2015 16:27
Pretty much this. There are a bunch of encodings that are theoretically possible, but in practice you can't beat an array of ints for performance on a COTS CPU. (Involve GPUs or FPGAs and it's a completely different story.)
1 u/isosum 25 Jun 2015 04:44
One format I've seen in several places (the MIDI file format, for example) is base-128 with the high bit set on all bytes except the last, i.e. (in hex):
The nice thing about this is a number always encodes to a multiple of 8 bits - reading and writing data tends to be much easier when everything is byte-aligned.