The myth of the computer’s math prowess runs so deep that it’s even built into the name. The verb “compute” comes originally from a Latin root that means “to think,” but during the last 400 years the English word “computation” has become almost synonymous with doing arithmetic.
The less we know about computers, the more likely we are to think of them as giant math machines — a belief that leads to excessive trust in computers’ mathematical abilities, despite their potential for making fundamental errors. As with almost every kind of computer problem, these errors are a result of the decisions made by programmers seeking to find the best combination of low cost, high speed and accuracy of results.
Such trade-offs are impossible to avoid, but it’s important for both the application developer and the user to be aware that such decisions are being made and to be confident that the right mix is being applied to the problem at hand.
There is a growing hazard that as on-board math coprocessors become a standard feature of mainstream chips such as the Intel 486 and Motorola 68040, software-development tools will treat the decisions of coprocessor designers as a default approach to handling math for all applications.
Although coprocessors reflect carefully considered standards and are designed to yield accurate results with reasonable handling of special cases, no single approach should be unthinkingly assumed to be best for all applications.
Why can’t computers pump numbers around their innards as easily as they transport streams of characters during word processing? To begin with, people usually work with powers of 10 (1, 10, 100 and so on) whereas computers work with a “binary” (two-value) system based on powers of two (1, 2, 4, 8 and so on).
This difference isn’t a problem when counting up whole numbers of things: for example, the number 10 in base 10 (one 10 plus zero 1s) is a longer but numerically identical 1010 in base 2 (one 8 plus zero 4s plus one 2 plus zero 1s).
The difference in bases makes life much more difficult, however, when trying to represent fractions: for example, a simple and common decimal value such as one-tenth (0.1) cannot be precisely represented by any finite number of digits to the right of a base-two binary point.
The value 0.1 can be approximated by a binary fraction such as 0.000110011 (1/16 plus 1/32 plus 1/256 plus 1/512), but this still leaves an error margin of more than 2 percent.
Additional digits can be used to make the error as small as desired, but the binary system can never make the error disappear completely.
This binary system wasn’t adopted just to be difficult. In 1947, mathematician Norbert Wiener showed that for any known method of storing complex structures of values, a system based on storing groups of only two possible low-level values would have the lowest cost per unit of information stored.
This led to the modern architecture based on BInary digITS (bits) of data, arranged for convenience in eight-bit bytes, and in words whose size depends on what’s convenient for a particular machine — typically 16 bits on the 80286 and earlier mainstream processors, and 32 bits on most modern designs.
But 32 bits are not enough for counting up the things around us. With 32 positions that can each hold two values, we can count over a range from zero to one less than the 32nd power of 2: that is, over a range from zero to 4,294,967,295, or only 4 trillion and change.
Therefore, it’s common for a programming language to provide an integer data type with a size of 64 bits, able to handle values with a high end of more than 18.4 quintillion (or 18.4 billion billion).
With almost 20 quintillion electronic fingers to count on, a great many things can be done without the complexity of working with binary fractions. For example, large dollar amounts can be multiplied by 1,000 to give a figure in units of 0.1 cent, a precision usually considered adequate for most financial transactions.
In the real world, a 64-bit integer can represent the average distance to Pluto in units of 0.0001 inch.
In situations where a value is known to have a certain range, a programmer can spread the available precision across that range to get the most accurate results possible.
For example, a temperature sensor in Minnesota might need to measure values ranging from minus 80 to plus 100 degrees Fahrenheit: a programmer could take the actual value, add 80 and multiply by 20 million to produce a value that uses most of the available bits in a 32-bit word.
Before such a scaling operation, a “jitter” of one bit would represent an error of almost 1 percent; after scaling, the error from a one-bit jitter would be far too small for concern, much less than the probable margin for error in the temperature sensor itself.
These are three of the least-complex ways of representing real-world numbers in a computer. Simple integers are the electronic equivalent of counting on your fingers. Integers with a multiplication factor (also known as fixed-point numbers) are the equivalent of using fingers to represent fractional units, then working with those units as if they were whole numbers.
Integers with an offset and a multiplication factor (or scaled integers) involve more complex setup calculations than the other integer models, but also make the best possible use of the available precision.
But all of the integer approaches break down in cases where numbers take on a huge and unpredictable range of values. Scientific calculations might work with values as large as the number of atoms of gas in the sun (say “billion” six times, then take a thousand of those), or values as small as the weight in pounds of a single electron (say “one billionth” three times and divide by 1,000).
To handle such a range using integer techniques would require a word size of almost 290 bits, which would be impractical: On-chip registers must provide the space, transfers from memory to processor must move the data, and both the cost of hardware and the time to produce results would get worse.
The floating point
Such problems are typically solved with floating-point procedures, which can be executed in software or built into coprocessor hardware. Floating-point numbers are represented by two components: an exponent, which gives the approximate range of the value in terms of some power of a base value, and a mantissa (sometimes called the significand) that multiplies the exponent to give the more precise overall value.
For example, the decimal value 1,234 has a mantissa of 1.234 and an exponent of 3: The value of the mantissa is multiplied by the exponent’s power of the base (in this case, by the third power of 10).
For any given number of bits to be used overall, a decision must be made as to how many should be allocated to the exponent and how many to the mantissa. With more bits of exponent, we can handle a larger range of values, but with less precision. With more bits of mantissa, we can be more precise, but over a smaller range.
The prevailing standard for floating-point computation is ANSI/IEEE 754-1985, where “1985” denotes the (surprisingly recent) year of adoption. This standard is the basis of both the Intel math coprocessors and the Standard Apple Numerics Environment (Motorola coprocessors augmented by proprietary software) on Macintosh systems.
This standard defines a “long real” number format using 64 bits: one bit for the sign of the number (plus or minus), 11 bits of exponent (representing the range from -307 to +308) and 52 bits of significand.
Assuming that the significand will always be at least 1 but less than 2 (remember, this is base 2), the first bit of the significand can be assumed to have a value of 1 and it need not be actually stored. This format can represent a range of values from 4.19 * 10 -307 up to 1.67 * 10-308.
Beyond the 64 bits that a value is supposed to use in the outside world, IEEE-standard floating-point hardware uses an additional 16 bits (making 80 bits in all) for “temporary real” numbers.
This additional precision, if used for intermediate calculations, can greatly reduce the vulnerability of your results to cumulative errors such as those that occur when numbers are rounded up or down.
Retaining this precision, however, requires one of two things. Eighty-bit expressions must be moved between processor and memory, which is time-consuming, or a complex calculation must be managed to keep those values in on-chip registers in a way that minimizes those transfers. This task demands extremely sophisticated global knowledge of algorithms, beyond the capability of most compilers.
IEEE 754 is not universally loved. It has been criticized as giving too much space for range and not enough for precision. Tom Ochs, president of Structured Scientific Software in Albany, Ore., has calculated that the estimated volume of the universe — measured in units of quantum space, believed to be the smallest fundamental unit of volume — is only about 6.8 * 10-177, smaller than the maximum IEEE value by a factor of 10-130 (say “billion” 14 times and multiply by 10,000).
At the same time, there is a gap between adjacent IEEE values — for example, between -4.19 * 10-307 and 4.19 * 10-307. That gap may seem small, but there is only one IEEE number available for every block of 3.7 * 10-158 pockets of space. By this argument, based on things that we might actually want to count, the IEEE standard makes a bad trade-off between range and precision.
We are always free to solve math problems on a computer in precisely the way that we would solve them by hand: treating numbers not as patterns of bits but as strings of decimal digits, and doing the equivalent of long division or multiplication in the same way that we would do it with pencil and paper. This eliminates any problems of arbitrary limits on either range or precision.
Binary-coded decimal (BCD), in which each digit of a number is represented separately rather than converting the entire number to binary (with attendant errors), is a traditional approach to business computations that do not allow any room for errors.
The REXX procedures language (included as a utility in OS/2) and many dialects of the LISP programming language also support infinite precision mathematics, as do symbolic tools such as Soft Warehouse Inc.’s Derive, Waterloo Maple Software’s Maple and Wolfram Research Inc.’s Mathematica.
These techniques reduce raw computational speed, but they also increase productivity in getting the program written and confidence in the correctness of results.