Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MATH256 Numerical Methods

Part 2 Errors and rounding

2 Errors and rounding

Numerical Methods (5 Parts)

Main Message in one sentence:

If all numbers are allowed to have a fixed number of digits (throwing away the rest)

e.g. x = 0.x1x2 ...x10 × 10n for 10 digits, what happens to subsequence operations or the accuracy?

In Part 1 (or chapter 1), we encountered the situation in which we needed to break a loop to obtain an approximate value for an infinite series. We used the condition

|tn+1 | < 0.5 × 10-10 × |Sn|, (*)

where Sn  is the partial sum up to term n and tn+1  is the first term omitted. In this chapter, we will examine errors and approximations in more detail.  In the process, we will see the justification for (*).

2.1 Computer storage and correct rounding

Any nonzero x e R can be written in scientific notation; that is

x = a × 10p ,    where   0.1 < |a| < 1 and p e Z.

The first nonzero digit of a always appears immediately after the decimal point.  For example, 0.01206 =    0.1206 × 10-1 ,

203.002 =    0.203002 × 103 ,

and

_π = _0.31415 . . .  × 101 .

Computers store numerical values in a similar way, using a fixed number of significant digits.  Consequently,

most operations are subject to rounding error. With the default settings, Maple uses ten digits, so a typical number in Maple (omitting exponent)

= 0.a1 a2 . . . a10

= a1  × 10-1 + a2  × 10-2 + . . . + a10  × 10-10 .

Here the digits aj  are integers between 0 and 9 inclusive, except a1 , which cannot be zero (unless = 0 exactly). We care about how an exact and nearby number a = 0.a1 a2 . . . a10 , a11 , . . . a50 . . . is approximated by by rounding. The correctly rounded approximation to a is the most accurate approximation stopping at a10. The true value for a then satisfies

|a _ | < 5 × 10-11 = 0.5 × 10-10 .

Note that numbers ending in 5 can be equidistantly rounded in two wayss, though traditional rounding chooses one way.

For example, to round the 11-digit number a = 0.12345678905 to only 10 digits, we can take

= 0.1234567890   or = 0.1234567891

with |a _ | = 5 × 10-11  in both cases. This is called a tie. The elementary rule ‘5 rounds up’ is usually not

used to resolve these because always rounding up can cause errors to accumulate.  By default, Maple resolves

ties by requiring that the final stored digit is even.  Each error caused by a tie then has a 50% chance of being positive and a 50% chance of being negative, so they are unlikely to accumulate. Of course, numbers not ending in 5 never encounter such a tie situations e.g. a = 0.12345678914 → a = 0.1234567891 with no other choices.

Remarks:

● We may visualise why the digit 5 is special in rounding from below

│(10) │ _ _ _ 9 _ _ _ 8 _ _ _ 7 _ _ _ 6 _ _ _ [5] _ _ _ 4 _ _ _ 3 _ _ _ 2 _ _ _ 1 _ _ _ │(0) │

●  If x and y are stored exactly (and y 0), then the result of x · y , xy and x/y will be correctly

rounded, as will ,x and ,y.  However, this applies for a single operation only.  Maintaining correct

rounding throughout a long calculation is more or less impossible.

●  In other words, rounding error cannot be avoided, so we must accept that the last digit in a result is likely to be wrong (perhaps more in long calculations).

● Although it is not our intention to dig deep into the process of how to implement these elementary operations,  it  is  not  hard to see that one  must take care of the components p first and then operate between the decimal numbers a (often called mantissa) as below.  Assume p2  2 p1  for x1 = a1  × 10p1 , x2 = a2  × 10p2 :

→ x2 · x1 = (a2  × 10p2 -p1  · x1 ) × 10p1   s 3  × 10p˜3 ,

→ x1 /x2 = (a1 /a2 ) × 10-(p2 -p1 ) s 4  × 10p˜4 ,

→ x1  × x2 = (a1  × a2 ) × 10p1 +p2   s 5  × 10p˜5 ,

where possibly p˜4 = _(p2 _p1 ), 1 _ (p2 _p1 ) and 5 = p1+p2 , p1+p2 _ 1  since  0.1 < |a1 /a2 | < 10 and |a1 × a2 | < 1.

● The number of digits stored by Maple can be increased by changing the parameter Digits. Remember: this controls the accuracy to which numbers are stored at each stage of a calculation. It does not guarantee the number of correct significant figures in the final answer.

●  Rounding error places a bound on accuracy. Trying to beat this (e.g. by including more terms in a series) will never succeed.

Example

Suppose that x = 0.13, and consider the two approximations

cos(x) s 1 _ + and    cos(x) s 1 _ + _ .

There are two sources of error here:  rounding error and the truncation of an infinite series.  Using ten digit

arithmetic, we obtain

cos(x) s 0.9915619004   and    cos(x) s 0.9915618937.

The second approximation is more accurate, so we have made an improvement by including an extra term

in the series.  In fact, the second result is correctly rounded.  Adding more terms cannot further improve the

accuracy unless we also decrease rounding error by including more digits.

2.2 Absolute and relative error

Notation: For any exact quantity x, represents an approximate value. The absolute error is given by

Ex = _ x.                      The significance of absolute error depends on the scale of the numbers in

● A 5mm error in measuring out a football pitch is not important.

● A 5mm error in eye surgery is likely to be disastrous.

To take the magnitudes of x into account, we can use the relative error :

question.

εx =

_ x

x x

The percentage error is 100 × |εx|.  Relative and percentage error are undefined if x = 0.

Remark: In general, the sign of the error (absolute or relative) does not matter, only the magnitude is important.

2.2.1 Example: decimal places

This is a measure of absolute error; it takes no account of the magnitude of the numbers we are dealing

with. We must define exactly what we mean when we say an approximation is accurate to n decimal places.

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●    ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Attempt 1: Count matching digits after the decimal point.

Counterexample: x = 1.00001

1 = 1.00200

2 = 0.99991.

Clearly, 1  approximates x to two places, but the more accurate approximation is 2 , in which none of the digits after the decimal point are the same as those of x.

Attempt 2: Round to n decimal places, then check if the results are the same.

This fixes the above problem; 2 = 1.000 to three places and 0.9999 to four, so 2 = x to 3 d.p.

Counterexample: x = 1.530

1 = 1.535

2 = 1.525.

Here we have a tie (see section 2.1) : 1  and 2  are equidistant from x.  It makes no sense to claim that

either approximation is ‘better.’  It also makes no sense to allow the number of correct d.p. to be influenced

by the mechanism used for resolving ties.

Attempt 3: Consider the difference between x and .  If |x _ | < 10-n , then there are n zeros after the decimal point and is accurate to n places.

Counterexample: x = 1.560

= 1.55001

Clearly, is accurate to one place, but x _ = 0.00999, which has two zeros after the decimal. To prevent this, we require the n + 1th digit after the decimal point in _ x to be less than 5, so that it does not round up.  Hence (finally), we have the following definition.

To state that is an approximation to x, accurate to n decimal places, means that

|x _ | < 5 × 10-(n+1) = × 10-n .

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●    ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

2.2.2 Example: significant figures

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●    ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

This is a measure of relative error. Again, we must make a precise definition. We begin by writing x using scientific notation, that is

x = a × 10p ,    where   0.1 < |a| < 1 and p e Z.

Since the first nonzero digit in a comes immediately after the decimal point, n decimal places in a is equivalent to n significant figures in x. Thus, if

= × 10p

then approximates x to n significant figures if and only if

|a _ | < × 10-n .                                                              (*)


a _ 10p × (a _ ) x _

a    =      10p  × a      =    x    ,

so (*) yields

│ × |a| < × 10-n .

Finally, since |a| < 1, we have the following:

If < × 10-n then approximates x to n significant figures.


● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●    ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Important remark: This is a sucient but not necessary condition.  It is stronger than necessary because we have dropped |a| from the left-hand side.  If n is the largest integer for which it holds, then we definitely have n s.f. and we may have n + 1 (a could be as small as 0.1). We definitely do not have n + 2 s.f.

2.2.3 Example approximation of e

●  If we approximate e using

˜e = 2721

then

εe =           =            _ 1 s _0.405 × 10-7 .

The test shows that we have 7 s.f. but we may have 8. We don’t because

e s 2.718281828

whereas

˜e s 2.718281718.


●  If x = 1 and = 1.049, then

εx = = 0.049 < 0.5 × 10-1 .

The test shows that we have 1 s.f. but we may have 2 (we do).

●  If x = 1 and = 1.051, then

εx = = 0.051 < 0.5 × 100 .

The test shows that we may have 1 s.f. (we do).

●  If ε > 0.5, then the largest possible value for n is actually _1, so we definitely don’t have 1 s.f. An approximation with a relative error greater than 0.5 is largely useless.

2.2.4 Series summation condition

Consider a series in which the sum up to term n is Sn, and tn+1  is the first term omitted.

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●    ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Let x = Sn+1  and = Sn. Then

x _ Sn+1 _ Sn

=

x             Sn+1

tn+1

=

Sn + tn+1

tn+1 1

Sn          1 + (tn+1/Sn) .

If |tn+1| < 0.5 × 10-10 × |Sn| then the last term is very close to 1, so Sn+1 and Sn are (almost certainly) the same to 10 s.f.


● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●    ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

This explains the condition we use for breaking a loop in a series summation: nothing is achieved by including tn+1  in the sum.


Remarks: (i) Careless use of this condition can produce incorrect results (see Practice Problem 3.2 for

example).  However, it is a good practical choice in many cases.

(ii) The above study on Sn  and and Sn+1  is not a complete analysis, because it does not ensure that the remainder |Rn| = |(n+1 tk| is less than 0.5 × 10-10 .  It is not trivial to give a general formula.

2.3 Cancellation

Rounding error alone is not usually problematic, but certain operations can magnify existing errors, which is

much more dangerous. Consider the example shown in quadratic.mw.  Here, the quadratic equation

t2 _ 42t + 1 = 0

is solved using the standard formula, with the result

t1 = 41.97617696   and   t2 = 0.02382304.

The second root has only seven significant digits; three have been lost. A few such operations will produce

results that are completely wrong, so we need to investigate what has happened.

2.4 Error propagation

We can use the fact that εx = /x _ 1, and so

= (1 + εx)x,

to study how errors propagate through calculations.  For a given operation f, we need to find an expression

for the relative error in f(); that is

f() _ f(x)

εf  =        f(x)

in terms of x and εx. Note that we are not concerned here with the error generated by computing f (which should be small).  Rather, we are concerned with the effect that f has on the pre-existing error in . As we will see, this can be significant.