Accurate and fast calculations for floating point numbers using the example of the sine function. Part 2: libm

I continue the series of articles on working with floating point. In the first article I gave a small mathematical introduction and showed the simplest and most obvious way to calculate the sine with examples of programs with different "pitfalls". Today's article will be slightly different in style. There will be no practice here, but we will dig deeper into the math and get into the holy of holies - the standard library code. I will also give an answer to the question at the end of the first article. So let's go.



Some math again



Obviously, the faster the Taylor series decreases in modulus, the fewer terms are needed to achieve the required accuracy. And so it seems that the result will be more accurate (below this will be discussed in more detail). For comparison, for example, take a term of the seventh degree of the Taylor series (x77!) at x=1.0 and x=0.1... The expression values โ€‹โ€‹will be1.198โ‹…10โˆ’4 and 1.198โ‹…10โˆ’11respectively. A huge difference, isn't it? So let's try to find a way to reduce the upper limit of the sine function calculation interval.



Series expansion around given values



In order to understand this method, we need to return to the first year of the institute and remember the definitions of the Taylor series ( wiki ). In a nutshell: knowing the function and its derivatives at some point, you can find the values โ€‹โ€‹of the function in the vicinity of this point by expanding into a Taylor series. For the sine function, this means the following

sinโก(x0+ฮ”x)โ‰ˆsinโก(x0)+sinโ€ฒโก(x0)ฮ”x1!+sinโ€ณโก(x0)ฮ”x22!+sinโ€ดโก(x0)ฮ”x33!+โ‹ฏ



What does this approach give us from a practical point of view? Imagine that we have an interval from0 before ฯ€/2... Let us choose 10 linearly distributed points on this interval (the choice is not optimal):x0=0, x1=ฯ€/20, x2=2ฯ€/20, xi=iโ‹…ฯ€/20... For each point, calculate the plate with the sine and its derivatives at this point. Now you can modify the function so that when getting the valuex the function takes the closest value xi and lays out in a row around the point xi, not around zero (ฮ”x=xโˆ’xi).



Using trigonometric transformations



If we go back even further, to the senior classes of school, then we can recall one very important formula:

sinโก(x0+ฮ”x)=sinโก(x0)cosโก(ฮ”x)+cosโก(x0)sinโก(ฮ”x)



And then everything is the same as in the previous paragraph. We select points inside the interval, calculate the sine and cosine for them, and when calling the sine function, we look for the nearest one and, using the formula above, calculate the sine using a small valueฮ”x...

Think about which of these two methods is better to choose, but for now we will move from math to practical calculations.



Distribution property of multiplication in the floating point world



I had to ask the Internet for advice, what is it called aโ‹…(b+c)=aโ‹…b+aโ‹…c... It turns out to be a distributive property. Let's go back to the question I asked at the end of the first part. Namely, why mathematically equivalent expressionsaโ‹…(b+c) and aโ‹…b+aโ‹…ccan give different results in floating point calculations? The easiest way to illustrate this is with an example. Let's take a hypothetical system that works with floating point numbers in decimal format with 4 digits of precision. Let's pretend thatb=1.000E0, c=1.234E-2, and a=5.678E-2... First, let's take an expression with brackets and calculate it step by step, remembering to round at each step:

1)b+c=1.000E0+1.234E-2=1.012E0

2) aโ‹…(b+c)=5.678E-2ร—1.012E0=5.746E-2

Received response y=5.746E-2

Now let's calculate the second expression in the same way step by step:

aโ‹…b+aโ‹…c=5.678E-2ร—1.000E0+5.678E-2ร—1.234E-2=5.678E-2+7.007E-4=5.748E-2

Received response y=5.748E-2

The true answer is 0.0574806652.



As you can see, the answer obtained in the second case is much closer to the true one than in the first. If we explain this on fingers, then imagine that when in the first case we add to 1.0 the numberc=1.234E-2we just throw out the last two digits. They are no more. In the second case, the discarding occurs at the very end, after the multiplication. Those. in the second case, the multiplication operation (s) is more accurate.



It seems that you can finish on this, but take a closer look at the first method and tell me what the result of the calculation will beb+cโˆ’b... And ... we have a way to round floating point numbers! Don't miss this example. Give yourself time to figure it out. Rounding of numbers will be very intensively used by us later in this and subsequent articles.



Let's notice one more feature of this expression. Imagine that the 4-digit precision of the variable is not enough for us. What to do? And here we already have the answer - to represent the number in the formb+cand store it in memory as the sum of two digits. And, accordingly, carry out operations (for example multiplication) separately for both terms. This technique is described in more detail in the article Adding two floating point numbers without loss of precision .



In the previous article, I also wrote that the methodaโ‹…(b+c)there is one unpleasant feature. And it is as follows. Numberc always truncated at the last significant digit of a number b... This means that regardless of the numberc, if a b+cโ‰ b, then an error in the last sign is always possible even for small c... This is not allowed in the approach in the next chapter.



How it works using the GNU library as an example



How is it? Have you chosen which of the two methods described at the beginning of the article did you choose for the accurate calculation of the sine? Whichever method you choose, both are correct. Moreover, they are absolutely identical. Believe me, check it out. Below I will use school formulas. They are easier to explain.



Armed with the knowledge gained in the previous article and in this article, you can easily understand the code of the standard library. Let's open the s_sin.c file and find the __sin function there :

image

Its code is quite simple. It is easy to understand that it calls a different set of functions depending on the limits of the input variable. In this article, we will discuss the 218-224 code section for angles 2 ^ -26 <| x | <0.855469. You can see that in this section of the code, the do_sin (x, 0) function is called. We will dwell on this function in more detail:



image



  1. , dx=0 .
  2. 129-130 , abs(x)<0.126, .. x , . , , , .
  3. 136-137. , . x 2 . u x. , 0.345678. u=0.34, 0.005678.
  4. 140-142. ( s ) ( c ) x . , cos(x)=1-c, 1.0, (. ), .
  5. 143. u. , u=0.34 34. sin(u)=sn+ssn, cos(u)=cs+ccs. sn cs โ€” ยซยป u, ssn ccs โ€” .
  6. 144-145. sin(u+x)=(sn+ssn)*(1-c)+(cs+ccs)*s. , , 144-145. โ€” .


In fact, I have described only the simplest part of calculating the sine in this way. There is a lot of mathematics left behind. For example, how do you calculate the size of a table and the elements in it? Where did the magic numbers 0.126 and 0.855469 come from? When to chop off the computation by the Taylor's number? Corrections to the coefficients of the Taylor series to refine the result.



All this, of course, is interesting, but, objectively, the presented method has many disadvantages: it is necessary to calculate the sine (s) and cosine (c) simultaneously, which requires twice as many calculations of the Taylor series 1 . Multiplication by tabular values, as we can see, is also not free. Also, storing a table of 3520 bytes in RAM is, of course, not a problem, but accessing it (even in the cache) can be expensive.



Therefore, in the next part we will try to get rid of the plate and calculate the sine in the interval [0.126, 0.855469] directly, but more accurately than in the first chapter.



Before ending - a question of quick wits. The number big in this example is 52776558133248 = 3 * 2 44 . Where did such a number come from, not, for example, 2 45 ? I will formulate the question more precisely. Why is the number 3 * 2 N optimal when rounding numbers , and not, for example, 2 N + 1 ? Another question, which N should you choose to round a number to an integer?



1 It is worth noting that a significant advantage for this approach can appear when the sine and cosine are calculated simultaneously from the same angle. The second function can be calculated for almost free.



All Articles