Instead of introducing
The article contains an example of manual optimization of a critical section of an application program in relation to budget stm32 microcontrollers, which increases performance 5 times or more compared to a library function.
Square root extraction is often used in applications. The sqrt function is included in the standard C library and operates on real numbers:
double sqrt (double num);
long double sqrtl (long double num);
Microcontrollers work primarily with integers; they usually do not have registers for real numbers.
In practice, in addition to the loss of computational speed on multiple transformations "integer <=> real" , accuracy is additionally lost - Example 1.
Example 1: Loss of Precision in Forward and Backward Conversions
//
uint32_t L1 = 169;
uint32_t L2 = 168;
//
uint32_t r1 = ( uint32_t )sqrt( ( double ) L1 );
uint32_t r2 = ( uint32_t )sqrt( ( double ) L2 );
//
L1 = r1*r1; // r1 = 13
L2 = r2*r2; // r2 = 12
//
// L1 = 169 โ 169
// L2 = 144 โ 168, 14%
Formulation of the problem
Raise the precision of sqrt calculations by rounding to the nearest integer.
If possible, increase productivity.
The solution of the problem
Create a custom function, for example, sqrt_fpu based on the standard one - Example 2.
Example 2: Calculating an integer root using the sqrt_fpu algorithm
uint16_t sqrt_fpu ( uint32_t L )
{
if ( L < 2 )
return ( uint16_t ) L;
double f_rslt = sqrt( ( double ) L );
uint32_t rslt = ( uint32_t ) f_rslt;
if ( !( f_rslt - ( double ) rslt < .5 ) )
rslt++;
return ( uint16_t ) rslt;
}
Sqrt_fpu advantages:
- compact code;
- the required accuracy is achieved.
Disadvantages of sqrt_fpu:
- loss of performance due to an extra call and additional floating point operations;
- lack of obvious potential for optimizing computing speed at the user level.
sqrt_fpu .
โ - ().
-: , .
1. :
ยซ , , .ยป
sqrt_odd โ 3.
3: sqrt_odd
uint16_t sqrt_odd ( uint32_t L )
{
if ( L < 2 )
return ( uint16_t ) L;
uint16_t div = 1, rslt = 1;
while ( 1 )
{
div += 2;
if ( ( uint32_t ) div >= L )
return rslt;
L -= div, rslt++;
}
}
,
.
sqrt_odd:
- ;
sqrt_odd:
- ;
- ; , 10e4+ 150 โ 1;
- .
1: sqrt_odd
2. :
ยซ ยป:
Rj = ( N / Ri + Ri ) / 2
sqrt_new โ 4.
4: sqrt_new
uint16_t sqrt_new ( uint32_t L )
{
if ( L < 2 )
return ( uint16_t ) L;
uint32_t rslt, div;
rslt = L;
div = L / 2;
while ( 1 )
{
div = ( L / div + div ) / 2;
if ( rslt > div )
rslt = div;
else
return ( uint16_t ) rslt;
}
}
sqrt_new โ sqrt_fpu ( 2).
sqrt_new:
- ;
- โ sqrt_fpu;
- ;
sqrt_new:
- .
sqrt_new ( 2):
- ;
- .
2: sqtr_new (!)
(!) โ 10e5+ 8 .
sqrt_new :
- , , ( );
- , -, ;
- .
2. sqrt_evn ( 5).
sqrt_evn , , [ 0โฆ 0xFFFFFFFF ].
sqrt_evn 2- 5- , sqrt_new ~40%.
[ 1โฆ 10 000 000 ] sqtr_evn 2-3 .
sqrt_evn โ 3.
3: sqtr_evn
, sqrt_evn โ 5.
5: sqrt_evn
uint16_t sqrt_evn ( uint32_t L )
{
if ( L < 2 )
return ( uint16_t ) L;
uint32_t div;
uint32_t rslt;
uint32_t temp;
if ( L & 0xFFFF0000L )
if ( L & 0xFF000000L )
if ( L & 0xF0000000L )
if ( L & 0xE0000000L )
div = 43771;
else
div = 22250;
else
if ( L & 0x0C000000L )
div = 11310;
else
div = 5749;
else
if ( L & 0x00F00000L )
if ( L & 0x00C00000L )
div = 2923;
else
div = 1486;
else
if ( L & 0x000C0000L )
div = 755;
else
div = 384;
else
if ( L & 0xFF00L )
if ( L & 0xF000L )
if ( L & 0xC000L )
div = 195;
else
div = 99;
else
if ( L & 0x0C00L )
div = 50;
else
div = 25;
else
if ( L & 0xF0L )
if ( L & 0x80L )
div = 13;
else
div = 7;
else
div = 3;
rslt = L;
while ( 1 )
{
temp = L / div;
temp += div;
div = temp >> 1;
div += temp & 1;
if ( rslt > div )
rslt = div;
else
{
if ( L / rslt == rslt - 1 && L % rslt == 0 )
rslt--;
return ( uint16_t ) rslt;
}
}
}
ยซยป โ . 1 .
sqrt_evn , .
( 2).
โ .
.
[ 3, 7, 13, 25 ] ยซ ยป. (). .
โ .
:
- : STM32F0308-DISCO, MCU STM32F030R8T6
- : STM32CubeIDE
- : USB-UART PL2303HX
:
- :
- : CPU โ 48 MHz, UART (RS485) โ 9600 bit/s
- : , Release
- : MCU GCC Linker: Miscellaneous: -u _printf_float
sqrt_fpu, sqrt_new sqrt_evn.
100 000 3- โ 4.
4:
.
โ sqrt_fpu, . โ .
, ( 4), .
( 5) .
5:
( 6) , 1 .
sqrt_fpu 19 531, sqrt_evn 147 059 ; sqrt_evn ~7,5 , sqrt_fpu.
6:
, , , .
At the same time, manual algorithmic code optimization can be effective in mass production of small IoT, due to the use of low-cost microcontroller models, freeing up the space of complex tasks for older models.