On the way to the top: Magma and Grasshopper on Elbrus

Recently, more and more articles have appeared on the performance of Russian Elbrus processors on various tasks. The topic of cryptography still remains behind the scenes, although at different times there were references to either the high capabilities of Elbrus (some GOST is 9 times better on Elbrus-4C than on Intel Core i7-2600), then about poor compiler optimization and, accordingly, extremely low speed of the implemented algorithms (Grasshopper is 100 times slower than on Intel?). I propose to finally figure out what Elbrus can do, using the example of two GOST symmetric encryption algorithms.





So that the article does not come out too long, we will assume that the reader has a general idea of ​​processor architectures, including Elbrus. If not, the developer's (MCST company) website has an excellent programming guide and a book about architecture in general. It was with these materials that my acquaintance with Elbrus began. I also note that modern processors have a lot of different mechanisms and features, so in the article I will only touch on those that, in my opinion, are important in the implementation of the selected algorithms.





What Elbrus architecture has to offer

6 (- ), . 5 (SIMD) 128- .





: 200 (64-) . SIMD , , . , 5 128-.





. β€” APB (Array Prefetch Buffer). , -.





, . , , , , , , .





, x86-64, . : , x86-64 , , , ARM-. , , SIMD , . , , .





, , , . , , ECB, CTR, MGM. AES, x86-64 . , ( ) . , .





: , β€” .









-





-









L1d





L1i





L2





L3





-4





E2Kv3





4





0.75





4 x 64





4 x 128





4 x 2









-1+





E2Kv4





1





0.985





1 x 64





1 x 128





1 x 2









-8





E2Kv4





8





1.2





8 x 64





8 x 128





8 x 512





16





-8





E2Kv5





8





1.55





8 x 64





8 x 128





8 x 512





16





-23





E2Kv6





2





2





2 x 64





2 x 128





2 x 2









-16





E2Kv6





16





2





16 x 64





16 x 128





8 x 1





32





x86-64 AVX AVX2. 3 . , .





. , , 3 4 6 64- , 5 β€” 4 128- .





. , x86-64, . , , (, , ). : , . intrinsic .





ECB. ( ) , , . , . β€” 1 . , . . , . :





















-4





116 /





137 /





5.2 /





-1+





151 /





179 /





5.2 /





-8





185 /





220 /





5.2 /





-8





402 /





520 /





2.8 /





-23





669 /





670 /





2.8 /





-16





671 /





672 /





2.8 /





8 E2Kv3/E2Kv4 16 E2Kv5/E2Kv6. ( 6) APB . 6 APB , . , .





Intel Core i3-7100 @ 3.9 . AVX β€” 458 /, 8.1 /; AVX2 β€” 1030 /, 3.6 /. Intel ( !) x86-64 AVX 1.5 ( 3 4 ), x86-64 AVX2 β€” 1.3 ( 5 ).





, . , S, , SIMD-, "" LS ( ) 64 ( LS ).





x86-64 AVX- ( 128- , 128 ). Scale-Index-Base-Displacement ( ), Scale 16, β€” 8. L1d (64 ) 4 , ( , x86-64 2 ).





, , . 5 __v2di (. e2kintrin.h ), , 128-.





, . , , . - .





, :





















-4





52 /





69 /





10.4 /





-1+





63 /





90 /





10.4 /





-8





80 /





110 /





10.4 /





-8





95 /





150 /





9.9 /





-23





170 /





171 /





11 /





-16





171 /





172 /





11 /





( ) Intel Core i7-6700 @ 4 β€” 170/, 22.4 /. , 2 .





: 3 . , , .





:





















-4





78 /





83 /





8.6 /





-1+





102 /





108 /





8.7 /





-8





126 /





133 /





8.6 /





-8





248 /





291 /





5.1 /





-23





453 /





454 /





4.2 /





-16





454 /





455 /





4.2 /





Intel Core i7-6700 @ 4 : 360 /, 10.6 /. , E2Kv3 E2Kv4 , - , x86-64 . 5 128- 2 .





E2Kv6 i7-6700 , ECB . i7-6700 Β« Β», , ECB : β€” Intel Core i7-9700K @ 4.7 β€” 411 /, 10.9 /. , 2.5 .





, , - .





, , 5 : . .





, . , : Intel/AMD . , , .





, .





P.S.

. , . , .





Despite the fact that in order to obtain the described results I managed to deal with Elbrus on the basis of only open information and documentation for the compiler, I want to express my gratitude to the MCST staff, in particular, Alexander Trush, for the answers to the questions that I have periodically asked and, of course, for providing remote access to new processors.








All Articles