Intel Development Center projects in Russia. Intel Integrated Performance Primitives

Our story about another Intel project made in Russia. This is the Intel Integrated Performance Primitives library - a set of ready-to-use, highly optimized for various Intel architectures, and completely free basic functions for working with images, signals and arbitrary data. How this project originated, how it developed, what is happening in Intel IPP now - in the article under the cut. And we will start, as is customary in the summary, with the present.



At KDPV - entrance to the IPP floor in the Intel office in Nizhny Novgorod



What is Intel IPP now



So, Intel IPP is the building blocks of functional primitives that can significantly speed up the operation of software that processes media and data, due to optimizations specific to specific microarchitectures and Intel platforms, as well as the widest possible use of sets of vector instructions Intel SSE and Intel AVX of all versions.

IPP functions cover four large domains:



  • Image processing;
  • Data compression;
  • Signal processing;
  • Cryptography.


The current Intel IPP 2020 Update 2 contains over 2,500 image processing primitives, 1,300 signal processing primitives, 500 computer vision and 300 cryptography.



The library is constantly being improved - it is optimized for new platforms, new functionality is added and the old, little-used one is inevitably removed.

Intel IPP works on any x86 device running Linux, macOS, Windows and Android. That is, processors not only from Intel, but also from other manufacturers are supported, moreover, IPP works on them quickly, although, of course, not as super fast as on Intel devices.



At the same time, IPP users do not need in-depth knowledge about the execution platform of their code using IPP and the associated actions - the required version of optimizations will be called automatically.



IPP is written in C using the so-called intrinsic compiler functions. For different models of using the library, there are different versions of it: single-threaded for calls from different threads of external applications and multi-threaded, using efficient internal parallelization.



For those interested in the speed of work on Intel Xeon and Core - some benchmarks .



IPP is currently available as part of Intel Parallel Studio XE , Intel System Studio , and just by itself... And - absolutely free for personal and commercial use.



Interestingly, the most "closed" area of ​​our life is cryptography, in the case of IPP it is now the most open - this is an open source project available on Github .



The groundwork, approach and API that IPP brought when it first appeared in signal and image processing can now be called an unspoken standard - the same applies to cryptography.



All library components are used by millions of users in hundreds of thousands of applications around the world. Including within the company itself - in various divisions of Intel. Intel IPP provides significant acceleration of the OpenCV library. By the way, a custom build Intel IPPwith functions used by OpenCV, released in 2015, became the first free, free version of IPP.



Intel IPP can be found in image recognition and enhancement applications in all areas, including medicine; printers, including 3D; digital video surveillance; autonomous vehicles; speech recognition and synthesis; data compression; telecommunications and games. IPP works everywhere - from servers to wearables. It can be argued that if a time machine were invented, then its software would certainly use Intel IPP.



History of IPP



It all started with Intel's signal processing (SPL) and image processing (IPL) libraries, developed by Intel's order at the VNIIEF federal nuclear center in Sarov (we wrote about this in our story about OpenCV ).



In 1996 (or 1997, according to various eyewitnesses), at the Intel Santa Clara headquarters, a meeting was held on further plans for the development of SPL and IPL with the participation of American project curators and invited experts from Sarov, among whom was the future architect, inspirer and head of the IPP team Boris Sabanin, as well as Sergey Kirillov, who is currently leading the work on IPP cryptography.



The Sarov team brought their list of proposals, and one of them was to open the interfaces of the low-level IPL and SPL functions for users, since they had already been implemented and optimized anyway, while some users were not comfortable with IPL data formats, they already had their established image formats. The proposed prototype of the IPP interface, using structures that are simpler in comparison with IPL / ISL, was created by Boris Sabanin during a discussion literally on a napkin. But at that time, the proposal of the Russian side, although it was not rejected, did not receive much support - it was in the middle of the list with low priority. But after a couple of years someone at Intel remembered this (most likely, Shin Li, who later became the "evangelist" of Intel IPP) and plans changed.





A 2004 Intel IPP book by Stewart Taylor, a participant in the historic meeting on IPP (then a newly hired Stanford bachelor from Intel)



This is how work began on Intel Performance Primitives, which were later renamed Integrated Performance Primitives.



An internal version of IPP, let's call it 1.0, was created in 1999. It was more of a Proof of Concept, a prototype to prove the concept's viability. It was not released as a product, but it did define and refine the concept, architecture, and specifications of IPP. The first public version immediately bore the number 2.0 and came out in April 2002.



Until 2009, most of the work on the IPP libraries was carried out under the leadership of Boris Sabanin, who can rightfully be considered the godfather and soul of IPP. He put a lot of effort into the project, built a team of versatile specialists, but, unfortunately, did not live up to the 20th anniversary of Intel IPP.





Painting "The Bridge to the Sarov Tower" by Boris Sabanin, known not only to IPP ( here you can see other paintings by Sabanin, including his self-portrait)



But the legacy of IPL / ISL was not limited to. Primitives for cryptography and data compression appeared almost immediately. Experiments began in the field of Computer Vision, which later grew into a project with OpenCV, using acceleration of algorithms using primitives in the ippCV domain.



Of course, this was not the only experiment and offshoot in the history of library creation. IPP kept pace with all Intel. Accordingly, for example, the fifth version of IPP, in addition to x86, supported the Intel XScale processor (ARM architecture) and Intel Itanium (IA-64)! Over the years, IPP has included such components as realistic rendering, small matrix operations, data integrity, video and audio codecs.



This functionality can be used now, if desired, using the IPP Legacy Libraries package available for download.



Moreover, IPP video codecs later served as the basis for another well-known Intel product - Intel Media SDK, and ray tracing was implemented in the open source Intel Embree project .



Interesting technological experiments in the field of IPP-structure include an example of a Windows driver for demonstrating the possibility of IPP in kernel mode, as well as a version of IPP for running on integrated Intel GPUs written in C for Metal .



It is curious that the IPP version numbers first went in order from 1 to 9, and then they began to be designated by the release year - 2017-2020.





Intel IPP development team in 2003



During the existence of the IPP family of libraries, more than 100 people took part in their work - in Sarov, Nizhny Novgorod and Moscow. Now the IPP headquarters is located in Nizhny Novgorod and looks very attractive!





IPP floor decoration at Intel



IPP is not a primitive library at all!



Although the name Intel IPP contains the word "primitive", and at first glance, there can be no fundamental difficulties in the set of "designer parts" for creating productive programs, which in fact is IPP, the structure of these libraries is not at all trivial. Interesting technological solutions have been applied to ensure maximum performance and usability of IPP.



As already mentioned, IPP contains thousands of functions (and each function has several versions optimized for a specific architecture), which leads to huge sizes of ready-made libraries, which is not at all an item on the list of benefits for IPP users.

Therefore, IPP is collected in a special way. Source files, which include a large number of simple functions, are cut by a special script into many small ones before compilation, one function per file. And then these minifiles are compiled. Moreover, they are compiled not once, but several times - for different architectures with the corresponding flags. The result is several huge static libraries - for each IPP domain. But these static libraries are glued together from very small object files. Therefore, when IPP is statically linked to an application, the size of the application increases almost exactly by the size of the functions used from IPP, not more byte.



IPP also has a mechanism for generating custom libraries based on existing tools without having to open source code to users. Namely, the user selects a list of functions of interest from the header files, after which a small dynamic library with only the necessary functions and their dependencies is automatically created by a script from huge static libraries. Moreover, to further reduce the size, this dynamic library may include assembly options for these functions not for all hardware options, but exclusively for a user-specified list of platforms.



In a situation where each additional percentage of the performance of the IPP libraries matters, it becomes very important to decide at what level to parallelize the code: inside the library functions, which is good for IPP calls from one user thread, or outside, at the level of user applications, which is good for multithreaded applications that independently separate work and data for IPP.



Since applications are different, and IPPs are made different too. Namely, the IPP package includes two sets of libraries in 32 and 64-bit versions: one is purely single-threaded inside, and the second is with internal parallelization of a significant number of functions using OpenMP (the exact list of functions is attached in the accompanying documents). In addition, there is one more version for image processing libraries - the "Threading Layer", which is an add-on over single-threaded IPP and uses either OpenMP or Intel TBB for external parallelization of work on images, which are divided into fragments (tiles) for this ... The IPP Threading Layer source code is available in the IPP package for those who want the most control over how their code runs concurrently.



Almost since the inception of IPP, developers have had to worry about the problem that image and signal processing pipelines, consisting of individual IPP functions, are running slower than they would like. The explanation is simple: when calling IPP functions, as a rule, loading and unloading from the cache or even from memory occurs, and this operation may turn out to be much more expensive than the actual calculations. This effect is especially noticeable when processing big data - not those that are called big data, but for example, FullHD images (not to mention 4K).



The decision to combine several functions inside IPP into one in this case is not suitable - then instead of primitive bricks we will get fancy details from the game "Tetris", which will be problematic to insert into user applications, and the variety of such possible combinations will exceed all reasonable limits.



As a result, a C ++ add-on over IPP was implemented, which built pipeline graphs, cut pictures into pieces, and then launched a parallel loop that performed not one operation in each thread, but the entire IPP pipeline on a separate tile. In the end, of course, the results stuck together. First a prototype was made, it showed decent acceleration. Then a setting was created called DMIP (deferred-mode image processing). Further, in 2011 at one of the first meetings of the OpenVX standardization committee in Khronos, DMIP was mentioned and warmly supported by the committee given the popularity of graphs among hardware developers. So the OpenVX standard turned out to be based on graph technology. For various reasons, the OpenVX standard has not received sufficient popularity, but now the graph paradigm is supported and developed by the Intel Graph API team.And since the Graph API is included in OpenCV, OpenVINO, Movidius SDK, there is a direct impact of IPP technologies on computer vision standards and modern APIs.



IPP - useful links



Once again, here are the most important links from this article.





Intel IPP First Person



Let's give the floor to people who have played an important role in the fate of Intel Performance Primitives over the years.



Vladimir Dudnik, head of the Intel IPP team in 2009-2011

-, IPP – , . , - , .. IPP - , - .

, , , , SIMD . , , MKL, IPP .


, OpenCV, IPPCV 2006-2008

IPP , , , . FFT. — ! , IPP , , .

IPP , , OCaml, Spiral, FFT — IPP. IPP. , , , , 13-14 , .



, IPP QA 2011-2015 , IPP 2017-2020

IPP QA 2000. , , .

17 IPP. Intel . waterfall & development agile & DevOps. . , .

– . , , . , , , . , , , .


, Image & Signal Processing Intel IPP, 2020 –

IPP SW 2011 , . , . IPP , , , Resize, WarpAffine, WarpPerspective . . C 2017 , , , 2018 IPP Crypto GitHub. 2018 , 2020 IPP, Image & Signal Processing . , , IPP Intel, .



All Articles