🚱 ✝️ 👨🏻‍🌾 Boost.Compute or GPU / CPU parallel computing. Part 1 ✊🏼 🕝 🚂

Introduction

Hello, Habr!

By my standards, I have been writing C ++ code for a long time, but until that time I had not yet encountered tasks related to parallel computing. I haven't seen a single article about the Boost.Compute library, so this article will be about it.

All parts

Part 1
Part 2

Content

What is boost.compute
Problems connecting boost.compute to the project
Introduction to boost.compute
Basic compute classes
Getting started
Conclusion

What is boost.compute

This c ++ library provides a simple high-level interface for interacting with multi-core CPU and GPU computing devices. This library was first added to boost in version 1.61.0 and is still supported.

Problems connecting boost.compute to the project

And so, I ran into some problems while using this library. One of them was that the library simply does not work without OpenCL. The compiler gives the following error:

After connecting everything should compile correctly.

At the expense of the boost library, it can be downloaded and connected to a Visual Studio project using the NuGet package manager.

Introduction to boost.compute

After installing all the necessary components, you can look at simple pieces of code. For correct operation, it is enough to enable the compute module in this way:

#include <boost/compute.hpp>
using namespace boost;

It is worth noting that regular stl containers are not suitable for use in compute namespace algorithms. Instead, there are specially created containers that do not conflict with the standard ones. Sample code:

std::vector<float> std_vector(10);
compute::vector<float> compute_vector(std_vector.begin(), std_vector.end(), queue); 
//       ,     .

You can use the copy () function to convert back to std :: vector:

compute::copy(compute_vector.begin(), compute_vector.end(), std_vector.begin(), queue);

Basic compute classes

The library includes three auxiliary classes, which are enough to start with calculations on a video card and / or processor:

compute :: device (will determine which device we will work with)
compute :: context (an object of this class stores OpenCL resources, including memory buffers and other objects)
compute :: command_queue (provides an interface for interacting with a computing device)

You can declare this whole thing like this:

auto device = compute::system::default_device(); //     
auto context = compute::context::context(device); //   
auto queue = compute::command_queue(context, device); //

Even just using the first line of code above, you can make sure that everything works as it should by running the following code:

std::cout << device.name() << std::endl;

Thus, we got the name of the device on which we will perform calculations. Result (you may have something different):

Getting started

Let's look at the trasform () and reduce () functions by example:

std::vector<float> host_vec = {1, 4, 9};

compute::vector<float> com_vec(host_vec.begin(), host_vec.end(), queue);
//           
//  copy()

compute::vector<float> buff_result(host_vec.size(), context);
transform(com_vec.begin(), com_vec.end(), buff_result.begin(), compute::sqrt<float>(), queue);

std::vector<float> transform_result(host_vec.size());
compute::copy(buff_result.begin(), buff_result.end(), transform_result.begin(), queue);
	
cout << "Transforming result: ";
for (size_t i = 0; i < transform_result.size(); i++)
{
	cout << transform_result[i] << " ";
}
cout << endl;

float reduce_result;
compute::reduce(com_vec.begin(), com_vec.end(), &reduce_result, compute::plus<float>(),queue);

cout << "Reducing result: " << reduce_result << endl;

When you run the above code, you should see the following result:

I settled on these two methods because they show well the primitive work with parallel computations without everything superfluous.

And so, the transform () function is used to change an array of data (or two arrays, if we are passing them) by applying one function to all values.

transform(com_vec.begin(), 
   com_vec.end(), 
   buff_result.begin(), 
   compute::sqrt<float>(), 
   queue);

Let's move on to parsing the arguments, with the first two arguments we pass a vector of input data, with the third argument we pass a pointer to the beginning of the vector into which we will write the result, with the next argument we indicate what we need to do. In the example above, we are using one of the standard vector processing functions, which is to extract the square root. Of course, you can write a custom function, boost provides us with two whole ways, but this is already the material for the next part (if there is any at all). Well, as the last argument, we pass an object of the compute :: command_queue class, which I talked about above.

The next function is reduce (), everything is a little more interesting here. This method returns the result of applying the fourth argument to all elements of the vector.

compute::reduce(com_vec.begin(), 
   com_vec.end(), 
   &reduce_result, 
   compute::plus<float>(),
   queue);

Now I will explain with an example, the code above can be compared with the following equation:

1 + 4 + 9

In our case, we get the sum of all the elements in the array.

Conclusion

Well, that's all, I think this is enough to carry out simple operations on big data. Now you can use the primitive functionality of the boost.compute library, and you can also prevent some errors when working with this library.

I would be glad to receive a positive feedback. Thank you for your time.

Good luck to all!

Boost.Compute or GPU / CPU parallel computing. Part 1