Boost.Compute or GPU / CPU parallel computing. Part 1

Introduction



Hello, Habr!



By my standards, I have been writing C ++ code for a long time, but until that time I had not yet encountered tasks related to parallel computing. I haven't seen a single article about the Boost.Compute library, so this article will be about it.

All parts





Content



  • What is boost.compute
  • Problems connecting boost.compute to the project
  • Introduction to boost.compute
  • Basic compute classes
  • Getting started
  • Conclusion


What is boost.compute



This c ++ library provides a simple high-level interface for interacting with multi-core CPU and GPU computing devices. This library was first added to boost in version 1.61.0 and is still supported.



Problems connecting boost.compute to the project



And so, I ran into some problems while using this library. One of them was that the library simply does not work without OpenCL. The compiler gives the following error:



image



After connecting everything should compile correctly.



At the expense of the boost library, it can be downloaded and connected to a Visual Studio project using the NuGet package manager.



Introduction to boost.compute



After installing all the necessary components, you can look at simple pieces of code. For correct operation, it is enough to enable the compute module in this way:



#include <boost/compute.hpp>
using namespace boost;


It is worth noting that regular stl containers are not suitable for use in compute namespace algorithms. Instead, there are specially created containers that do not conflict with the standard ones. Sample code:



std::vector<float> std_vector(10);
compute::vector<float> compute_vector(std_vector.begin(), std_vector.end(), queue); 
//       ,     .


You can use the copy () function to convert back to std :: vector:



compute::copy(compute_vector.begin(), compute_vector.end(), std_vector.begin(), queue);


Basic compute classes



The library includes three auxiliary classes, which are enough to start with calculations on a video card and / or processor:



  • compute :: device (will determine which device we will work with)
  • compute :: context (an object of this class stores OpenCL resources, including memory buffers and other objects)
  • compute :: command_queue (provides an interface for interacting with a computing device)


You can declare this whole thing like this:



auto device = compute::system::default_device(); //     
auto context = compute::context::context(device); //   
auto queue = compute::command_queue(context, device); //   


Even just using the first line of code above, you can make sure that everything works as it should by running the following code:



std::cout << device.name() << std::endl; 


Thus, we got the name of the device on which we will perform calculations. Result (you may have something different):



image



Getting started



Let's look at the trasform () and reduce () functions by example:



std::vector<float> host_vec = {1, 4, 9};

compute::vector<float> com_vec(host_vec.begin(), host_vec.end(), queue);
//           
//  copy()

compute::vector<float> buff_result(host_vec.size(), context);
transform(com_vec.begin(), com_vec.end(), buff_result.begin(), compute::sqrt<float>(), queue);

std::vector<float> transform_result(host_vec.size());
compute::copy(buff_result.begin(), buff_result.end(), transform_result.begin(), queue);
	
cout << "Transforming result: ";
for (size_t i = 0; i < transform_result.size(); i++)
{
	cout << transform_result[i] << " ";
}
cout << endl;

float reduce_result;
compute::reduce(com_vec.begin(), com_vec.end(), &reduce_result, compute::plus<float>(),queue);

cout << "Reducing result: " << reduce_result << endl;


When you run the above code, you should see the following result:



image



I settled on these two methods because they show well the primitive work with parallel computations without everything superfluous.



And so, the transform () function is used to change an array of data (or two arrays, if we are passing them) by applying one function to all values.



transform(com_vec.begin(), 
   com_vec.end(), 
   buff_result.begin(), 
   compute::sqrt<float>(), 
   queue);


Let's move on to parsing the arguments, with the first two arguments we pass a vector of input data, with the third argument we pass a pointer to the beginning of the vector into which we will write the result, with the next argument we indicate what we need to do. In the example above, we are using one of the standard vector processing functions, which is to extract the square root. Of course, you can write a custom function, boost provides us with two whole ways, but this is already the material for the next part (if there is any at all). Well, as the last argument, we pass an object of the compute :: command_queue class, which I talked about above.



The next function is reduce (), everything is a little more interesting here. This method returns the result of applying the fourth argument to all elements of the vector.



compute::reduce(com_vec.begin(), 
   com_vec.end(), 
   &reduce_result, 
   compute::plus<float>(),
   queue);


Now I will explain with an example, the code above can be compared with the following equation:

1+4+nine

In our case, we get the sum of all the elements in the array.



Conclusion



Well, that's all, I think this is enough to carry out simple operations on big data. Now you can use the primitive functionality of the boost.compute library, and you can also prevent some errors when working with this library.



I would be glad to receive a positive feedback. Thank you for your time.



Good luck to all!



All Articles