Development of a mechanism for parallelizing python code using docker containers

The current stage in the development of technology, including computer technology, shows us the growth of data volumes and the need for more and more powerful computers. The development of central processors has always been based on the technology of increasing the number of transistors on a microprocessor chip. The well-known Moore's law says: "If this trend continues, the power of computing devices can grow exponentially in a relatively short period of time (24 months)."



However, the same Moore in 2003 published the work "No Exponential is Forever: But" Forever "Can Be Delayed! ", In which he admitted that the exponential growth of physical quantities for a long time is impossible. Only the evolution of transistors and their manufacturing technologies made it possible to extend the law for several more generations.



In 2007, Moore stated that the law would apparently soon cease to operate due to the atomic nature of matter and the speed of light limitation. Currently, the maximum size of a transistor in a processor is 5 nanometers. There are also trial samples of the 3nm processor, but its release will not begin until 2021. This suggests that soon the further increase in the number of transistors on a chip will stop (until a new material is discovered or the technological process is radically updated).



One of the solutions to this problem is parallel computing. This term is understood as such a way of organizing computer computations in which programs are developed as a set of interacting computational processes running in parallel (simultaneously).



Parallel computing by the method of synchronization is divided into two types.



In the first variant, the interaction of processes occurs through shared memory: a separate thread of execution is launched on each processor of the multiprocessor system. All threads belong to one process. Threads exchange data through a shared memory area for a given process. The number of threads corresponds to the number of processors. Streams are created either by means of a programming language (for example, Java, C #, C ++ since C ++ 11, C since C11), or using libraries. In this case, it is possible to create threads explicitly (for example, in C / C ++ using PThreads), declaratively (for example, using the OpenMP library), or automatically - using built-in compiler tools (for example, High Performance Fortran). The described variant of parallel programming usually requires some form of control capture (mutexes, semaphores,monitors) to coordinate streams among themselves.



In the second variant, the interaction is carried out using message transmission. A single-threaded process starts on each processor in a multiprocessor system, which communicates with other processes running on other processors using messages. Processes are created explicitly by calling the appropriate function of the operating system, and messages are exchanged using a special library (for example, the MPI protocol implementation), or using language tools (for example, High Performance Fortran, Erlang, or occam).



In addition to the two described above, a hybrid option is also used: on multiprocessor systems with distributed memory (DM-MIMD), where each node of the system is a multiprocessor with shared memory (SM-MIMD), the following approach can be used. A multi-threaded process is launched on each node of the system, which distributes threads between the processors of this node. Data exchange between threads on a node is carried out through shared memory, and data exchange between nodes is carried out through message transfer. In this case, the number of processes is determined by the number of nodes, and the number of threads is determined by the number of processors on each node. The hybrid method of parallel programming is more complicated (it is required to rewrite the parallel program in a special way), but it is most efficient in using the hardware resources of each node of the multiprocessor system.



In this article, I propose to adapt such a hybrid approach for parallelizing computations in the Python language. The key feature of the work is the use of docker-container technology. The developed framework will have a client-server architecture, which includes the following elements.



On the client side:



  1. Serializer: in accordance with the name, it serializes functions and their variables (that is, it allows them to be saved to an external device or network and then loaded into memory on the same or another node). It is also worth highlighting the parallel decorator, which is a wrapper function that allows you to use the serializer for functions of various kinds.
  2. Classes for server / cluster connection configuration
  3. Additional language facilities to mark functions to be parallelized.


Server side:



  1. Deserializer - accordingly, deserializes the received data (see above).
  2. Executor is a class that processes deserialized data (functions and their arguments), as well as installs the necessary libraries into the virtual environment of the Python interpreter.


The general architecture of the system being developed is shown in the figure.



image



For communication between the client and the server, sockets or the twisted framework can be used, interaction with which will be performed through the developed API.



The implementation of this system assumes the use of docker technology. This allows you to provide convenience and high speed of software configuration to get started: just start the docker-swarm cluster, deploy the docker image on the selected server and set the number of replications.



Other important advantages of using docker technology are the creation of a homogeneous computing environment by virtualizing a UNIX-like system (Ubuntu - lightweight Alpine Linux), as well as the presence of a swarm mode, which allows you to run multiple containers on different servers and quickly balance the load by transferring jobs to free containers ...



The developed framework can be used in various fields where it is required to perform large amounts of computations in the Python language, including machine learning and deep data analysis tasks, as well as for simpler tasks, for example, for distributed decision checking during programming competitions.



All Articles