Speed ​​up Ansible

Sectional turbocharger


It's no secret that with the default settings Ansible may not do its job very quickly. In the article I will point out several reasons for this and offer a useful minimum of settings that, quite possibly, will really increase the speed of your project.



We discuss here and further Ansible 2.9.x, which was installed in the freshly created virtualenv in your favorite way.



After installation, we create a file β€œansible.cfg” next to your playbook - this location will allow you to transfer these settings along with the project, plus they will be loaded quite automatically.



Pipelining



Someone could already hear about the need to use pipelining, that is, not copying modules to the FS of the target system, but transferring the Base64-wrapped zip archive directly to the stdin of the Python interpreter, but the fact remains : This setting is still underestimated. Unfortunately, some of the popular Linux distributions used to configure sudo by default not very well - so that this command required a tty (terminal), so Ansible left this very useful setting disabled by default.



pipelining = True


Collecting facts



Did you know that with the default settings, Ansible for each play will initiate a collection of facts from all hosts that participate in it? In general, if you didn't know, now you know. To prevent this from happening, you need to enable either the explicit request mode for collecting facts (explicit) or smart mode. In it, facts will be collected only from those hosts that have not been encountered in previous plays.

UPD. When copying, you will have to choose one of these settings.



gathering = smart|explicit


Reusing ssh connections



If you've ever run Ansible in debug mode (option "v" repeated one to nine times), then you may have noticed that ssh connections are constantly being established and dropped. So, there are also a couple of subtleties here.



You can avoid the stage of re-establishing an ssh connection at two levels at once: both directly in the ssh client, and when transferring files to a managed host from a manager.

To reuse an open ssh connection, simply pass the required keys to the ssh client. Then he will begin to do the following: when first establishing an ssh connection, additionally create a so-called control socket, on subsequent ones - check the existence of this very socket, and, if successful, reuse the existing ssh connection. And for all this to make sense, we will set the time to save the connection when inactive. More details can be found in the ssh documentation , and in the context of Ansible, we simply use the "forwarding" of the necessary options to the ssh client.



ssh_args = "-o ControlMaster=auto -o ControlPersist=15m"


To reuse an already open ssh connection when transferring files to a managed host, it is enough to specify one more unknown ssh_tranfer_method setting. The documentation on this matter is extremely sparse and misleading, because this option works for itself! But reading the source code allows you to understand what exactly will happen: the dd command will be launched on the managed host, working directly with the required file.



transfer_method = piped


By the way, in the "develop" branch this setting also exists and has not gone anywhere .



Do not be afraid of the knife, be afraid of the fork



Another useful setting is forks. It determines the number of worker processes that will simultaneously connect to hosts and perform tasks. Due to the peculiarities of Python as a PL, it is processes, not threads, that are used, because Ansible still supports Python 2.7 - no asyncio for you, there is nothing to breed asynchrony here! By default, Ansible launches five workers, but if asked correctly, it will run more:



forks = 20


I just warn you right away that there may be some difficulties associated with the available amount of memory on the controlling machine. In other words, you can, of course, put forks = 100500, but who said that it will work?



Putting it all together



As a result, for ansible.cfg (ini format), the necessary settings may look like this:



[defaults]
gathering = smart|explicit
forks = 20
[ssh_connection]
pipelining = True
ssh_args = -o ControlMaster=auto -o ControlPersist=15m
transfer_method = piped


And if you want to hide everything in the normal YaML-inventory of a healthy person, then it might look something like this:



---
all:
  vars:
    ansible_ssh_pipelining: true
    ansible_ssh_transfer_method: piped
    ansible_ssh_args: -o ControlMaster=auto -o ControlPersist=15m


Unfortunately, this will not work with the "gathering = smart / explicit" and "forks = 20" settings: there are no YaML equivalents. Either we set them in ansible.cfg, or we pass them through the environment variables ANSIBLE_GATHERING and ANSIBLE_FORKS.



About Mitogen
β€” Mitogen? β€” , . β€” . , Mitogen, Ansible , , β€” , Mitogen . , , β€” .



Mitogen? , . β€” , : , Β« , Β». , Β« Β».



Some of these settings were discovered while reading the source code of the connection plugin under the self-explanatory name "ssh.py". I share the results of reading in the hope that it will inspire someone else to look at the source, read them, check the implementation, compare with the documentation - after all, sooner or later, all this will bring you positive results. Good luck!



All Articles