refspec
at time git fetch
speeds up the clone step by a factor of 100.
The Engineering Productivity group is responsible for supporting engineers who build and deploy software on Pinterest. Our team supports a range of infrastructure services and often works on large projects - migrating all software to Bazel , creating a continuous delivery platform called Hermez . They also support monorepositories , where they send several hundred commits every day, and that's not all of their tasks.
We work hard to make software development and delivery on Pinterest fast and painless. Recently, life has shown once again what a great impact even the smallest detail can have. We found a small detail in Git that dramatically reduced build times in our continuous integration pipelines. To understand how this small change had such a big impact, we need to share some information about our monorepositories and pipelines.
Monorepositories and conveyors
We have six main repositories on Pinterest: Pinboard, Optimus, Cosmos, Magnus, iOS, and Android. These are all mono-repositories with a wide range of language-specific services. Pinboard is the largest single repository maintained since the company's foundation. It has over 350k commits and is 20GB in size when fully cloned.
Cloning a mono repository with a lot of code and a long history takes a lot of time, and in our continuous integration pipelines we have to do it very often during the day. For Pinboard alone, on weekdays we make more than 60 thousand items.
git pull
... Most of the Jenkins pipeline configuration scripts (written in Groovy) start with a Checkout step, where we clone a repository that will be built and tested in later steps. This is what a typical Checkout stage looks like:
If you use the Git CLI directly:
``
Even with incomplete / shallow cloning, without extracting any tags and only for the last 50 commits, the operation was still not as fast as it could have been. This is because we did not set the refspec parameter . Note that the absence of this parameter means the command to retrieve all refspecs: + refs / heads / *: refs / remotes / origin / * . In the case of Pinboard, over 2,500 branches are processed.
By simply adding the refspec option and specifying which links we are interested in (in our case, only from the master), you can limit the scope of processing to the desired branch and save a lot of time. Here's what it looks like in our pipeline:
A simple one line change reduced the cloning time by a factor of 100 and, as a result, significantly reduced the build time. Cloning time for the largest Pinboard repository has been reduced from 40 minutes to 30 seconds. This shows that sometimes even the smallest effort makes a huge difference.