... We developed Gardenscapes. It still had traces of the old Gardenscapes under Windows. It wasn't even Match-3, but a Hidden Object. And no one could even imagine the heights that the game would reach.
And then one fine day ...
How it all began
When accessing the repository, we saw the following message:
βThis repository has been disabled. Access to this repository has been disabled by GitHub staff due to excessive use of resources, in violation of our Terms of Service. Please contact support to restore access to this repository. Read here to learn more about decreasing the size of your repository. "
As you may have guessed, we use github to host git repositories. And so, suddenly and without declaring war, github blocked our repository for exceeding the maximum size allowed. The exact figure was not given on their website. At the time of locking, the .git folder was approximately 25 GB in size. (Note 2020: the limits are now higher, and the github site explicitly states that the size of the repository should not exceed 100 GB).
How did we manage to make such a large repository? The reason is clear: we store binary files in it. It is written everywhere that it is not recommended to do this, but it is much easier for us. We want the game to be launched from the repository immediately, without additional effort. Therefore, we commit graphics and other game resources to the repository.
But this is not so bad. An important lesson we learned from this whole story: never
Fight for history
So, nothing works for anyone. We told the team that they would have to work locally for a day, but not try very hard, otherwise they would clear up conflicts later (everyone was very upset and immediately left for tea). And they began to think what to do. It is clear that a new repository is needed, but what to commit there? An easy way is the current state of all branches. But we didn't like it that much, because the history of changes will be lost, everyone's favorite git blame command will break, and everything will go somersault. Therefore, we decided to do this: erase the history of binary files, and keep the history of text files.
Step 1. Delete the history of binaries
We had a complete local copy of the repository. The first thing we found was the excellent BFG Repo-Cleaner utility . It's very simple yet very fast, and the title is good.
An example execution scenario:
java -jar bfg.jar bfg --delete-files *.{pvrtc,webp,png,jpeg,fla,swl,swf,pbi,bin,mask,ods,ogv,ogg,ttf,mp4} path_to_repository
The parameters contain all the extensions of the binary files that we could come up with. From all commits in the world, information about files with these extensions will be deleted. The utility is smart and when deleting the history of the file, it leaves its most recent version. In addition, this latest version will be included in the most recent commit on the branch. We also wanted to delete the history of exe and dll files, but the utility gave an error. Apparently, for some reason, processing in the form of * .exe is prohibited. Moreover, if you explicitly specify a file, for example, gardenscapes.exe, then everything works. (Note 2020: the bug may have been fixed already).
Step 2. Compress the repository
After the first step, the size of the repository is still large. The reason for this is the way git works. We removed only links to files, but the files themselves remained.
To physically delete the files, you need to run the git gc command, namely:
git reflog expire --expire=now --all
and than:
git gc --prune=now --aggressive
This is the sequence of commands recommended by the author of the utility. Here gc really takes a long time. In addition, with the default repository settings, the git client does not have enough memory to complete the operation and needs some dancing with a tambourine. (Note 2020: at that time we had a 32-bit version of git. Most likely, these problems are no longer in the 64-bit version).
Step 3. Writing commits to the new repository
This turned out to be the most interesting part of the quest.
To understand what follows, you need to understand how git works. You can read more about git in many places, including our blog:
So, we have a very, very many commits locally, these commits are correct, that is, without the history of binaries. It would seem that it is enough to execute git push and everything will work itself. But no!
If you just execute the command git push -u master, then git cheerfully begins the process of uploading data to the server, but crashes with an error of about 2 GB. This means that you won't be able to upload so many commits in one go. We will eat the elephant in parts. We figured that 2,000 commits would probably fit in 2GB. The total volume of our repository was then about 20,000 commits, distributed between 4 branches: master-v101-v102-v103. (Note 2020: eh, youth! Since then everything has become much more serious. There are already more than 100,000 commits in this repository, and there are several dozen release branches. At the same time, we still fit into the Github limits)
First of all, we consider the number of commits in the branches when help command:
git rev-list --count <branch-name>
For example, there are approximately 10,000 commits in the master branch. Now we can use the extended syntax for the git push command, namely:
git push -u origin HEAD~8000:refs/origin/master
HEAD ~ 8000: refs / origin / master is the so called refspec. The left side says that you need to take commits up to a commit that is 8,000 away from HEAD, that is, just about 2,000 commits. And the right side is that you need to push them to the remote master branch. The full path to the refs / origin / master branch is needed here.
After that, there is still no master branch, and, for example, git fetch will not be able to download it. This is not surprising - after all, the commit that would point to her HEAD does not exist yet. Nevertheless, after repeating the command git push HEAD ~ 8000: refs / origin / master , we saw the answer that these commits are already on the server, and, therefore, the work is done after all.
Next, we thought that the process is clear and the rest of the work can be assigned to the script. The last commit will be very large, since all binaries will be included in it. Therefore, just in case, the last 10 commits are filled in separately. The script turned out like this:
git push origin HEAD~6000:refs/origin/master
git push origin HEAD~5000:refs/origin/master
git push origin HEAD~4000:refs/origin/master
git push origin HEAD~3000:refs/origin/master
git push origin HEAD~2000:refs/origin/master
git push origin HEAD~1000:refs/origin/master
git push origin HEAD~10:refs/origin/master
git push origin master
git checkout v101
git push -u origin HEAD~1000:refs/origin/v101
git push origin HEAD~10:refs/origin/v101
git push origin v101
git checkout v102
β¦ ..
That is, we consistently write all our branches to the server, 2,000 commits per push, and the last 10 commits separately.
This whole story took a lot of time, and the clock was shown closer to 12 at night. So we left the script to work overnight, said the proper prayers to Cthulhu (Note 2020: it was still relatively popular then) and went home.
The final. Happy end
In the morning, having opened the repository on the github site, we made sure that the script worked successfully and that all commits and branches were in place.
As a result: the size of the repository (.git folder) has been reduced from 25 GB to 7.5 GB. At the same time, all important commit history - everything except binaries - is preserved. The game designers drank more tea than usual. Programmers got an unforgettable experience. And they urgently began to think about how to do it so that it was not necessary to commit the executable file to the repository, but it would be convenient to work with it.