How to remove sensitive files from a Git repository

The files are indexed, the commit message is written, the data is sent to the server ... And suddenly you want to turn back the clock. The commit contains a file that should not be there. When that happens, it's time to go to a search engine.



Every developer has at one time committed files with confidential information to the public repository by mistake. How to deal with such a problem? How to make sure that nothing like this happens again?



In this article, I will tell you what to do if a file accidentally gets into the repository that has absolutely nothing to do there. Here I will provide Git commands that will allow you to correct history, and share some recommendations for organizing the safe work with confidential information.





Removing sensitive files from the Git repository ( large image )



Minimizing damage



So, you accidentally committed a file with confidential information. Let's call this file .env. Immediately after this happened, you need to ask yourself a couple of questions:



  • Has the commit been pushed to the remote repository?
  • Is the remote repository publicly accessible?


▍Commit has not yet been sent to the remote repository



If you have not yet sent a commit to the repository, then, in general, the situation that has arisen does not pose any threat. In order to fix everything, you just need to go back to the previous commit:



git reset HEAD^ --soft


The files will remain in the working copy of the repository, you can make the necessary changes to the project.



If you want to keep the commit and you just need to remove certain files from it, then do this:



git rm .env --cached
git commit --amend


This parameter --amendcan only be used to work with the most recent commit. If you added a few more after a failed commit, use this command:



git rebase -i HEAD~{    ?}


This will fix the wrong commit and will help you not to lose the changes made to the project by other commits.



▍Commit has been sent to the remote repository



If you have already pushed a commit to a remote repository, then, first of all, you need to know about the difference between public and private repositories.



If your repository is private and not accessible to bots or people you don't trust, you can simply tweak the last commit using a couple of the commands above.



If you have pushed other commits to the repository after a problematic commit, this will not prevent you from removing sensitive files from your Git history using the git filter-branch command or the BFG Repo-Cleaner tool .



Here's a usage example git filter-branch:



git filter-branch --force --index-filter "git rm --cached --ignore-unmatch .env" --prune-empty --tag-name-filter cat -- --all


But when doing this, keep in mind two important aspects of such changes made to the repository:



  • Git. , - , , PR, . .
  • . , , . , , , , . , ID, , .


Do I need to create new secret keys if their current versions are in the public repository?



If you briefly answer the question in the title, then - it is necessary. If your repository is publicly available, or if you, for any reason, believe that it is not a place to store sensitive data, you will need to consider the confidential data that got into it as compromised.



Even if you removed this data from the repository, you cannot do anything with bots and repository forks. How to proceed?



  • Deactivate all keys or passwords. This must be done first. After you deactivate the keys, confidential information that has gone public becomes useless.
  • Customize the file .gitignore. Take in the .gitignorerecords of files with sensitive information to Git would not monitor the status of those files.
  • Prepare a commit that does not contain sensitive files.
  • Submit the changes to the repository, provide the commit with explanations about the situation. Don't try to hide the error. All programmers working on the project, including you, will appreciate the presence in the repository of a commit with an explanation of the situation and a description of what exactly was fixed with this commit.


Best practices for keeping sensitive files in projects that use Git for version control



In order to prevent leaks of confidential information, you should adhere to the following recommendations.



▍Store sensitive data in a .env file (or other similar file)



Keep API keys and similar information in a single file .env. With this approach, if Git does not keep track of the state of the file .env, by adding a new key to this file, you will not accidentally push it to the repository.



Another advantage of this approach is that this way you will have access to all keys through a global variable process.



▍Use API keys if possible



Compromised API keys are easy to deactivate and easy to recreate. If possible - use them, and not something like logins and passwords.



▍Store API keys using your build tool



API keys are usually needed when building applications. Build tools like Netlify let you keep keys in secure vaults. Such keys are automatically injected into the application using a global variable process.





Environment Variable Management



▍Add .env file entry to .gitignore file



Prevent Git from tracking sensitive files.



▍Prepare a template .env.template file



Having such a template file helps those working on a project add API keys to the project, eliminating the need to read the documentation.



▍Do not change Git history in remote repositories



Try to stick to this rule strictly. If you've followed the guidelines above, then you won't need to change your Git history.



Outcome



I hope my material will help you to work safely with confidential data.



Have you ever submitted something to a public repository that shouldn't go there?










All Articles