-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repository clean-up #1281
Comments
Yes, we are aware of this issue. The size has grown over the years. Time to time we have removed unwanted/unused files. Here is the output of git count-objects This number is substantially smaller than 6G. I also looked at the vendor folders in platform-operator and helm-pod. They are about 40 MB each. So they are also not contributing a whole lot to the size. Maybe the branches are counting towards the size. I am running 'git gc' now. Let's see if that helps. |
I think it is the branches. In .git/objects/pack, there is a .pack file which is 6.6G is size. It is all the history of the repository over the years. We have 130+ branches. The only active branches right now are "develop" and "master". My workflow is to work on develop and then do a PR to master. So, Option 2: Find all the files that are no longer in the master branch and then purge them from other branches.
Option 3: Punt this issue for later with the acknowledgment that we will have to fix this eventually. In the documentation, explicitly mention shallow cloning the repository. With shallow clone, the size of the repository is 518M. Any other option:? Option 1 is simplest. Ideally, it would be great to know how much of a delta deleting a particular branch will achieve. May be, I can try to delete a branch, then re-clone, and see how much does it reduce the size of the cloned repo. Option 3 does not rock the boat right now. Thoughts? |
I believe we should go with the This will also make our Along with this, as I have suggested on Slack we may move independent modules out of this repo such as operator-analysis so that it may have it's own independent releases and development cycles. |
@chiukapoor I have cleaned up the branches. There are now only 12 branches remaining (including master and develop). The 10 remaining branches (except master and develop) cannot be deleted without further careful analysis. We can do that later. The size of the repository with the above curl command still shows 6.6G. The .pack file in .git folder is the cause of the size. Can you look into the ways to reduce the size of this file? |
RCAFindings:Upon researching Git packing, I discovered that the .pack file encompasses both the objects and history of a Git repository. To identify large files in the Git history, I utilized the following script found on Stack Overflow:
This script detects blob objects (which represent file contents) larger than 20 MiB across the entire Git history, excluding those currently in the HEAD. It then sorts and presents these files alongside their sizes, concluding with the total size of all the identified large files in the repository's history. Here are the identified large files:
SolutionClean-up:To address this issue, the outdated and unwanted objects such as old binaries will be removed using BFG, as suggested on Stack Overflow (PS: I have tested this locally and the .pack file size is down to less than 300 MiB) Fix:To prevent this issue in the future, it's important to refrain from uploading binary files to the git repository. Instead, GitHub's release and tags feature and automated CI can be utilized. https://docs.github.com/en/repositories/releasing-projects-on-github/managing-releases-in-a-repository |
@chiukapoor Great work! |
Looking at the files, we do want to keep following files in the repo:
We can remove rest of the files. Also, looks like there is another approach to remove files from history I will experiment with both (bfg cleaner and git filter-repo) in coming days. |
Issue
The text was updated successfully, but these errors were encountered: