Cleaning a local Git repository

What does "cleaning" mean?

Every action in Git is tracked locally. From the individual commits to the files themselves, all of this is recorded for future reference. This history builds up over time though and can make for some large repositories.

Though it's highly unlikely for a repository to become so massive that it needs to be cleaned, it's still interesting to know that this feature exists.

Checking the size of a Git repository

When writing my guide on removing things from a Git repository I discovered a command to display the total size of a repository:

$ git count-objects -vH

The git count-objects command returns an output similar to the following, where -v displays verbose information and -H displays human readable file sizes:

count: 2009
size: 7.85 MiB
in-pack: 758
packs: 2
size-pack: 216.82 KiB
prune-packable: 0
garbage: 0
size-garbage: 0 bytes

How to clean a repository

As Roberto Tyley writes in the documentation for BFG Repo-Cleaner, a repository can be cleaned in a "one-liner":

$ git reflog expire --expire=now && git gc --prune=now --aggressive

This command, which is two commands chained together with &&:

  1. expires Git's reference logs; and
  2. runs a clean up to optimise the total size via compression.

Expiring Git reference logs

$ git reflog expire --expire=now

Git's reference logs record when the tips of branches have been updated, e.g. after a commit, merge or similar action. Reference logs are useful for rolling back undesired changes.

Removing these logs can clean up space when there are a lot of them.

The amount of history removed with the git reflog expire command can be controlled with the --expire=<time> flag. To remove all reference logs, --expire=all can be used. The default is typically 90 days.1

The --all flag ensures that all references are checked for expiry.

Running a clean up and pruning loose objects

$ git gc --prune=now --aggressive

The git gc command on its own is good for general optimisation, but the settings need to be changed to clean up loose objects for a thorough clean.

The --prune=<date> flag instructs the clean up to prune any loose objects older than the passed time. To prune everything at the time of command, --prune=now is used.

The --aggressive flag instructs the clean up to optimise the repository further at the expense of time.


Works cited

Tyley, Roberto. "BFG Repo-Cleaner." https://github.com/rtyley/bfg-repo-cleaner/. Accessed 15 August 2018.

Footnotes


  1. The closest I can get to a comprehensive list of accepted date formats in Git is via a test file for date formats in the Git source code.