13 May 2009

Cloning multiple git repos

Many of the maintainers of linux-kernel maintain a git repository. I usually have clones of various such repositories.

For example I have clones of
Linus Torvalds's repo:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
Block Maintainer, Jens Axboe's repo:
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git
...

If I do individual clones of all these repositories, it downloads and maintains duplicate copies of same objects wasting disk space, and network bandwidth.
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
git clone git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git

So I was looking for a way to share the common objects so that duplicate objects wont waste disk and network. And no surprise, git has a way to do that. Just that I was unaware of a simple option, "--reference".
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
git clone --reference linux-2.6/ git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git






Difference between cloning Jens' git with and without --reference to Linus's git.

# git clone git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git
Initialized empty Git repository in /home/knikanth/labs-sw/linus/linux-2.6-block/.git/
remote: Counting objects: 1180249, done.
remote: Compressing objects: 100% (295444/295444), done.
remote: Total 1180249 (delta 984716), reused 1073684 (delta 878311)
Receiving objects: 100% (1180249/1180249), 289.32 MiB | 496 KiB/s, done.
Resolving deltas: 100% (984716/984716), done.
Checking out files: 100% (27842/27842), done.
# du -sh linux-2.6-block/
714M linux-2.6-block/


# git clone --reference linux-2.6/ git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git
Initialized empty Git repository in /home/knikanth/labs-sw/linus/linux-2.6-block/.git/
remote: Counting objects: 111061, done.
remote: Compressing objects: 100% (19021/19021), done.
remote: Total 100463 (delta 84138), reused 95679 (delta 79959)
Receiving objects: 100% (100463/100463), 23.21 MiB | 1209 KiB/s, done.
Resolving deltas: 100% (84138/84138), completed with 8189 local objects.
Checking out files: 100% (27842/27842), done.
# du -sh linux-2.6-block/
468M linux-2.6-block/




--reference automatically sets up .git/objects/info/alternates to obtain objects from the reference repository. Now I wonder whether it is possible to have circular references, multiple references, etc.. The plural file name, "alternates" suggests it should be possible, but "git clone" ignores multiple --reference on the command line!

BTW git uses SHA-1 digests to identify objects. I wonder what is the chance of a SHA-1 collision and how git handles it? The SHA-1 digest has 40 Hex-digits == 160 bits.. So at most, only 2160 objects are possible. :-)

1 comment:

Suresh said...

Nice tip! Keep 'em coming..