13 May 2009

Cloning multiple git repos

Many of the maintainers of linux-kernel maintain a git repository. I usually have clones of various such repositories.

For example I have clones of
Linus Torvalds's repo:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
Block Maintainer, Jens Axboe's repo:
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git
...

If I do individual clones of all these repositories, it downloads and maintains duplicate copies of same objects wasting disk space, and network bandwidth.
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
git clone git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git

So I was looking for a way to share the common objects so that duplicate objects wont waste disk and network. And no surprise, git has a way to do that. Just that I was unaware of a simple option, "--reference".
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
git clone --reference linux-2.6/ git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git






Difference between cloning Jens' git with and without --reference to Linus's git.

# git clone git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git
Initialized empty Git repository in /home/knikanth/labs-sw/linus/linux-2.6-block/.git/
remote: Counting objects: 1180249, done.
remote: Compressing objects: 100% (295444/295444), done.
remote: Total 1180249 (delta 984716), reused 1073684 (delta 878311)
Receiving objects: 100% (1180249/1180249), 289.32 MiB | 496 KiB/s, done.
Resolving deltas: 100% (984716/984716), done.
Checking out files: 100% (27842/27842), done.
# du -sh linux-2.6-block/
714M linux-2.6-block/


# git clone --reference linux-2.6/ git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git
Initialized empty Git repository in /home/knikanth/labs-sw/linus/linux-2.6-block/.git/
remote: Counting objects: 111061, done.
remote: Compressing objects: 100% (19021/19021), done.
remote: Total 100463 (delta 84138), reused 95679 (delta 79959)
Receiving objects: 100% (100463/100463), 23.21 MiB | 1209 KiB/s, done.
Resolving deltas: 100% (84138/84138), completed with 8189 local objects.
Checking out files: 100% (27842/27842), done.
# du -sh linux-2.6-block/
468M linux-2.6-block/




--reference automatically sets up .git/objects/info/alternates to obtain objects from the reference repository. Now I wonder whether it is possible to have circular references, multiple references, etc.. The plural file name, "alternates" suggests it should be possible, but "git clone" ignores multiple --reference on the command line!

BTW git uses SHA-1 digests to identify objects. I wonder what is the chance of a SHA-1 collision and how git handles it? The SHA-1 digest has 40 Hex-digits == 160 bits.. So at most, only 2160 objects are possible. :-)

10 May 2009

Mother's day story

The story that disturbs me the most.

A monkey and its baby were caught in flood. The mother held its baby above the water level. Slowly the water level raised. Knee level, hip, chest, throat.. the monkey dropped its baby in the water and stood on top of it.

Thats why they are called monkeys.