History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: QB-1829
Type: Improvement Improvement
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Unassigned
Reporter: Don Ross
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
QuickBuild

Enable use of reference copies when cloning a git repository

Created: 01/Nov/13 05:00 PM   Updated: 04/Dec/13 08:40 PM
Component/s: None
Affects Version/s: 5.0.39
Fix Version/s: 5.1.0-rc4, 5.1.0

Original Estimate: Unknown Remaining Estimate: Unknown Time Spent: Unknown
Environment: CentOS server, Windows 7 agent, msysgit version 1.8.4


 Description  « Hide
Cloning a large git repository can add a large chunk to build time, not to mention disk space. To reduce these costs (as well as network bandwith), it is possible from the git command line to create a bare repository:
> git clone --bare ssh:user@host/git/myrepo.git myrepo_dir C:\GitRepos\myrepo.git
and then use that bare repository as a reference copy when creating new workspaces on the system:
> git clone --reference C:\GitRepos\myrepo.git ssh:user@host/git/myrepo.git C:\workspaces\first_workspace
> git clone --reference C:\GitRepos\myrepo.git ssh:user@host/git/myrepo.git C:\workspaces\second_workspace

It would be very helpful for the git plug-in to QuickBuild to support this model.

All that would be needed would be a field in the Git repository definition 'reference copy', which would be the absolute path to the reference copy (with variable expansion of course).
This would then be passed on the command line via the '--reference' option. If the reference copy does not exist, then this option is already ignored by the git command.

Of course, a truly robust implementation would have the plug-in detect the existence of the reference copy and create it (via the --bare command) if it does not. But that is perhaps over-reaching.

 All   Comments   Work Log   Change History      Sort Order:
Robin Shen [02/Nov/13 01:40 AM]
Thanks for sharing the idea. My understanding is that the reference repository serves as a cache, and it should also gets synced with remote repository in a regular basis in order not to cause too much cache misses. Also this approach is mainly for purpose of reducing clone time, but can not help reducing disk space usage. Am I correct on this?

Don Ross [02/Nov/13 03:59 AM]
Oh, no, it is a huge space saver. For example, on my desktop I have a month-old 3.4gb reference copy and four workspaces that use it (each for a different branch in the same repository). The .git folder in each of those workspaces is between 50 and 75mb.

It is not just a cache; any database object that is needed in the workspace .git folder is linked against the reference copy; it is *not* copied into each workspace that needs it. Only objects added since the reference copy was created need to exist in the workspace's .git folder.

Over time, as changes are made to the origin repository, it grows more disparate from the reference copy, and clones become larger. I find I have to rebuild my reference copy about once every quarter to keep my clones under 100mb - and really, even 100mb is tiny compared to the size of the reference copy.

Of course, this does not help with the size of the workspace itself; my library of checked out files in each workspace still amount to about 6gb each, plus the size of derived objects from the build. But at least I am not pulling down 3.4gb over the network every time I make a new clone.

This would be particularly useful for QuickBuild, since it looks like every configuration needs to have its own workspace, even multiple configuration builds of the same branch (for example, debug, release, continuous...)

Jenkins has this capability and we are using it heavily in our current deployment. I can see the lack of this feature being an impediment to my convincing the team to move from Jenkins to QuickBuild.

Robin Shen [03/Nov/13 01:00 AM]
I understand it now, will take care of this soon.

Robin Shen [20/Nov/13 02:10 AM]
One can define reference setting in advanced setting of Git repo now.

Don Ross [04/Dec/13 08:40 PM]
Thanks, Robin. However, it looks like the Reference repository doesn't take a script/variable definition?
This makes it hard when some boxes are on Windows and some on Linux.

Or if I have my repository configured with a path like:
] ssh://hostname/git/${vars.getValue("gitRepoName")}
in which case I want my reference to be
] E:\reference\${vars.getValue("gitRepoName")}

But thank you for the start!