Git clone clones the entire repository (including history and non-relevant branches). When repositories start building up history, this becomes more and more expensive.
A common pattern (in our case at least) is to run "clean workspace" as a pre-execute action for "master" step to ensure a pristine workarea, but this of course results in a full clone being done.
What's worse, it looks like what happens is that it actually happens twice:
-A clone is created during build condition check
-Then workspace is cleaned during master
-Another clone is created during the actual "checkout" step
This is probably fine for small repos and "trivial" cases, but in larger setups the network traffic and IO starts to add up.
As a concrete example, one of our projects after migrating from SVN to Git:
2.1Mb of source vs 163Mb of .git folders - for spinning up a build we're only interested in that 2.1 mb, but currently downloading 163mb (in worst case, twice)
Possible solutions and workarounds
1. Add a configuration option that will intercept "clone" command and cache .git folder (for first clone), for later clone of the same repo it will create link (mklink /J on windows, and ln -s on Linux) of .git folder.
This caches size and location and max size would be configurable, and cleaned in FIFO fashion (oldest repo is thrown away if running out of space)
2. Add logic/option to run "git clean -xdf" and "git reset --hard" instead of "cleanup workspace", or meaning that "cleanup workspace" deletes everything else but preserves .git (if so specified) folder and at the start of the build
instead of doing a clone, does a reset + pull
3.Use git shallow clone (git clone --depth 1 ), this could work for some scenarios but as it can only get the HEAD revision then not very useful for multi platform builds where the same revision is required across multiple machines
4.Use git single branch clone option (--single-branch), this would work for most scenarios and would reduce the I/O requirements and diskspace used a bit depending on how many branches a team have in a repo, but this is only a delaying tactic and does not address the history problem.
5. Something completely different and less fragile magic that we haven't thought of but achieves the following: only the files of a targeted branch are pulled into the workarea, not the entire repo every time, checking build condition
does not require cloning the entire repository while allowing us to keep the convenience of "clean workarea" pre-execution action.