Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use shallow clone for analysis #38

Open
rhaschke opened this issue Aug 31, 2019 · 4 comments
Open

use shallow clone for analysis #38

rhaschke opened this issue Aug 31, 2019 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@rhaschke
Copy link
Contributor

Is it possible to use git clone --depth n to just clone the recent history of the requested branch?
Some repos, e.g. OpenCV, have a huge history and take ages to download fully.

@rhaschke rhaschke added the enhancement New feature or request label Aug 31, 2019
@scymtym
Copy link
Member

scymtym commented Sep 2, 2019

Thank you for the suggestion. I'm not completely sure what you mean, though. Which context are we talking about, the analyze command or analysis in other commands? Is this suggestion for cloning from the cache (that should already use a limited depth), for creating the cache entry, or for working without the cache?

@rhaschke
Copy link
Contributor Author

rhaschke commented Sep 2, 2019

I was aiming for the analyze command, creating the cache entry and/or working w/o the cache.

@scymtym
Copy link
Member

scymtym commented Sep 3, 2019

I did a few experiments.

  1. The analyze command should usually work with --depth 1 (and doesn't use the cache).

  2. Cloning a specific from the cache for analysis can usually use --depth 1.

  3. Two possibilities for cache entry creation:

    1. Create the cache entry with

      git clone --bare --depth 1 URL                                    # shallow bare clone, basically empty
      git config --add remote.origin.fetch '+refs/heads/*:refs/heads/*' # should fetch branches
      git config --add remote.origin.fetch '+refs/tags/*:refs/tags/*'   # should fetch tags
      git fetch --depth 1                                               # shallow fetch
    2. Like 3.1 but only fetch branches, tags and commits that are actually needed. This requires collecting the referenced branches, tags and commits for a given project and updating what the cache entry should fetch when new references are needed.

The improvements 1. and 2. can be implemented easily but will typically not gain us a lot.

3.1 is relatively effective (300 MB instead of more than 1 GB for opencv) but doesn't work for all cases: directly specified commits as well unusual ref names will not be present in the cache entry.

3.2 is very effective and also correct but complicated so implementing it may not be worth the trouble.

@rhaschke
Copy link
Contributor Author

rhaschke commented Sep 3, 2019

Regarding 3.2: Don't you know the required refs anyway? You could even fetch individual refs on demand only, i.e. only fetch the ones that you actually need for the present analysis. Usually, within a distribution, we don't often switch between different versions, do we?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants