[bitbake-devel] fetch: git: add reference parameter

Submitted by Rasmus Villemoes via lists.openembedded.org on Oct. 18, 2020, 9:33 p.m. | Patch ID: 177557

Details

Message ID 20201018213352.32692-1-rasmus.villemoes@prevas.dk
State New
Headers show

Commit Message

The --reference(-if-able) option to git clone can significantly speed
up the initial do_fetch if one is able to point git at a local
repository containing a large number of the objects that would
otherwise need to be downloaded.

Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>

---
 lib/bb/fetch2/git.py | 11 +++++++++++
 1 file changed, 11 insertions(+)

-- 
2.23.0
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#11760): https://lists.openembedded.org/g/bitbake-devel/message/11760
Mute This Topic: https://lists.openembedded.org/mt/77645766/1003190
Group Owner: bitbake-devel+owner@lists.openembedded.org
Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [mhalstead@linuxfoundation.org]
-=-=-=-=-=-=-=-=-=-=-=-

Patch hide | download patch | download mbox

diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py
index b97967b4..c8837816 100644
--- a/lib/bb/fetch2/git.py
+++ b/lib/bb/fetch2/git.py
@@ -51,6 +51,13 @@  Supported SRC_URI options are:
    For local git:// urls to use the current branch HEAD as the revision for use with
    AUTOREV. Implies nobranch.
 
+- reference
+   This can be set to the path to a local repository expected to
+   contain many of the objects that would otherwise need to be
+   downloaded. It will be passed as the argument to the
+   '--reference-if-able' option of 'git clone', along with the
+   '--dissociate' option.
+
 """
 
 # Copyright (C) 2005 Richard Purdie
@@ -151,6 +158,8 @@  class Git(FetchMethod):
 
         ud.nobranch = ud.parm.get("nobranch","0") == "1"
 
+        ud.reference = ud.parm.get("reference", "")
+
         # usehead implies nobranch
         ud.usehead = ud.parm.get("usehead","0") == "1"
         if ud.usehead:
@@ -344,6 +353,8 @@  class Git(FetchMethod):
             if repourl.startswith("file://"):
                 repourl = repourl[7:]
             clone_cmd = "LANG=C %s clone --bare --mirror %s %s --progress" % (ud.basecmd, shlex.quote(repourl), ud.clonedir)
+            if ud.reference:
+                clone_cmd += " --reference-if-able %s --dissociate" % shlex.quote(ud.reference)
             if ud.proto.lower() != 'file':
                 bb.fetch2.check_network_access(d, clone_cmd, ud.url)
             progresshandler = GitProgressHandler(d)

Comments

Richard Purdie Oct. 19, 2020, 9:09 a.m.
On Sun, 2020-10-18 at 23:33 +0200, Rasmus Villemoes wrote:
> The --reference(-if-able) option to git clone can significantly speed

> up the initial do_fetch if one is able to point git at a local

> repository containing a large number of the objects that would

> otherwise need to be downloaded.

> 

> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>

> ---

>  lib/bb/fetch2/git.py | 11 +++++++++++

>  1 file changed, 11 insertions(+)


Whilst I can see why this is useful, it is going to end up causing
problems when you share that recipe using this term with others :(. I
don't think we should add API which will cause scaling issues end
"encourage" behaviour which leads to problems.

Clone by reference also causes problems for the archiver.

I have often wondered about this issue though, my own thoughts were
some kind of "keyword" system where if present, a base repository could
be cloned, then added to from the original upstream. At the end of
cloning, revisions could be pushed back into that making it a kind of
mirror.

So yes, I see the attraction/need but I think to have something merged,
it would need to be more usable in the general case.

Cheers,
Richard
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#11761): https://lists.openembedded.org/g/bitbake-devel/message/11761
Mute This Topic: https://lists.openembedded.org/mt/77645766/1003190
Group Owner: bitbake-devel+owner@lists.openembedded.org
Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [mhalstead@linuxfoundation.org]
-=-=-=-=-=-=-=-=-=-=-=-
Richard Purdie Oct. 19, 2020, 10:43 a.m.
On Mon, 2020-10-19 at 12:25 +0200, Rasmus Villemoes wrote:
> On 19/10/2020 11.09, Richard Purdie wrote:

> > On Sun, 2020-10-18 at 23:33 +0200, Rasmus Villemoes wrote:

> > > The --reference(-if-able) option to git clone can significantly

> > > speed

> > > up the initial do_fetch if one is able to point git at a local

> > > repository containing a large number of the objects that would

> > > otherwise need to be downloaded.

> > > 

> > > Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>

> > > ---

> > >  lib/bb/fetch2/git.py | 11 +++++++++++

> > >  1 file changed, 11 insertions(+)

> > 

> > Whilst I can see why this is useful, it is going to end up causing

> > problems when you share that recipe using this term with others :(.

> > I

> > don't think we should add API which will cause scaling issues end

> > "encourage" behaviour which leads to problems.

> > 

> > Clone by reference also causes problems for the archiver.

> 

> How? Note that I unconditionally add the --dissociate, so that after

> the download, there's absolutely no link between the referenced and

> the cloned repository. Also, I use the -if-able variant which will

> just silently ignore the given repo if it cannot be used (often,

> because it doesn't exist). So I don't think the presence of absence

> of ;reference= can ever affect the correctness or end result. But see

> below;


Ok, I'd missed that dissociate was available. I think that is new since
the last time I looked at this (its been a while) and that helps
simplify things.

>  I don't think any recipes should/would ever grow a ;reference=

> directly, as it's really a local opt-in optimization, so the recipe

> should just be written to allow that easily.


You know how this works, why and what it means. Most people won't so it
will get added to public recipes and it will cause problems. I've been
a maintainer long enough to know how this kind of works out
unfortunately.

I realised there is also a bigger problem. You're changing SRC_URI to
be specific to local paths. This will affect the sstate checksums and
break sstate reuse. We don't want to do that. My counter-proposal of an
identifier in SRC_URI like ";reference=linux-kernel" won't cause a
problem with checksums.

> > I have often wondered about this issue though, my own thoughts were

> > some kind of "keyword" system where if present, a base repository

> > could

> > be cloned, then added to from the original upstream. At the end of

> > cloning, revisions could be pushed back into that making it a kind

> > of

> > mirror.

> > 

> > So yes, I see the attraction/need but I think to have something

> > merged,

> > it would need to be more usable in the general case.

> 

> By far, the biggest pain point and what we're mostly going to use

> this for is cloning linux repositories - those are usually at least

> 2G in size, and every project/board uses such a thing. So my example

> is based on that. In our linux recipe, we have (simplified)

> 

> LINUX_SRC_URI ?= "git://our.server/generic/linux.git;protocol=ssh"

> SRC_URI = "${LINUX_SRC_URI}"

> SRC_URI += [various .cfg files etc.]

> 

> i.e., we have made sure that the main component of SRC_URI is its own

> variable, which can thus easily be overridden from machine config or

> elsewhere - when doing development, it's nice to able to just put

> 

> LINUX_SRC_URI = "git:///some/local/path;protocol=file"

> 

> in local.conf (the branch and SRCREV are similarly defined via such

> LINUX_* variables). So I was thinking that a LINUX_SRC_URI_append =

> ";reference=..." would do the trick. Alternatively, I'd rewrite the

> recipe to read

> 

> SRC_URI = "${LINUX_SRC_URI}${LINUX_SRC_URI_REF}"

> 

> with LINUX_SRC_URI_REF of course being ?= "". Then any developer can

> just set LINUX_SRC_URI_REF = ";reference=/my/local/linux-tree", and

> our CI setup (which, when asked to do clean build, runs with an empty

> initial downloads dir) could similarly be taught to set that

> variable.

> 

> Sure, this is all just for one recipe/repo, but (a) the user would

> necessarily need to provide such a reference for each recipe making

> use of this and (b) there are not really that many projects whose

> repo sizes warrant this, so it's quite manageable to tweak the few

> recipes.

> 

> I'll also note that this has been a bigger pain for the past few

> months when I've been working from home, connected to our

> infrastructure over a slow vpn connection. Nothing indicates that

> things are returning to normal any time soon, so I'd really like to

> find some way to reduce the necessary network traffic.


As I said, I totally understand the need/usage. I just don't think the
implementation can be merged as is for various reasons, we need
something more generic. I've tried to give some hints on a better
"interface" which would allow something to be merged.

Cheers,

Richard
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#11763): https://lists.openembedded.org/g/bitbake-devel/message/11763
Mute This Topic: https://lists.openembedded.org/mt/77645766/1003190
Group Owner: bitbake-devel+owner@lists.openembedded.org
Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [mhalstead@linuxfoundation.org]
-=-=-=-=-=-=-=-=-=-=-=-