Message ID | 20220819165455.270130-1-marex@denx.de |
---|---|
State | Accepted, archived |
Commit | d32e5b0ec2ab85ffad7e56ac5b3160860b732556 |
Headers | show |
Series | [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata | expand |
> -----Original Message----- > From: Marek Vasut <marex@denx.de> > Sent: den 19 augusti 2022 18:55 > To: bitbake-devel@lists.openembedded.org > Cc: Marek Vasut <marex@denx.de>; Martin Jansa <Martin.Jansa@gmail.com>; Peter Kjellerstedt <peter.kjellerstedt@axis.com>; Richard Purdie <richard.purdie@linuxfoundation.org> > Subject: [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata > > The bitbake git fetcher currently fetches 'refs/*:refs/*', i.e. every > single object in the remote repository. This works poorly with gitlab > and github, which use the remote git repository to track its metadata > like merge requests, CI pipelines and such. > > Specifically, gitlab generates refs/merge-requests/*, refs/pipelines/* > and refs/keep-around/* and they all contain massive amount of data that > are useless for the bitbake build purposes. The amount of useless data > can in fact be so massive (e.g. with FDO mesa.git repository) that some > proxies may outright terminate the 'git fetch' connection, and make it > appear as if bitbake got stuck on 'git fetch' with no output. > > To avoid fetching all these useless metadata, tweak the git fetcher such > that it only fetches refs/heads/* and refs/tags/* . Avoid using negative > refspecs as those are only available in new git versions. > > Signed-off-by: Marek Vasut <marex@denx.de> > --- > Cc: Martin Jansa <Martin.Jansa@gmail.com> > Cc: Peter Kjellerstedt <peter.kjellerstedt@axis.com> > Cc: Richard Purdie <richard.purdie@linuxfoundation.org> > --- > lib/bb/fetch2/git.py | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py > index 4534bd75..b5fc0a51 100644 > --- a/lib/bb/fetch2/git.py > +++ b/lib/bb/fetch2/git.py > @@ -382,7 +382,7 @@ class Git(FetchMethod): > runfetchcmd("%s remote rm origin" % ud.basecmd, d, workdir=ud.clonedir) > > runfetchcmd("%s remote add --mirror=fetch origin %s" % (ud.basecmd, shlex.quote(repourl)), d, workdir=ud.clonedir) > - fetch_cmd = "LANG=C %s fetch -f --progress %s refs/*:refs/*" % (ud.basecmd, shlex.quote(repourl)) > + fetch_cmd = "LANG=C %s fetch -f --progress %s refs/heads/*:refs/heads/* refs/tags/*:refs/tags/*" % (ud.basecmd, shlex.quote(repourl)) > if ud.proto.lower() != 'file': > bb.fetch2.check_network_access(d, fetch_cmd, ud.url) > progresshandler = GitProgressHandler(d) > -- > 2.35.1 Seems like the right thing to do. We use Gerrit, which also has its metadata in special refs/ spaces. One repository I tested with grew from 3 MB to 35 MB when I fetched using refs/* while another grew from 20 MB to 120 MB, so there is definitely space and time to be saved by only fetching the refs/heads and refs/tags spaces.... Reviewed-by: Peter Kjellerstedt <peter.kjellerstedt@axis.com> //Peter
Hi, On Sat, Aug 20, 2022 at 12:06:55PM +0000, Peter Kjellerstedt wrote: > > -----Original Message----- > > From: Marek Vasut <marex@denx.de> > > Sent: den 19 augusti 2022 18:55 > > To: bitbake-devel@lists.openembedded.org > > Cc: Marek Vasut <marex@denx.de>; Martin Jansa <Martin.Jansa@gmail.com>; Peter Kjellerstedt <peter.kjellerstedt@axis.com>; Richard Purdie <richard.purdie@linuxfoundation.org> > > Subject: [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata > > > > The bitbake git fetcher currently fetches 'refs/*:refs/*', i.e. every > > single object in the remote repository. This works poorly with gitlab > > and github, which use the remote git repository to track its metadata > > like merge requests, CI pipelines and such. > > > > Specifically, gitlab generates refs/merge-requests/*, refs/pipelines/* > > and refs/keep-around/* and they all contain massive amount of data that > > are useless for the bitbake build purposes. The amount of useless data > > can in fact be so massive (e.g. with FDO mesa.git repository) that some > > proxies may outright terminate the 'git fetch' connection, and make it > > appear as if bitbake got stuck on 'git fetch' with no output. > > > > To avoid fetching all these useless metadata, tweak the git fetcher such > > that it only fetches refs/heads/* and refs/tags/* . Avoid using negative > > refspecs as those are only available in new git versions. > > > > Signed-off-by: Marek Vasut <marex@denx.de> > > --- > > Cc: Martin Jansa <Martin.Jansa@gmail.com> > > Cc: Peter Kjellerstedt <peter.kjellerstedt@axis.com> > > Cc: Richard Purdie <richard.purdie@linuxfoundation.org> > > --- > > lib/bb/fetch2/git.py | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py > > index 4534bd75..b5fc0a51 100644 > > --- a/lib/bb/fetch2/git.py > > +++ b/lib/bb/fetch2/git.py > > @@ -382,7 +382,7 @@ class Git(FetchMethod): > > runfetchcmd("%s remote rm origin" % ud.basecmd, d, workdir=ud.clonedir) > > > > runfetchcmd("%s remote add --mirror=fetch origin %s" % (ud.basecmd, shlex.quote(repourl)), d, workdir=ud.clonedir) > > - fetch_cmd = "LANG=C %s fetch -f --progress %s refs/*:refs/*" % (ud.basecmd, shlex.quote(repourl)) > > + fetch_cmd = "LANG=C %s fetch -f --progress %s refs/heads/*:refs/heads/* refs/tags/*:refs/tags/*" % (ud.basecmd, shlex.quote(repourl)) > > if ud.proto.lower() != 'file': > > bb.fetch2.check_network_access(d, fetch_cmd, ud.url) > > progresshandler = GitProgressHandler(d) > > -- > > 2.35.1 > > Seems like the right thing to do. We use Gerrit, which also has its > metadata in special refs/ spaces. One repository I tested with grew > from 3 MB to 35 MB when I fetched using refs/* while another grew > from 20 MB to 120 MB, so there is definitely space and time to be > saved by only fetching the refs/heads and refs/tags spaces.... As user of Gerrit, I fear this will cause problems. In my case developers are used to creating test topics and using git hashes in recipes which are not yet released, e.g. not yet in release branches or tags. This can of course create problems when such changes end up in real releases. Workaround is that developers can create throw away testing branches and refer to them in recipes. From one side this is an improvement to have less data in caches, but on the other side this adds extra actions to developers who want to test changes to their recipes. Can't decide which one is more important though :/ Cheers, -Mikko
Can be solved with a parameter to a fetcher perhaps? Alex On Mon 22. Aug 2022 at 7.20, Mikko Rapeli <mikko.rapeli@bmw.de> wrote: > Hi, > > On Sat, Aug 20, 2022 at 12:06:55PM +0000, Peter Kjellerstedt wrote: > > > -----Original Message----- > > > From: Marek Vasut <marex@denx.de> > > > Sent: den 19 augusti 2022 18:55 > > > To: bitbake-devel@lists.openembedded.org > > > Cc: Marek Vasut <marex@denx.de>; Martin Jansa <Martin.Jansa@gmail.com>; > Peter Kjellerstedt <peter.kjellerstedt@axis.com>; Richard Purdie < > richard.purdie@linuxfoundation.org> > > > Subject: [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching > gitlab repository metadata > > > > > > The bitbake git fetcher currently fetches 'refs/*:refs/*', i.e. every > > > single object in the remote repository. This works poorly with gitlab > > > and github, which use the remote git repository to track its metadata > > > like merge requests, CI pipelines and such. > > > > > > Specifically, gitlab generates refs/merge-requests/*, refs/pipelines/* > > > and refs/keep-around/* and they all contain massive amount of data that > > > are useless for the bitbake build purposes. The amount of useless data > > > can in fact be so massive (e.g. with FDO mesa.git repository) that some > > > proxies may outright terminate the 'git fetch' connection, and make it > > > appear as if bitbake got stuck on 'git fetch' with no output. > > > > > > To avoid fetching all these useless metadata, tweak the git fetcher > such > > > that it only fetches refs/heads/* and refs/tags/* . Avoid using > negative > > > refspecs as those are only available in new git versions. > > > > > > Signed-off-by: Marek Vasut <marex@denx.de> > > > --- > > > Cc: Martin Jansa <Martin.Jansa@gmail.com> > > > Cc: Peter Kjellerstedt <peter.kjellerstedt@axis.com> > > > Cc: Richard Purdie <richard.purdie@linuxfoundation.org> > > > --- > > > lib/bb/fetch2/git.py | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py > > > index 4534bd75..b5fc0a51 100644 > > > --- a/lib/bb/fetch2/git.py > > > +++ b/lib/bb/fetch2/git.py > > > @@ -382,7 +382,7 @@ class Git(FetchMethod): > > > runfetchcmd("%s remote rm origin" % ud.basecmd, d, > workdir=ud.clonedir) > > > > > > runfetchcmd("%s remote add --mirror=fetch origin %s" % > (ud.basecmd, shlex.quote(repourl)), d, workdir=ud.clonedir) > > > - fetch_cmd = "LANG=C %s fetch -f --progress %s > refs/*:refs/*" % (ud.basecmd, shlex.quote(repourl)) > > > + fetch_cmd = "LANG=C %s fetch -f --progress %s > refs/heads/*:refs/heads/* refs/tags/*:refs/tags/*" % (ud.basecmd, > shlex.quote(repourl)) > > > if ud.proto.lower() != 'file': > > > bb.fetch2.check_network_access(d, fetch_cmd, ud.url) > > > progresshandler = GitProgressHandler(d) > > > -- > > > 2.35.1 > > > > Seems like the right thing to do. We use Gerrit, which also has its > > metadata in special refs/ spaces. One repository I tested with grew > > from 3 MB to 35 MB when I fetched using refs/* while another grew > > from 20 MB to 120 MB, so there is definitely space and time to be > > saved by only fetching the refs/heads and refs/tags spaces.... > > As user of Gerrit, I fear this will cause problems. In my case developers > are used to creating test topics and using git hashes in recipes which > are not yet released, e.g. not yet in release branches or tags. This can of > course create problems when such changes end up in real releases. > > Workaround is that developers can create throw away testing branches > and refer to them in recipes. > > From one side this is an improvement to have less data in caches, but on > the other side this adds extra actions to developers who want to test > changes to their recipes. Can't decide which one is more important though > :/ > > Cheers, > > -Mikko > -=-=-=-=-=-=-=-=-=-=-=- > Links: You receive all messages sent to this group. > View/Reply Online (#13910): > https://lists.openembedded.org/g/bitbake-devel/message/13910 > Mute This Topic: https://lists.openembedded.org/mt/93128921/1686489 > Group Owner: bitbake-devel+owner@lists.openembedded.org > Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [ > alex.kanavin@gmail.com] > -=-=-=-=-=-=-=-=-=-=-=- > >
On Mon, Aug 22, 2022 at 08:57:11AM +0200, Alexander Kanavin wrote:
> Can be solved with a parameter to a fetcher perhaps?
Frequently developers know to change the git URL in recipes from
"branch=master" to "nobranch=1" for their test commits.
This could be used for fetching the changes too, to limit the scope.
Cheers,
-Mikko
On 8/22/22 09:38, Mikko.Rapeli@bmw.de wrote: > On Mon, Aug 22, 2022 at 08:57:11AM +0200, Alexander Kanavin wrote: >> Can be solved with a parameter to a fetcher perhaps? > > Frequently developers know to change the git URL in recipes from > "branch=master" to "nobranch=1" for their test commits. > > This could be used for fetching the changes too, to limit the scope. So maybe the easy way out is, if nobranch=1 then fetch everything, else just heads and tags ?
On 8/22/22 10:29, Marek Vasut wrote: > On 8/22/22 09:38, Mikko.Rapeli@bmw.de wrote: >> On Mon, Aug 22, 2022 at 08:57:11AM +0200, Alexander Kanavin wrote: >>> Can be solved with a parameter to a fetcher perhaps? >> >> Frequently developers know to change the git URL in recipes from >> "branch=master" to "nobranch=1" for their test commits. >> >> This could be used for fetching the changes too, to limit the scope. > > So maybe the easy way out is, if nobranch=1 then fetch everything, else > just heads and tags ? No, this won't do, nobranch expects the commit to be in a tag.
On 8/22/22 10:37, Marek Vasut wrote: > On 8/22/22 10:29, Marek Vasut wrote: >> On 8/22/22 09:38, Mikko.Rapeli@bmw.de wrote: >>> On Mon, Aug 22, 2022 at 08:57:11AM +0200, Alexander Kanavin wrote: >>>> Can be solved with a parameter to a fetcher perhaps? >>> >>> Frequently developers know to change the git URL in recipes from >>> "branch=master" to "nobranch=1" for their test commits. >>> >>> This could be used for fetching the changes too, to limit the scope. >> >> So maybe the easy way out is, if nobranch=1 then fetch everything, >> else just heads and tags ? > > No, this won't do, nobranch expects the commit to be in a tag. But then, if gerrit works with nobranch=1, then gerrit must be generating tags which contain the commits you test ? And since this patch fetches refs/tags/ , then this patch won't break the gerrit setup ?
On Mon, 22 Aug 2022 at 10:37, Marek Vasut <marex@denx.de> wrote: > > So maybe the easy way out is, if nobranch=1 then fetch everything, else > > just heads and tags ? > > No, this won't do, nobranch expects the commit to be in a tag. I don't think it expects that. Alex
Hi, On Mon, Aug 22, 2022 at 10:41:23AM +0200, Marek Vasut wrote: > On 8/22/22 10:37, Marek Vasut wrote: > > On 8/22/22 10:29, Marek Vasut wrote: > > > On 8/22/22 09:38, Mikko.Rapeli@bmw.de wrote: > > > > On Mon, Aug 22, 2022 at 08:57:11AM +0200, Alexander Kanavin wrote: > > > > > Can be solved with a parameter to a fetcher perhaps? > > > > > > > > Frequently developers know to change the git URL in recipes from > > > > "branch=master" to "nobranch=1" for their test commits. > > > > > > > > This could be used for fetching the changes too, to limit the scope. > > > > > > So maybe the easy way out is, if nobranch=1 then fetch everything, > > > else just heads and tags ? To me this would be the way to go. > > No, this won't do, nobranch expects the commit to be in a tag. > > But then, if gerrit works with nobranch=1, then gerrit must be generating > tags which contain the commits you test ? > > And since this patch fetches refs/tags/ , then this patch won't break the > gerrit setup ? nobranch=1 works with any branch or tag or open gerrit review commit id. At least with sumo and dunfell. Cheers, -Mikko
On 8/22/22 10:41, Alexander Kanavin wrote: > On Mon, 22 Aug 2022 at 10:37, Marek Vasut <marex@denx.de> wrote: > >>> So maybe the easy way out is, if nobranch=1 then fetch everything, else >>> just heads and tags ? >> >> No, this won't do, nobranch expects the commit to be in a tag. > > I don't think it expects that. Documentation says it does: https://git.openembedded.org/bitbake/tree/lib/bb/fetch2/git.py#n45 " - nobranch Don't check the SHA validation for branch. set this option for the recipe referring to commit which is valid in tag instead of branch. The default is "0", set nobranch=1 if needed. "
On Mon, Aug 22, 2022 at 12:35:08PM +0200, Marek Vasut wrote: > On 8/22/22 10:41, Alexander Kanavin wrote: > > On Mon, 22 Aug 2022 at 10:37, Marek Vasut <marex@denx.de> wrote: > > > > > > So maybe the easy way out is, if nobranch=1 then fetch everything, else > > > > just heads and tags ? > > > > > > No, this won't do, nobranch expects the commit to be in a tag. > > > > I don't think it expects that. > > Documentation says it does: > > https://git.openembedded.org/bitbake/tree/lib/bb/fetch2/git.py#n45 > " > - nobranch > Don't check the SHA validation for branch. set this option for the recipe > referring to commit which is valid in tag instead of branch. > The default is "0", set nobranch=1 if needed. > " Only the first sentence is enforced. The change can still be in a branch, in a tag, in random other namespace as long as the commit is found at checkout time. Cheers, -Mikko
Hi Marek, On 8/22/22 12:35, Marek Vasut wrote: > On 8/22/22 10:41, Alexander Kanavin wrote: >> On Mon, 22 Aug 2022 at 10:37, Marek Vasut <marex@denx.de> wrote: >> >>>> So maybe the easy way out is, if nobranch=1 then fetch everything, else >>>> just heads and tags ? >>> >>> No, this won't do, nobranch expects the commit to be in a tag. >> >> I don't think it expects that. > > Documentation says it does: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.openembedded.org_bitbake_tree_lib_bb_fetch2_git.py-23n45&d=DwICaQ&c=_sEr5x9kUWhuk4_nFwjJtA&r=LYjLexDn7rXIzVmkNPvw5ymA1XTSqHGq8yBP6m6qZZ4njZguQhZhkI_-172IIy1t&m=uZNWVMGEowy_ntO8q5fjINXu3LIe9haqbSTYjwWiqO6Q5sEPsUIx5nw28YTBw6oI&s=_AJ3mkGnnM8peSNP8k6MePZ0RtEkQLo7yS1Cll2yjmc&e= > " > - nobranch > Don't check the SHA validation for branch. set this option for the > recipe > referring to commit which is valid in tag instead of branch. I assume this was meant to give the example of tags which aren't necessarily in a branch (annotated tags or tags of commits not belong to any branch anymore (force-push for example, or branch deletion). The git fetcher does a git log --pretty=oneline -n 1 <hash> when nobranch is set, otherwise git branch --contains <hash> --list <branch> to check whether a commit exists and can be used by bitbake. Considering this check, I assume nobranch=1 is working for any commit that was fetched by the git fetcher? (We need to update the docs to reflect that in that case). Cheers, Quentin > The default is "0", set nobranch=1 if needed. > " > > > -=-=-=-=-=-=-=-=-=-=-=- > Links: You receive all messages sent to this group. > View/Reply Online (#13918): https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.openembedded.org_g_bitbake-2Ddevel_message_13918&d=DwIFaQ&c=_sEr5x9kUWhuk4_nFwjJtA&r=LYjLexDn7rXIzVmkNPvw5ymA1XTSqHGq8yBP6m6qZZ4njZguQhZhkI_-172IIy1t&m=uZNWVMGEowy_ntO8q5fjINXu3LIe9haqbSTYjwWiqO6Q5sEPsUIx5nw28YTBw6oI&s=g4KWyxwbq71V3gbvIJNG-oA9Gdvj3A5wqfz8Kws5qZg&e= > Mute This Topic: https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.openembedded.org_mt_93128921_6293953&d=DwIFaQ&c=_sEr5x9kUWhuk4_nFwjJtA&r=LYjLexDn7rXIzVmkNPvw5ymA1XTSqHGq8yBP6m6qZZ4njZguQhZhkI_-172IIy1t&m=uZNWVMGEowy_ntO8q5fjINXu3LIe9haqbSTYjwWiqO6Q5sEPsUIx5nw28YTBw6oI&s=KMelWJJL5NtG7NWmtiS3jFAONb4GRttyl1ziLzEHhr8&e= > Group Owner: bitbake-devel+owner@lists.openembedded.org > Unsubscribe: https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.openembedded.org_g_bitbake-2Ddevel_unsub&d=DwIFaQ&c=_sEr5x9kUWhuk4_nFwjJtA&r=LYjLexDn7rXIzVmkNPvw5ymA1XTSqHGq8yBP6m6qZZ4njZguQhZhkI_-172IIy1t&m=uZNWVMGEowy_ntO8q5fjINXu3LIe9haqbSTYjwWiqO6Q5sEPsUIx5nw28YTBw6oI&s=nyYO56WzR2jmLa4g95pgXToervc-fJhqbVjnOOUDm0g&e= [quentin.schulz@theobroma-systems.com] > -=-=-=-=-=-=-=-=-=-=-=- >
On 8/22/22 12:57, Quentin Schulz wrote: > Hi Marek, > > On 8/22/22 12:35, Marek Vasut wrote: >> On 8/22/22 10:41, Alexander Kanavin wrote: >>> On Mon, 22 Aug 2022 at 10:37, Marek Vasut <marex@denx.de> wrote: >>> >>>>> So maybe the easy way out is, if nobranch=1 then fetch everything, >>>>> else >>>>> just heads and tags ? >>>> >>>> No, this won't do, nobranch expects the commit to be in a tag. >>> >>> I don't think it expects that. >> >> Documentation says it does: >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.openembedded.org_bitbake_tree_lib_bb_fetch2_git.py-23n45&d=DwICaQ&c=_sEr5x9kUWhuk4_nFwjJtA&r=LYjLexDn7rXIzVmkNPvw5ymA1XTSqHGq8yBP6m6qZZ4njZguQhZhkI_-172IIy1t&m=uZNWVMGEowy_ntO8q5fjINXu3LIe9haqbSTYjwWiqO6Q5sEPsUIx5nw28YTBw6oI&s=_AJ3mkGnnM8peSNP8k6MePZ0RtEkQLo7yS1Cll2yjmc&e= " >> - nobranch >> Don't check the SHA validation for branch. set this option for the >> recipe >> referring to commit which is valid in tag instead of branch. > > I assume this was meant to give the example of tags which aren't > necessarily in a branch (annotated tags or tags of commits not belong to > any branch anymore (force-push for example, or branch deletion). > > The git fetcher does a git log --pretty=oneline -n 1 <hash> when > nobranch is set, otherwise git branch --contains <hash> --list <branch> > to check whether a commit exists and can be used by bitbake. > > Considering this check, I assume nobranch=1 is working for any commit > that was fetched by the git fetcher? > > (We need to update the docs to reflect that in that case). In that case, 'git fetch refs/*' in case nobranch is set and 'refs/head refs/tags' otherwise ?
On Mon, 2022-08-22 at 13:55 +0200, Marek Vasut wrote: > On 8/22/22 12:57, Quentin Schulz wrote: > > Hi Marek, > > > > On 8/22/22 12:35, Marek Vasut wrote: > > > On 8/22/22 10:41, Alexander Kanavin wrote: > > > > On Mon, 22 Aug 2022 at 10:37, Marek Vasut <marex@denx.de> wrote: > > > > > > > > > > So maybe the easy way out is, if nobranch=1 then fetch everything, > > > > > > else > > > > > > just heads and tags ? > > > > > > > > > > No, this won't do, nobranch expects the commit to be in a tag. > > > > > > > > I don't think it expects that. > > > > > > Documentation says it does: > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.openembedded.org_bitbake_tree_lib_bb_fetch2_git.py-23n45&d=DwICaQ&c=_sEr5x9kUWhuk4_nFwjJtA&r=LYjLexDn7rXIzVmkNPvw5ymA1XTSqHGq8yBP6m6qZZ4njZguQhZhkI_-172IIy1t&m=uZNWVMGEowy_ntO8q5fjINXu3LIe9haqbSTYjwWiqO6Q5sEPsUIx5nw28YTBw6oI&s=_AJ3mkGnnM8peSNP8k6MePZ0RtEkQLo7yS1Cll2yjmc&e= " > > > - nobranch > > > Don't check the SHA validation for branch. set this option for the > > > recipe > > > referring to commit which is valid in tag instead of branch. > > > > I assume this was meant to give the example of tags which aren't > > necessarily in a branch (annotated tags or tags of commits not belong to > > any branch anymore (force-push for example, or branch deletion). > > > > The git fetcher does a git log --pretty=oneline -n 1 <hash> when > > nobranch is set, otherwise git branch --contains <hash> --list <branch> > > to check whether a commit exists and can be used by bitbake. > > > > Considering this check, I assume nobranch=1 is working for any commit > > that was fetched by the git fetcher? > > > > (We need to update the docs to reflect that in that case). > > In that case, 'git fetch refs/*' in case nobranch is set and 'refs/head > refs/tags' otherwise ? This does get a bit more complex though since you now need two different mirror tarballs, one for each option. The code can do that if setup correctly but we do need to cover that issue. Cheers, Richard
> -----Original Message----- > From: Richard Purdie <richard.purdie@linuxfoundation.org> > Sent: den 22 augusti 2022 16:17 > To: Marek Vasut <marex@denx.de>; Quentin Schulz <quentin.schulz@theobroma- > systems.com>; Alexander Kanavin <alex.kanavin@gmail.com> > Cc: Mikko Rapeli <Mikko.Rapeli@bmw.de>; Martin Jansa > <Martin.Jansa@gmail.com>; bitbake-devel <bitbake- > devel@lists.openembedded.org>; Peter Kjellerstedt > <peter.kjellerstedt@axis.com> > Subject: Re: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher > from fetching gitlab repository metadata > > On Mon, 2022-08-22 at 13:55 +0200, Marek Vasut wrote: > > On 8/22/22 12:57, Quentin Schulz wrote: > > > Hi Marek, > > > > > > On 8/22/22 12:35, Marek Vasut wrote: > > > > On 8/22/22 10:41, Alexander Kanavin wrote: > > > > > On Mon, 22 Aug 2022 at 10:37, Marek Vasut <marex@denx.de> wrote: > > > > > > > > > > > > So maybe the easy way out is, if nobranch=1 then fetch > everything, > > > > > > > else > > > > > > > just heads and tags ? > > > > > > > > > > > > No, this won't do, nobranch expects the commit to be in a tag. > > > > > > > > > > I don't think it expects that. > > > > > > > > Documentation says it does: > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https- > 3A__git.openembedded.org_bitbake_tree_lib_bb_fetch2_git.py- > 23n45&d=DwICaQ&c=_sEr5x9kUWhuk4_nFwjJtA&r=LYjLexDn7rXIzVmkNPvw5ymA1XTSqHGq > 8yBP6m6qZZ4njZguQhZhkI_- > 172IIy1t&m=uZNWVMGEowy_ntO8q5fjINXu3LIe9haqbSTYjwWiqO6Q5sEPsUIx5nw28YTBw6o > I&s=_AJ3mkGnnM8peSNP8k6MePZ0RtEkQLo7yS1Cll2yjmc&e= " > > > > - nobranch > > > > Don't check the SHA validation for branch. set this option for > the > > > > recipe > > > > referring to commit which is valid in tag instead of branch. > > > > > > I assume this was meant to give the example of tags which aren't > > > necessarily in a branch (annotated tags or tags of commits not belong > to > > > any branch anymore (force-push for example, or branch deletion). > > > > > > The git fetcher does a git log --pretty=oneline -n 1 <hash> when > > > nobranch is set, otherwise git branch --contains <hash> --list > <branch> > > > to check whether a commit exists and can be used by bitbake. > > > > > > Considering this check, I assume nobranch=1 is working for any commit > > > that was fetched by the git fetcher? > > > > > > (We need to update the docs to reflect that in that case). > > > > In that case, 'git fetch refs/*' in case nobranch is set and 'refs/head > > refs/tags' otherwise ? > > This does get a bit more complex though since you now need two > different mirror tarballs, one for each option. The code can do that if > setup correctly but we do need to cover that issue. > > Cheers, > > Richard I made some testing, and for Gerrit to continue to work it would be enough to use: fetch_cmd = "LANG=C %s fetch -f --progress %s refs/heads/*:refs/heads/* refs/tags/*:refs/tags/* refs/changes/*:refs/changes/*" % (ud.basecmd, shlex.quote(repourl)) This should not affect other Git servers and should avoid using different fetch commands depending on the URL. The drawback is of course that for Gerrit, there would be only marginal benefits to this change since the majority of its metadata is in the refs/changes space. However, I wonder if the suggested change actually has any significant effect, given that the initial clone is done using --mirror, which means all refs/ spaces are fetched. If I remove the --mirror option from the clone command the change works as expected, but I have no idea if that has any other significant impact... //Peter
Hi Marek, On Fri, 19 Aug 2022 18:54:55 +0200 "Marek Vasut" <marex@denx.de> wrote: > The bitbake git fetcher currently fetches 'refs/*:refs/*', i.e. every > single object in the remote repository. This works poorly with gitlab > and github, which use the remote git repository to track its metadata > like merge requests, CI pipelines and such. > > Specifically, gitlab generates refs/merge-requests/*, refs/pipelines/* > and refs/keep-around/* and they all contain massive amount of data that > are useless for the bitbake build purposes. The amount of useless data > can in fact be so massive (e.g. with FDO mesa.git repository) that some > proxies may outright terminate the 'git fetch' connection, and make it > appear as if bitbake got stuck on 'git fetch' with no output. > > To avoid fetching all these useless metadata, tweak the git fetcher such > that it only fetches refs/heads/* and refs/tags/* . Avoid using negative > refspecs as those are only available in new git versions. > > Signed-off-by: Marek Vasut <marex@denx.de> Of course this might become irrelevant with whatever implementation will be in v2, however when testing with this patch applied I got the following warning and wonder whether they are related: WARNING: mesa-2_22.1.6-r0 do_package_qa: QA Issue: File /usr/lib/dri/nouveau_dri.so in package mesa-megadriver contains reference to TMPDIR Full log: https://autobuilder.yoctoproject.org/typhoon/#/builders/72/builds/5753/steps/32/logs/stdio
> -----Original Message----- > From: Luca Ceresoli <luca.ceresoli@bootlin.com> > Sent: den 22 augusti 2022 18:03 > To: Marek Vasut <marex@denx.de> > Cc: bitbake-devel@lists.openembedded.org; Martin Jansa > <Martin.Jansa@gmail.com>; Peter Kjellerstedt > <peter.kjellerstedt@axis.com>; Richard Purdie > <richard.purdie@linuxfoundation.org> > Subject: Re: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher > from fetching gitlab repository metadata > > Hi Marek, > > On Fri, 19 Aug 2022 18:54:55 +0200 > "Marek Vasut" <marex@denx.de> wrote: > > > The bitbake git fetcher currently fetches 'refs/*:refs/*', i.e. every > > single object in the remote repository. This works poorly with gitlab > > and github, which use the remote git repository to track its metadata > > like merge requests, CI pipelines and such. > > > > Specifically, gitlab generates refs/merge-requests/*, refs/pipelines/* > > and refs/keep-around/* and they all contain massive amount of data that > > are useless for the bitbake build purposes. The amount of useless data > > can in fact be so massive (e.g. with FDO mesa.git repository) that some > > proxies may outright terminate the 'git fetch' connection, and make it > > appear as if bitbake got stuck on 'git fetch' with no output. > > > > To avoid fetching all these useless metadata, tweak the git fetcher such > > that it only fetches refs/heads/* and refs/tags/* . Avoid using negative > > refspecs as those are only available in new git versions. > > > > Signed-off-by: Marek Vasut <marex@denx.de> > > Of course this might become irrelevant with whatever implementation > will be in v2, however when testing with this patch applied I got the > following warning and wonder whether they are related: > > WARNING: mesa-2_22.1.6-r0 do_package_qa: QA Issue: File /usr/lib/dri/nouveau_dri.so in package mesa-megadriver contains reference to TMPDIR I cannot see any reason how they can be related. > > Full log: > https://autobuilder.yoctoproject.org/typhoon/#/builders/72/builds/5753/ste ps/32/logs/stdio > > -- > Luca Ceresoli, Bootlin > Embedded Linux and Kernel engineering > https://bootlin.com //Peter
On Mon, 2022-08-22 at 18:02 +0200, Luca Ceresoli wrote: > Hi Marek, > > On Fri, 19 Aug 2022 18:54:55 +0200 > "Marek Vasut" <marex@denx.de> wrote: > > > The bitbake git fetcher currently fetches 'refs/*:refs/*', i.e. every > > single object in the remote repository. This works poorly with gitlab > > and github, which use the remote git repository to track its metadata > > like merge requests, CI pipelines and such. > > > > Specifically, gitlab generates refs/merge-requests/*, refs/pipelines/* > > and refs/keep-around/* and they all contain massive amount of data that > > are useless for the bitbake build purposes. The amount of useless data > > can in fact be so massive (e.g. with FDO mesa.git repository) that some > > proxies may outright terminate the 'git fetch' connection, and make it > > appear as if bitbake got stuck on 'git fetch' with no output. > > > > To avoid fetching all these useless metadata, tweak the git fetcher such > > that it only fetches refs/heads/* and refs/tags/* . Avoid using negative > > refspecs as those are only available in new git versions. > > > > Signed-off-by: Marek Vasut <marex@denx.de> > > Of course this might become irrelevant with whatever implementation > will be in v2, however when testing with this patch applied I got the > following warning and wonder whether they are related: > > WARNING: mesa-2_22.1.6-r0 do_package_qa: QA Issue: File /usr/lib/dri/nouveau_dri.so in package mesa-megadriver contains reference to TMPDIR > > Full log: > https://autobuilder.yoctoproject.org/typhoon/#/builders/72/builds/5753/steps/32/logs/stdio > This is a known issue with an open bug assigned to me (unfortunately), it isn't related. It is intermittent as it is llvm-native related and we don't commonly rebuild this codepath. Cheers, Richard
On 8/22/22 17:21, Peter Kjellerstedt wrote: >> -----Original Message----- >> From: Richard Purdie <richard.purdie@linuxfoundation.org> >> Sent: den 22 augusti 2022 16:17 >> To: Marek Vasut <marex@denx.de>; Quentin Schulz <quentin.schulz@theobroma- >> systems.com>; Alexander Kanavin <alex.kanavin@gmail.com> >> Cc: Mikko Rapeli <Mikko.Rapeli@bmw.de>; Martin Jansa >> <Martin.Jansa@gmail.com>; bitbake-devel <bitbake- >> devel@lists.openembedded.org>; Peter Kjellerstedt >> <peter.kjellerstedt@axis.com> >> Subject: Re: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher >> from fetching gitlab repository metadata >> >> On Mon, 2022-08-22 at 13:55 +0200, Marek Vasut wrote: >>> On 8/22/22 12:57, Quentin Schulz wrote: >>>> Hi Marek, >>>> >>>> On 8/22/22 12:35, Marek Vasut wrote: >>>>> On 8/22/22 10:41, Alexander Kanavin wrote: >>>>>> On Mon, 22 Aug 2022 at 10:37, Marek Vasut <marex@denx.de> wrote: >>>>>> >>>>>>>> So maybe the easy way out is, if nobranch=1 then fetch >> everything, >>>>>>>> else >>>>>>>> just heads and tags ? >>>>>>> >>>>>>> No, this won't do, nobranch expects the commit to be in a tag. >>>>>> >>>>>> I don't think it expects that. >>>>> >>>>> Documentation says it does: >>>>> >>>>> https://urldefense.proofpoint.com/v2/url?u=https- >> 3A__git.openembedded.org_bitbake_tree_lib_bb_fetch2_git.py- >> 23n45&d=DwICaQ&c=_sEr5x9kUWhuk4_nFwjJtA&r=LYjLexDn7rXIzVmkNPvw5ymA1XTSqHGq >> 8yBP6m6qZZ4njZguQhZhkI_- >> 172IIy1t&m=uZNWVMGEowy_ntO8q5fjINXu3LIe9haqbSTYjwWiqO6Q5sEPsUIx5nw28YTBw6o >> I&s=_AJ3mkGnnM8peSNP8k6MePZ0RtEkQLo7yS1Cll2yjmc&e= " >>>>> - nobranch >>>>> Don't check the SHA validation for branch. set this option for >> the >>>>> recipe >>>>> referring to commit which is valid in tag instead of branch. >>>> >>>> I assume this was meant to give the example of tags which aren't >>>> necessarily in a branch (annotated tags or tags of commits not belong >> to >>>> any branch anymore (force-push for example, or branch deletion). >>>> >>>> The git fetcher does a git log --pretty=oneline -n 1 <hash> when >>>> nobranch is set, otherwise git branch --contains <hash> --list >> <branch> >>>> to check whether a commit exists and can be used by bitbake. >>>> >>>> Considering this check, I assume nobranch=1 is working for any commit >>>> that was fetched by the git fetcher? >>>> >>>> (We need to update the docs to reflect that in that case). >>> >>> In that case, 'git fetch refs/*' in case nobranch is set and 'refs/head >>> refs/tags' otherwise ? >> >> This does get a bit more complex though since you now need two >> different mirror tarballs, one for each option. The code can do that if >> setup correctly but we do need to cover that issue. >> >> Cheers, >> >> Richard > > I made some testing, and for Gerrit to continue to work it would be > enough to use: > > fetch_cmd = "LANG=C %s fetch -f --progress %s refs/heads/*:refs/heads/* refs/tags/*:refs/tags/* refs/changes/*:refs/changes/*" % (ud.basecmd, shlex.quote(repourl)) > > This should not affect other Git servers and should avoid using > different fetch commands depending on the URL. The drawback is of > course that for Gerrit, there would be only marginal benefits to > this change since the majority of its metadata is in the > refs/changes space. > > However, I wonder if the suggested change actually has any significant > effect, given that the initial clone is done using --mirror, which means > all refs/ spaces are fetched. If I remove the --mirror option from the > clone command the change works as expected, but I have no idea if that > has any other significant impact... With this change, I am able to actually fetch mesa from gitlab.freedesktop.org without local CI proxy terminating the connection in the process. So yes, it does have effect.
Hi Richard, Peter, On Mon, 22 Aug 2022 17:07:50 +0100 "Richard Purdie" <richard.purdie@linuxfoundation.org> wrote: > On Mon, 2022-08-22 at 18:02 +0200, Luca Ceresoli wrote: > > Hi Marek, > > > > On Fri, 19 Aug 2022 18:54:55 +0200 > > "Marek Vasut" <marex@denx.de> wrote: > > > > > The bitbake git fetcher currently fetches 'refs/*:refs/*', i.e. every > > > single object in the remote repository. This works poorly with gitlab > > > and github, which use the remote git repository to track its metadata > > > like merge requests, CI pipelines and such. > > > > > > Specifically, gitlab generates refs/merge-requests/*, refs/pipelines/* > > > and refs/keep-around/* and they all contain massive amount of data that > > > are useless for the bitbake build purposes. The amount of useless data > > > can in fact be so massive (e.g. with FDO mesa.git repository) that some > > > proxies may outright terminate the 'git fetch' connection, and make it > > > appear as if bitbake got stuck on 'git fetch' with no output. > > > > > > To avoid fetching all these useless metadata, tweak the git fetcher such > > > that it only fetches refs/heads/* and refs/tags/* . Avoid using negative > > > refspecs as those are only available in new git versions. > > > > > > Signed-off-by: Marek Vasut <marex@denx.de> > > > > Of course this might become irrelevant with whatever implementation > > will be in v2, however when testing with this patch applied I got the > > following warning and wonder whether they are related: > > > > WARNING: mesa-2_22.1.6-r0 do_package_qa: QA Issue: File /usr/lib/dri/nouveau_dri.so in package mesa-megadriver contains reference to TMPDIR > > > > Full log: > > https://autobuilder.yoctoproject.org/typhoon/#/builders/72/builds/5753/steps/32/logs/stdio > > > > This is a known issue with an open bug assigned to me (unfortunately), > it isn't related. It is intermittent as it is llvm-native related and > we don't commonly rebuild this codepath. Indeed, added to https://bugzilla.yoctoproject.org/show_bug.cgi?id=14897 Thanks for the hint and apologies for the noise.
On 8/22/22 18:39, Marek Vasut wrote: Hi, [...] >> I made some testing, and for Gerrit to continue to work it would be >> enough to use: >> >> fetch_cmd = "LANG=C %s fetch -f --progress %s >> refs/heads/*:refs/heads/* refs/tags/*:refs/tags/* >> refs/changes/*:refs/changes/*" % (ud.basecmd, shlex.quote(repourl)) >> >> This should not affect other Git servers and should avoid using >> different fetch commands depending on the URL. The drawback is of >> course that for Gerrit, there would be only marginal benefits to >> this change since the majority of its metadata is in the >> refs/changes space. >> >> However, I wonder if the suggested change actually has any significant >> effect, given that the initial clone is done using --mirror, which means >> all refs/ spaces are fetched. If I remove the --mirror option from the >> clone command the change works as expected, but I have no idea if that >> has any other significant impact... > > With this change, I am able to actually fetch mesa from > gitlab.freedesktop.org without local CI proxy terminating the connection > in the process. So yes, it does have effect. I keep running into this problem with mesa, how can we proceed to fix it upstream ?
On Thu, 2022-09-01 at 19:50 +0200, Marek Vasut wrote: > On 8/22/22 18:39, Marek Vasut wrote: > > Hi, > > [...] > > > > I made some testing, and for Gerrit to continue to work it would be > > > enough to use: > > > > > > fetch_cmd = "LANG=C %s fetch -f --progress %s > > > refs/heads/*:refs/heads/* refs/tags/*:refs/tags/* > > > refs/changes/*:refs/changes/*" % (ud.basecmd, shlex.quote(repourl)) > > > > > > This should not affect other Git servers and should avoid using > > > different fetch commands depending on the URL. The drawback is of > > > course that for Gerrit, there would be only marginal benefits to > > > this change since the majority of its metadata is in the > > > refs/changes space. > > > > > > However, I wonder if the suggested change actually has any significant > > > effect, given that the initial clone is done using --mirror, which means > > > all refs/ spaces are fetched. If I remove the --mirror option from the > > > clone command the change works as expected, but I have no idea if that > > > has any other significant impact... > > > > With this change, I am able to actually fetch mesa from > > gitlab.freedesktop.org without local CI proxy terminating the connection > > in the process. So yes, it does have effect. > > I keep running into this problem with mesa, how can we proceed to fix it > upstream ? We probably need a version of the patch which restricts by default but allows it restriction to be turned off on a per url basis with a parameter. That restriction needs to be reflected in the mirror tarball name too. Cheers, Richard
diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py index 4534bd75..b5fc0a51 100644 --- a/lib/bb/fetch2/git.py +++ b/lib/bb/fetch2/git.py @@ -382,7 +382,7 @@ class Git(FetchMethod): runfetchcmd("%s remote rm origin" % ud.basecmd, d, workdir=ud.clonedir) runfetchcmd("%s remote add --mirror=fetch origin %s" % (ud.basecmd, shlex.quote(repourl)), d, workdir=ud.clonedir) - fetch_cmd = "LANG=C %s fetch -f --progress %s refs/*:refs/*" % (ud.basecmd, shlex.quote(repourl)) + fetch_cmd = "LANG=C %s fetch -f --progress %s refs/heads/*:refs/heads/* refs/tags/*:refs/tags/*" % (ud.basecmd, shlex.quote(repourl)) if ud.proto.lower() != 'file': bb.fetch2.check_network_access(d, fetch_cmd, ud.url) progresshandler = GitProgressHandler(d)
The bitbake git fetcher currently fetches 'refs/*:refs/*', i.e. every single object in the remote repository. This works poorly with gitlab and github, which use the remote git repository to track its metadata like merge requests, CI pipelines and such. Specifically, gitlab generates refs/merge-requests/*, refs/pipelines/* and refs/keep-around/* and they all contain massive amount of data that are useless for the bitbake build purposes. The amount of useless data can in fact be so massive (e.g. with FDO mesa.git repository) that some proxies may outright terminate the 'git fetch' connection, and make it appear as if bitbake got stuck on 'git fetch' with no output. To avoid fetching all these useless metadata, tweak the git fetcher such that it only fetches refs/heads/* and refs/tags/* . Avoid using negative refspecs as those are only available in new git versions. Signed-off-by: Marek Vasut <marex@denx.de> --- Cc: Martin Jansa <Martin.Jansa@gmail.com> Cc: Peter Kjellerstedt <peter.kjellerstedt@axis.com> Cc: Richard Purdie <richard.purdie@linuxfoundation.org> --- lib/bb/fetch2/git.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)