diff mbox series

[bitbake-devel,1/3] fetch2/git.py: fix a corner case in try_premirror

Message ID 20240201033100.3617421-1-Qi.Chen@windriver.com
State New
Headers show
Series [bitbake-devel,1/3] fetch2/git.py: fix a corner case in try_premirror | expand

Commit Message

ChenQi Feb. 1, 2024, 3:30 a.m. UTC
From: Chen Qi <Qi.Chen@windriver.com>

For gitsm recipes, it's possible that some URL is used more than once.
e.g.,
A -> B:rev1 (B is a submodule of A)
A -> C (C is a submodule of A)
C -> B:rev2 (B is a submodule of C)
A anc C are both using B as submodules, but on different revs.
Now if we have:
B:rev1 -> D
B:rev2 -> E
Then, the mirror will not be fully used.
Say we have all repo mirrors for A, B, C, D, E, then in theory it's not
necessary to reach out to any network for downloading. But it's not the
case. After downloading B(rev1) and its submodule D from mirrors, the fetch
process continues to download C, thus B(rev2) and E. Now it finds that B
needs an update because its submodule E needs an update. Of course this is
true because E is not downloaded yet. Now the problem comes to whether to
use mirror or not. The git.py defines try_premirror to return 'False' when
the ud.clonedir exists. As B has been cloned, the ud.clonedir exists and
try_mirror returns False, resulting in not using mirror and going to upstream
directly.

We can see that the mirrors are not fully used. This is usually not problem,
as the cost is only some network download. But in case the following two
settings are there, we get errors.
BB_NO_NETWORK = "0"
BB_ALLOWED_NETWORKS = "*.some.allowed.domain"
In such case, the gitsm recipe A will fail to fetch. Note that all contents
that A needs are in mirrors and now it's failing to fetch. This is unexpected.

Note that the different revs of the same repo in gitsm recipe is not the only
way to reveal this problem. For example, there might be a recipe call B that
uses B:rev3. Check the protobuf and grpc recipes as an example.

For now, we can use the following steps to reproduce this issue. To be clear,
the grpc recipe in meta-oe is now 1.60.0.
1. Add in local.conf:
   DL_DIR = "${TOPDIR}/downloads-premirror"
   bitbake grpc -c fetch
2. Comment out the DL_DIR setting in local.conf and add the following lines:
   PREMIRRORS:append = " \
     git://.*/.* git://${TOPDIR}/downloads-premirror/git2/MIRRORNAME;protocol=file \n \
     gitsm://.*/.* gitsm://${TOPDIR}/downloads-premirror/git2/MIRRORNAME;protocol=file \n \
   "
3. Set BB_NO_NETWORK = "1" and then 'bitbake grpc -c fetch'.
   This command succeeds and this shows that the premirror holds everything we need.
4. Add the following lines and then 'bitbake grpc -c fetch'.
   BB_NO_NETWORK = "0"
   BB_ALLOWED_NETWORKS = "*.some.domain"

After step 4, the error message is as below:
ERROR: grpc-1.60.0-r0 do_fetch: The URL: 'gitsm://github.com/protocolbuffers/protobuf.git;protocol=https;name=third_party/protobuf;subpath=third_party/protobuf;nobranch=1;lfs=True;bareclone=1;nobranch=1' is not trusted and cannot be used

This patch fixes this problem by handling this corner case, that is, if the URL is
not trusted from the settings of BB_NO_NETWORK and BB_ALLOWED_NETWORKS, then we should
try premirrors because trying to reach upstream is destined to fail.

Signed-off-by: Chen Qi <Qi.Chen@windriver.com>
---
 bitbake/lib/bb/fetch2/git.py | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

ChenQi Feb. 18, 2024, 2:30 a.m. UTC | #1
ping

On 2/1/24 11:30, Chen Qi via lists.openembedded.org wrote:
> From: Chen Qi <Qi.Chen@windriver.com>
>
> For gitsm recipes, it's possible that some URL is used more than once.
> e.g.,
> A -> B:rev1 (B is a submodule of A)
> A -> C (C is a submodule of A)
> C -> B:rev2 (B is a submodule of C)
> A anc C are both using B as submodules, but on different revs.
> Now if we have:
> B:rev1 -> D
> B:rev2 -> E
> Then, the mirror will not be fully used.
> Say we have all repo mirrors for A, B, C, D, E, then in theory it's not
> necessary to reach out to any network for downloading. But it's not the
> case. After downloading B(rev1) and its submodule D from mirrors, the fetch
> process continues to download C, thus B(rev2) and E. Now it finds that B
> needs an update because its submodule E needs an update. Of course this is
> true because E is not downloaded yet. Now the problem comes to whether to
> use mirror or not. The git.py defines try_premirror to return 'False' when
> the ud.clonedir exists. As B has been cloned, the ud.clonedir exists and
> try_mirror returns False, resulting in not using mirror and going to upstream
> directly.
>
> We can see that the mirrors are not fully used. This is usually not problem,
> as the cost is only some network download. But in case the following two
> settings are there, we get errors.
> BB_NO_NETWORK = "0"
> BB_ALLOWED_NETWORKS = "*.some.allowed.domain"
> In such case, the gitsm recipe A will fail to fetch. Note that all contents
> that A needs are in mirrors and now it's failing to fetch. This is unexpected.
>
> Note that the different revs of the same repo in gitsm recipe is not the only
> way to reveal this problem. For example, there might be a recipe call B that
> uses B:rev3. Check the protobuf and grpc recipes as an example.
>
> For now, we can use the following steps to reproduce this issue. To be clear,
> the grpc recipe in meta-oe is now 1.60.0.
> 1. Add in local.conf:
>     DL_DIR = "${TOPDIR}/downloads-premirror"
>     bitbake grpc -c fetch
> 2. Comment out the DL_DIR setting in local.conf and add the following lines:
>     PREMIRRORS:append = " \
>       git://.*/.* git://${TOPDIR}/downloads-premirror/git2/MIRRORNAME;protocol=file \n \
>       gitsm://.*/.* gitsm://${TOPDIR}/downloads-premirror/git2/MIRRORNAME;protocol=file \n \
>     "
> 3. Set BB_NO_NETWORK = "1" and then 'bitbake grpc -c fetch'.
>     This command succeeds and this shows that the premirror holds everything we need.
> 4. Add the following lines and then 'bitbake grpc -c fetch'.
>     BB_NO_NETWORK = "0"
>     BB_ALLOWED_NETWORKS = "*.some.domain"
>
> After step 4, the error message is as below:
> ERROR: grpc-1.60.0-r0 do_fetch: The URL: 'gitsm://github.com/protocolbuffers/protobuf.git;protocol=https;name=third_party/protobuf;subpath=third_party/protobuf;nobranch=1;lfs=True;bareclone=1;nobranch=1' is not trusted and cannot be used
>
> This patch fixes this problem by handling this corner case, that is, if the URL is
> not trusted from the settings of BB_NO_NETWORK and BB_ALLOWED_NETWORKS, then we should
> try premirrors because trying to reach upstream is destined to fail.
>
> Signed-off-by: Chen Qi <Qi.Chen@windriver.com>
> ---
>   bitbake/lib/bb/fetch2/git.py | 6 ++++++
>   1 file changed, 6 insertions(+)
>
> diff --git a/bitbake/lib/bb/fetch2/git.py b/bitbake/lib/bb/fetch2/git.py
> index 0deeb5cee1..322a9366da 100644
> --- a/bitbake/lib/bb/fetch2/git.py
> +++ b/bitbake/lib/bb/fetch2/git.py
> @@ -87,6 +87,7 @@ from contextlib import contextmanager
>   from   bb.fetch2 import FetchMethod
>   from   bb.fetch2 import runfetchcmd
>   from   bb.fetch2 import logger
> +from   bb.fetch2 import trusted_network
>   
>   
>   sha1_re = re.compile(r'^[0-9a-f]{40}$')
> @@ -355,6 +356,11 @@ class Git(FetchMethod):
>           # is not possible
>           if bb.utils.to_boolean(d.getVar("BB_FETCH_PREMIRRORONLY")):
>               return True
> +        # If the url is not in trusted network, that is, BB_NO_NETWORK is set to 0
> +        # and BB_ALLOWED_NETWORKS does not contain the host that ud.url uses, then
> +        # we need to try premirrors first as using upstream is destined to fail.
> +        if not trusted_network(d, ud.url):
> +            return True
>           if os.path.exists(ud.clonedir):
>               return False
>           return True
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#15817): https://lists.openembedded.org/g/bitbake-devel/message/15817
> Mute This Topic: https://lists.openembedded.org/mt/104090271/7304865
> Group Owner: bitbake-devel+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [Qi.Chen@eng.windriver.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
diff mbox series

Patch

diff --git a/bitbake/lib/bb/fetch2/git.py b/bitbake/lib/bb/fetch2/git.py
index 0deeb5cee1..322a9366da 100644
--- a/bitbake/lib/bb/fetch2/git.py
+++ b/bitbake/lib/bb/fetch2/git.py
@@ -87,6 +87,7 @@  from contextlib import contextmanager
 from   bb.fetch2 import FetchMethod
 from   bb.fetch2 import runfetchcmd
 from   bb.fetch2 import logger
+from   bb.fetch2 import trusted_network
 
 
 sha1_re = re.compile(r'^[0-9a-f]{40}$')
@@ -355,6 +356,11 @@  class Git(FetchMethod):
         # is not possible
         if bb.utils.to_boolean(d.getVar("BB_FETCH_PREMIRRORONLY")):
             return True
+        # If the url is not in trusted network, that is, BB_NO_NETWORK is set to 0
+        # and BB_ALLOWED_NETWORKS does not contain the host that ud.url uses, then
+        # we need to try premirrors first as using upstream is destined to fail.
+        if not trusted_network(d, ud.url):
+            return True
         if os.path.exists(ud.clonedir):
             return False
         return True