diff mbox series

fetch2/gitsm.py: handle spaces in git repository paths

Message ID 20240227173527.726401-1-cordlandwehr@kde.org
State New
Headers show
Series fetch2/gitsm.py: handle spaces in git repository paths | expand

Commit Message

Andreas Cord-Landwehr Feb. 27, 2024, 5:35 p.m. UTC
Handle encoding of spaces for the obscure but real-world scenario of
having a git repository with submodules, where the urls to submodules
contain spaces, eg. https://example.com/foo/repo%20with%20space.git

Git submodules are referenced with single escaping, yet the fetcher
requires double-escaping of spaces.

Signed-off-by: Andreas Cord-Landwehr <cordlandwehr@kde.org>
---
 bitbake/lib/bb/fetch2/gitsm.py | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Richard Purdie Feb. 28, 2024, 9:21 a.m. UTC | #1
On Tue, 2024-02-27 at 18:35 +0100, Andreas Cord-Landwehr wrote:
> Handle encoding of spaces for the obscure but real-world scenario of
> having a git repository with submodules, where the urls to submodules
> contain spaces, eg. https://example.com/foo/repo%20with%20space.git
> 
> Git submodules are referenced with single escaping, yet the fetcher
> requires double-escaping of spaces.
> 
> Signed-off-by: Andreas Cord-Landwehr <cordlandwehr@kde.org>
> ---
>  bitbake/lib/bb/fetch2/gitsm.py | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/bitbake/lib/bb/fetch2/gitsm.py b/bitbake/lib/bb/fetch2/gitsm.py
> index f7f3af7212..e37be055eb 100644
> --- a/bitbake/lib/bb/fetch2/gitsm.py
> +++ b/bitbake/lib/bb/fetch2/gitsm.py
> @@ -23,6 +23,7 @@ import copy
>  import shutil
>  import tempfile
>  from   bb.fetch2.git import Git
> +from   bb.fetch2 import URI
>  from   bb.fetch2 import runfetchcmd
>  from   bb.fetch2 import logger
>  from   bb.fetch2 import Fetch
> @@ -93,6 +94,11 @@ class GitSM(Git):
>                      newud.path = os.path.normpath(os.path.join(newud.path, uris[m]))
>                      uris[m] = Git._get_repo_url(self, newud)
>  
> +                # Double-encode before handing over to git to handle spaces in URLs
> +                double_encoded_uri = URI(uris[m])
> +                double_encoded_uri.path = double_encoded_uri.path_quoted
> +                uris[m] = str(double_encoded_uri)
> +
>          for module in submodules:
>              # Translate the module url into a SRC_URI

Thanks for the patch, I think this may need some work though.

Double encoding feels like a workaround for the issue and this probably
needs a different fix where we only decode the data once. That is
likely more invasive but ultimately more correct else I worry we'd have
to triple encode second level submodules or other horrible things.

Also is there a testcase we could add to bitbake-selftest to cover
this? The fetcher is tricky to work on due to all the corner cases so
we do try to cover them in the testsuite.

Cheers,

Richard
Andreas Cord-Landwehr Feb. 28, 2024, 6:30 p.m. UTC | #2
Hi Richard, thank you for the fast response!
At the moment I only have access to a closed server (AzureDevOps based) which allows creation of repositories with spaces. For GitHub eg. I checked that they replace spaces with a hyphen and so avoid any problems. Probably a simple rewrite rule may work though...? But I do not have a stable server at hand that I could offer as a permanent source for tests. But I will check if I can find a solution.

Regarding the issue itself. Already the git fetcher requires %20 spaces to be encoded as %2520 in SRC_URI statements. So I expected that URLs are always expected to be double encoded. Is that not the case?

Best regards,
Andreas

Am 28. Februar 2024 10:21:19 MEZ schrieb Richard Purdie <richard.purdie@linuxfoundation.org>:
>On Tue, 2024-02-27 at 18:35 +0100, Andreas Cord-Landwehr wrote:
>> Handle encoding of spaces for the obscure but real-world scenario of
>> having a git repository with submodules, where the urls to submodules
>> contain spaces, eg. https://example.com/foo/repo%20with%20space.git
>> 
>> Git submodules are referenced with single escaping, yet the fetcher
>> requires double-escaping of spaces.
>> 
>> Signed-off-by: Andreas Cord-Landwehr <cordlandwehr@kde.org>
>> ---
>>  bitbake/lib/bb/fetch2/gitsm.py | 6 ++++++
>>  1 file changed, 6 insertions(+)
>> 
>> diff --git a/bitbake/lib/bb/fetch2/gitsm.py b/bitbake/lib/bb/fetch2/gitsm.py
>> index f7f3af7212..e37be055eb 100644
>> --- a/bitbake/lib/bb/fetch2/gitsm.py
>> +++ b/bitbake/lib/bb/fetch2/gitsm.py
>> @@ -23,6 +23,7 @@ import copy
>>  import shutil
>>  import tempfile
>>  from   bb.fetch2.git import Git
>> +from   bb.fetch2 import URI
>>  from   bb.fetch2 import runfetchcmd
>>  from   bb.fetch2 import logger
>>  from   bb.fetch2 import Fetch
>> @@ -93,6 +94,11 @@ class GitSM(Git):
>>                      newud.path = os.path.normpath(os.path.join(newud.path, uris[m]))
>>                      uris[m] = Git._get_repo_url(self, newud)
>>  
>> +                # Double-encode before handing over to git to handle spaces in URLs
>> +                double_encoded_uri = URI(uris[m])
>> +                double_encoded_uri.path = double_encoded_uri.path_quoted
>> +                uris[m] = str(double_encoded_uri)
>> +
>>          for module in submodules:
>>              # Translate the module url into a SRC_URI
>
>Thanks for the patch, I think this may need some work though.
>
>Double encoding feels like a workaround for the issue and this probably
>needs a different fix where we only decode the data once. That is
>likely more invasive but ultimately more correct else I worry we'd have
>to triple encode second level submodules or other horrible things.
>
>Also is there a testcase we could add to bitbake-selftest to cover
>this? The fetcher is tricky to work on due to all the corner cases so
>we do try to cover them in the testsuite.
>
>Cheers,
>
>Richard
>
>
>
diff mbox series

Patch

diff --git a/bitbake/lib/bb/fetch2/gitsm.py b/bitbake/lib/bb/fetch2/gitsm.py
index f7f3af7212..e37be055eb 100644
--- a/bitbake/lib/bb/fetch2/gitsm.py
+++ b/bitbake/lib/bb/fetch2/gitsm.py
@@ -23,6 +23,7 @@  import copy
 import shutil
 import tempfile
 from   bb.fetch2.git import Git
+from   bb.fetch2 import URI
 from   bb.fetch2 import runfetchcmd
 from   bb.fetch2 import logger
 from   bb.fetch2 import Fetch
@@ -93,6 +94,11 @@  class GitSM(Git):
                     newud.path = os.path.normpath(os.path.join(newud.path, uris[m]))
                     uris[m] = Git._get_repo_url(self, newud)
 
+                # Double-encode before handing over to git to handle spaces in URLs
+                double_encoded_uri = URI(uris[m])
+                double_encoded_uri.path = double_encoded_uri.path_quoted
+                uris[m] = str(double_encoded_uri)
+
         for module in submodules:
             # Translate the module url into a SRC_URI