diff mbox series

[bitbake-devel,v2] fetch2: git: Check if clone directory is a git repo

Message ID 20230824195332.851528-1-JPEWhacker@gmail.com
State New
Headers show
Series [bitbake-devel,v2] fetch2: git: Check if clone directory is a git repo | expand

Commit Message

Joshua Watt Aug. 24, 2023, 7:53 p.m. UTC
If the clone target directory exists and is a valid git repo, but the
git directory is not a child, it needs to be erased and re-cloned. One
example of how this can happen is if a clone creates the directory, but
then fails to actual clone and make it a git repository. This left-over
directory can be particularly problematic if the download directory is a
descent of some top layer git repo (e.g. the default with poky), as the
commands that operate on the remote later will then mangle the layers
git repository instead of the download git repo.

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/fetch2/git.py | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

Comments

Paulo Neves Aug. 25, 2023, 9:18 a.m. UTC | #1
On 24/08/2023 21:53, Joshua Watt wrote:
> If the clone target directory exists and is a valid git repo, but the
> git directory is not a child, it needs to be erased and re-cloned. One
> example of how this can happen is if a clone creates the directory, but
> then fails to actual clone and make it a git repository. This left-over
> directory can be particularly problematic if the download directory is a
> descent of some top layer git repo (e.g. the default with poky), as the
> commands that operate on the remote later will then mangle the layers
> git repository instead of the download git repo.
>
> Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> ---
>   bitbake/lib/bb/fetch2/git.py | 30 +++++++++++++++++++++++++++++-
>   1 file changed, 29 insertions(+), 1 deletion(-)
>
> diff --git a/bitbake/lib/bb/fetch2/git.py b/bitbake/lib/bb/fetch2/git.py
> index 2a3c06fe4e9..33895e09b29 100644
> --- a/bitbake/lib/bb/fetch2/git.py
> +++ b/bitbake/lib/bb/fetch2/git.py
> @@ -65,6 +65,7 @@ import fnmatch
>   import os
>   import re
>   import shlex
> +import shutil
>   import subprocess
>   import tempfile
>   import bb
> @@ -365,8 +366,35 @@ class Git(FetchMethod):
>                   runfetchcmd(fetch_cmd, d, workdir=ud.clonedir)
>           repourl = self._get_repo_url(ud)
>
> +        needs_clone = False
> +        if os.path.exists(ud.clonedir):
> +            # The directory may exist, but not be the top level of a bare git
> +            # repository in which case it needs to be deleted and re-cloned.
> +            try:
> +                # Since clones can be bare, use --absolute-git-dir instead of --show-toplevel
> +                output = runfetchcmd("LANG=C %s rev-parse --absolute-git-dir" % ud.basecmd, d, workdir=ud.clonedir)
> +            except bb.fetch2.FetchError as e:
> +                logger.warning("Unable to get top level for %s (not a git directory?): %s", ud.clonedir, e)
> +                needs_clone = True
> +            else:
> +                toplevel = os.path.abspath(output.rstrip())
> +                abs_clonedir = os.path.abspath(ud.clonedir).rstrip('/')
> +                # The top level Git directory must either be the clone directory
> +                # or a child of the clone directory. Any ancestor directory of
> +                # the clone directory is not valid as the Git directory (and
> +                # probably belongs to some other unrelated repository), so a
> +                # clone is required
> +                if os.path.commonprefix([abs_clonedir, toplevel]) != abs_clonedir:
> +                    logger.warning("Top level directory '%s' doesn't match expected '%s'. Re-cloning", toplevel, ud.clonedir)
> +                    needs_clone = True
> +
> +            if needs_clone:
> +                shutil.rmtree(ud.clonedir)
> +        else:
> +            needs_clone = True
> +
>           # If the repo still doesn't exist, fallback to cloning it
> -        if not os.path.exists(ud.clonedir):
> +        if needs_clone:
>               # We do this since git will use a "-l" option automatically for local urls where possible,
>               # but it doesn't work when git/objects is a symlink, only works when it is a directory.
>               if repourl.startswith("file://"):
> --
> 2.33.0
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#14979): https://lists.openembedded.org/g/bitbake-devel/message/14979
> Mute This Topic: https://lists.openembedded.org/mt/100942876/4454782
> Group Owner: bitbake-devel+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [paulo@myneves.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
Is it possible to factor the try/except else to a function? Then we 
would not need the else, and would just return immediately and have a 
self contained code, which in my opinion is easier to read. I also think 
f-string would be more readable for runfetchcmd. f-string are available 
since python 3.6.

Can you share the rationale for the else clause? I do not see anything 
inside the else clause that might raise a FetchError, thus the else 
clause is redundant.

Another change I suggest is to move the rmtree inside the usual 
need_clone and ignore the error if the path does not exist. Removes 
another conditional.

I also think using needs_clone to assert whether the rmtree should be 
done is confusing A different variable should be used or use a 
function's return value. All the code inside if 
os.path.exists(ud.clonedir): deals with the need for removal and as a 
side effect will trigger the clone in another part of the code.
Paulo Neves Aug. 25, 2023, 9:33 a.m. UTC | #2
On 25/08/2023 11:18, Paulo Neves wrote:

> On 24/08/2023 21:53, Joshua Watt wrote:
>
>> If the clone target directory exists and is a valid git repo, but the
>> git directory is not a child, it needs to be erased and re-cloned. One
>> example of how this can happen is if a clone creates the directory, but
>> then fails to actual clone and make it a git repository. This left-over
>> directory can be particularly problematic if the download directory is a
>> descent of some top layer git repo (e.g. the default with poky), as the
>> commands that operate on the remote later will then mangle the layers
>> git repository instead of the download git repo.
>>
>> Signed-off-by: Joshua Watt
>> [<JPEWhacker@gmail.com>](mailto:JPEWhacker@gmail.com)
>> ---
>>   bitbake/lib/bb/fetch2/git.py | 30 +++++++++++++++++++++++++++++-
>>   1 file changed, 29 insertions(+), 1 deletion(-)
>>
>> diff --git a/bitbake/lib/bb/fetch2/git.py b/bitbake/lib/bb/fetch2/git.py
>> index 2a3c06fe4e9..33895e09b29 100644
>> --- a/bitbake/lib/bb/fetch2/git.py
>> +++ b/bitbake/lib/bb/fetch2/git.py
>> @@ -65,6 +65,7 @@ import fnmatch
>>   import os
>>   import re
>>   import shlex
>> +import shutil
>>   import subprocess
>>   import tempfile
>>   import bb
>> @@ -365,8 +366,35 @@ class Git(FetchMethod):
>>                   runfetchcmd(fetch_cmd, d, workdir=ud.clonedir)
>>           repourl = self._get_repo_url(ud)
>>
>> +        needs_clone = False
>> +        if os.path.exists(ud.clonedir):
>> +            # The directory may exist, but not be the top level of a bare git
>> +            # repository in which case it needs to be deleted and re-cloned.
>> +            try:
>> +                # Since clones can be bare, use --absolute-git-dir instead of --show-toplevel
>> +                output = runfetchcmd("LANG=C %s rev-parse --absolute-git-dir" % ud.basecmd, d, workdir=ud.clonedir)
>> +            except bb.fetch2.FetchError as e:
>> +                logger.warning("Unable to get top level for %s (not a git directory?): %s", ud.clonedir, e)
>> +                needs_clone = True
>> +            else:
>> +                toplevel = os.path.abspath(output.rstrip())
>> +                abs_clonedir = os.path.abspath(ud.clonedir).rstrip('/')
>> +                # The top level Git directory must either be the clone directory
>> +                # or a child of the clone directory. Any ancestor directory of
>> +                # the clone directory is not valid as the Git directory (and
>> +                # probably belongs to some other unrelated repository), so a
>> +                # clone is required
>> +                if os.path.commonprefix([abs_clonedir, toplevel]) != abs_clonedir:
>> +                    logger.warning("Top level directory '%s' doesn't match expected '%s'. Re-cloning", toplevel, ud.clonedir)
>> +                    needs_clone = True
>> +
>> +            if needs_clone:
>> +                shutil.rmtree(ud.clonedir)
>> +        else:
>> +            needs_clone = True
>> +
>>           # If the repo still doesn't exist, fallback to cloning it
>> -        if not os.path.exists(ud.clonedir):
>> +        if needs_clone:
>>               # We do this since git will use a "-l" option automatically for local urls where possible,
>>               # but it doesn't work when git/objects is a symlink, only works when it is a directory.
>>               if repourl.startswith(
>> ["file://"](file://)
>> ):
>> --
>> 2.33.0
>
> Is it possible to factor the try/except else to a function? Then we
> would not need the else, and would just return immediately and have a
> self contained code, which in my opinion is easier to read. I also think
> f-string would be more readable for runfetchcmd. f-string are available
> since python 3.6.
>
> Can you share the rationale for the else clause? I do not see anything
> inside the else clause that might raise a FetchError, thus the else
> clause is redundant.
>
> Another change I suggest is to move the rmtree inside the usual
> need_clone and ignore the error if the path does not exist. Removes
> another conditional.
>
> I also think using needs_clone to assert whether the rmtree should be
> done is confusing A different variable should be used or use a
> function's return value. All the code inside if
> os.path.exists(ud.clonedir): deals with the need for removal and as a
> side effect will trigger the clone in another part of the code.
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#14982):
> https://lists.openembedded.org/g/bitbake-devel/message/14982
> Mute This Topic:
> https://lists.openembedded.org/mt/100942876/4454782
> Group Owner:
> bitbake-devel+owner@lists.openembedded.org
> Unsubscribe:
> https://lists.openembedded.org/g/bitbake-devel/unsub
> [
> paulo@myneves.com
> ]
> -=-=-=-=-=-=-=-=-=-=-=-

> Another change I suggest is to move the rmtree inside the usual
need_clone and ignore the error if the path does not exist. Removes
another conditional.

Disregard this comment. It is incorrect.
Joshua Watt Aug. 25, 2023, 2:19 p.m. UTC | #3
On Fri, Aug 25, 2023 at 3:18 AM Paulo Neves <paulo@myneves.com> wrote:
>
> On 24/08/2023 21:53, Joshua Watt wrote:
> > If the clone target directory exists and is a valid git repo, but the
> > git directory is not a child, it needs to be erased and re-cloned. One
> > example of how this can happen is if a clone creates the directory, but
> > then fails to actual clone and make it a git repository. This left-over
> > directory can be particularly problematic if the download directory is a
> > descent of some top layer git repo (e.g. the default with poky), as the
> > commands that operate on the remote later will then mangle the layers
> > git repository instead of the download git repo.
> >
> > Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> > ---
> >   bitbake/lib/bb/fetch2/git.py | 30 +++++++++++++++++++++++++++++-
> >   1 file changed, 29 insertions(+), 1 deletion(-)
> >
> > diff --git a/bitbake/lib/bb/fetch2/git.py b/bitbake/lib/bb/fetch2/git.py
> > index 2a3c06fe4e9..33895e09b29 100644
> > --- a/bitbake/lib/bb/fetch2/git.py
> > +++ b/bitbake/lib/bb/fetch2/git.py
> > @@ -65,6 +65,7 @@ import fnmatch
> >   import os
> >   import re
> >   import shlex
> > +import shutil
> >   import subprocess
> >   import tempfile
> >   import bb
> > @@ -365,8 +366,35 @@ class Git(FetchMethod):
> >                   runfetchcmd(fetch_cmd, d, workdir=ud.clonedir)
> >           repourl = self._get_repo_url(ud)
> >
> > +        needs_clone = False
> > +        if os.path.exists(ud.clonedir):
> > +            # The directory may exist, but not be the top level of a bare git
> > +            # repository in which case it needs to be deleted and re-cloned.
> > +            try:
> > +                # Since clones can be bare, use --absolute-git-dir instead of --show-toplevel
> > +                output = runfetchcmd("LANG=C %s rev-parse --absolute-git-dir" % ud.basecmd, d, workdir=ud.clonedir)
> > +            except bb.fetch2.FetchError as e:
> > +                logger.warning("Unable to get top level for %s (not a git directory?): %s", ud.clonedir, e)
> > +                needs_clone = True
> > +            else:
> > +                toplevel = os.path.abspath(output.rstrip())
> > +                abs_clonedir = os.path.abspath(ud.clonedir).rstrip('/')
> > +                # The top level Git directory must either be the clone directory
> > +                # or a child of the clone directory. Any ancestor directory of
> > +                # the clone directory is not valid as the Git directory (and
> > +                # probably belongs to some other unrelated repository), so a
> > +                # clone is required
> > +                if os.path.commonprefix([abs_clonedir, toplevel]) != abs_clonedir:
> > +                    logger.warning("Top level directory '%s' doesn't match expected '%s'. Re-cloning", toplevel, ud.clonedir)
> > +                    needs_clone = True
> > +
> > +            if needs_clone:
> > +                shutil.rmtree(ud.clonedir)
> > +        else:
> > +            needs_clone = True
> > +
> >           # If the repo still doesn't exist, fallback to cloning it
> > -        if not os.path.exists(ud.clonedir):
> > +        if needs_clone:
> >               # We do this since git will use a "-l" option automatically for local urls where possible,
> >               # but it doesn't work when git/objects is a symlink, only works when it is a directory.
> >               if repourl.startswith("file://"):
> > --
> > 2.33.0
> >
> >
> > -=-=-=-=-=-=-=-=-=-=-=-
> > Links: You receive all messages sent to this group.
> > View/Reply Online (#14979): https://lists.openembedded.org/g/bitbake-devel/message/14979
> > Mute This Topic: https://lists.openembedded.org/mt/100942876/4454782
> > Group Owner: bitbake-devel+owner@lists.openembedded.org
> > Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [paulo@myneves.com]
> > -=-=-=-=-=-=-=-=-=-=-=-
> >
> Is it possible to factor the try/except else to a function? Then we
> would not need the else, and would just return immediately and have a
> self contained code, which in my opinion is easier to read. I also think
> f-string would be more readable for runfetchcmd. f-string are available
> since python 3.6.
>
> Can you share the rationale for the else clause? I do not see anything
> inside the else clause that might raise a FetchError, thus the else
> clause is redundant.

The else clause is pretty much the entire reason for this patch in the
first place :)

The problem is that the fetcher assumes that the simple existence of
the download directory means it can can add a remote and clone there,
but that is not true. If your clone directory is a subdirectory of
another git repository, but not a git repository itself, bitbake will
(incorrectly) add the remote the higher level git repos and fetch
there, leaving it a mess.

More visually, if your layout is like this:

 poky <-- This is a git repository
   build
      downloads
          git2
            foo <-- This is NOT a git repository

The old code would see that the foo directory existed, see that it was
part of a git repo (poky!) add the remotes and start cloning objects
into the poky git repo, which is not at all what we want.

The else clause corrects this as it makes the fetcher check if "foo"
is not only a git repo, but that the it is the toplevel of a git repo
(or at least that it's --git-dir is not a parent of that directory).
If it's not, it deletes it because it's not a valid place to clone
things into anymore.

>
> Another change I suggest is to move the rmtree inside the usual
> need_clone and ignore the error if the path does not exist. Removes
> another conditional.
>
> I also think using needs_clone to assert whether the rmtree should be
> done is confusing A different variable should be used or use a
> function's return value. All the code inside if
> os.path.exists(ud.clonedir): deals with the need for removal and as a
> side effect will trigger the clone in another part of the code.
>
>
>
Paulo Neves Aug. 31, 2023, 12:21 p.m. UTC | #4
On 25/08/2023 16:19, Joshua Watt wrote:
> On Fri, Aug 25, 2023 at 3:18 AM Paulo Neves <paulo@myneves.com> wrote:
>> On 24/08/2023 21:53, Joshua Watt wrote:
>>> If the clone target directory exists and is a valid git repo, but the
>>> git directory is not a child, it needs to be erased and re-cloned. One
>>> example of how this can happen is if a clone creates the directory, but
>>> then fails to actual clone and make it a git repository. This left-over
>>> directory can be particularly problematic if the download directory is a
>>> descent of some top layer git repo (e.g. the default with poky), as the
>>> commands that operate on the remote later will then mangle the layers
>>> git repository instead of the download git repo.
>>>
>>> Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
>>> ---
>>>    bitbake/lib/bb/fetch2/git.py | 30 +++++++++++++++++++++++++++++-
>>>    1 file changed, 29 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/bitbake/lib/bb/fetch2/git.py b/bitbake/lib/bb/fetch2/git.py
>>> index 2a3c06fe4e9..33895e09b29 100644
>>> --- a/bitbake/lib/bb/fetch2/git.py
>>> +++ b/bitbake/lib/bb/fetch2/git.py
>>> @@ -65,6 +65,7 @@ import fnmatch
>>>    import os
>>>    import re
>>>    import shlex
>>> +import shutil
>>>    import subprocess
>>>    import tempfile
>>>    import bb
>>> @@ -365,8 +366,35 @@ class Git(FetchMethod):
>>>                    runfetchcmd(fetch_cmd, d, workdir=ud.clonedir)
>>>            repourl = self._get_repo_url(ud)
>>>
>>> +        needs_clone = False
>>> +        if os.path.exists(ud.clonedir):
>>> +            # The directory may exist, but not be the top level of a bare git
>>> +            # repository in which case it needs to be deleted and re-cloned.
>>> +            try:
>>> +                # Since clones can be bare, use --absolute-git-dir instead of --show-toplevel
>>> +                output = runfetchcmd("LANG=C %s rev-parse --absolute-git-dir" % ud.basecmd, d, workdir=ud.clonedir)
>>> +            except bb.fetch2.FetchError as e:
>>> +                logger.warning("Unable to get top level for %s (not a git directory?): %s", ud.clonedir, e)
>>> +                needs_clone = True
>>> +            else:
>>> +                toplevel = os.path.abspath(output.rstrip())
>>> +                abs_clonedir = os.path.abspath(ud.clonedir).rstrip('/')
>>> +                # The top level Git directory must either be the clone directory
>>> +                # or a child of the clone directory. Any ancestor directory of
>>> +                # the clone directory is not valid as the Git directory (and
>>> +                # probably belongs to some other unrelated repository), so a
>>> +                # clone is required
>>> +                if os.path.commonprefix([abs_clonedir, toplevel]) != abs_clonedir:
>>> +                    logger.warning("Top level directory '%s' doesn't match expected '%s'. Re-cloning", toplevel, ud.clonedir)
>>> +                    needs_clone = True
>>> +
>>> +            if needs_clone:
>>> +                shutil.rmtree(ud.clonedir)
>>> +        else:
>>> +            needs_clone = True
>>> +
>>>            # If the repo still doesn't exist, fallback to cloning it
>>> -        if not os.path.exists(ud.clonedir):
>>> +        if needs_clone:
>>>                # We do this since git will use a "-l" option automatically for local urls where possible,
>>>                # but it doesn't work when git/objects is a symlink, only works when it is a directory.
>>>                if repourl.startswith("file://"):
>>> --
>>> 2.33.0
>>>
>>>
>>>
>>>
>> Is it possible to factor the try/except else to a function? Then we
>> would not need the else, and would just return immediately and have a
>> self contained code, which in my opinion is easier to read. I also think
>> f-string would be more readable for runfetchcmd. f-string are available
>> since python 3.6.
>>
>> Can you share the rationale for the else clause? I do not see anything
>> inside the else clause that might raise a FetchError, thus the else
>> clause is redundant.
> The else clause is pretty much the entire reason for this patch in the
> first place :)
>
> The problem is that the fetcher assumes that the simple existence of
> the download directory means it can can add a remote and clone there,
> but that is not true. If your clone directory is a subdirectory of
> another git repository, but not a git repository itself, bitbake will
> (incorrectly) add the remote the higher level git repos and fetch
> there, leaving it a mess.
>
> More visually, if your layout is like this:
>
>   poky <-- This is a git repository
>     build
>        downloads
>            git2
>              foo <-- This is NOT a git repository
>
> The old code would see that the foo directory existed, see that it was
> part of a git repo (poky!) add the remotes and start cloning objects
> into the poky git repo, which is not at all what we want.
>
> The else clause corrects this as it makes the fetcher check if "foo"
> is not only a git repo, but that the it is the toplevel of a git repo
> (or at least that it's --git-dir is not a parent of that directory).
> If it's not, it deletes it because it's not a valid place to clone
> things into anymore.

I understand the conceptual point of the patch and I think it is valid. 
What i mean is that you do not need the else clause. The code under the 
else clause can continue inside the try clause and not change the logic. 
The try-else idiom is very unusual and represents a meaning which is not 
needed here: That the code inside the else should not trigger the except 
clause.

>> Another change I suggest is to move the rmtree inside the usual
>> need_clone and ignore the error if the path does not exist. Removes
>> another conditional.
>>
>> I also think using needs_clone to assert whether the rmtree should be
>> done is confusing A different variable should be used or use a
>> function's return value. All the code inside if
>> os.path.exists(ud.clonedir): deals with the need for removal and as a
>> side effect will trigger the clone in another part of the code.
>>
>>
>>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#14984): https://lists.openembedded.org/g/bitbake-devel/message/14984
> Mute This Topic: https://lists.openembedded.org/mt/100942876/4454782
> Group Owner: bitbake-devel+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [paulo@myneves.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
Joshua Watt Aug. 31, 2023, 2:27 p.m. UTC | #5
On Thu, Aug 31, 2023 at 6:21 AM Paulo Neves <paulo@myneves.com> wrote:
>
> On 25/08/2023 16:19, Joshua Watt wrote:
> > On Fri, Aug 25, 2023 at 3:18 AM Paulo Neves <paulo@myneves.com> wrote:
> >> On 24/08/2023 21:53, Joshua Watt wrote:
> >>> If the clone target directory exists and is a valid git repo, but the
> >>> git directory is not a child, it needs to be erased and re-cloned. One
> >>> example of how this can happen is if a clone creates the directory, but
> >>> then fails to actual clone and make it a git repository. This left-over
> >>> directory can be particularly problematic if the download directory is a
> >>> descent of some top layer git repo (e.g. the default with poky), as the
> >>> commands that operate on the remote later will then mangle the layers
> >>> git repository instead of the download git repo.
> >>>
> >>> Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> >>> ---
> >>>    bitbake/lib/bb/fetch2/git.py | 30 +++++++++++++++++++++++++++++-
> >>>    1 file changed, 29 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/bitbake/lib/bb/fetch2/git.py b/bitbake/lib/bb/fetch2/git.py
> >>> index 2a3c06fe4e9..33895e09b29 100644
> >>> --- a/bitbake/lib/bb/fetch2/git.py
> >>> +++ b/bitbake/lib/bb/fetch2/git.py
> >>> @@ -65,6 +65,7 @@ import fnmatch
> >>>    import os
> >>>    import re
> >>>    import shlex
> >>> +import shutil
> >>>    import subprocess
> >>>    import tempfile
> >>>    import bb
> >>> @@ -365,8 +366,35 @@ class Git(FetchMethod):
> >>>                    runfetchcmd(fetch_cmd, d, workdir=ud.clonedir)
> >>>            repourl = self._get_repo_url(ud)
> >>>
> >>> +        needs_clone = False
> >>> +        if os.path.exists(ud.clonedir):
> >>> +            # The directory may exist, but not be the top level of a bare git
> >>> +            # repository in which case it needs to be deleted and re-cloned.
> >>> +            try:
> >>> +                # Since clones can be bare, use --absolute-git-dir instead of --show-toplevel
> >>> +                output = runfetchcmd("LANG=C %s rev-parse --absolute-git-dir" % ud.basecmd, d, workdir=ud.clonedir)
> >>> +            except bb.fetch2.FetchError as e:
> >>> +                logger.warning("Unable to get top level for %s (not a git directory?): %s", ud.clonedir, e)
> >>> +                needs_clone = True
> >>> +            else:
> >>> +                toplevel = os.path.abspath(output.rstrip())
> >>> +                abs_clonedir = os.path.abspath(ud.clonedir).rstrip('/')
> >>> +                # The top level Git directory must either be the clone directory
> >>> +                # or a child of the clone directory. Any ancestor directory of
> >>> +                # the clone directory is not valid as the Git directory (and
> >>> +                # probably belongs to some other unrelated repository), so a
> >>> +                # clone is required
> >>> +                if os.path.commonprefix([abs_clonedir, toplevel]) != abs_clonedir:
> >>> +                    logger.warning("Top level directory '%s' doesn't match expected '%s'. Re-cloning", toplevel, ud.clonedir)
> >>> +                    needs_clone = True
> >>> +
> >>> +            if needs_clone:
> >>> +                shutil.rmtree(ud.clonedir)
> >>> +        else:
> >>> +            needs_clone = True
> >>> +
> >>>            # If the repo still doesn't exist, fallback to cloning it
> >>> -        if not os.path.exists(ud.clonedir):
> >>> +        if needs_clone:
> >>>                # We do this since git will use a "-l" option automatically for local urls where possible,
> >>>                # but it doesn't work when git/objects is a symlink, only works when it is a directory.
> >>>                if repourl.startswith("file://"):
> >>> --
> >>> 2.33.0
> >>>
> >>>
> >>>
> >>>
> >> Is it possible to factor the try/except else to a function? Then we
> >> would not need the else, and would just return immediately and have a
> >> self contained code, which in my opinion is easier to read. I also think
> >> f-string would be more readable for runfetchcmd. f-string are available
> >> since python 3.6.
> >>
> >> Can you share the rationale for the else clause? I do not see anything
> >> inside the else clause that might raise a FetchError, thus the else
> >> clause is redundant.
> > The else clause is pretty much the entire reason for this patch in the
> > first place :)
> >
> > The problem is that the fetcher assumes that the simple existence of
> > the download directory means it can can add a remote and clone there,
> > but that is not true. If your clone directory is a subdirectory of
> > another git repository, but not a git repository itself, bitbake will
> > (incorrectly) add the remote the higher level git repos and fetch
> > there, leaving it a mess.
> >
> > More visually, if your layout is like this:
> >
> >   poky <-- This is a git repository
> >     build
> >        downloads
> >            git2
> >              foo <-- This is NOT a git repository
> >
> > The old code would see that the foo directory existed, see that it was
> > part of a git repo (poky!) add the remotes and start cloning objects
> > into the poky git repo, which is not at all what we want.
> >
> > The else clause corrects this as it makes the fetcher check if "foo"
> > is not only a git repo, but that the it is the toplevel of a git repo
> > (or at least that it's --git-dir is not a parent of that directory).
> > If it's not, it deletes it because it's not a valid place to clone
> > things into anymore.
>
> I understand the conceptual point of the patch and I think it is valid.
> What i mean is that you do not need the else clause. The code under the
> else clause can continue inside the try clause and not change the logic.
> The try-else idiom is very unusual and represents a meaning which is not
> needed here: That the code inside the else should not trigger the except
> clause.

Ah, yes. I see what you are saying now

>
> >> Another change I suggest is to move the rmtree inside the usual
> >> need_clone and ignore the error if the path does not exist. Removes
> >> another conditional.
> >>
> >> I also think using needs_clone to assert whether the rmtree should be
> >> done is confusing A different variable should be used or use a
> >> function's return value. All the code inside if
> >> os.path.exists(ud.clonedir): deals with the need for removal and as a
> >> side effect will trigger the clone in another part of the code.
> >>
> >>
> >>
> > -=-=-=-=-=-=-=-=-=-=-=-
> > Links: You receive all messages sent to this group.
> > View/Reply Online (#14984): https://lists.openembedded.org/g/bitbake-devel/message/14984
> > Mute This Topic: https://lists.openembedded.org/mt/100942876/4454782
> > Group Owner: bitbake-devel+owner@lists.openembedded.org
> > Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [paulo@myneves.com]
> > -=-=-=-=-=-=-=-=-=-=-=-
> >
>
>
diff mbox series

Patch

diff --git a/bitbake/lib/bb/fetch2/git.py b/bitbake/lib/bb/fetch2/git.py
index 2a3c06fe4e9..33895e09b29 100644
--- a/bitbake/lib/bb/fetch2/git.py
+++ b/bitbake/lib/bb/fetch2/git.py
@@ -65,6 +65,7 @@  import fnmatch
 import os
 import re
 import shlex
+import shutil
 import subprocess
 import tempfile
 import bb
@@ -365,8 +366,35 @@  class Git(FetchMethod):
                 runfetchcmd(fetch_cmd, d, workdir=ud.clonedir)
         repourl = self._get_repo_url(ud)
 
+        needs_clone = False
+        if os.path.exists(ud.clonedir):
+            # The directory may exist, but not be the top level of a bare git
+            # repository in which case it needs to be deleted and re-cloned.
+            try:
+                # Since clones can be bare, use --absolute-git-dir instead of --show-toplevel
+                output = runfetchcmd("LANG=C %s rev-parse --absolute-git-dir" % ud.basecmd, d, workdir=ud.clonedir)
+            except bb.fetch2.FetchError as e:
+                logger.warning("Unable to get top level for %s (not a git directory?): %s", ud.clonedir, e)
+                needs_clone = True
+            else:
+                toplevel = os.path.abspath(output.rstrip())
+                abs_clonedir = os.path.abspath(ud.clonedir).rstrip('/')
+                # The top level Git directory must either be the clone directory
+                # or a child of the clone directory. Any ancestor directory of
+                # the clone directory is not valid as the Git directory (and
+                # probably belongs to some other unrelated repository), so a
+                # clone is required
+                if os.path.commonprefix([abs_clonedir, toplevel]) != abs_clonedir:
+                    logger.warning("Top level directory '%s' doesn't match expected '%s'. Re-cloning", toplevel, ud.clonedir)
+                    needs_clone = True
+
+            if needs_clone:
+                shutil.rmtree(ud.clonedir)
+        else:
+            needs_clone = True
+
         # If the repo still doesn't exist, fallback to cloning it
-        if not os.path.exists(ud.clonedir):
+        if needs_clone:
             # We do this since git will use a "-l" option automatically for local urls where possible,
             # but it doesn't work when git/objects is a symlink, only works when it is a directory.
             if repourl.startswith("file://"):