diff mbox series

[V2] dnf: remove log_lock.pid before exit

Message ID 20240307091925.2324637-1-changqing.li@windriver.com
State New
Headers show
Series [V2] dnf: remove log_lock.pid before exit | expand

Commit Message

Changqing Li March 7, 2024, 9:19 a.m. UTC
From: Changqing Li <changqing.li@windriver.com>

dnf has a bug, refer [1], it causes that log_lock.pid may not removed
after dnf exit. And for native dnf, since we change the lock file path
to /, it will never be removed, and make the rootfs not clean,refer
[2][3].  This patch is a workaround to fix above issue.

[1] https://github.com/rpm-software-management/dnf/issues/1963
[2] https://git.openembedded.org/openembedded-core/commit/?id=742a1b71249f4da1c8d8e13e270b0eb6128a3f66
[3] https://github.com/rpm-software-management/dnf/blob/master/etc/tmpfiles.d/dnf.conf

Signed-off-by: Changqing Li <changqing.li@windriver.com>
---
 ...n.py-remove-log_lock.pid-before-exit.patch | 41 +++++++++++++++++++
 meta/recipes-devtools/dnf/dnf_4.18.2.bb       |  3 +-
 2 files changed, 43 insertions(+), 1 deletion(-)
 create mode 100644 meta/recipes-devtools/dnf/dnf/0001-main.py-remove-log_lock.pid-before-exit.patch

Comments

Alexander Kanavin March 7, 2024, 9:46 a.m. UTC | #1
On Thu, 7 Mar 2024 at 10:19, Changqing Li
<changqing.li@eng.windriver.com> wrote:
> +Upstream-Status: Inappropriate [oe specific workaround]

You *just* said you will send this upstream as requested. Why is it
suddenly Inappropriate?

Alex
Alexander Kanavin March 7, 2024, 9:56 a.m. UTC | #2
On Thu, 7 Mar 2024 at 10:19, Changqing Li
<changqing.li@eng.windriver.com> wrote:
> ++        for arg in args:
> ++            if arg.startswith("--installroot="):
> ++                root=arg.split("=")[1]
> ++                if os.path.exists(os.path.join(root, "log_lock.pid")):
> ++                    os.unlink(os.path.join(root, "log_lock.pid"))
> ++                break

Apologies, but after looking at this more carefully I have to say no
once more. The obstacle is that the code checks for existence before
removing the lock, which means the issue of why the lock is left in
place to begin with is still unresolved. This doesn't always happen,
and we need to get to the bottom of *why*.

Alex
ChenQi March 7, 2024, 10:21 a.m. UTC | #3
Hi Alex,

You can see dnf's solution is: https://github.com/rpm-software-management/dnf/blob/master/etc/tmpfiles.d/dnf.conf

I don't think dnf community will look into this issue. And I would expect it to be a complicated one. Because dnf's own solution looks like more of a workaround. At the same time, yocto based systems will sometimes have log_lock.pid left in target filesystems. Users typing 'ls /' will notice it.

We have an OE specific patch: https://git.openembedded.org/openembedded-core/commit/?id=742a1b71249f4da1c8d8e13e270b0eb6128a3f66
I can see that this OE specific patch, 0001-dnf-write-the-log-lock-to-root.patch, does solve some issue. Unfortunately, at the same time, it introduces this specific issue, a easy-to-notice one. 

My suggestion is that we merge this fix into 0001-dnf-write-the-log-lock-to-root.patch, because they really belong together. I guess we can't expect dnf community to devote time into this, as they've already got a solution. And I'm not sure if anyone in OE community is familiar with dnf enough to solve this issue. So let's make things a little better, avoiding users of Yocto systems see this log_lock.pid file when they type 'ls /'.

If you have any other suggestion, please let us know.

Regards,
Qi


-----Original Message-----
From: openembedded-core@lists.openembedded.org <openembedded-core@lists.openembedded.org> On Behalf Of Alexander Kanavin via lists.openembedded.org
Sent: Thursday, March 7, 2024 5:57 PM
To: Li, Changqing <Changqing.Li@windriver.com>; Richard Purdie <richard.purdie@linuxfoundation.org>
Cc: openembedded-core@lists.openembedded.org
Subject: Re: [OE-core] [PATCH V2] dnf: remove log_lock.pid before exit

On Thu, 7 Mar 2024 at 10:19, Changqing Li <changqing.li@eng.windriver.com> wrote:
> ++        for arg in args:
> ++            if arg.startswith("--installroot="):
> ++                root=arg.split("=")[1]
> ++                if os.path.exists(os.path.join(root, "log_lock.pid")):
> ++                    os.unlink(os.path.join(root, "log_lock.pid"))
> ++                break

Apologies, but after looking at this more carefully I have to say no once more. The obstacle is that the code checks for existence before removing the lock, which means the issue of why the lock is left in place to begin with is still unresolved. This doesn't always happen, and we need to get to the bottom of *why*.

Alex
Alexander Kanavin March 7, 2024, 11:42 a.m. UTC | #4
On Thu, 7 Mar 2024 at 11:21, Chen Qi via lists.openembedded.org
<Qi.Chen=windriver.com@lists.openembedded.org> wrote:
> You can see dnf's solution is: https://github.com/rpm-software-management/dnf/blob/master/etc/tmpfiles.d/dnf.conf
>
> I don't think dnf community will look into this issue. And I would expect it to be a complicated one. Because dnf's own solution looks like more of a workaround. At the same time, yocto based systems will sometimes have log_lock.pid left in target filesystems. Users typing 'ls /' will notice it.
>
> We have an OE specific patch: https://git.openembedded.org/openembedded-core/commit/?id=742a1b71249f4da1c8d8e13e270b0eb6128a3f66
> I can see that this OE specific patch, 0001-dnf-write-the-log-lock-to-root.patch, does solve some issue. Unfortunately, at the same time, it introduces this specific issue, a easy-to-notice one.
>
> My suggestion is that we merge this fix into 0001-dnf-write-the-log-lock-to-root.patch, because they really belong together. I guess we can't expect dnf community to devote time into this, as they've already got a solution. And I'm not sure if anyone in OE community is familiar with dnf enough to solve this issue. So let's make things a little better, avoiding users of Yocto systems see this log_lock.pid file when they type 'ls /'.
>
> If you have any other suggestion, please let us know.

We need to find out why it happens, and why it happens only sometimes.
If lock file does not get properly deleted that is quite possibly
masking a different issue which needs fixing, and the proposed patch
just sweeps it all under the carpet in the name of giving users an
aesthetically pleasing rootfs. So the answer is still no.

If you are having difficulty debugging the situation where the lock
file does get left behind, can you provide a way for me to reproduce
the issue? I just build core-image-minimal, and I'm not seeing it:

alex@Zen2:/srv/storage/alex/yocto/build-64-alt$ ls
tmp/work/qemux86_64-poky-linux/core-image-minimal/1.0/rootfs/
bin  boot  dev  etc  home  lib  media  mnt  proc  root  run  sbin  srv
 sys  tmp  usr  var

If you can't provide that, then you can certainly still do more on
your side: build core-image-minimal with plain poky, ensure the lock
file gets deleted, then build your own private things where the file
does not get deleted, then investigate why the code path that deletes
the lock file isn't being taken in the latter case. You can see the
code where the lock log (self.rotate_lock) is used in logging.py:

    def emit(self, record):
        while True:
            try:
                if self.shouldRollover(record):
                    with self.rotate_lock:
                        # Do rollover while preserving the mode of the
new log file
                        mode = os.stat(self.baseFilename).st_mode
                        self.doRollover()
                        os.chmod(self.baseFilename, mode)
                logging.FileHandler.emit(self, record)
                return
            except (dnf.exceptions.ProcessLockError,
dnf.exceptions.ThreadLockError):
                time.sleep(0.01)
            except Exception:
                self.handleError(record)
                return

And the deletion happens in lock.py:

    def __exit__(self, *exc_args):
        if self.count == 1:
            os.unlink(self.target)
        self._unlock_thread()



Alex
Peter Kjellerstedt March 7, 2024, 2:32 p.m. UTC | #5
> -----Original Message-----
> From: openembedded-core@lists.openembedded.org <openembedded-
> core@lists.openembedded.org> On Behalf Of Alexander Kanavin
> Sent: den 7 mars 2024 12:42
> To: Qi.Chen@windriver.com
> Cc: Li, Changqing <Changqing.Li@windriver.com>; Richard Purdie
> <richard.purdie@linuxfoundation.org>; openembedded-
> core@lists.openembedded.org
> Subject: Re: [OE-core] [PATCH V2] dnf: remove log_lock.pid before exit
> 
> On Thu, 7 Mar 2024 at 11:21, Chen Qi via lists.openembedded.org
> <Qi.Chen=windriver.com@lists.openembedded.org> wrote:
> > You can see dnf's solution is: https://github.com/rpm-software-
> management/dnf/blob/master/etc/tmpfiles.d/dnf.conf
> >
> > I don't think dnf community will look into this issue. And I would
> expect it to be a complicated one. Because dnf's own solution looks like
> more of a workaround. At the same time, yocto based systems will sometimes
> have log_lock.pid left in target filesystems. Users typing 'ls /' will
> notice it.
> >
> > We have an OE specific patch: https://git.openembedded.org/openembedded-
> core/commit/?id=742a1b71249f4da1c8d8e13e270b0eb6128a3f66
> > I can see that this OE specific patch, 0001-dnf-write-the-log-lock-to-
> root.patch, does solve some issue. Unfortunately, at the same time, it
> introduces this specific issue, a easy-to-notice one.
> >
> > My suggestion is that we merge this fix into 0001-dnf-write-the-log-
> lock-to-root.patch, because they really belong together. I guess we can't
> expect dnf community to devote time into this, as they've already got a
> solution. And I'm not sure if anyone in OE community is familiar with dnf
> enough to solve this issue. So let's make things a little better, avoiding
> users of Yocto systems see this log_lock.pid file when they type 'ls /'.
> >
> > If you have any other suggestion, please let us know.
> 
> We need to find out why it happens, and why it happens only sometimes.
> If lock file does not get properly deleted that is quite possibly
> masking a different issue which needs fixing, and the proposed patch
> just sweeps it all under the carpet in the name of giving users an
> aesthetically pleasing rootfs. So the answer is still no.
> 
> If you are having difficulty debugging the situation where the lock
> file does get left behind, can you provide a way for me to reproduce
> the issue? I just build core-image-minimal, and I'm not seeing it:
> 
> alex@Zen2:/srv/storage/alex/yocto/build-64-alt$ ls
> tmp/work/qemux86_64-poky-linux/core-image-minimal/1.0/rootfs/
> bin  boot  dev  etc  home  lib  media  mnt  proc  root  run  sbin  srv
>  sys  tmp  usr  var
> 
> If you can't provide that, then you can certainly still do more on
> your side: build core-image-minimal with plain poky, ensure the lock
> file gets deleted, then build your own private things where the file
> does not get deleted, then investigate why the code path that deletes
> the lock file isn't being taken in the latter case. You can see the
> code where the lock log (self.rotate_lock) is used in logging.py:
> 
>     def emit(self, record):
>         while True:
>             try:
>                 if self.shouldRollover(record):
>                     with self.rotate_lock:
>                         # Do rollover while preserving the mode of the new log file
>                         mode = os.stat(self.baseFilename).st_mode
>                         self.doRollover()
>                         os.chmod(self.baseFilename, mode)
>                 logging.FileHandler.emit(self, record)
>                 return
>             except (dnf.exceptions.ProcessLockError, dnf.exceptions.ThreadLockError):
>                 time.sleep(0.01)
>             except Exception:
>                 self.handleError(record)
>                 return
> 
> And the deletion happens in lock.py:
> 
>     def __exit__(self, *exc_args):
>         if self.count == 1:
>             os.unlink(self.target)
>         self._unlock_thread()
> 
> 
> 
> Alex

I did try to find a solution to this problem quite a while back, but 
unfortunately never managed to get to the bottom of it. What I did figure out 
was that the problem is related to whether dnf thinks it is running as root 
or not as that results in different paths being used. I.e., it may create the 
file in one place and then try to delete it in another. What I never could 
figure out is how/where dnf switches between root/not root as it seems to 
happen mid-run.

//Peter
Changqing Li March 15, 2024, 1:43 a.m. UTC | #6
On 3/7/24 19:42, Alexander Kanavin wrote:
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
> On Thu, 7 Mar 2024 at 11:21, Chen Qi via lists.openembedded.org
> <Qi.Chen=windriver.com@lists.openembedded.org>  wrote:
>> You can see dnf's solution is:https://github.com/rpm-software-management/dnf/blob/master/etc/tmpfiles.d/dnf.conf
>>
>> I don't think dnf community will look into this issue. And I would expect it to be a complicated one. Because dnf's own solution looks like more of a workaround. At the same time, yocto based systems will sometimes have log_lock.pid left in target filesystems. Users typing 'ls /' will notice it.
>>
>> We have an OE specific patch:https://git.openembedded.org/openembedded-core/commit/?id=742a1b71249f4da1c8d8e13e270b0eb6128a3f66
>> I can see that this OE specific patch, 0001-dnf-write-the-log-lock-to-root.patch, does solve some issue. Unfortunately, at the same time, it introduces this specific issue, a easy-to-notice one.
>>
>> My suggestion is that we merge this fix into 0001-dnf-write-the-log-lock-to-root.patch, because they really belong together. I guess we can't expect dnf community to devote time into this, as they've already got a solution. And I'm not sure if anyone in OE community is familiar with dnf enough to solve this issue. So let's make things a little better, avoiding users of Yocto systems see this log_lock.pid file when they type 'ls /'.
>>
>> If you have any other suggestion, please let us know.
> We need to find out why it happens, and why it happens only sometimes.
> If lock file does not get properly deleted that is quite possibly
> masking a different issue which needs fixing, and the proposed patch
> just sweeps it all under the carpet in the name of giving users an
> aesthetically pleasing rootfs. So the answer is still no.
>
> If you are having difficulty debugging the situation where the lock
> file does get left behind, can you provide a way for me to reproduce
> the issue? I just build core-image-minimal, and I'm not seeing it:
>
> alex@Zen2:/srv/storage/alex/yocto/build-64-alt$ ls
> tmp/work/qemux86_64-poky-linux/core-image-minimal/1.0/rootfs/
> bin  boot  dev  etc  home  lib  media  mnt  proc  root  run  sbin  srv
>   sys  tmp  usr  var

This issue only happened when dnf log rotate is happened. maybe install

"a lot" packages can trigger reproduce this issue more quickly.

>
> If you can't provide that, then you can certainly still do more on
> your side: build core-image-minimal with plain poky, ensure the lock
> file gets deleted, then build your own private things where the file
> does not get deleted, then investigate why the code path that deletes
> the lock file isn't being taken in the latter case. You can see the
> code where the lock log (self.rotate_lock) is used in logging.py:
>
>      def emit(self, record):
>          while True:
>              try:
>                  if self.shouldRollover(record):
>                      with self.rotate_lock:
>                          # Do rollover while preserving the mode of the
> new log file
>                          mode = os.stat(self.baseFilename).st_mode
>                          self.doRollover()
>                          os.chmod(self.baseFilename, mode)
>                  logging.FileHandler.emit(self, record)
>                  return
>              except (dnf.exceptions.ProcessLockError,
> dnf.exceptions.ThreadLockError):
>                  time.sleep(0.01)
>              except Exception:
>                  self.handleError(record)
>                  return
>
> And the deletion happens in lock.py:
>
>      def __exit__(self, *exc_args):
>          if self.count == 1:
>              os.unlink(self.target)
>          self._unlock_thread()

Hi,

Thanks for trying to help.  I had try to find the root cause when I 
opened the upstream issue, but unfortunately, not got the root cause,  I 
thought

it should be a complicated one. After do more investigating recently, 
the root cause should be found, and I have send pull request [1] to 
upsteam dnf ,

will backport the patch if upstream accept the fix.

[1] https://github.com/rpm-software-management/dnf/pull/2065

//Changqing

>
>
>
> Alex
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#196785):https://lists.openembedded.org/g/openembedded-core/message/196785
> Mute This Topic:https://lists.openembedded.org/mt/104784184/3616873
> Group Owner:openembedded-core+owner@lists.openembedded.org
> Unsubscribe:https://lists.openembedded.org/g/openembedded-core/unsub  [changqing.li@eng.windriver.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
Alexander Kanavin March 16, 2024, 4:39 p.m. UTC | #7
On Fri, 15 Mar 2024 at 02:43, Changqing Li <changqing.li@windriver.com> wrote:
> This issue only happened when dnf log rotate is happened. maybe install
> "a lot" packages can trigger reproduce this issue more quickly.

Maybe logrotate threshold can as well be adjusted to happen on small,
easily reproducible installations?

> Thanks for trying to help.  I had try to find the root cause when I opened the upstream issue, but unfortunately, not got the root cause,  I thought
>
> it should be a complicated one. After do more investigating recently, the root cause should be found, and I have send pull request [1] to upsteam dnf ,
>
> will backport the patch if upstream accept the fix.
>
> [1] https://github.com/rpm-software-management/dnf/pull/2065

Thanks, that looks roughly correct to me, let's see what upstream says.

Alex
diff mbox series

Patch

diff --git a/meta/recipes-devtools/dnf/dnf/0001-main.py-remove-log_lock.pid-before-exit.patch b/meta/recipes-devtools/dnf/dnf/0001-main.py-remove-log_lock.pid-before-exit.patch
new file mode 100644
index 0000000000..2dd4969d57
--- /dev/null
+++ b/meta/recipes-devtools/dnf/dnf/0001-main.py-remove-log_lock.pid-before-exit.patch
@@ -0,0 +1,41 @@ 
+From bdf17281385cf33ad59267fe75a9e427e6e2ffc4 Mon Sep 17 00:00:00 2001
+From: Changqing Li <changqing.li@windriver.com>
+Date: Thu, 7 Mar 2024 14:03:07 +0800
+Subject: [PATCH] main.py: remove log_lock.pid before exit
+
+dnf has a bug, refer [1], it causes that log_lock.pid may not removed
+after dnf exit. And for native dnf, since we change the lock file path
+to /, it will never be removed, and make the rootfs not clean,refer
+[2][3].  This patch is a workaround to fix above issue.
+
+[1] https://github.com/rpm-software-management/dnf/issues/1963
+[2] https://git.openembedded.org/openembedded-core/commit/?id=742a1b71249f4da1c8d8e13e270b0eb6128a3f66
+[3] https://github.com/rpm-software-management/dnf/blob/master/etc/tmpfiles.d/dnf.conf
+
+Upstream-Status: Inappropriate [oe specific workaround]
+
+Signed-off-by: Changqing Li <changqing.li@windriver.com>
+---
+ dnf/cli/main.py | 6 ++++++
+ 1 file changed, 6 insertions(+)
+
+diff --git a/dnf/cli/main.py b/dnf/cli/main.py
+index 2a7f92d54..2f672cff2 100644
+--- a/dnf/cli/main.py
++++ b/dnf/cli/main.py
+@@ -200,6 +200,12 @@ def user_main(args, exit_code=False):
+ 
+     errcode = main(args)
+     if exit_code:
++        for arg in args:
++            if arg.startswith("--installroot="):
++                root=arg.split("=")[1]
++                if os.path.exists(os.path.join(root, "log_lock.pid")):
++                    os.unlink(os.path.join(root, "log_lock.pid"))
++                break
+         sys.exit(errcode)
+     return errcode
+ 
+-- 
+2.25.1
+
diff --git a/meta/recipes-devtools/dnf/dnf_4.18.2.bb b/meta/recipes-devtools/dnf/dnf_4.18.2.bb
index dc0c18be5e..7770c9d7e0 100644
--- a/meta/recipes-devtools/dnf/dnf_4.18.2.bb
+++ b/meta/recipes-devtools/dnf/dnf_4.18.2.bb
@@ -17,7 +17,8 @@  SRC_URI = "git://github.com/rpm-software-management/dnf.git;branch=master;protoc
            file://0001-set-python-path-for-completion_helper.patch \
            "
 
-SRC_URI:append:class-native = "file://0001-dnf-write-the-log-lock-to-root.patch"
+SRC_URI:append:class-native = "file://0001-dnf-write-the-log-lock-to-root.patch \
+                               file://0001-main.py-remove-log_lock.pid-before-exit.patch"
 
 SRCREV = "1c43d0999178d492381ad0b43917ffd9c74016f8"
 UPSTREAM_CHECK_GITTAGREGEX = "(?P<pver>\d+(\.\d+)+)"