diff mbox series

command: Avoid time intensive distractions for ping

Message ID 20230830213303.2043607-1-richard.purdie@linuxfoundation.org
State Accepted, archived
Commit 0fc821a22f2b49cbd336d9658d92942c0d733be1
Headers show
Series command: Avoid time intensive distractions for ping | expand

Commit Message

Richard Purdie Aug. 30, 2023, 9:33 p.m. UTC
We noticed some users were seeing very slow ping response times which
caused 'server timeout' issues. There were some function calls in runCommand
which could be slow such as the inotify callback processing.

Mark up the ping command such that it doesn't need configuration information,
it is allowed on a readonly server and specifically skip the inofity processing
too since ping would never need that.

This will hopefully resolve various ping timeout issues that were being reported.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
---
 lib/bb/command.py | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Mikko Rapeli Aug. 31, 2023, 8:51 a.m. UTC | #1
Hi,

On Wed, Aug 30, 2023 at 10:33:03PM +0100, Richard Purdie wrote:
> We noticed some users were seeing very slow ping response times which
> caused 'server timeout' issues. There were some function calls in runCommand
> which could be slow such as the inotify callback processing.

I guess the error looks like:

Timeout while waiting for a reply from the bitbake server (60s)

and happens right after a succesful build to me when I try to run any bitbake
command.

Hope this fixes it. Thanks!

-Mikko
Richard Purdie Aug. 31, 2023, 8:56 a.m. UTC | #2
On Thu, 2023-08-31 at 11:51 +0300, Mikko Rapeli wrote:
> Hi,
> 
> On Wed, Aug 30, 2023 at 10:33:03PM +0100, Richard Purdie wrote:
> > We noticed some users were seeing very slow ping response times which
> > caused 'server timeout' issues. There were some function calls in runCommand
> > which could be slow such as the inotify callback processing.
> 
> I guess the error looks like:
> 
> Timeout while waiting for a reply from the bitbake server (60s)
> 
> and happens right after a succesful build to me when I try to run any bitbake
> command.
> 
> Hope this fixes it. Thanks!

That is the message however the cause may be different in your case. It
sounds like the server may be taking a long time to shut down for you
and it can't connect or start a new one whilst that is happening.

If you have the bitbake-cookerdaemon log for your case I can see if
that sheds any light if this doesn't work for you.

Cheers,

Richard
Mikko Rapeli Aug. 31, 2023, 11:30 a.m. UTC | #3
Hi,

On Thu, Aug 31, 2023 at 09:56:33AM +0100, Richard Purdie wrote:
> On Thu, 2023-08-31 at 11:51 +0300, Mikko Rapeli wrote:
> > Hi,
> > 
> > On Wed, Aug 30, 2023 at 10:33:03PM +0100, Richard Purdie wrote:
> > > We noticed some users were seeing very slow ping response times which
> > > caused 'server timeout' issues. There were some function calls in runCommand
> > > which could be slow such as the inotify callback processing.
> > 
> > I guess the error looks like:
> > 
> > Timeout while waiting for a reply from the bitbake server (60s)
> > 
> > and happens right after a succesful build to me when I try to run any bitbake
> > command.
> > 
> > Hope this fixes it. Thanks!
> 
> That is the message however the cause may be different in your case. It
> sounds like the server may be taking a long time to shut down for you
> and it can't connect or start a new one whilst that is happening.
> 
> If you have the bitbake-cookerdaemon log for your case I can see if
> that sheds any light if this doesn't work for you.

(Traceback (most recent call last):
  File "/home/builder/src/base/poky/bitbake/lib/bb/server/process.py", line 292, in main
    self.command_channel_reply.send(self.cooker.command.runCommand(command, self))
  File "/home/builder/src/base/poky/bitbake/lib/bb/server/process.py", line 868, in send
    self._send(obj)
  File "/home/builder/src/base/poky/bitbake/lib/bb/server/process.py", line 847, in _send
    self.writer.send_bytes(obj)
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
)
4131625 12:21:32.806544 Processing Client
4131625 12:21:32.806635 Disconnecting Client (socket: True)
4131625 12:21:32.807999 No timeout, exiting.
4131625 12:21:32.908254 Exiting (socket: True)
4131625 12:21:32.982669 More than one thread left?: [<_MainThread(MainThread, started 140013699211712)>, <Thread(asyncio_0, started 140012031006272)>]
4131625 12:21:34.433909 Exiting as we could obtain the lock
sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/home/builder/src/base/build/bitbake-cookerdaemon.log' mode='a+' encoding='UTF-8'>

I don't know what else to look for in the log.

Cheers,

-Mikko
Richard Purdie Aug. 31, 2023, 12:01 p.m. UTC | #4
On Thu, 2023-08-31 at 14:30 +0300, Mikko Rapeli wrote:
> Hi,
> 
> On Thu, Aug 31, 2023 at 09:56:33AM +0100, Richard Purdie wrote:
> > On Thu, 2023-08-31 at 11:51 +0300, Mikko Rapeli wrote:
> > > Hi,
> > > 
> > > On Wed, Aug 30, 2023 at 10:33:03PM +0100, Richard Purdie wrote:
> > > > We noticed some users were seeing very slow ping response times which
> > > > caused 'server timeout' issues. There were some function calls in runCommand
> > > > which could be slow such as the inotify callback processing.
> > > 
> > > I guess the error looks like:
> > > 
> > > Timeout while waiting for a reply from the bitbake server (60s)
> > > 
> > > and happens right after a succesful build to me when I try to run any bitbake
> > > command.
> > > 
> > > Hope this fixes it. Thanks!
> > 
> > That is the message however the cause may be different in your case. It
> > sounds like the server may be taking a long time to shut down for you
> > and it can't connect or start a new one whilst that is happening.
> > 
> > If you have the bitbake-cookerdaemon log for your case I can see if
> > that sheds any light if this doesn't work for you.
> 
> (Traceback (most recent call last):
>   File "/home/builder/src/base/poky/bitbake/lib/bb/server/process.py", line 292, in main
>     self.command_channel_reply.send(self.cooker.command.runCommand(command, self))
>   File "/home/builder/src/base/poky/bitbake/lib/bb/server/process.py", line 868, in send
>     self._send(obj)
>   File "/home/builder/src/base/poky/bitbake/lib/bb/server/process.py", line 847, in _send
>     self.writer.send_bytes(obj)
>   File "/usr/lib/python3.10/multiprocessing/connection.py", line 200, in send_bytes
>     self._send_bytes(m[offset:offset + size])
>   File "/usr/lib/python3.10/multiprocessing/connection.py", line 411, in _send_bytes
>     self._send(header + buf)
>   File "/usr/lib/python3.10/multiprocessing/connection.py", line 368, in _send
>     n = write(self._handle, buf)
> BrokenPipeError: [Errno 32] Broken pipe
> )
> 4131625 12:21:32.806544 Processing Client
> 4131625 12:21:32.806635 Disconnecting Client (socket: True)
> 4131625 12:21:32.807999 No timeout, exiting.
> 4131625 12:21:32.908254 Exiting (socket: True)
> 4131625 12:21:32.982669 More than one thread left?: [<_MainThread(MainThread, started 140013699211712)>, <Thread(asyncio_0, started 140012031006272)>]
> 4131625 12:21:34.433909 Exiting as we could obtain the lock
> sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/home/builder/src/base/build/bitbake-cookerdaemon.log' mode='a+' encoding='UTF-8'>
> 
> I don't know what else to look for in the log.

What happened before that?

The pipe error was the server trying to reply to the client but the
client was gone. We should be able to see the command running before
that and if it was a ping or not and the timestamps?

Cheers,

Richard
Mikko Rapeli Sept. 4, 2023, 6:47 a.m. UTC | #5
Hi,

On Thu, Aug 31, 2023 at 01:01:30PM +0100, Richard Purdie wrote:
> On Thu, 2023-08-31 at 14:30 +0300, Mikko Rapeli wrote:
> > Hi,
> > 
> > On Thu, Aug 31, 2023 at 09:56:33AM +0100, Richard Purdie wrote:
> > > On Thu, 2023-08-31 at 11:51 +0300, Mikko Rapeli wrote:
> > > > Hi,
> > > > 
> > > > On Wed, Aug 30, 2023 at 10:33:03PM +0100, Richard Purdie wrote:
> > > > > We noticed some users were seeing very slow ping response times which
> > > > > caused 'server timeout' issues. There were some function calls in runCommand
> > > > > which could be slow such as the inotify callback processing.
> > > > 
> > > > I guess the error looks like:
> > > > 
> > > > Timeout while waiting for a reply from the bitbake server (60s)
> > > > 
> > > > and happens right after a succesful build to me when I try to run any bitbake
> > > > command.
> > > > 
> > > > Hope this fixes it. Thanks!
> > > 
> > > That is the message however the cause may be different in your case. It
> > > sounds like the server may be taking a long time to shut down for you
> > > and it can't connect or start a new one whilst that is happening.
> > > 
> > > If you have the bitbake-cookerdaemon log for your case I can see if
> > > that sheds any light if this doesn't work for you.
> > 
> > (Traceback (most recent call last):
> >   File "/home/builder/src/base/poky/bitbake/lib/bb/server/process.py", line 292, in main
> >     self.command_channel_reply.send(self.cooker.command.runCommand(command, self))
> >   File "/home/builder/src/base/poky/bitbake/lib/bb/server/process.py", line 868, in send
> >     self._send(obj)
> >   File "/home/builder/src/base/poky/bitbake/lib/bb/server/process.py", line 847, in _send
> >     self.writer.send_bytes(obj)
> >   File "/usr/lib/python3.10/multiprocessing/connection.py", line 200, in send_bytes
> >     self._send_bytes(m[offset:offset + size])
> >   File "/usr/lib/python3.10/multiprocessing/connection.py", line 411, in _send_bytes
> >     self._send(header + buf)
> >   File "/usr/lib/python3.10/multiprocessing/connection.py", line 368, in _send
> >     n = write(self._handle, buf)
> > BrokenPipeError: [Errno 32] Broken pipe
> > )
> > 4131625 12:21:32.806544 Processing Client
> > 4131625 12:21:32.806635 Disconnecting Client (socket: True)
> > 4131625 12:21:32.807999 No timeout, exiting.
> > 4131625 12:21:32.908254 Exiting (socket: True)
> > 4131625 12:21:32.982669 More than one thread left?: [<_MainThread(MainThread, started 140013699211712)>, <Thread(asyncio_0, started 140012031006272)>]
> > 4131625 12:21:34.433909 Exiting as we could obtain the lock
> > sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/home/builder/src/base/build/bitbake-cookerdaemon.log' mode='a+' encoding='UTF-8'>
> > 
> > I don't know what else to look for in the log.
> 
> What happened before that?
> 
> The pipe error was the server trying to reply to the client but the
> client was gone. We should be able to see the command running before
> that and if it was a ping or not and the timestamps?

bitbake is quite sensitive to IO on my Ubuntu 22.04 build machine
with zfs. This is how I trigger this reliably:

$ mv tmp foo ; rm -rf foo & bitbake image
[1] 2709733
Timeout while waiting for a reply from the bitbake server (60s)

I have rm_work enabled so tmp is not that big except for the deployed files.

Cheers,

-Mikko
Richard Purdie Sept. 4, 2023, 7:07 a.m. UTC | #6
On Mon, 2023-09-04 at 09:47 +0300, Mikko Rapeli wrote:
> Hi,
> 
> On Thu, Aug 31, 2023 at 01:01:30PM +0100, Richard Purdie wrote:
> > On Thu, 2023-08-31 at 14:30 +0300, Mikko Rapeli wrote:
> > > Hi,
> > > 
> > > On Thu, Aug 31, 2023 at 09:56:33AM +0100, Richard Purdie wrote:
> > > > On Thu, 2023-08-31 at 11:51 +0300, Mikko Rapeli wrote:
> > > > > Hi,
> > > > > 
> > > > > On Wed, Aug 30, 2023 at 10:33:03PM +0100, Richard Purdie wrote:
> > > > > > We noticed some users were seeing very slow ping response times which
> > > > > > caused 'server timeout' issues. There were some function calls in runCommand
> > > > > > which could be slow such as the inotify callback processing.
> > > > > 
> > > > > I guess the error looks like:
> > > > > 
> > > > > Timeout while waiting for a reply from the bitbake server (60s)
> > > > > 
> > > > > and happens right after a succesful build to me when I try to run any bitbake
> > > > > command.
> > > > > 
> > > > > Hope this fixes it. Thanks!
> > > > 
> > > > That is the message however the cause may be different in your case. It
> > > > sounds like the server may be taking a long time to shut down for you
> > > > and it can't connect or start a new one whilst that is happening.
> > > > 
> > > > If you have the bitbake-cookerdaemon log for your case I can see if
> > > > that sheds any light if this doesn't work for you.
> > > 
> > > (Traceback (most recent call last):
> > >   File "/home/builder/src/base/poky/bitbake/lib/bb/server/process.py", line 292, in main
> > >     self.command_channel_reply.send(self.cooker.command.runCommand(command, self))
> > >   File "/home/builder/src/base/poky/bitbake/lib/bb/server/process.py", line 868, in send
> > >     self._send(obj)
> > >   File "/home/builder/src/base/poky/bitbake/lib/bb/server/process.py", line 847, in _send
> > >     self.writer.send_bytes(obj)
> > >   File "/usr/lib/python3.10/multiprocessing/connection.py", line 200, in send_bytes
> > >     self._send_bytes(m[offset:offset + size])
> > >   File "/usr/lib/python3.10/multiprocessing/connection.py", line 411, in _send_bytes
> > >     self._send(header + buf)
> > >   File "/usr/lib/python3.10/multiprocessing/connection.py", line 368, in _send
> > >     n = write(self._handle, buf)
> > > BrokenPipeError: [Errno 32] Broken pipe
> > > )
> > > 4131625 12:21:32.806544 Processing Client
> > > 4131625 12:21:32.806635 Disconnecting Client (socket: True)
> > > 4131625 12:21:32.807999 No timeout, exiting.
> > > 4131625 12:21:32.908254 Exiting (socket: True)
> > > 4131625 12:21:32.982669 More than one thread left?: [<_MainThread(MainThread, started 140013699211712)>, <Thread(asyncio_0, started 140012031006272)>]
> > > 4131625 12:21:34.433909 Exiting as we could obtain the lock
> > > sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/home/builder/src/base/build/bitbake-cookerdaemon.log' mode='a+' encoding='UTF-8'>
> > > 
> > > I don't know what else to look for in the log.
> > 
> > What happened before that?
> > 
> > The pipe error was the server trying to reply to the client but the
> > client was gone. We should be able to see the command running before
> > that and if it was a ping or not and the timestamps?
> 
> bitbake is quite sensitive to IO on my Ubuntu 22.04 build machine
> with zfs. This is how I trigger this reliably:
> 
> $ mv tmp foo ; rm -rf foo & bitbake image
> [1] 2709733
> Timeout while waiting for a reply from the bitbake server (60s)
> 
> I have rm_work enabled so tmp is not that big except for the deployed files.

That is quite a different usecase to the one being fixed by the patch.

When bitbake starts up there are a few heavy IO things it needs to do
such as loading a hash equivalence database which may be large, loading
the parse cache and iterating through all the metadata to check it's
validity.

If your system has IO challenges, doing that in parallell with a large
deletion may be challenging. If the server doesn't start quickly
enough, you'll see the timeout.

Note that it isn't a ping which is timing out but an actual command.

Later when builds are running, ping is used as a lightweight way to
check the server is still there if there has been no communication for
a while. This patch fixes the ping command to really be lightweight so
that builds don't fail on a loaded system during the build.

It would be nice to find a way to improve the startup so that the
timeouts are avoided too but I'm less sure how to do that. Even if the
ping did work, if the commands themselves are slow/stalling, that is a
different set of challenges...

Cheers,

Richard
diff mbox series

Patch

diff --git a/lib/bb/command.py b/lib/bb/command.py
index a355f56c60..b494f84a0a 100644
--- a/lib/bb/command.py
+++ b/lib/bb/command.py
@@ -65,7 +65,7 @@  class Command:
         command = commandline.pop(0)
 
         # Ensure cooker is ready for commands
-        if command != "updateConfig" and command != "setFeatures":
+        if command not in ["updateConfig", "setFeatures", "ping"]:
             try:
                 self.cooker.init_configdata()
                 if not self.remotedatastores:
@@ -85,7 +85,8 @@  class Command:
                 if not hasattr(command_method, 'readonly') or not getattr(command_method, 'readonly'):
                     return None, "Not able to execute not readonly commands in readonly mode"
             try:
-                self.cooker.process_inotify_updates_apply()
+                if command != "ping":
+                    self.cooker.process_inotify_updates_apply()
                 if getattr(command_method, 'needconfig', True):
                     self.cooker.updateCacheSync()
                 result = command_method(self, commandline)
@@ -169,6 +170,8 @@  class CommandsSync:
         Allow a UI to check the server is still alive
         """
         return "Still alive!"
+    ping.needconfig = False
+    ping.readonly = True
 
     def stateShutdown(self, command, params):
         """