Message ID | 20230830213303.2043607-1-richard.purdie@linuxfoundation.org |
---|---|
State | Accepted, archived |
Commit | 0fc821a22f2b49cbd336d9658d92942c0d733be1 |
Headers | show |
Series | command: Avoid time intensive distractions for ping | expand |
Hi, On Wed, Aug 30, 2023 at 10:33:03PM +0100, Richard Purdie wrote: > We noticed some users were seeing very slow ping response times which > caused 'server timeout' issues. There were some function calls in runCommand > which could be slow such as the inotify callback processing. I guess the error looks like: Timeout while waiting for a reply from the bitbake server (60s) and happens right after a succesful build to me when I try to run any bitbake command. Hope this fixes it. Thanks! -Mikko
On Thu, 2023-08-31 at 11:51 +0300, Mikko Rapeli wrote: > Hi, > > On Wed, Aug 30, 2023 at 10:33:03PM +0100, Richard Purdie wrote: > > We noticed some users were seeing very slow ping response times which > > caused 'server timeout' issues. There were some function calls in runCommand > > which could be slow such as the inotify callback processing. > > I guess the error looks like: > > Timeout while waiting for a reply from the bitbake server (60s) > > and happens right after a succesful build to me when I try to run any bitbake > command. > > Hope this fixes it. Thanks! That is the message however the cause may be different in your case. It sounds like the server may be taking a long time to shut down for you and it can't connect or start a new one whilst that is happening. If you have the bitbake-cookerdaemon log for your case I can see if that sheds any light if this doesn't work for you. Cheers, Richard
Hi, On Thu, Aug 31, 2023 at 09:56:33AM +0100, Richard Purdie wrote: > On Thu, 2023-08-31 at 11:51 +0300, Mikko Rapeli wrote: > > Hi, > > > > On Wed, Aug 30, 2023 at 10:33:03PM +0100, Richard Purdie wrote: > > > We noticed some users were seeing very slow ping response times which > > > caused 'server timeout' issues. There were some function calls in runCommand > > > which could be slow such as the inotify callback processing. > > > > I guess the error looks like: > > > > Timeout while waiting for a reply from the bitbake server (60s) > > > > and happens right after a succesful build to me when I try to run any bitbake > > command. > > > > Hope this fixes it. Thanks! > > That is the message however the cause may be different in your case. It > sounds like the server may be taking a long time to shut down for you > and it can't connect or start a new one whilst that is happening. > > If you have the bitbake-cookerdaemon log for your case I can see if > that sheds any light if this doesn't work for you. (Traceback (most recent call last): File "/home/builder/src/base/poky/bitbake/lib/bb/server/process.py", line 292, in main self.command_channel_reply.send(self.cooker.command.runCommand(command, self)) File "/home/builder/src/base/poky/bitbake/lib/bb/server/process.py", line 868, in send self._send(obj) File "/home/builder/src/base/poky/bitbake/lib/bb/server/process.py", line 847, in _send self.writer.send_bytes(obj) File "/usr/lib/python3.10/multiprocessing/connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) File "/usr/lib/python3.10/multiprocessing/connection.py", line 411, in _send_bytes self._send(header + buf) File "/usr/lib/python3.10/multiprocessing/connection.py", line 368, in _send n = write(self._handle, buf) BrokenPipeError: [Errno 32] Broken pipe ) 4131625 12:21:32.806544 Processing Client 4131625 12:21:32.806635 Disconnecting Client (socket: True) 4131625 12:21:32.807999 No timeout, exiting. 4131625 12:21:32.908254 Exiting (socket: True) 4131625 12:21:32.982669 More than one thread left?: [<_MainThread(MainThread, started 140013699211712)>, <Thread(asyncio_0, started 140012031006272)>] 4131625 12:21:34.433909 Exiting as we could obtain the lock sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/home/builder/src/base/build/bitbake-cookerdaemon.log' mode='a+' encoding='UTF-8'> I don't know what else to look for in the log. Cheers, -Mikko
On Thu, 2023-08-31 at 14:30 +0300, Mikko Rapeli wrote: > Hi, > > On Thu, Aug 31, 2023 at 09:56:33AM +0100, Richard Purdie wrote: > > On Thu, 2023-08-31 at 11:51 +0300, Mikko Rapeli wrote: > > > Hi, > > > > > > On Wed, Aug 30, 2023 at 10:33:03PM +0100, Richard Purdie wrote: > > > > We noticed some users were seeing very slow ping response times which > > > > caused 'server timeout' issues. There were some function calls in runCommand > > > > which could be slow such as the inotify callback processing. > > > > > > I guess the error looks like: > > > > > > Timeout while waiting for a reply from the bitbake server (60s) > > > > > > and happens right after a succesful build to me when I try to run any bitbake > > > command. > > > > > > Hope this fixes it. Thanks! > > > > That is the message however the cause may be different in your case. It > > sounds like the server may be taking a long time to shut down for you > > and it can't connect or start a new one whilst that is happening. > > > > If you have the bitbake-cookerdaemon log for your case I can see if > > that sheds any light if this doesn't work for you. > > (Traceback (most recent call last): > File "/home/builder/src/base/poky/bitbake/lib/bb/server/process.py", line 292, in main > self.command_channel_reply.send(self.cooker.command.runCommand(command, self)) > File "/home/builder/src/base/poky/bitbake/lib/bb/server/process.py", line 868, in send > self._send(obj) > File "/home/builder/src/base/poky/bitbake/lib/bb/server/process.py", line 847, in _send > self.writer.send_bytes(obj) > File "/usr/lib/python3.10/multiprocessing/connection.py", line 200, in send_bytes > self._send_bytes(m[offset:offset + size]) > File "/usr/lib/python3.10/multiprocessing/connection.py", line 411, in _send_bytes > self._send(header + buf) > File "/usr/lib/python3.10/multiprocessing/connection.py", line 368, in _send > n = write(self._handle, buf) > BrokenPipeError: [Errno 32] Broken pipe > ) > 4131625 12:21:32.806544 Processing Client > 4131625 12:21:32.806635 Disconnecting Client (socket: True) > 4131625 12:21:32.807999 No timeout, exiting. > 4131625 12:21:32.908254 Exiting (socket: True) > 4131625 12:21:32.982669 More than one thread left?: [<_MainThread(MainThread, started 140013699211712)>, <Thread(asyncio_0, started 140012031006272)>] > 4131625 12:21:34.433909 Exiting as we could obtain the lock > sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/home/builder/src/base/build/bitbake-cookerdaemon.log' mode='a+' encoding='UTF-8'> > > I don't know what else to look for in the log. What happened before that? The pipe error was the server trying to reply to the client but the client was gone. We should be able to see the command running before that and if it was a ping or not and the timestamps? Cheers, Richard
Hi, On Thu, Aug 31, 2023 at 01:01:30PM +0100, Richard Purdie wrote: > On Thu, 2023-08-31 at 14:30 +0300, Mikko Rapeli wrote: > > Hi, > > > > On Thu, Aug 31, 2023 at 09:56:33AM +0100, Richard Purdie wrote: > > > On Thu, 2023-08-31 at 11:51 +0300, Mikko Rapeli wrote: > > > > Hi, > > > > > > > > On Wed, Aug 30, 2023 at 10:33:03PM +0100, Richard Purdie wrote: > > > > > We noticed some users were seeing very slow ping response times which > > > > > caused 'server timeout' issues. There were some function calls in runCommand > > > > > which could be slow such as the inotify callback processing. > > > > > > > > I guess the error looks like: > > > > > > > > Timeout while waiting for a reply from the bitbake server (60s) > > > > > > > > and happens right after a succesful build to me when I try to run any bitbake > > > > command. > > > > > > > > Hope this fixes it. Thanks! > > > > > > That is the message however the cause may be different in your case. It > > > sounds like the server may be taking a long time to shut down for you > > > and it can't connect or start a new one whilst that is happening. > > > > > > If you have the bitbake-cookerdaemon log for your case I can see if > > > that sheds any light if this doesn't work for you. > > > > (Traceback (most recent call last): > > File "/home/builder/src/base/poky/bitbake/lib/bb/server/process.py", line 292, in main > > self.command_channel_reply.send(self.cooker.command.runCommand(command, self)) > > File "/home/builder/src/base/poky/bitbake/lib/bb/server/process.py", line 868, in send > > self._send(obj) > > File "/home/builder/src/base/poky/bitbake/lib/bb/server/process.py", line 847, in _send > > self.writer.send_bytes(obj) > > File "/usr/lib/python3.10/multiprocessing/connection.py", line 200, in send_bytes > > self._send_bytes(m[offset:offset + size]) > > File "/usr/lib/python3.10/multiprocessing/connection.py", line 411, in _send_bytes > > self._send(header + buf) > > File "/usr/lib/python3.10/multiprocessing/connection.py", line 368, in _send > > n = write(self._handle, buf) > > BrokenPipeError: [Errno 32] Broken pipe > > ) > > 4131625 12:21:32.806544 Processing Client > > 4131625 12:21:32.806635 Disconnecting Client (socket: True) > > 4131625 12:21:32.807999 No timeout, exiting. > > 4131625 12:21:32.908254 Exiting (socket: True) > > 4131625 12:21:32.982669 More than one thread left?: [<_MainThread(MainThread, started 140013699211712)>, <Thread(asyncio_0, started 140012031006272)>] > > 4131625 12:21:34.433909 Exiting as we could obtain the lock > > sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/home/builder/src/base/build/bitbake-cookerdaemon.log' mode='a+' encoding='UTF-8'> > > > > I don't know what else to look for in the log. > > What happened before that? > > The pipe error was the server trying to reply to the client but the > client was gone. We should be able to see the command running before > that and if it was a ping or not and the timestamps? bitbake is quite sensitive to IO on my Ubuntu 22.04 build machine with zfs. This is how I trigger this reliably: $ mv tmp foo ; rm -rf foo & bitbake image [1] 2709733 Timeout while waiting for a reply from the bitbake server (60s) I have rm_work enabled so tmp is not that big except for the deployed files. Cheers, -Mikko
On Mon, 2023-09-04 at 09:47 +0300, Mikko Rapeli wrote: > Hi, > > On Thu, Aug 31, 2023 at 01:01:30PM +0100, Richard Purdie wrote: > > On Thu, 2023-08-31 at 14:30 +0300, Mikko Rapeli wrote: > > > Hi, > > > > > > On Thu, Aug 31, 2023 at 09:56:33AM +0100, Richard Purdie wrote: > > > > On Thu, 2023-08-31 at 11:51 +0300, Mikko Rapeli wrote: > > > > > Hi, > > > > > > > > > > On Wed, Aug 30, 2023 at 10:33:03PM +0100, Richard Purdie wrote: > > > > > > We noticed some users were seeing very slow ping response times which > > > > > > caused 'server timeout' issues. There were some function calls in runCommand > > > > > > which could be slow such as the inotify callback processing. > > > > > > > > > > I guess the error looks like: > > > > > > > > > > Timeout while waiting for a reply from the bitbake server (60s) > > > > > > > > > > and happens right after a succesful build to me when I try to run any bitbake > > > > > command. > > > > > > > > > > Hope this fixes it. Thanks! > > > > > > > > That is the message however the cause may be different in your case. It > > > > sounds like the server may be taking a long time to shut down for you > > > > and it can't connect or start a new one whilst that is happening. > > > > > > > > If you have the bitbake-cookerdaemon log for your case I can see if > > > > that sheds any light if this doesn't work for you. > > > > > > (Traceback (most recent call last): > > > File "/home/builder/src/base/poky/bitbake/lib/bb/server/process.py", line 292, in main > > > self.command_channel_reply.send(self.cooker.command.runCommand(command, self)) > > > File "/home/builder/src/base/poky/bitbake/lib/bb/server/process.py", line 868, in send > > > self._send(obj) > > > File "/home/builder/src/base/poky/bitbake/lib/bb/server/process.py", line 847, in _send > > > self.writer.send_bytes(obj) > > > File "/usr/lib/python3.10/multiprocessing/connection.py", line 200, in send_bytes > > > self._send_bytes(m[offset:offset + size]) > > > File "/usr/lib/python3.10/multiprocessing/connection.py", line 411, in _send_bytes > > > self._send(header + buf) > > > File "/usr/lib/python3.10/multiprocessing/connection.py", line 368, in _send > > > n = write(self._handle, buf) > > > BrokenPipeError: [Errno 32] Broken pipe > > > ) > > > 4131625 12:21:32.806544 Processing Client > > > 4131625 12:21:32.806635 Disconnecting Client (socket: True) > > > 4131625 12:21:32.807999 No timeout, exiting. > > > 4131625 12:21:32.908254 Exiting (socket: True) > > > 4131625 12:21:32.982669 More than one thread left?: [<_MainThread(MainThread, started 140013699211712)>, <Thread(asyncio_0, started 140012031006272)>] > > > 4131625 12:21:34.433909 Exiting as we could obtain the lock > > > sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/home/builder/src/base/build/bitbake-cookerdaemon.log' mode='a+' encoding='UTF-8'> > > > > > > I don't know what else to look for in the log. > > > > What happened before that? > > > > The pipe error was the server trying to reply to the client but the > > client was gone. We should be able to see the command running before > > that and if it was a ping or not and the timestamps? > > bitbake is quite sensitive to IO on my Ubuntu 22.04 build machine > with zfs. This is how I trigger this reliably: > > $ mv tmp foo ; rm -rf foo & bitbake image > [1] 2709733 > Timeout while waiting for a reply from the bitbake server (60s) > > I have rm_work enabled so tmp is not that big except for the deployed files. That is quite a different usecase to the one being fixed by the patch. When bitbake starts up there are a few heavy IO things it needs to do such as loading a hash equivalence database which may be large, loading the parse cache and iterating through all the metadata to check it's validity. If your system has IO challenges, doing that in parallell with a large deletion may be challenging. If the server doesn't start quickly enough, you'll see the timeout. Note that it isn't a ping which is timing out but an actual command. Later when builds are running, ping is used as a lightweight way to check the server is still there if there has been no communication for a while. This patch fixes the ping command to really be lightweight so that builds don't fail on a loaded system during the build. It would be nice to find a way to improve the startup so that the timeouts are avoided too but I'm less sure how to do that. Even if the ping did work, if the commands themselves are slow/stalling, that is a different set of challenges... Cheers, Richard
diff --git a/lib/bb/command.py b/lib/bb/command.py index a355f56c60..b494f84a0a 100644 --- a/lib/bb/command.py +++ b/lib/bb/command.py @@ -65,7 +65,7 @@ class Command: command = commandline.pop(0) # Ensure cooker is ready for commands - if command != "updateConfig" and command != "setFeatures": + if command not in ["updateConfig", "setFeatures", "ping"]: try: self.cooker.init_configdata() if not self.remotedatastores: @@ -85,7 +85,8 @@ class Command: if not hasattr(command_method, 'readonly') or not getattr(command_method, 'readonly'): return None, "Not able to execute not readonly commands in readonly mode" try: - self.cooker.process_inotify_updates_apply() + if command != "ping": + self.cooker.process_inotify_updates_apply() if getattr(command_method, 'needconfig', True): self.cooker.updateCacheSync() result = command_method(self, commandline) @@ -169,6 +170,8 @@ class CommandsSync: Allow a UI to check the server is still alive """ return "Still alive!" + ping.needconfig = False + ping.readonly = True def stateShutdown(self, command, params): """
We noticed some users were seeing very slow ping response times which caused 'server timeout' issues. There were some function calls in runCommand which could be slow such as the inotify callback processing. Mark up the ping command such that it doesn't need configuration information, it is allowed on a readonly server and specifically skip the inofity processing too since ping would never need that. This will hopefully resolve various ping timeout issues that were being reported. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org> --- lib/bb/command.py | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)