diff mbox series

[1/2] server/process: catch and expand multiprocessing connection exceptions

Message ID 20231228210118.9273-1-mark.asselstine@windriver.com
State Accepted, archived
Commit 5ff62b802f79acc86bbd6a99484f08501ff5dc2d
Headers show
Series [1/2] server/process: catch and expand multiprocessing connection exceptions | expand

Commit Message

Mark Asselstine Dec. 28, 2023, 9:01 p.m. UTC
Doing builds on systems with limited resources, or with high demand
package builds such as chromium it isn't uncommon for the OOM Killer
to be triggered and for bitbake-server to be selected as the process
to be killed. When the bitbake-server does terminate unexpectedly due
to the OOM Killer or otherwise, this currently results in a generic
python traceback with little indication as to what has failed.

Here we trap and raise the exceptions while extending the exception
text in runCommand() to make it clear that this is most likely caused
by the bitbake-server unexpectedly terminating.

Callers of runCommand() should be updated to properly handle the
BrokenPipeError and EOFError exceptions to avoid printing a python
traceback, but even if they don't, the added text in the exceptions
should provide some hints as to what might have caused the failure.

Signed-off-by: Mark Asselstine <mark.asselstine@windriver.com>
---
 lib/bb/server/process.py | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)
diff mbox series

Patch

diff --git a/lib/bb/server/process.py b/lib/bb/server/process.py
index d495ac62..6d77ce47 100644
--- a/lib/bb/server/process.py
+++ b/lib/bb/server/process.py
@@ -500,12 +500,18 @@  class ServerCommunicator():
         self.recv = recv
 
     def runCommand(self, command):
-        self.connection.send(command)
+        try:
+            self.connection.send(command)
+        except BrokenPipeError as e:
+            raise BrokenPipeError("bitbake-server might have died or been forcibly stopped, ie. OOM killed") from e
         if not self.recv.poll(30):
             logger.info("No reply from server in 30s (for command %s at %s)" % (command[0], currenttime()))
             if not self.recv.poll(30):
                 raise ProcessTimeout("Timeout while waiting for a reply from the bitbake server (60s at %s)" % currenttime())
-        ret, exc = self.recv.get()
+        try:
+            ret, exc = self.recv.get()
+        except EOFError as e:
+            raise EOFError("bitbake-server might have died or been forcibly stopped, ie. OOM killed") from e
         # Should probably turn all exceptions in exc back into exceptions?
         # For now, at least handle BBHandledException
         if exc and ("BBHandledException" in exc or "SystemExit" in exc):