Message ID | 20231129124501.86503-1-alex.bennee@linaro.org |
---|---|
State | Under Review |
Headers | show |
Series | [RFC] qemurunner.py: ensure we drain stdout after boot prompt | expand |
On Wed Nov 29, 2023 at 1:45 PM CET, Alex Bennée wrote: > If qemurunner doesn't continuously drain stdout we will eventually > cause QEMU to block while trying to write to the pipe. This can > manifest itself if the guest has for example configured its serial > ports to output via stdio even if the test itself is using a TCP > console or SSH to run things. > > This doesn't address a potential overflow of stderr although generally > stderr from QEMU will be a lot less likely to block due to the volume > of data. > > Suggested-by: Erik Schilling <erik.schilling@linaro.org> > Signed-off-by: Alex Bennée <alex.bennee@linaro.org> > Cc: Mikko Rapeli <mikko.rapeli@linaro.org> > > --- > AJB: > As a QEMU developer I should note that we've had to solve a lot of > similar problems within our own internal testing (e.g. > https://gitlab.com/qemu-project/qemu/-/blob/master/python/qemu/machine/console_socket.py?ref_type=heads). > Perhaps in the longer term it might make sense to consider using > QEMU's own python tooling for configuring and launching QEMU? > --- > meta/lib/oeqa/utils/qemurunner.py | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/meta/lib/oeqa/utils/qemurunner.py b/meta/lib/oeqa/utils/qemurunner.py > index 29fe271976..1ec472c49e 100644 > --- a/meta/lib/oeqa/utils/qemurunner.py > +++ b/meta/lib/oeqa/utils/qemurunner.py > @@ -243,6 +243,7 @@ class QemuRunner: > # to be a proper fix but this will suffice for now. > self.runqemu = subprocess.Popen(launch_cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, stdin=subprocess.PIPE, preexec_fn=os.setpgrp, env=env, cwd=self.tmpdir) > output = self.runqemu.stdout > + output_drain = output > launch_time = time.time() > > # > @@ -539,6 +540,17 @@ class QemuRunner: > self.logger.warning("The output:\n%s" % output) > except: > self.logger.warning("Serial console failed while trying to login") > + > + def drain_log(): > + while not output_drain.closed: > + more_output = self.getOutput(output_drain) > + if len(more_output) > 0: > + self.logger.debug("Logs since boot: %s", more_output) > + time.sleep(0.1) > + > + t = threading.Thread(target=drain_log) > + t.start() > + > return True > > def stop(self): This is of course just a hack to demonstrate this was the problem. A better solution would probably be to collect the logs through the existing supervision process that gets forked off... That then also solves the problem for the earlier code (and would transition nicely to also drain stderr). - Erik
On Wed, 2023-11-29 at 15:11 +0100, Erik Schilling wrote: > On Wed Nov 29, 2023 at 1:45 PM CET, Alex Bennée wrote: > > If qemurunner doesn't continuously drain stdout we will eventually > > cause QEMU to block while trying to write to the pipe. This can > > manifest itself if the guest has for example configured its serial > > ports to output via stdio even if the test itself is using a TCP > > console or SSH to run things. > > > > This doesn't address a potential overflow of stderr although generally > > stderr from QEMU will be a lot less likely to block due to the volume > > of data. > > > > Suggested-by: Erik Schilling <erik.schilling@linaro.org> > > Signed-off-by: Alex Bennée <alex.bennee@linaro.org> > > Cc: Mikko Rapeli <mikko.rapeli@linaro.org> > > > > --- > > AJB: > > As a QEMU developer I should note that we've had to solve a lot of > > similar problems within our own internal testing (e.g. > > https://gitlab.com/qemu-project/qemu/-/blob/master/python/qemu/machine/console_socket.py?ref_type=heads). > > Perhaps in the longer term it might make sense to consider using > > QEMU's own python tooling for configuring and launching QEMU? > > --- > > meta/lib/oeqa/utils/qemurunner.py | 12 ++++++++++++ > > 1 file changed, 12 insertions(+) > > > > diff --git a/meta/lib/oeqa/utils/qemurunner.py b/meta/lib/oeqa/utils/qemurunner.py > > index 29fe271976..1ec472c49e 100644 > > --- a/meta/lib/oeqa/utils/qemurunner.py > > +++ b/meta/lib/oeqa/utils/qemurunner.py > > @@ -243,6 +243,7 @@ class QemuRunner: > > # to be a proper fix but this will suffice for now. > > self.runqemu = subprocess.Popen(launch_cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, stdin=subprocess.PIPE, preexec_fn=os.setpgrp, env=env, cwd=self.tmpdir) > > output = self.runqemu.stdout > > + output_drain = output > > launch_time = time.time() > > > > # > > @@ -539,6 +540,17 @@ class QemuRunner: > > self.logger.warning("The output:\n%s" % output) > > except: > > self.logger.warning("Serial console failed while trying to login") > > + > > + def drain_log(): > > + while not output_drain.closed: > > + more_output = self.getOutput(output_drain) > > + if len(more_output) > 0: > > + self.logger.debug("Logs since boot: %s", more_output) > > + time.sleep(0.1) > > + > > + t = threading.Thread(target=drain_log) > > + t.start() > > + > > return True > > > > def stop(self): > > This is of course just a hack to demonstrate this was the problem. A > better solution would probably be to collect the logs through the > existing supervision process that gets forked off... That then also > solves the problem for the earlier code (and would transition nicely to > also drain stderr). I was wondering about that, this would ideally be handled by that existing log processing thread. Would you be willing to work out a patch to do that? Cheers, Richard
Richard Purdie <richard.purdie@linuxfoundation.org> writes: > On Wed, 2023-11-29 at 15:11 +0100, Erik Schilling wrote: >> On Wed Nov 29, 2023 at 1:45 PM CET, Alex Bennée wrote: >> > If qemurunner doesn't continuously drain stdout we will eventually >> > cause QEMU to block while trying to write to the pipe. This can >> > manifest itself if the guest has for example configured its serial >> > ports to output via stdio even if the test itself is using a TCP >> > console or SSH to run things. >> > >> > This doesn't address a potential overflow of stderr although generally >> > stderr from QEMU will be a lot less likely to block due to the volume >> > of data. >> > >> > Suggested-by: Erik Schilling <erik.schilling@linaro.org> >> > Signed-off-by: Alex Bennée <alex.bennee@linaro.org> >> > Cc: Mikko Rapeli <mikko.rapeli@linaro.org> >> > >> > --- >> > AJB: >> > As a QEMU developer I should note that we've had to solve a lot of >> > similar problems within our own internal testing (e.g. >> > https://gitlab.com/qemu-project/qemu/-/blob/master/python/qemu/machine/console_socket.py?ref_type=heads). >> > Perhaps in the longer term it might make sense to consider using >> > QEMU's own python tooling for configuring and launching QEMU? >> > --- >> > meta/lib/oeqa/utils/qemurunner.py | 12 ++++++++++++ >> > 1 file changed, 12 insertions(+) >> > >> > diff --git a/meta/lib/oeqa/utils/qemurunner.py b/meta/lib/oeqa/utils/qemurunner.py >> > index 29fe271976..1ec472c49e 100644 >> > --- a/meta/lib/oeqa/utils/qemurunner.py >> > +++ b/meta/lib/oeqa/utils/qemurunner.py >> > @@ -243,6 +243,7 @@ class QemuRunner: >> > # to be a proper fix but this will suffice for now. >> > self.runqemu = subprocess.Popen(launch_cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, stdin=subprocess.PIPE, preexec_fn=os.setpgrp, env=env, cwd=self.tmpdir) >> > output = self.runqemu.stdout >> > + output_drain = output >> > launch_time = time.time() >> > >> > # >> > @@ -539,6 +540,17 @@ class QemuRunner: >> > self.logger.warning("The output:\n%s" % output) >> > except: >> > self.logger.warning("Serial console failed while trying to login") >> > + >> > + def drain_log(): >> > + while not output_drain.closed: >> > + more_output = self.getOutput(output_drain) >> > + if len(more_output) > 0: >> > + self.logger.debug("Logs since boot: %s", more_output) >> > + time.sleep(0.1) >> > + >> > + t = threading.Thread(target=drain_log) >> > + t.start() >> > + >> > return True >> > >> > def stop(self): >> >> This is of course just a hack to demonstrate this was the problem. A >> better solution would probably be to collect the logs through the >> existing supervision process that gets forked off... That then also >> solves the problem for the earlier code (and would transition nicely to >> also drain stderr). > > I was wondering about that, this would ideally be handled by that > existing log processing thread. Would you be willing to work out a > patch to do that? Sure - but I'm a little unclear about how things are meant to work if you can offer any pointers. I assume we can just allow the existing logger to continue once we reach the login point?
On Wed, 2023-11-29 at 15:56 +0000, Alex Bennée wrote: > Richard Purdie <richard.purdie@linuxfoundation.org> writes: > > > On Wed, 2023-11-29 at 15:11 +0100, Erik Schilling wrote: > > > On Wed Nov 29, 2023 at 1:45 PM CET, Alex Bennée wrote: > > > > If qemurunner doesn't continuously drain stdout we will eventually > > > > cause QEMU to block while trying to write to the pipe. This can > > > > manifest itself if the guest has for example configured its serial > > > > ports to output via stdio even if the test itself is using a TCP > > > > console or SSH to run things. > > > > > > > > This doesn't address a potential overflow of stderr although generally > > > > stderr from QEMU will be a lot less likely to block due to the volume > > > > of data. > > > > > > > > Suggested-by: Erik Schilling <erik.schilling@linaro.org> > > > > Signed-off-by: Alex Bennée <alex.bennee@linaro.org> > > > > Cc: Mikko Rapeli <mikko.rapeli@linaro.org> > > > > > > > > --- > > > > AJB: > > > > As a QEMU developer I should note that we've had to solve a lot of > > > > similar problems within our own internal testing (e.g. > > > > https://gitlab.com/qemu-project/qemu/-/blob/master/python/qemu/machine/console_socket.py?ref_type=heads). > > > > Perhaps in the longer term it might make sense to consider using > > > > QEMU's own python tooling for configuring and launching QEMU? > > > > --- > > > > meta/lib/oeqa/utils/qemurunner.py | 12 ++++++++++++ > > > > 1 file changed, 12 insertions(+) > > > > > > > > diff --git a/meta/lib/oeqa/utils/qemurunner.py b/meta/lib/oeqa/utils/qemurunner.py > > > > index 29fe271976..1ec472c49e 100644 > > > > --- a/meta/lib/oeqa/utils/qemurunner.py > > > > +++ b/meta/lib/oeqa/utils/qemurunner.py > > > > @@ -243,6 +243,7 @@ class QemuRunner: > > > > # to be a proper fix but this will suffice for now. > > > > self.runqemu = subprocess.Popen(launch_cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, stdin=subprocess.PIPE, preexec_fn=os.setpgrp, env=env, cwd=self.tmpdir) > > > > output = self.runqemu.stdout > > > > + output_drain = output > > > > launch_time = time.time() > > > > > > > > # > > > > @@ -539,6 +540,17 @@ class QemuRunner: > > > > self.logger.warning("The output:\n%s" % output) > > > > except: > > > > self.logger.warning("Serial console failed while trying to login") > > > > + > > > > + def drain_log(): > > > > + while not output_drain.closed: > > > > + more_output = self.getOutput(output_drain) > > > > + if len(more_output) > 0: > > > > + self.logger.debug("Logs since boot: %s", more_output) > > > > + time.sleep(0.1) > > > > + > > > > + t = threading.Thread(target=drain_log) > > > > + t.start() > > > > + > > > > return True > > > > > > > > def stop(self): > > > > > > This is of course just a hack to demonstrate this was the problem. A > > > better solution would probably be to collect the logs through the > > > existing supervision process that gets forked off... That then also > > > solves the problem for the earlier code (and would transition nicely to > > > also drain stderr). > > > > I was wondering about that, this would ideally be handled by that > > existing log processing thread. Would you be willing to work out a > > patch to do that? > > Sure - but I'm a little unclear about how things are meant to work if > you can offer any pointers. I assume we can just allow the existing > logger to continue once we reach the login point? I was a little unsure about that when I recently worked on that code. I think you're right, we should keep the data flowing to the log files after login. Cheers, Richard
Richard Purdie <richard.purdie@linuxfoundation.org> writes: > On Wed, 2023-11-29 at 15:56 +0000, Alex Bennée wrote: >> Richard Purdie <richard.purdie@linuxfoundation.org> writes: >> >> > On Wed, 2023-11-29 at 15:11 +0100, Erik Schilling wrote: >> > > On Wed Nov 29, 2023 at 1:45 PM CET, Alex Bennée wrote: >> > > > If qemurunner doesn't continuously drain stdout we will eventually >> > > > cause QEMU to block while trying to write to the pipe. This can >> > > > manifest itself if the guest has for example configured its serial >> > > > ports to output via stdio even if the test itself is using a TCP >> > > > console or SSH to run things. >> > > > >> > > > This doesn't address a potential overflow of stderr although generally >> > > > stderr from QEMU will be a lot less likely to block due to the volume >> > > > of data. >> > > > >> > > > Suggested-by: Erik Schilling <erik.schilling@linaro.org> >> > > > Signed-off-by: Alex Bennée <alex.bennee@linaro.org> >> > > > Cc: Mikko Rapeli <mikko.rapeli@linaro.org> >> > > > >> > > > --- >> > > > AJB: >> > > > As a QEMU developer I should note that we've had to solve a lot of >> > > > similar problems within our own internal testing (e.g. >> > > > https://gitlab.com/qemu-project/qemu/-/blob/master/python/qemu/machine/console_socket.py?ref_type=heads). >> > > > Perhaps in the longer term it might make sense to consider using >> > > > QEMU's own python tooling for configuring and launching QEMU? >> > > > --- >> > > > meta/lib/oeqa/utils/qemurunner.py | 12 ++++++++++++ >> > > > 1 file changed, 12 insertions(+) >> > > > >> > > > diff --git a/meta/lib/oeqa/utils/qemurunner.py b/meta/lib/oeqa/utils/qemurunner.py >> > > > index 29fe271976..1ec472c49e 100644 >> > > > --- a/meta/lib/oeqa/utils/qemurunner.py >> > > > +++ b/meta/lib/oeqa/utils/qemurunner.py >> > > > @@ -243,6 +243,7 @@ class QemuRunner: >> > > > # to be a proper fix but this will suffice for now. >> > > > self.runqemu = subprocess.Popen(launch_cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, stdin=subprocess.PIPE, preexec_fn=os.setpgrp, env=env, cwd=self.tmpdir) >> > > > output = self.runqemu.stdout >> > > > + output_drain = output >> > > > launch_time = time.time() >> > > > >> > > > # >> > > > @@ -539,6 +540,17 @@ class QemuRunner: >> > > > self.logger.warning("The output:\n%s" % output) >> > > > except: >> > > > self.logger.warning("Serial console failed while trying to login") >> > > > + >> > > > + def drain_log(): >> > > > + while not output_drain.closed: >> > > > + more_output = self.getOutput(output_drain) >> > > > + if len(more_output) > 0: >> > > > + self.logger.debug("Logs since boot: %s", more_output) >> > > > + time.sleep(0.1) >> > > > + >> > > > + t = threading.Thread(target=drain_log) >> > > > + t.start() >> > > > + >> > > > return True >> > > > >> > > > def stop(self): >> > > >> > > This is of course just a hack to demonstrate this was the problem. A >> > > better solution would probably be to collect the logs through the >> > > existing supervision process that gets forked off... That then also >> > > solves the problem for the earlier code (and would transition nicely to >> > > also drain stderr). >> > >> > I was wondering about that, this would ideally be handled by that >> > existing log processing thread. Would you be willing to work out a >> > patch to do that? >> >> Sure - but I'm a little unclear about how things are meant to work if >> you can offer any pointers. I assume we can just allow the existing >> logger to continue once we reach the login point? > > I was a little unsure about that when I recently worked on that code. I > think you're right, we should keep the data flowing to the log files > after login. I'm having difficulty following the logic. I can see the main loop: while time.time() < endtime and not stopread: try: sread, swrite, serror = select.select(filelist, [], [], 5) except InterruptedError: continue but I guess we have to exit that for the tests to start running. Can this be punted to the logging process? We'd need some sort of signal from the logging bit that we'd reached the login point. What does python use for that? mutex? > > Cheers, > > Richard
On Thu, 2023-11-30 at 11:07 +0000, Alex Bennée wrote: > Richard Purdie <richard.purdie@linuxfoundation.org> writes: > > > On Wed, 2023-11-29 at 15:56 +0000, Alex Bennée wrote: > > > Richard Purdie <richard.purdie@linuxfoundation.org> writes: > > > > > > > On Wed, 2023-11-29 at 15:11 +0100, Erik Schilling wrote: > > > > > On Wed Nov 29, 2023 at 1:45 PM CET, Alex Bennée wrote: > > > > > > If qemurunner doesn't continuously drain stdout we will eventually > > > > > > cause QEMU to block while trying to write to the pipe. This can > > > > > > manifest itself if the guest has for example configured its serial > > > > > > ports to output via stdio even if the test itself is using a TCP > > > > > > console or SSH to run things. > > > > > > > > > > > > This doesn't address a potential overflow of stderr although generally > > > > > > stderr from QEMU will be a lot less likely to block due to the volume > > > > > > of data. > > > > > > > > > > > > Suggested-by: Erik Schilling <erik.schilling@linaro.org> > > > > > > Signed-off-by: Alex Bennée <alex.bennee@linaro.org> > > > > > > Cc: Mikko Rapeli <mikko.rapeli@linaro.org> > > > > > > > > > > > > --- > > > > > > AJB: > > > > > > As a QEMU developer I should note that we've had to solve a lot of > > > > > > similar problems within our own internal testing (e.g. > > > > > > https://gitlab.com/qemu-project/qemu/-/blob/master/python/qemu/machine/console_socket.py?ref_type=heads). > > > > > > Perhaps in the longer term it might make sense to consider using > > > > > > QEMU's own python tooling for configuring and launching QEMU? > > > > > > --- > > > > > > meta/lib/oeqa/utils/qemurunner.py | 12 ++++++++++++ > > > > > > 1 file changed, 12 insertions(+) > > > > > > > > > > > > diff --git a/meta/lib/oeqa/utils/qemurunner.py b/meta/lib/oeqa/utils/qemurunner.py > > > > > > index 29fe271976..1ec472c49e 100644 > > > > > > --- a/meta/lib/oeqa/utils/qemurunner.py > > > > > > +++ b/meta/lib/oeqa/utils/qemurunner.py > > > > > > @@ -243,6 +243,7 @@ class QemuRunner: > > > > > > # to be a proper fix but this will suffice for now. > > > > > > self.runqemu = subprocess.Popen(launch_cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, stdin=subprocess.PIPE, preexec_fn=os.setpgrp, env=env, cwd=self.tmpdir) > > > > > > output = self.runqemu.stdout > > > > > > + output_drain = output > > > > > > launch_time = time.time() > > > > > > > > > > > > # > > > > > > @@ -539,6 +540,17 @@ class QemuRunner: > > > > > > self.logger.warning("The output:\n%s" % output) > > > > > > except: > > > > > > self.logger.warning("Serial console failed while trying to login") > > > > > > + > > > > > > + def drain_log(): > > > > > > + while not output_drain.closed: > > > > > > + more_output = self.getOutput(output_drain) > > > > > > + if len(more_output) > 0: > > > > > > + self.logger.debug("Logs since boot: %s", more_output) > > > > > > + time.sleep(0.1) > > > > > > + > > > > > > + t = threading.Thread(target=drain_log) > > > > > > + t.start() > > > > > > + > > > > > > return True > > > > > > > > > > > > def stop(self): > > > > > > > > > > This is of course just a hack to demonstrate this was the problem. A > > > > > better solution would probably be to collect the logs through the > > > > > existing supervision process that gets forked off... That then also > > > > > solves the problem for the earlier code (and would transition nicely to > > > > > also drain stderr). > > > > > > > > I was wondering about that, this would ideally be handled by that > > > > existing log processing thread. Would you be willing to work out a > > > > patch to do that? > > > > > > Sure - but I'm a little unclear about how things are meant to work if > > > you can offer any pointers. I assume we can just allow the existing > > > logger to continue once we reach the login point? > > > > I was a little unsure about that when I recently worked on that code. I > > think you're right, we should keep the data flowing to the log files > > after login. > > I'm having difficulty following the logic. I can see the main loop: > > while time.time() < endtime and not stopread: > try: > sread, swrite, serror = select.select(filelist, [], [], 5) > except InterruptedError: > continue > > but I guess we have to exit that for the tests to start running. Correct. > Can this be punted to the logging process? Probably not since we want things booted before running the tests which is partly why the code does this. > We'd need some sort of signal > from the logging bit that we'd reached the login point. What does python > use for that? mutex? A threading.Semaphore() might work for this use case? Created and held before the logging thread starts and the logging thread can non- blocking acquire it and take over. https://docs.python.org/3/library/threading.html#semaphore-objects Cheers, Richard
diff --git a/meta/lib/oeqa/utils/qemurunner.py b/meta/lib/oeqa/utils/qemurunner.py index 29fe271976..1ec472c49e 100644 --- a/meta/lib/oeqa/utils/qemurunner.py +++ b/meta/lib/oeqa/utils/qemurunner.py @@ -243,6 +243,7 @@ class QemuRunner: # to be a proper fix but this will suffice for now. self.runqemu = subprocess.Popen(launch_cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, stdin=subprocess.PIPE, preexec_fn=os.setpgrp, env=env, cwd=self.tmpdir) output = self.runqemu.stdout + output_drain = output launch_time = time.time() # @@ -539,6 +540,17 @@ class QemuRunner: self.logger.warning("The output:\n%s" % output) except: self.logger.warning("Serial console failed while trying to login") + + def drain_log(): + while not output_drain.closed: + more_output = self.getOutput(output_drain) + if len(more_output) > 0: + self.logger.debug("Logs since boot: %s", more_output) + time.sleep(0.1) + + t = threading.Thread(target=drain_log) + t.start() + return True def stop(self):
If qemurunner doesn't continuously drain stdout we will eventually cause QEMU to block while trying to write to the pipe. This can manifest itself if the guest has for example configured its serial ports to output via stdio even if the test itself is using a TCP console or SSH to run things. This doesn't address a potential overflow of stderr although generally stderr from QEMU will be a lot less likely to block due to the volume of data. Suggested-by: Erik Schilling <erik.schilling@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Cc: Mikko Rapeli <mikko.rapeli@linaro.org> --- AJB: As a QEMU developer I should note that we've had to solve a lot of similar problems within our own internal testing (e.g. https://gitlab.com/qemu-project/qemu/-/blob/master/python/qemu/machine/console_socket.py?ref_type=heads). Perhaps in the longer term it might make sense to consider using QEMU's own python tooling for configuring and launching QEMU? --- meta/lib/oeqa/utils/qemurunner.py | 12 ++++++++++++ 1 file changed, 12 insertions(+)