Message ID | 20231004123823.2990916-5-richard.purdie@linuxfoundation.org |
---|---|
State | Accepted, archived |
Commit | 605938b3bb19dcf6c7218648b4d16df9eaa675fc |
Headers | show |
Series | [1/5] wic: Add console parameters to qemux86 canned-wks | expand |
Hi,
Acked-by: Mikko Rapeli <mikko.rapeli@linaro.org>
for the full series. I hope our discussion over #yocto irc channel
helped resolve these and the test failures seen with master-next.
Cheers,
-Mikko
On Wed, 2023-10-04 at 16:07 +0300, Mikko Rapeli wrote: > Hi, > > Acked-by: Mikko Rapeli <mikko.rapeli@linaro.org> > > for the full series. I hope our discussion over #yocto irc channel > helped resolve these and the test failures seen with master-next. Yes, thanks! It was useful to have someone to talk the issues through with and much appreciated. Sadly I don't think we've found all the issues yet. On the build N-1 we have a selftest failure on debian12-ty-1: https://autobuilder.yoctoproject.org/typhoon/#/builders/80/builds/5803/steps/14/logs/stdio with these logs saved: https://autobuilder.yocto.io/pub/failed-builds-data/6.5%20kernel/d/qemurunner_log.20231004022851 https://autobuilder.yocto.io/pub/failed-builds-data/6.5%20kernel/d/qemu_boot_log.20231004022851 which suggest that ttyS1 never got any output but a getty did appear on ttyS0. I confirmed there were only two serial devices in that configuration. The exact image downloaded off that worker and run locally worked so it isn't the image, it has to be some kind of race. The image does have gettys configured for both serial ports. The kernel boot log isn't present which a patch in this series attempts to fix. There was also: https://autobuilder.yoctoproject.org/typhoon/#/builders/82/builds/5528/steps/13/logs/stdio which looks again like a getty didn't show up on a serial port. We don't have more logs for that one but "Running postinst /etc/rpm- postinsts/100-sysvinit-inittab" worries Ross and I a bit. It may or may not be related. That brings me to: https://autobuilder.yoctoproject.org/typhoon/#/builders/127/builds/2199 which has two failures, I think both in commands run over the serial interface. These ones appear to be more reproducible on ARM. I'm trying a new build run with this series applied to see if that helps shed any light on what is going on. Cheers, Richard
Hi, On Wed, Oct 04, 2023 at 02:29:40PM +0100, Richard Purdie wrote: > On Wed, 2023-10-04 at 16:07 +0300, Mikko Rapeli wrote: > > Hi, > > > > Acked-by: Mikko Rapeli <mikko.rapeli@linaro.org> > > > > for the full series. I hope our discussion over #yocto irc channel > > helped resolve these and the test failures seen with master-next. > > Yes, thanks! It was useful to have someone to talk the issues through > with and much appreciated. > > Sadly I don't think we've found all the issues yet. > > On the build N-1 we have a selftest failure on debian12-ty-1: > > https://autobuilder.yoctoproject.org/typhoon/#/builders/80/builds/5803/steps/14/logs/stdio Can I get from this test run stdio to the more detailed task, qemurunner or qemu boot logs somehow? Or do you have to dig into the system manually to capture them, like in the example below? I would like to check into the detailed logs from both passing and failing test runs. Of course kernel dmesg would need to be there so maybe after your patch to the runqemu side at least. If there is a regression in kernel to the serial device detection and numbering, then we need some way to see e.g. 8250 uart/tty0/ttyAMA0 related dmesg messages. Cheers, -Mikko > with these logs saved: > > https://autobuilder.yocto.io/pub/failed-builds-data/6.5%20kernel/d/qemurunner_log.20231004022851 > https://autobuilder.yocto.io/pub/failed-builds-data/6.5%20kernel/d/qemu_boot_log.20231004022851 > > which suggest that ttyS1 never got any output but a getty did appear on > ttyS0. I confirmed there were only two serial devices in that > configuration. The exact image downloaded off that worker and run > locally worked so it isn't the image, it has to be some kind of race. > The image does have gettys configured for both serial ports. The kernel > boot log isn't present which a patch in this series attempts to fix. > > There was also: > > https://autobuilder.yoctoproject.org/typhoon/#/builders/82/builds/5528/steps/13/logs/stdio > > which looks again like a getty didn't show up on a serial port. We > don't have more logs for that one but "Running postinst /etc/rpm- > postinsts/100-sysvinit-inittab" worries Ross and I a bit. It may or may > not be related. > > That brings me to: > > https://autobuilder.yoctoproject.org/typhoon/#/builders/127/builds/2199 > > which has two failures, I think both in commands run over the serial > interface. These ones appear to be more reproducible on ARM. > > I'm trying a new build run with this series applied to see if that > helps shed any light on what is going on. > > Cheers, > > Richard > > > > > > > > > > > >
On Wed, 2023-10-04 at 22:34 +0300, Mikko Rapeli wrote: > Hi, > > On Wed, Oct 04, 2023 at 02:29:40PM +0100, Richard Purdie wrote: > > On Wed, 2023-10-04 at 16:07 +0300, Mikko Rapeli wrote: > > > Hi, > > > > > > Acked-by: Mikko Rapeli <mikko.rapeli@linaro.org> > > > > > > for the full series. I hope our discussion over #yocto irc channel > > > helped resolve these and the test failures seen with master-next. > > > > Yes, thanks! It was useful to have someone to talk the issues through > > with and much appreciated. > > > > Sadly I don't think we've found all the issues yet. > > > > On the build N-1 we have a selftest failure on debian12-ty-1: > > > > https://autobuilder.yoctoproject.org/typhoon/#/builders/80/builds/5803/steps/14/logs/stdio > > Can I get from this test run stdio to the more detailed task, qemurunner or qemu boot logs somehow? > Or do you have to dig into the system manually to capture them, like in the example below? > > I would like to check into the detailed logs from both passing and failing test runs. > Of course kernel dmesg would need to be there so maybe after your patch to the runqemu > side at least. > > If there is a regression in kernel to the serial device detection and numbering, > then we need some way to see e.g. 8250 uart/tty0/ttyAMA0 related dmesg messages. I can't capture everything in time in each case but I did grab: https://autobuilder.yocto.io/pub/failed-builds-data/6.5%20kernel/e/ which is from https://autobuilder.yoctoproject.org/typhoon/#/builders/76/builds/7863/steps/21/logs/stdio which has this threads patches applied. I've put two sets of logs there from the same builds, one success, one is missing the getty on one log. I also have the logs from: https://autobuilder.yoctoproject.org/typhoon/#/builders/86/builds/5848/steps/14/logs/stdio which I've saved here: https://autobuilder.yocto.io/pub/failed-builds-data/6.5%20kernel/f/ noting that: https://autobuilder.yocto.io/pub/failed-builds-data/6.5%20kernel/f/qemu_boot_log.20231004134427 is 3 bytes in size. Also interestingly, https://autobuilder.yocto.io/pub/failed-builds-data/6.5%20kernel/f/qemu_boot_log.20231004140326 contains: "bootlogd: writing to console: Input/output error" which appears to come from sysvinit: sysvinit/3.04/sysvinit-3.04/src/bootlogd.c: /* * We got a write error on the real console. If its an EIO, * somebody hung up our filedescriptor, so try to re-open it. */ int write_err(int pts, int realfd, char *realcons, int e) { int fd; if (e != EIO) { werr: close(pts); fprintf(stderr, "bootlogd: writing to console: %s\n", strerror(e)); return -1; } close(realfd); if ((fd = open_nb(realcons)) < 0) goto werr; return fd; } and the werr goto must be triggering. I think we've seen these for a while but it is interesting something is closing console fds. Cheers, Richard
Hi, On Wed, Oct 04, 2023 at 09:35:20PM +0100, Richard Purdie wrote: > On Wed, 2023-10-04 at 22:34 +0300, Mikko Rapeli wrote: > > Hi, > > > > On Wed, Oct 04, 2023 at 02:29:40PM +0100, Richard Purdie wrote: > > > On Wed, 2023-10-04 at 16:07 +0300, Mikko Rapeli wrote: > > > > Hi, > > > > > > > > Acked-by: Mikko Rapeli <mikko.rapeli@linaro.org> > > > > > > > > for the full series. I hope our discussion over #yocto irc channel > > > > helped resolve these and the test failures seen with master-next. > > > > > > Yes, thanks! It was useful to have someone to talk the issues through > > > with and much appreciated. > > > > > > Sadly I don't think we've found all the issues yet. > > > > > > On the build N-1 we have a selftest failure on debian12-ty-1: > > > > > > https://autobuilder.yoctoproject.org/typhoon/#/builders/80/builds/5803/steps/14/logs/stdio > > > > Can I get from this test run stdio to the more detailed task, qemurunner or qemu boot logs somehow? > > Or do you have to dig into the system manually to capture them, like in the example below? > > > > I would like to check into the detailed logs from both passing and failing test runs. > > Of course kernel dmesg would need to be there so maybe after your patch to the runqemu > > side at least. > > > > If there is a regression in kernel to the serial device detection and numbering, > > then we need some way to see e.g. 8250 uart/tty0/ttyAMA0 related dmesg messages. > > I can't capture everything in time in each case but I did grab: > > https://autobuilder.yocto.io/pub/failed-builds-data/6.5%20kernel/e/ > > which is from > > https://autobuilder.yoctoproject.org/typhoon/#/builders/76/builds/7863/steps/21/logs/stdio > > which has this threads patches applied. > > I've put two sets of logs there from the same builds, one success, one > is missing the getty on one log. > > > I also have the logs from: > > https://autobuilder.yoctoproject.org/typhoon/#/builders/86/builds/5848/steps/14/logs/stdio > > which I've saved here: > > https://autobuilder.yocto.io/pub/failed-builds-data/6.5%20kernel/f/ > > noting that:� > > https://autobuilder.yocto.io/pub/failed-builds-data/6.5%20kernel/f/qemu_boot_log.20231004134427 > > is 3 bytes in size. All this is very confusing indeed. I sent another version of the detect login prompt without utf-8 conversion, with some luck that could help. At least on some of the logs the login prompt is there but with lots of garbage around it. Serial port core got refactored in 6.5 and diff to 6.4 is quite large so there could be bugs there, and those could be triggering new errors visible to userspace too. Cheers, -Mikko
On Thu, 2023-10-05 at 18:27 +0300, Mikko Rapeli wrote: > Hi, > > On Wed, Oct 04, 2023 at 09:35:20PM +0100, Richard Purdie wrote: > > On Wed, 2023-10-04 at 22:34 +0300, Mikko Rapeli wrote: > > > Hi, > > > > > > On Wed, Oct 04, 2023 at 02:29:40PM +0100, Richard Purdie wrote: > > > > On Wed, 2023-10-04 at 16:07 +0300, Mikko Rapeli wrote: > > > > > Hi, > > > > > > > > > > Acked-by: Mikko Rapeli <mikko.rapeli@linaro.org> > > > > > > > > > > for the full series. I hope our discussion over #yocto irc channel > > > > > helped resolve these and the test failures seen with master-next. > > > > > > > > Yes, thanks! It was useful to have someone to talk the issues through > > > > with and much appreciated. > > > > > > > > Sadly I don't think we've found all the issues yet. > > > > > > > > On the build N-1 we have a selftest failure on debian12-ty-1: > > > > > > > > https://autobuilder.yoctoproject.org/typhoon/#/builders/80/builds/5803/steps/14/logs/stdio > > > > > > Can I get from this test run stdio to the more detailed task, qemurunner or qemu boot logs somehow? > > > Or do you have to dig into the system manually to capture them, like in the example below? > > > > > > I would like to check into the detailed logs from both passing and failing test runs. > > > Of course kernel dmesg would need to be there so maybe after your patch to the runqemu > > > side at least. > > > > > > If there is a regression in kernel to the serial device detection and numbering, > > > then we need some way to see e.g. 8250 uart/tty0/ttyAMA0 related dmesg messages. > > > > I can't capture everything in time in each case but I did grab: > > > > https://autobuilder.yocto.io/pub/failed-builds-data/6.5%20kernel/e/ > > > > which is from > > > > https://autobuilder.yoctoproject.org/typhoon/#/builders/76/builds/7863/steps/21/logs/stdio > > > > which has this threads patches applied. > > > > I've put two sets of logs there from the same builds, one success, one > > is missing the getty on one log. > > > > > > I also have the logs from: > > > > https://autobuilder.yoctoproject.org/typhoon/#/builders/86/builds/5848/steps/14/logs/stdio > > > > which I've saved here: > > > > https://autobuilder.yocto.io/pub/failed-builds-data/6.5%20kernel/f/ > > > > noting that: > > > > https://autobuilder.yocto.io/pub/failed-builds-data/6.5%20kernel/f/qemu_boot_log.20231004134427 > > > > is 3 bytes in size. > > All this is very confusing indeed. I sent another version of the detect login prompt without > utf-8 conversion, with some luck that could help. At least on some of the logs the login prompt > is there but with lots of garbage around it. Serial port core got refactored in 6.5 and diff to > 6.4 is quite large so there could be bugs there, and those could be triggering new > errors visible to userspace too. With the changes I've made to the logging and the raw log files we have without any utf-8 conversion, I'm now 99% certain this isn't any utf conversion issue since the login prompts are not in the raw binary files and those are written with no conversion. It is much more likely the getty isn't running or isn't communicating somehow. I'm trying to work out how to add some further debug/testing when this fails in a way it will help us debug. Cheers, Richard
Hi, On Thu, Oct 05, 2023 at 04:47:16PM +0100, Richard Purdie wrote: > On Thu, 2023-10-05 at 18:27 +0300, Mikko Rapeli wrote: > > Hi, > > > > On Wed, Oct 04, 2023 at 09:35:20PM +0100, Richard Purdie wrote: > > > On Wed, 2023-10-04 at 22:34 +0300, Mikko Rapeli wrote: > > > > Hi, > > > > > > > > On Wed, Oct 04, 2023 at 02:29:40PM +0100, Richard Purdie wrote: > > > > > On Wed, 2023-10-04 at 16:07 +0300, Mikko Rapeli wrote: > > > > > > Hi, > > > > > > > > > > > > Acked-by: Mikko Rapeli <mikko.rapeli@linaro.org> > > > > > > > > > > > > for the full series. I hope our discussion over #yocto irc channel > > > > > > helped resolve these and the test failures seen with master-next. > > > > > > > > > > Yes, thanks! It was useful to have someone to talk the issues through > > > > > with and much appreciated. > > > > > > > > > > Sadly I don't think we've found all the issues yet. > > > > > > > > > > On the build N-1 we have a selftest failure on debian12-ty-1: > > > > > > > > > > https://autobuilder.yoctoproject.org/typhoon/#/builders/80/builds/5803/steps/14/logs/stdio > > > > > > > > Can I get from this test run stdio to the more detailed task, qemurunner or qemu boot logs somehow? > > > > Or do you have to dig into the system manually to capture them, like in the example below? > > > > > > > > I would like to check into the detailed logs from both passing and failing test runs. > > > > Of course kernel dmesg would need to be there so maybe after your patch to the runqemu > > > > side at least. > > > > > > > > If there is a regression in kernel to the serial device detection and numbering, > > > > then we need some way to see e.g. 8250 uart/tty0/ttyAMA0 related dmesg messages. > > > > > > I can't capture everything in time in each case but I did grab: > > > > > > https://autobuilder.yocto.io/pub/failed-builds-data/6.5%20kernel/e/ > > > > > > which is from > > > > > > https://autobuilder.yoctoproject.org/typhoon/#/builders/76/builds/7863/steps/21/logs/stdio > > > > > > which has this threads patches applied. > > > > > > I've put two sets of logs there from the same builds, one success, one > > > is missing the getty on one log. > > > > > > > > > I also have the logs from: > > > > > > https://autobuilder.yoctoproject.org/typhoon/#/builders/86/builds/5848/steps/14/logs/stdio > > > > > > which I've saved here: > > > > > > https://autobuilder.yocto.io/pub/failed-builds-data/6.5%20kernel/f/ > > > > > > noting that:� > > > > > > https://autobuilder.yocto.io/pub/failed-builds-data/6.5%20kernel/f/qemu_boot_log.20231004134427 > > > > > > is 3 bytes in size. > > > > All this is very confusing indeed. I sent another version of the detect login prompt without > > utf-8 conversion, with some luck that could help. At least on some of the logs the login prompt > > is there but with lots of garbage around it. Serial port core got refactored in 6.5 and diff to > > 6.4 is quite large so there could be bugs there, and those could be triggering new > > errors visible to userspace too. > > With the changes I've made to the logging and the raw log files we have > without any utf-8 conversion, I'm now 99% certain this isn't any utf > conversion issue since the login prompts are not in the raw binary > files and those are written with no conversion. It is much more likely > the getty isn't running or isn't communicating somehow. > > I'm trying to work out how to add some further debug/testing when this > fails in a way it will help us debug. Alright, I was confused by logs from working login detection. I presume qemu is configured with two serial ports to avoid mixing kernel messages with the login prompt. It would be easy to grep for the login prompt also from the first serial output though it would not explain the root cause for these issues but could allow testing to move forward to the ssh connection. In all of the logs I saw, the first serial port had a login prompt. The serial ports are not used in testing, AFAIK. Oh now there are the debug commands in master-next. Nice, I was hoping for this kind of data in logs when things go horribly wrong. Cheers, -Mikko
diff --git a/meta/lib/oeqa/utils/qemurunner.py b/meta/lib/oeqa/utils/qemurunner.py index ea867a2934f..9aa2e7b3bea 100644 --- a/meta/lib/oeqa/utils/qemurunner.py +++ b/meta/lib/oeqa/utils/qemurunner.py @@ -503,9 +503,8 @@ class QemuRunner: (self.boottime, time.strftime("%D %H:%M:%S"))) tail = lambda l: "\n".join(l.splitlines()[-25:]) bootlog = self.decode_qemulog(bootlog) - # in case bootlog is empty, use tail qemu log store at self.msg - lines = tail(bootlog if bootlog else self.msg) - self.logger.warning("Last 25 lines of text (%d):\n%s" % (len(bootlog), lines)) + self.logger.warning("Last 25 lines of login console (%d):\n%s" % (len(bootlog), tail(bootlog))) + self.logger.warning("Last 25 lines of all logging (%d):\n%s" % (len(self.msg), tail(self.msg))) self.logger.warning("Check full boot log: %s" % self.logfile) self.stop() return False
It is unclear when things fail which output was on which serial port. Improve the output to show the last lines of both data to improve debugging. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org> --- meta/lib/oeqa/utils/qemurunner.py | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)