diff mbox series

[v2,9/9] oeqa selftest runtime_test.py: skip virgl_headless test if /dev/dri/renderD128 access fails

Message ID 20230823061025.3952909-9-mikko.rapeli@linaro.org
State New
Headers show
Series [v2,1/9] core-image-minimal: increase extra space to pass df.py oeqa runtime test | expand

Commit Message

Mikko Rapeli Aug. 23, 2023, 6:10 a.m. UTC
If access to /dev/dri/renderD128 fails, then qemu with 3d graphics will
fail to start and errors are rather cryptic like:

qemu-system-x86_64: egl: render node init failed

To fix this, users likely need to

 * modprobe vgem
 * add their user to "render" group to write to /dev/dri/renderD128

If access is not available due to missing HW, driver or failing access,
then skip the test:

2023-08-22 14:41:20,591 - oe-selftest - INFO - test_testimage_virgl_headless (runtime_test.TestImage)
2023-08-22 14:41:20,603 - oe-selftest - INFO -  ... skipped 'Can not open "/dev/dri/renderD128" device'
2023-08-22 14:41:20,603 - oe-selftest - INFO - Can not open "/dev/dri/renderD128" device

Signed-off-by: Mikko Rapeli <mikko.rapeli@linaro.org>
---
 meta/lib/oeqa/selftest/cases/runtime_test.py | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Alexander Kanavin Aug. 23, 2023, 6:37 p.m. UTC | #1
On Wed, 23 Aug 2023 at 08:10, Mikko Rapeli <mikko.rapeli@linaro.org> wrote:
>
> If access to /dev/dri/renderD128 fails, then qemu with 3d graphics will
> fail to start and errors are rather cryptic like:
>
> qemu-system-x86_64: egl: render node init failed
>
> To fix this, users likely need to
>
>  * modprobe vgem
>  * add their user to "render" group to write to /dev/dri/renderD128
>
> If access is not available due to missing HW, driver or failing access,
> then skip the test:

This should be a failure rather than a skip. Otherwise, autobuilder
testing will silently regress and no one will notice. The error
message should include information from IOError as well so that it's
clear that the issue is device file permissions.

Alex
Alexander Kanavin Aug. 23, 2023, 6:38 p.m. UTC | #2
Oh, and device should be /dev/dri/render*, and all of this should be in runqemu.

Alex

On Wed, 23 Aug 2023 at 20:37, Alexander Kanavin <alex.kanavin@gmail.com> wrote:
>
> On Wed, 23 Aug 2023 at 08:10, Mikko Rapeli <mikko.rapeli@linaro.org> wrote:
> >
> > If access to /dev/dri/renderD128 fails, then qemu with 3d graphics will
> > fail to start and errors are rather cryptic like:
> >
> > qemu-system-x86_64: egl: render node init failed
> >
> > To fix this, users likely need to
> >
> >  * modprobe vgem
> >  * add their user to "render" group to write to /dev/dri/renderD128
> >
> > If access is not available due to missing HW, driver or failing access,
> > then skip the test:
>
> This should be a failure rather than a skip. Otherwise, autobuilder
> testing will silently regress and no one will notice. The error
> message should include information from IOError as well so that it's
> clear that the issue is device file permissions.
>
> Alex
Alexander Kanavin Aug. 23, 2023, 6:39 p.m. UTC | #3
Better yet, how about fixing this at the source, i.e. qemu itself?
Sorry for the rapid-fire :)

Alex

On Wed, 23 Aug 2023 at 20:38, Alexander Kanavin <alex.kanavin@gmail.com> wrote:
>
> Oh, and device should be /dev/dri/render*, and all of this should be in runqemu.
>
> Alex
>
> On Wed, 23 Aug 2023 at 20:37, Alexander Kanavin <alex.kanavin@gmail.com> wrote:
> >
> > On Wed, 23 Aug 2023 at 08:10, Mikko Rapeli <mikko.rapeli@linaro.org> wrote:
> > >
> > > If access to /dev/dri/renderD128 fails, then qemu with 3d graphics will
> > > fail to start and errors are rather cryptic like:
> > >
> > > qemu-system-x86_64: egl: render node init failed
> > >
> > > To fix this, users likely need to
> > >
> > >  * modprobe vgem
> > >  * add their user to "render" group to write to /dev/dri/renderD128
> > >
> > > If access is not available due to missing HW, driver or failing access,
> > > then skip the test:
> >
> > This should be a failure rather than a skip. Otherwise, autobuilder
> > testing will silently regress and no one will notice. The error
> > message should include information from IOError as well so that it's
> > clear that the issue is device file permissions.
> >
> > Alex
Alexander Kanavin Aug. 23, 2023, 6:42 p.m. UTC | #4
I mean runqemu does already have

def check_render_nodes(self):

where this should be fixed. Sorry can't help myself :)

Alex

On Wed, 23 Aug 2023 at 20:40, Alexander Kanavin via
lists.openembedded.org <alex.kanavin=gmail.com@lists.openembedded.org>
wrote:
>
> Better yet, how about fixing this at the source, i.e. qemu itself?
> Sorry for the rapid-fire :)
>
> Alex
>
> On Wed, 23 Aug 2023 at 20:38, Alexander Kanavin <alex.kanavin@gmail.com> wrote:
> >
> > Oh, and device should be /dev/dri/render*, and all of this should be in runqemu.
> >
> > Alex
> >
> > On Wed, 23 Aug 2023 at 20:37, Alexander Kanavin <alex.kanavin@gmail.com> wrote:
> > >
> > > On Wed, 23 Aug 2023 at 08:10, Mikko Rapeli <mikko.rapeli@linaro.org> wrote:
> > > >
> > > > If access to /dev/dri/renderD128 fails, then qemu with 3d graphics will
> > > > fail to start and errors are rather cryptic like:
> > > >
> > > > qemu-system-x86_64: egl: render node init failed
> > > >
> > > > To fix this, users likely need to
> > > >
> > > >  * modprobe vgem
> > > >  * add their user to "render" group to write to /dev/dri/renderD128
> > > >
> > > > If access is not available due to missing HW, driver or failing access,
> > > > then skip the test:
> > >
> > > This should be a failure rather than a skip. Otherwise, autobuilder
> > > testing will silently regress and no one will notice. The error
> > > message should include information from IOError as well so that it's
> > > clear that the issue is device file permissions.
> > >
> > > Alex
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#186629): https://lists.openembedded.org/g/openembedded-core/message/186629
> Mute This Topic: https://lists.openembedded.org/mt/100910043/1686489
> Group Owner: openembedded-core+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [alex.kanavin@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
Mikko Rapeli Aug. 24, 2023, 6:45 a.m. UTC | #5
Hi,

On Wed, Aug 23, 2023 at 08:37:47PM +0200, Alexander Kanavin wrote:
> On Wed, 23 Aug 2023 at 08:10, Mikko Rapeli <mikko.rapeli@linaro.org> wrote:
> >
> > If access to /dev/dri/renderD128 fails, then qemu with 3d graphics will
> > fail to start and errors are rather cryptic like:
> >
> > qemu-system-x86_64: egl: render node init failed
> >
> > To fix this, users likely need to
> >
> >  * modprobe vgem
> >  * add their user to "render" group to write to /dev/dri/renderD128
> >
> > If access is not available due to missing HW, driver or failing access,
> > then skip the test:
> 
> This should be a failure rather than a skip. Otherwise, autobuilder
> testing will silently regress and no one will notice. The error
> message should include information from IOError as well so that it's
> clear that the issue is device file permissions.

Then this change is not needed at all. Test already fails if render device
is not available or if access fails. In my experience very few environments have vgem
driver loaded and "render" group permissions for the user running the builds and tests.

Cheers,

-Mikko
Alexander Kanavin Aug. 24, 2023, 6:56 a.m. UTC | #6
On Thu, 24 Aug 2023 at 08:45, Mikko Rapeli <mikko.rapeli@linaro.org> wrote:
> Then this change is not needed at all. Test already fails if render device
> is not available or if access fails. In my experience very few environments have vgem
> driver loaded and "render" group permissions for the user running the builds and tests.

check_render_nodes() in runqemu could however be improved to check for
permissions as well (it currently checks only presence).

Alex
Alexander Kanavin Sept. 7, 2023, 8:59 a.m. UTC | #7
This patch made it to master-next - please don't. The issue that needs
to be fixed is unhelpful error messages, which should be addressed in
runqemu (I'll make a patch for it).

But the test should continue to fail rather than be skipped (until
it's on a distro where it can't pass, which is only RHEL clones at the
moment). Otherwise skipping the test quietly amounts to disabling it
on newly added autobuilder workers or other instances where selftest
runs, and regressions will follow that.

Alex

On Thu, 24 Aug 2023 at 08:57, Alexander Kanavin via
lists.openembedded.org <alex.kanavin=gmail.com@lists.openembedded.org>
wrote:
>
> On Thu, 24 Aug 2023 at 08:45, Mikko Rapeli <mikko.rapeli@linaro.org> wrote:
> > Then this change is not needed at all. Test already fails if render device
> > is not available or if access fails. In my experience very few environments have vgem
> > driver loaded and "render" group permissions for the user running the builds and tests.
>
> check_render_nodes() in runqemu could however be improved to check for
> permissions as well (it currently checks only presence).
>
> Alex
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#186639): https://lists.openembedded.org/g/openembedded-core/message/186639
> Mute This Topic: https://lists.openembedded.org/mt/100910043/1686489
> Group Owner: openembedded-core+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [alex.kanavin@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
Richard Purdie Sept. 7, 2023, 10:54 a.m. UTC | #8
On Thu, 2023-09-07 at 10:59 +0200, Alexander Kanavin wrote:
> This patch made it to master-next - please don't. The issue that needs
> to be fixed is unhelpful error messages, which should be addressed in
> runqemu (I'll make a patch for it).
> 
> But the test should continue to fail rather than be skipped (until
> it's on a distro where it can't pass, which is only RHEL clones at the
> moment). Otherwise skipping the test quietly amounts to disabling it
> on newly added autobuilder workers or other instances where selftest
> runs, and regressions will follow that.

Agreed. Ross asked these be reconsidered as there was a worry some were
falling through the cracks. I know some had issues and need to dive
deeper into remembering what happened. I've removed that one locally.

Cheers,

Richard
diff mbox series

Patch

diff --git a/meta/lib/oeqa/selftest/cases/runtime_test.py b/meta/lib/oeqa/selftest/cases/runtime_test.py
index 5f90bc658f..e72504773a 100644
--- a/meta/lib/oeqa/selftest/cases/runtime_test.py
+++ b/meta/lib/oeqa/selftest/cases/runtime_test.py
@@ -264,7 +264,8 @@  TEST_RUNQEMUPARAMS:append = " slirp"
     def test_testimage_virgl_headless(self):
         """
         Summary: Check host-assisted accelerate OpenGL functionality in qemu with egl-headless frontend
-        Expected: 1. Check that virgl kernel driver is loaded and 3d acceleration is enabled
+        Expected: 1. Check that virgl kernel driver is loaded (modprobe vgem, user part of "render" group and
+                     can open /dev/dri/renderD* device) and 3d acceleration is enabled
                   2. Check that kmscube demo runs without crashing.
         Product: oe-core
         Author: Alexander Kanavin <alex.kanavin@gmail.com>
@@ -276,6 +277,14 @@  TEST_RUNQEMUPARAMS:append = " slirp"
             distro.startswith('almalinux') or distro.startswith('rocky')):
             self.skipTest('virgl headless cannot be tested with %s' %(distro))
 
+        # test requires vgem driver and possibly "render" group rights to access device file
+        render_dev = "/dev/dri/renderD128"
+        try:
+            with open(render_dev, "w") as f:
+                f.close()
+        except IOError:
+            self.skipTest('Can not open "%s" device' % (render_dev))
+
         qemu_distrofeatures = get_bb_var('DISTRO_FEATURES', 'qemu-system-native')
         features = 'IMAGE_CLASSES += "testimage"\n'
         if 'opengl' not in qemu_distrofeatures: