diff mbox series

scripts/resulttool: group all regressions in regression report

Message ID 20231103205018.19693-1-alexis.lothore@bootlin.com
State Accepted, archived
Commit 982798ef96e3a32bf15341bdd3bb7c4356709412
Headers show
Series scripts/resulttool: group all regressions in regression report | expand

Commit Message

Alexis Lothoré Nov. 3, 2023, 8:50 p.m. UTC
From: Alexis Lothoré <alexis.lothore@bootlin.com>

Commit c304fcbe0588b1078373558c2ddf36064bcdf214 introduced a grouping when
listing regressions. This grouping has been added only for ptests. It has
been observed that any other kind of tests could benefit from it. For
example, current regression reports can show the following:

1 regression(s) for oescripts.OEGitproxyTests.test_oegitproxy_proxy_dash
    oescripts.OEGitproxyTests.test_oegitproxy_proxy_dash: PASSED -> SKIPPED
1 regression(s) for oescripts.OEPybootchartguyTests.test_pybootchartguy_help
    oescripts.OEPybootchartguyTests.test_pybootchartguy_help: PASSED -> SKIPPED
1 regression(s) for oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_pdf_output
    oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_pdf_output: PASSED -> SKIPPED
1 regression(s) for oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_png_output
    oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_png_output: PASSED -> SKIPPED
1 regression(s) for oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_svg_output
    oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_svg_output: PASSED -> SKIPPED
[...]

This output is not so useful in its current state and should be grouped per
test type too.
Enable grouping for all kind of tests, to make it llok like the following
in reports:
5 regression(s) for oescripts
    oescripts.OEGitproxyTests.test_oegitproxy_proxy_dash: PASSED -> SKIPPED
    oescripts.OEPybootchartguyTests.test_pybootchartguy_help: PASSED -> SKIPPED
    oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_pdf_output: PASSED -> SKIPPED
    oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_png_output: PASSED -> SKIPPED
    oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_svg_output: PASSED -> SKIPPED

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
---
 scripts/lib/resulttool/regression.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Richard Purdie Nov. 4, 2023, 11:13 a.m. UTC | #1
On Fri, 2023-11-03 at 13:50 -0700, Alexis Lothoré via
lists.openembedded.org wrote:
> From: Alexis Lothoré <alexis.lothore@bootlin.com>
> 
> Commit c304fcbe0588b1078373558c2ddf36064bcdf214 introduced a grouping when
> listing regressions. This grouping has been added only for ptests. It has
> been observed that any other kind of tests could benefit from it. For
> example, current regression reports can show the following:
> 
> 1 regression(s) for oescripts.OEGitproxyTests.test_oegitproxy_proxy_dash
>     oescripts.OEGitproxyTests.test_oegitproxy_proxy_dash: PASSED -> SKIPPED
> 1 regression(s) for oescripts.OEPybootchartguyTests.test_pybootchartguy_help
>     oescripts.OEPybootchartguyTests.test_pybootchartguy_help: PASSED -> SKIPPED
> 1 regression(s) for oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_pdf_output
>     oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_pdf_output: PASSED -> SKIPPED
> 1 regression(s) for oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_png_output
>     oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_png_output: PASSED -> SKIPPED
> 1 regression(s) for oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_svg_output
>     oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_svg_output: PASSED -> SKIPPED
> [...]
> 
> This output is not so useful in its current state and should be grouped per
> test type too.
> Enable grouping for all kind of tests, to make it llok like the following
> in reports:
> 5 regression(s) for oescripts
>     oescripts.OEGitproxyTests.test_oegitproxy_proxy_dash: PASSED -> SKIPPED
>     oescripts.OEPybootchartguyTests.test_pybootchartguy_help: PASSED -> SKIPPED
>     oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_pdf_output: PASSED -> SKIPPED
>     oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_png_output: PASSED -> SKIPPED
>     oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_svg_output: PASSED -> SKIPPED
> 

Do you have a different example? This one is tricky as I happen to know
that test depends on the host distro and the things available there.
There are some distros it will pass on, there are some where it is
always skipped. There was a recipetool test added recently which will
do something similar depending upon host python version.

The challenge is we run the test on different host distros so it is
hard to see it as a regression. I don't know what we can do to make
this "clear" to the report reader...

The patch is probably ok as it doesn't make the output worse but it has
probably already obfuscated things a bit.

Cheers,

Richard
Alexis Lothoré Nov. 6, 2023, 9:20 p.m. UTC | #2
Hi Richard,
sorry for the late reply. I see that you have decided to apply the patch in the
mean time, but here are my comments

On 11/4/23 04:13, Richard Purdie wrote:
> On Fri, 2023-11-03 at 13:50 -0700, Alexis Lothoré via
> lists.openembedded.org wrote:
>> From: Alexis Lothoré <alexis.lothore@bootlin.com>
>> 5 regression(s) for oescripts
>>     oescripts.OEGitproxyTests.test_oegitproxy_proxy_dash: PASSED -> SKIPPED
>>     oescripts.OEPybootchartguyTests.test_pybootchartguy_help: PASSED -> SKIPPED
>>     oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_pdf_output: PASSED -> SKIPPED
>>     oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_png_output: PASSED -> SKIPPED
>>     oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_svg_output: PASSED -> SKIPPED
>>
> 
> Do you have a different example? This one is tricky as I happen to know
> that test depends on the host distro and the things available there.
> There are some distros it will pass on, there are some where it is
> always skipped. There was a recipetool test added recently which will
> do something similar depending upon host python version.

If not already done, you can take a look at the report I have manually generated
for 4.3.rc2 (https://pastebin.com/fvRcqes4) which relies on this patch

> The challenge is we run the test on different host distros so it is
> hard to see it as a regression. I don't know what we can do to make
> this "clear" to the report reader...
> 
> The patch is probably ok as it doesn't make the output worse but it has
> probably already obfuscated things a bit.

I am not sure if this patch obfuscates things more, since it is mostly about
grouping regressions to avoid the repeated "1 regression(s) for <..>". I would
say it _may_ obfuscate some things if there are a lot of regressions for the
same test kind  (so the display limit is triggered) AND some tests of this
specific kind are meant to run on specific distros while some others must run on
multiple. In this case, indeed, there may be a mix of (legitimately) skipped
tests and real failing tests, hidden by the display limit. But is it the case ?

Anyway, I agree with you about the main issue (false positive due to tests not
meant to be compared between some distros/machines), but I did not find time yet
to take a better look at this and propose something relevant while making sure
not to loose any relevant comparison. The "dumb" way could be to detect that all
tests in a result have a "SKIPPED" status on target side, which hints about
those tests not being relevant for the target (in this case, we could simply
silently discard the comparison), but I have to ensure it is valid for most cases.

Alexis

> 
> Cheers,
> 
> Richar
diff mbox series

Patch

diff --git a/scripts/lib/resulttool/regression.py b/scripts/lib/resulttool/regression.py
index 8fbe5a54783a..10e7d13841a7 100644
--- a/scripts/lib/resulttool/regression.py
+++ b/scripts/lib/resulttool/regression.py
@@ -236,7 +236,8 @@  def compare_result(logger, base_name, target_name, base_result, target_result, d
             for k in sorted(result):
                 if not result[k]['target'] or not result[k]['target'].startswith("PASS"):
                     # Differentiate each ptest kind when listing regressions
-                    key = '.'.join(k.split('.')[:2]) if k.startswith('ptest') else k
+                    key_parts = k.split('.')
+                    key = '.'.join(key_parts[:2]) if k.startswith('ptest') else key_parts[0]
                     # Append new regression to corresponding test family
                     regressions[key] = regressions.setdefault(key, []) + ['        %s: %s -> %s\n' % (k, get_status_str(result[k]['base']), get_status_str(result[k]['target']))]
             resultstring += f"    Total: {sum([len(regressions[r]) for r in regressions])} new regression(s):\n"