Message ID | 20231103205018.19693-1-alexis.lothore@bootlin.com |
---|---|
State | Accepted, archived |
Commit | 982798ef96e3a32bf15341bdd3bb7c4356709412 |
Headers | show |
Series | scripts/resulttool: group all regressions in regression report | expand |
On Fri, 2023-11-03 at 13:50 -0700, Alexis Lothoré via lists.openembedded.org wrote: > From: Alexis Lothoré <alexis.lothore@bootlin.com> > > Commit c304fcbe0588b1078373558c2ddf36064bcdf214 introduced a grouping when > listing regressions. This grouping has been added only for ptests. It has > been observed that any other kind of tests could benefit from it. For > example, current regression reports can show the following: > > 1 regression(s) for oescripts.OEGitproxyTests.test_oegitproxy_proxy_dash > oescripts.OEGitproxyTests.test_oegitproxy_proxy_dash: PASSED -> SKIPPED > 1 regression(s) for oescripts.OEPybootchartguyTests.test_pybootchartguy_help > oescripts.OEPybootchartguyTests.test_pybootchartguy_help: PASSED -> SKIPPED > 1 regression(s) for oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_pdf_output > oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_pdf_output: PASSED -> SKIPPED > 1 regression(s) for oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_png_output > oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_png_output: PASSED -> SKIPPED > 1 regression(s) for oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_svg_output > oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_svg_output: PASSED -> SKIPPED > [...] > > This output is not so useful in its current state and should be grouped per > test type too. > Enable grouping for all kind of tests, to make it llok like the following > in reports: > 5 regression(s) for oescripts > oescripts.OEGitproxyTests.test_oegitproxy_proxy_dash: PASSED -> SKIPPED > oescripts.OEPybootchartguyTests.test_pybootchartguy_help: PASSED -> SKIPPED > oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_pdf_output: PASSED -> SKIPPED > oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_png_output: PASSED -> SKIPPED > oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_svg_output: PASSED -> SKIPPED > Do you have a different example? This one is tricky as I happen to know that test depends on the host distro and the things available there. There are some distros it will pass on, there are some where it is always skipped. There was a recipetool test added recently which will do something similar depending upon host python version. The challenge is we run the test on different host distros so it is hard to see it as a regression. I don't know what we can do to make this "clear" to the report reader... The patch is probably ok as it doesn't make the output worse but it has probably already obfuscated things a bit. Cheers, Richard
Hi Richard, sorry for the late reply. I see that you have decided to apply the patch in the mean time, but here are my comments On 11/4/23 04:13, Richard Purdie wrote: > On Fri, 2023-11-03 at 13:50 -0700, Alexis Lothoré via > lists.openembedded.org wrote: >> From: Alexis Lothoré <alexis.lothore@bootlin.com> >> 5 regression(s) for oescripts >> oescripts.OEGitproxyTests.test_oegitproxy_proxy_dash: PASSED -> SKIPPED >> oescripts.OEPybootchartguyTests.test_pybootchartguy_help: PASSED -> SKIPPED >> oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_pdf_output: PASSED -> SKIPPED >> oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_png_output: PASSED -> SKIPPED >> oescripts.OEPybootchartguyTests.test_pybootchartguy_to_generate_build_svg_output: PASSED -> SKIPPED >> > > Do you have a different example? This one is tricky as I happen to know > that test depends on the host distro and the things available there. > There are some distros it will pass on, there are some where it is > always skipped. There was a recipetool test added recently which will > do something similar depending upon host python version. If not already done, you can take a look at the report I have manually generated for 4.3.rc2 (https://pastebin.com/fvRcqes4) which relies on this patch > The challenge is we run the test on different host distros so it is > hard to see it as a regression. I don't know what we can do to make > this "clear" to the report reader... > > The patch is probably ok as it doesn't make the output worse but it has > probably already obfuscated things a bit. I am not sure if this patch obfuscates things more, since it is mostly about grouping regressions to avoid the repeated "1 regression(s) for <..>". I would say it _may_ obfuscate some things if there are a lot of regressions for the same test kind (so the display limit is triggered) AND some tests of this specific kind are meant to run on specific distros while some others must run on multiple. In this case, indeed, there may be a mix of (legitimately) skipped tests and real failing tests, hidden by the display limit. But is it the case ? Anyway, I agree with you about the main issue (false positive due to tests not meant to be compared between some distros/machines), but I did not find time yet to take a better look at this and propose something relevant while making sure not to loose any relevant comparison. The "dumb" way could be to detect that all tests in a result have a "SKIPPED" status on target side, which hints about those tests not being relevant for the target (in this case, we could simply silently discard the comparison), but I have to ensure it is valid for most cases. Alexis > > Cheers, > > Richar
diff --git a/scripts/lib/resulttool/regression.py b/scripts/lib/resulttool/regression.py index 8fbe5a54783a..10e7d13841a7 100644 --- a/scripts/lib/resulttool/regression.py +++ b/scripts/lib/resulttool/regression.py @@ -236,7 +236,8 @@ def compare_result(logger, base_name, target_name, base_result, target_result, d for k in sorted(result): if not result[k]['target'] or not result[k]['target'].startswith("PASS"): # Differentiate each ptest kind when listing regressions - key = '.'.join(k.split('.')[:2]) if k.startswith('ptest') else k + key_parts = k.split('.') + key = '.'.join(key_parts[:2]) if k.startswith('ptest') else key_parts[0] # Append new regression to corresponding test family regressions[key] = regressions.setdefault(key, []) + [' %s: %s -> %s\n' % (k, get_status_str(result[k]['base']), get_status_str(result[k]['target']))] resultstring += f" Total: {sum([len(regressions[r]) for r in regressions])} new regression(s):\n"