mbox series

[v2,0/4] scripts/resulttool/regression: add metadata filtering

Message ID 20230214165309.63527-1-alexis.lothore@bootlin.com
Headers show
Series scripts/resulttool/regression: add metadata filtering | expand

Message

Alexis Lothoré Feb. 14, 2023, 4:53 p.m. UTC
From: Alexis Lothoré <alexis.lothore@bootlin.com>

This v2 does not contain any change in patches content, it only sets the From:
field correctly. Sorry for the noise.

This patch serie is a proposal linked to discussion initiated here:
https://lists.yoctoproject.org/g/automated-testing/topic/96652823#1219

After integration of some improvements on regression reporting, it has been
observed that the regression report of version 4.2_M2 is way too big. When
checking it, it appears that a big part of the report is composed of "missing
tests" (regression detected because test status changed from "PASS" to "None").
It is mostly due to oeselftest results, since oeselftest is run multiple time
for a single build, but not with the same parameters (so not the same tests
"sets"), so those test sets are not comparable.

The proposed serie introduce OSELFTEST_METADATA appended to tests results when
the TEST_TYPE is "oeselftest". An oeselftest result with those metadata looks
like this:
	[...]
	"configuration": {
		"HOST_DISTRO": "fedora-36",
		"HOST_NAME": "fedora36-ty-3",
		"LAYERS": {
			[...]
		},
		"MACHINE": "qemux86",
		"STARTTIME": "20230126235248",
		"TESTSERIES": "qemux86",
		"TEST_TYPE": "oeselftest",
		"OESELFTEST_METADATA": {
		    "run_all_tests": true,
		    "run_tests": null,
		    "skips": null,
		    "machine": null,
		    "select_tags": ["toolchain-user", "toolchain-system"],
		    "exclude_tags": null
		} 
 	}
	[...]

Additionally, the serie now makes resulttool look at a METADATA_MATCH_TABLE,
which tells that when compared test results have a specific TEST_TYPE, it should
look for some specific metadata to know if tests can be compared or not. It will
then remove all the false positive in regression reports due to tests present in
base results but not found in target results because of skipped tests/excluded
tags

* this serie prioritize retro-compatibility: if the base test is older (ie: it
does not have the needed metadata), it will consider tests as "comparable"
* additionally to tests added in oeqa test cases, some "best effort" manual
testing has been done, with the following cases:
  - run a basic test (e.g: `oeselftest -r tinfoils`), collect test result, break
    test, collect result, ensure tests are compared. Change oeselftest
    parameters, ensure tests are not compared
  - collect base and target tests results from 4.2_M2 regression report,
    manually add new metadata to some tests, replay regression report, ensure
    that regressions are kept or discarded depending on the metadata

Alexis Lothoré (4):
  scripts/oe-selftest: append metadata to tests results
  oeqa/selftest/resulttooltests: fix minor typo
  scripts/resulttool/regression: add metadata filtering for oeselftest
  oeqa/selftest/resulttool: add test for metadata filtering on
    regression

 .../oeqa/selftest/cases/resulttooltests.py    | 123 +++++++++++++++++-
 meta/lib/oeqa/selftest/context.py             |  15 ++-
 scripts/lib/resulttool/regression.py          |  34 +++++
 3 files changed, 170 insertions(+), 2 deletions(-)

Comments

Richard Purdie Feb. 16, 2023, 12:02 a.m. UTC | #1
On Tue, 2023-02-14 at 17:53 +0100, Alexis Lothoré via
lists.openembedded.org wrote:
> From: Alexis Lothoré <alexis.lothore@bootlin.com>
> 
> This v2 does not contain any change in patches content, it only sets the From:
> field correctly. Sorry for the noise.
> 
> This patch serie is a proposal linked to discussion initiated here:
> https://lists.yoctoproject.org/g/automated-testing/topic/96652823#1219
> 
> After integration of some improvements on regression reporting, it has been
> observed that the regression report of version 4.2_M2 is way too big. When
> checking it, it appears that a big part of the report is composed of "missing
> tests" (regression detected because test status changed from "PASS" to "None").
> It is mostly due to oeselftest results, since oeselftest is run multiple time
> for a single build, but not with the same parameters (so not the same tests
> "sets"), so those test sets are not comparable.
> 
> The proposed serie introduce OSELFTEST_METADATA appended to tests results when
> the TEST_TYPE is "oeselftest". An oeselftest result with those metadata looks
> like this:
> 	[...]
> 	"configuration": {
> 		"HOST_DISTRO": "fedora-36",
> 		"HOST_NAME": "fedora36-ty-3",
> 		"LAYERS": {
> 			[...]
> 		},
> 		"MACHINE": "qemux86",
> 		"STARTTIME": "20230126235248",
> 		"TESTSERIES": "qemux86",
> 		"TEST_TYPE": "oeselftest",
> 		"OESELFTEST_METADATA": {
> 		    "run_all_tests": true,
> 		    "run_tests": null,
> 		    "skips": null,
> 		    "machine": null,
> 		    "select_tags": ["toolchain-user", "toolchain-system"],
> 		    "exclude_tags": null
> 		} 
>  	}
> 	[...]
> 
> Additionally, the serie now makes resulttool look at a METADATA_MATCH_TABLE,
> which tells that when compared test results have a specific TEST_TYPE, it should
> look for some specific metadata to know if tests can be compared or not. It will
> then remove all the false positive in regression reports due to tests present in
> base results but not found in target results because of skipped tests/excluded
> tags
> 
> * this serie prioritize retro-compatibility: if the base test is older (ie: it
> does not have the needed metadata), it will consider tests as "comparable"
> * additionally to tests added in oeqa test cases, some "best effort" manual
> testing has been done, with the following cases:
>   - run a basic test (e.g: `oeselftest -r tinfoils`), collect test result, break
>     test, collect result, ensure tests are compared. Change oeselftest
>     parameters, ensure tests are not compared
>   - collect base and target tests results from 4.2_M2 regression report,
>     manually add new metadata to some tests, replay regression report, ensure
>     that regressions are kept or discarded depending on the metadata

I think this is heading in the right direction. Firstly, can we put
some kind of test script into OE-Core for making debugging/testing this
easier?

I'm wondering if we can take some of the code from qa_send_email and
move it into OE-Core such that I could do something like:

show-regression-report 4.2_M1 4.2_M2

which would then resolve those two tags to commits, find the
testresults repo, fetch the data depth1 then call resulttool regression
on them.

I did that manually to experiment. I realised that if we do something
like:

    if "MACHINE" in base_configuration and "MACHINE" in target_configuration:
        if base_configuration["MACHINE"] != target_configuration["MACHINE"]:
            print("Skipping")
            return False

in metadata_matches() we can skip a lot of mismatched combinations even
with the older test results. I think we also should be able to use some
pattern matching to generate a dummy OESELFTEST_METADATA section for
older data which doesn't have it. For example, the presence of meta_ide
tests indicates one particular type of test. Combined with the MACHINE
match, this should let us compare old and new data? That would mean
metadata_matches() would need to see into the actual results too.

Does that make sense?

Cheers,

Richard
Alexis Lothoré Feb. 16, 2023, 8:56 a.m. UTC | #2
On 2/16/23 01:02, Richard Purdie wrote:
> On Tue, 2023-02-14 at 17:53 +0100, Alexis Lothoré via
> lists.openembedded.org wrote:
>> From: Alexis Lothoré <alexis.lothore@bootlin.com>
>> * this serie prioritize retro-compatibility: if the base test is older (ie: it
>> does not have the needed metadata), it will consider tests as "comparable"
>> * additionally to tests added in oeqa test cases, some "best effort" manual
>> testing has been done, with the following cases:
>>   - run a basic test (e.g: `oeselftest -r tinfoils`), collect test result, break
>>     test, collect result, ensure tests are compared. Change oeselftest
>>     parameters, ensure tests are not compared
>>   - collect base and target tests results from 4.2_M2 regression report,
>>     manually add new metadata to some tests, replay regression report, ensure
>>     that regressions are kept or discarded depending on the metadata
> 
> I think this is heading in the right direction. Firstly, can we put
> some kind of test script into OE-Core for making debugging/testing this
> easier?
> 
> I'm wondering if we can take some of the code from qa_send_email and
> move it into OE-Core such that I could do something like:
> 
> show-regression-report 4.2_M1 4.2_M2
> 
> which would then resolve those two tags to commits, find the
> testresults repo, fetch the data depth1 then call resulttool regression
> on them.

That totally makes sense, it will make future change easier to test, thanks for
the suggestion. I will think about how I can rework/transfer some code to do
that. I feel that one thing could still be troublesome to do so: since there
will be linked modification in yocto-autobuilder-helper and openembedded, do you
know if yocto-autobuilder-helper is fetched on same branch family as all
repositories needed for build ? From a quick reading, I see that it is by
default on "master" in config.py in yocto-autobuilder2 repository, so I think it
is always on master, which will be an issue to solve ?

> 
> I did that manually to experiment. I realised that if we do something
> like:
> 
>     if "MACHINE" in base_configuration and "MACHINE" in target_configuration:
>         if base_configuration["MACHINE"] != target_configuration["MACHINE"]:
>             print("Skipping")
>             return False
> 
> in metadata_matches() we can skip a lot of mismatched combinations even
> with the older test results.

Indeed, comparing MACHINE when present on both results can help too. I may have
skipped this because of confusion about our previous discussion on some tests
being "cross-compared" while other are "strictly" compared ! I will ensure that
most of scenarios allow comparing MACHINE, and if so, I will add it in
metadata_matches to be checked first (because cheap to test), and if it matches,
check OESELFTEST_METADATA too (more expensive)

> I think we also should be able to use some
> pattern matching to generate a dummy OESELFTEST_METADATA section for
> older data which doesn't have it. For example, the presence of meta_ide
> tests indicates one particular type of test. Combined with the MACHINE
> match, this should let us compare old and new data? That would mean
> metadata_matches() would need to see into the actual results too.

I thought about this possibility while starting to work on this (checking for
existing metadata to kind of generate/guess test configuration), but I have felt
that it would need too much "business logic" encoding in resulttool scripts,
which could easily break if we want to change some tests configuration. I mean,
if someone changes some parameters passed to oe-selftest in builders
configuration, he will very probably not remember to update some "guess" filters
in resulttool.

I will start working on new revision of this series based on your comments

Kind regards,
Richard Purdie Feb. 16, 2023, 10:33 a.m. UTC | #3
On Thu, 2023-02-16 at 09:56 +0100, Alexis Lothoré wrote:
> On 2/16/23 01:02, Richard Purdie wrote:
> > On Tue, 2023-02-14 at 17:53 +0100, Alexis Lothoré via
> > lists.openembedded.org wrote:
> > > From: Alexis Lothoré <alexis.lothore@bootlin.com>
> > > * this serie prioritize retro-compatibility: if the base test is older (ie: it
> > > does not have the needed metadata), it will consider tests as "comparable"
> > > * additionally to tests added in oeqa test cases, some "best effort" manual
> > > testing has been done, with the following cases:
> > >   - run a basic test (e.g: `oeselftest -r tinfoils`), collect test result, break
> > >     test, collect result, ensure tests are compared. Change oeselftest
> > >     parameters, ensure tests are not compared
> > >   - collect base and target tests results from 4.2_M2 regression report,
> > >     manually add new metadata to some tests, replay regression report, ensure
> > >     that regressions are kept or discarded depending on the metadata
> > 
> > I think this is heading in the right direction. Firstly, can we put
> > some kind of test script into OE-Core for making debugging/testing this
> > easier?
> > 
> > I'm wondering if we can take some of the code from qa_send_email and
> > move it into OE-Core such that I could do something like:
> > 
> > show-regression-report 4.2_M1 4.2_M2

I was thinking "yocto-testresult-query regression-report" might be a
better name/command, then it can default to using the yocto-testresults
repo and resolving yocto tags. You'll also likely want to specify a
workdir for the operation but those are implementation details :)

> > 
> > which would then resolve those two tags to commits, find the
> > testresults repo, fetch the data depth1 then call resulttool regression
> > on them.
> 
> That totally makes sense, it will make future change easier to test, thanks for
> the suggestion. I will think about how I can rework/transfer some code to do
> that. I feel that one thing could still be troublesome to do so: since there
> will be linked modification in yocto-autobuilder-helper and openembedded, do you
> know if yocto-autobuilder-helper is fetched on same branch family as all
> repositories needed for build ? From a quick reading, I see that it is by
> default on "master" in config.py in yocto-autobuilder2 repository, so I think it
> is always on master, which will be an issue to solve ?

Unlike yocto-autobuilder2, there is a branch of yocto-autobuilder-
helper per release. We can therefore move code from there to OE-Core on
the master branch and we'll be ok. It will be a small headache for SWAT
whilst we test it but Alexandre is on cc and will cope :).

> > 
> > I did that manually to experiment. I realised that if we do something
> > like:
> > 
> >     if "MACHINE" in base_configuration and "MACHINE" in target_configuration:
> >         if base_configuration["MACHINE"] != target_configuration["MACHINE"]:
> >             print("Skipping")
> >             return False
> > 
> > in metadata_matches() we can skip a lot of mismatched combinations even
> > with the older test results.
> 
> Indeed, comparing MACHINE when present on both results can help too. I may have
> skipped this because of confusion about our previous discussion on some tests
> being "cross-compared" while other are "strictly" compared ! I will ensure that
> most of scenarios allow comparing MACHINE, and if so, I will add it in
> metadata_matches to be checked first (because cheap to test), and if it matches,
> check OESELFTEST_METADATA too (more expensive)

MACHINE should always match, that is fine. What doesn't need to match
are the distros and so on.

> > I think we also should be able to use some
> > pattern matching to generate a dummy OESELFTEST_METADATA section for
> > older data which doesn't have it. For example, the presence of meta_ide
> > tests indicates one particular type of test. Combined with the MACHINE
> > match, this should let us compare old and new data? That would mean
> > metadata_matches() would need to see into the actual results too.
> 
> I thought about this possibility while starting to work on this (checking for
> existing metadata to kind of generate/guess test configuration), but I have felt
> that it would need too much "business logic" encoding in resulttool scripts,
> which could easily break if we want to change some tests configuration. I mean,
> if someone changes some parameters passed to oe-selftest in builders
> configuration, he will very probably not remember to update some "guess" filters
> in resulttool.

I think it will be important/useful to have good regression reports
against existing test results (e.g. ultimately on our LTS release
branches too). Whilst I don't normally like hardcoding things like
this, I think in this case the benefit outweighs the ugliness as long
as it "fades" over time which it will if we add proper metadata going
forward.

> I will start working on new revision of this series based on your comments

Sounds good, thanks!

Cheers,

Richard