Message ID | 20230602095037.97981-4-alexis.lothore@bootlin.com |
---|---|
State | New |
Headers | show |
Series | add failed test artifacts retriever | expand |
On Fri, 2 Jun 2023 at 11:50, Alexis Lothoré via lists.openembedded.org <alexis.lothore=bootlin.com@lists.openembedded.org> wrote: > +# When any test fails, if path to an artifacts configuration (listing > +# files/directories to retrieve on target) has been provided, all listed > +# elements will be downloaded from target before shutting it down > + > +ARTIFACTS_LIST_PATH ??= "" I think this should be simplified further by simply specifying the default list directly instead: TESTIMAGE_FAILED_QA_ARTIFACTS ?= "${libdir}/*/ptest/ /var/log/ /etc/version" Then, if desired, the list can be overridden globally, or per image as needed - from the image recipes and/or regular bitbake configuration mechanisms used by CI (local.conf etc). If the feature needs to be switched off altogether, that can be done by setting an empty list. Alex
Hi, On Fri, Jun 02, 2023 at 01:46:03PM +0200, Alexander Kanavin wrote: > On Fri, 2 Jun 2023 at 11:50, Alexis Lothor� via lists.openembedded.org > <alexis.lothore=bootlin.com@lists.openembedded.org> wrote: > > +# When any test fails, if path to an artifacts configuration (listing > > +# files/directories to retrieve on target) has been provided, all listed > > +# elements will be downloaded from target before shutting it down > > + > > +ARTIFACTS_LIST_PATH ??= "" > > I think this should be simplified further by simply specifying the > default list directly instead: > > TESTIMAGE_FAILED_QA_ARTIFACTS ?= "${libdir}/*/ptest/ /var/log/ /etc/version" Try fetching /etc/os-release too, please. I like this patch, thanks! Cheers, -Mikko
Hello Alexander, Mikko, On 6/2/23 13:52, Mikko Rapeli wrote: > Hi, > > On Fri, Jun 02, 2023 at 01:46:03PM +0200, Alexander Kanavin wrote: >> On Fri, 2 Jun 2023 at 11:50, Alexis Lothoré via lists.openembedded.org >> <alexis.lothore=bootlin.com@lists.openembedded.org> wrote: >>> +# When any test fails, if path to an artifacts configuration (listing >>> +# files/directories to retrieve on target) has been provided, all listed >>> +# elements will be downloaded from target before shutting it down >>> + >>> +ARTIFACTS_LIST_PATH ??= "" >> >> I think this should be simplified further by simply specifying the >> default list directly instead: >> >> TESTIMAGE_FAILED_QA_ARTIFACTS ?= "${libdir}/*/ptest/ /var/log/ /etc/version" I wondered about this possibility while adding this variable, and at this time I thought it would encode to much "CI logic" in testimage class. But since you raise the suggestion, I can update to pass directly the file list, which could indeed be overridden by CI tests runners. We can start with those three paths in the default value. Another point raised by your example is the lack of wildcards management in artifacts path, which would clearly ease the listing, especially for ptests. I'll see if I can come up with something better for this > > Try fetching /etc/os-release too, please. ACK, will add it to default list > > I like this patch, thanks! > > Cheers, > > -Mikko > > > > -=-=-=-=-=-=-=-=-=-=-=- > Links: You receive all messages sent to this group. > View/Reply Online (#182307): https://lists.openembedded.org/g/openembedded-core/message/182307 > Mute This Topic: https://lists.openembedded.org/mt/99283034/7394887 > Group Owner: openembedded-core+owner@lists.openembedded.org > Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [alexis.lothore@bootlin.com] > -=-=-=-=-=-=-=-=-=-=-=- >
On Fri, 2 Jun 2023 at 14:14, Alexis Lothoré <alexis.lothore@bootlin.com> wrote: > I wondered about this possibility while adding this variable, and at this time I > thought it would encode to much "CI logic" in testimage class. But since you > raise the suggestion, I can update to pass directly the file list, which could > indeed be overridden by CI tests runners. We can start with those three paths in > the default value. But this isn't CI logic. It's bitbake's QA logic, which can and should be available in local builds. Running testimage locally is a supported and regularly used operation, so it should work exactly same as in CI. Alex
Hi, These changes are an improvement, but based on my experience in product test automation, instead of collecting logs after testing is completed, it is better to capture logs from the target device while tests are being executed. Serial console, systemd journal etc logs can be captured as streams from the device under test (DUT), and time stamps added from the test controller so that aligning events like test command execution and log messages is possible. It is also a good idea to capture core dumps via this kind of stream(s). Problems with HW and BSP SW may not be visible after testing completes (device has rebooted, reset) or capturing data is not possible at all (due to system, kernel, userspace hang, oom or too much load). This capturing of logs could be implemented by adding some configurable variables to execute commands on the test controller and inside oeqa environment at certain stages of testing, for example after serial console login prompt has been detected. Command to execute could be a simple "ssh -c 'journalctl -b -a'" to capture boot logs and "ssh 'journalctl -a -f'" and log the output data with additional time stamps to bitbake task output or a separate file. Just thinking out loud here, these changes are an improvement over current situation already. Thanks for sending these patches! Cheers, -Mikko
Hi Mikko, sorry for late reply, and thanks for the additional feedback On 6/2/23 15:07, Mikko Rapeli wrote: > Hi, > > These changes are an improvement, but based on my experience in product test automation, > instead of collecting logs after testing is completed, it is better to capture logs > from the target device while tests are being executed. Serial console, systemd journal > etc logs can be captured as streams from the device under test (DUT), and time stamps added > from the test controller so that aligning events like test command execution and log messages > is possible. It is also a good idea to capture core dumps via this kind of stream(s). > Problems with HW and BSP SW may not be visible after testing completes (device has > rebooted, reset) or capturing data is not possible at all (due to system, kernel, userspace hang, > oom or too much load). > > This capturing of logs could be implemented by adding some configurable variables to execute > commands on the test controller and inside oeqa environment at certain stages of testing, for example > after serial console login prompt has been detected. Command to execute could be a simple > "ssh -c 'journalctl -b -a'" to capture boot logs and "ssh 'journalctl -a -f'" and log the output data with > additional time stamps to bitbake task output or a separate file. > > Just thinking out loud here, these changes are an improvement over current situation already. > Thanks for sending these patches! While I tend to agree with what you are suggesting, I feel like there are some new constraints, because of targets variety: is the system a real target or a qemu image ? Does it run systemd/journald ? Are coredumps enabled by default ? Of course those constraints may be quite easy to circumvent, moreover we can think of a "best effort" solution which gathers "what it can" from running target. Also, there may be some corner cases that would need some specific handling: some tests can be quite long (hours), what about broken pipes during those ? This would definitely need some re-connection strategies for example. Since current proposed solution is not very invasive/has not much requirements except a valid ssh access, I plan to update the series and let some real testing show if it is relevant. If not (or not enough), we may give a try to a "runtime" version like the one you are suggesting ? Thanks, Alexis > > Cheers, > > -Mikko > > > > -=-=-=-=-=-=-=-=-=-=-=- > Links: You receive all messages sent to this group. > View/Reply Online (#182316): https://lists.openembedded.org/g/openembedded-core/message/182316 > Mute This Topic: https://lists.openembedded.org/mt/99283034/7394887 > Group Owner: openembedded-core+owner@lists.openembedded.org > Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [alexis.lothore@bootlin.com] > -=-=-=-=-=-=-=-=-=-=-=- >
Hi, On Wed, Jun 07, 2023 at 09:36:07AM +0200, Alexis Lothor� wrote: > Hi Mikko, sorry for late reply, and thanks for the additional feedback > > On 6/2/23 15:07, Mikko Rapeli wrote: > > Hi, > > > > These changes are an improvement, but based on my experience in product test automation, > > instead of collecting logs after testing is completed, it is better to capture logs > > from the target device while tests are being executed. Serial console, systemd journal > > etc logs can be captured as streams from the device under test (DUT), and time stamps added > > from the test controller so that aligning events like test command execution and log messages > > is possible. It is also a good idea to capture core dumps via this kind of stream(s). > > Problems with HW and BSP SW may not be visible after testing completes (device has > > rebooted, reset) or capturing data is not possible at all (due to system, kernel, userspace hang, > > oom or too much load). > > > > This capturing of logs could be implemented by adding some configurable variables to execute > > commands on the test controller and inside oeqa environment at certain stages of testing, for example > > after serial console login prompt has been detected. Command to execute could be a simple > > "ssh -c 'journalctl -b -a'" to capture boot logs and "ssh 'journalctl -a -f'" and log the output data with > > additional time stamps to bitbake task output or a separate file. > > > > Just thinking out loud here, these changes are an improvement over current situation already. > > Thanks for sending these patches! > > While I tend to agree with what you are suggesting, I feel like there are some > new constraints, because of targets variety: is the system a real target or a > qemu image ? Does it run systemd/journald ? Are coredumps enabled by default ? > Of course those constraints may be quite easy to circumvent, moreover we can > think of a "best effort" solution which gathers "what it can" from running > target. Also, there may be some corner cases that would need some specific > handling: some tests can be quite long (hours), what about broken pipes during > those ? This would definitely need some re-connection strategies for example. Agreed. Currently our qemu reboot tests are failing for due some kind of reconnection issue, for example. I'll try to find time to fix this... > Since current proposed solution is not very invasive/has not much requirements > except a valid ssh access, I plan to update the series and let some real testing > show if it is relevant. If not (or not enough), we may give a try to a "runtime" > version like the one you are suggesting ? Yes, the series is a clear improvement over the current state. Cheers, -Mikko
diff --git a/meta/classes-recipe/testimage.bbclass b/meta/classes-recipe/testimage.bbclass index 6b10c1db09f9..3fd3dd5b0264 100644 --- a/meta/classes-recipe/testimage.bbclass +++ b/meta/classes-recipe/testimage.bbclass @@ -18,6 +18,12 @@ inherit image-artifact-names TESTIMAGE_AUTO ??= "0" +# When any test fails, if path to an artifacts configuration (listing +# files/directories to retrieve on target) has been provided, all listed +# elements will be downloaded from target before shutting it down + +ARTIFACTS_LIST_PATH ??= "" + # You can set (or append to) TEST_SUITES in local.conf to select the tests # which you want to run for your target. # The test names are the module names in meta/lib/oeqa/runtime/cases. @@ -192,6 +198,45 @@ def get_testimage_boot_patterns(d): boot_patterns[flag] = flagval.encode().decode('unicode-escape') return boot_patterns +def load_artifacts_list(artifacts_conf_path): + import json + + if not artifacts_conf_path: + return None + + if not os.path.isfile(artifacts_conf_path): + return None + + try: + with open(artifacts_conf_path) as f: + artifacts_list = json.load(f) + except json.decoder.JSONDecodeError as e: + bb.warn(f"Invalid tests artifact list format: {e.lineno}:{e.colno} : {e.msg}") + return None + + if 'artifacts' in artifacts_list: + return artifacts_list['artifacts'] + + return None + +def retrieve_test_artifacts(target, artifacts_list, target_dir): + import shutil + + local_artifacts_dir = os.path.join(target_dir, "artifacts") + if os.path.isdir(local_artifacts_dir): + shutil.rmtree(local_artifacts_dir) + + os.makedirs(local_artifacts_dir) + for artifact_path in artifacts_list: + if not os.path.isabs(artifact_path): + bb.warn(f"{artifact_path} is not an absolute path") + continue + try: + dest_dir = os.path.join(local_artifacts_dir, os.path.dirname(artifact_path[1:])) + os.makedirs(dest_dir, exist_ok=True) + target.copyFrom(artifact_path, dest_dir) + except: + bb.warn(f"Can not retrieve {artifact_path} from test target") def testimage_main(d): import os @@ -402,6 +447,12 @@ def testimage_main(d): get_testimage_result_id(configuration), dump_streams=d.getVar('TESTREPORT_FULLLOGS')) results.logSummary(pn) + if not results.wasSuccessful(): + artifacts_list = load_artifacts_list(d.getVar("ARTIFACTS_LIST_PATH")) + if not artifacts_list: + bb.warn("Could not load artifacts list, skip artifacts retrieval") + else: + retrieve_test_artifacts(tc.target, artifacts_list, get_testimage_json_result_dir(d)) tc.target.stop()
Add a basic artifacts retrievers in testimage class which: - triggers when at least one runtime test fails - reads a list of files to retrieve from ARTIFACTS_LIST_PATH - retrieve those files over scp thanks to existing ssh class - store those files in an "artifacts" directory in "tmp/log/oeqa/<image>" Bring partial solution to [YOCTO #14901] Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com> --- meta/classes-recipe/testimage.bbclass | 51 +++++++++++++++++++++++++++ 1 file changed, 51 insertions(+)