diff mbox series

[1/7] bitbake/runqueue: rework 'bitbake -S printdiff' logic

Message ID 20231218084403.599015-1-alex@linutronix.de
State New
Headers show
Series [1/7] bitbake/runqueue: rework 'bitbake -S printdiff' logic | expand

Commit Message

Alexander Kanavin Dec. 18, 2023, 8:43 a.m. UTC
Previously printdiff code would iterate over tasks that were reported as invalid or absent,
trying to follow dependency chains that would reach the most basic invalid items in the tree.

While this works in tightly controlled local builds, it can lead to bizarre reports
against industrial-sized sstate caches, as the code would not consider whether the
overall target can be fulfilled from valid sstate objects, and instead report
missing sstate signature files that perhaps were never even created due to hash
equivalency providing shortcuts in builds.

This commit reworks the logic in two ways:

- start the iteration over final targets rather than missing objects
and try to recursively arrive at the root of the invalid object dependency.

A previous version of this patch relied relies on finding the most 'recent'
signature in stamps or sstate in a different function later, and recursively
comparing that to the current signature, which is unreliable on real world caches.

- if a given object can be fulfilled from sstate, recurse only into
its setscene dependencies; bitbake wouldn't care if dependencies
for the actual task are absent, and neither should printdiff

I wrote a recursive function for following dependencies, as
doing recursive algorithms non-recursively can result in write-only
code, as was the case here.

[YOCTO #15289]

Signed-off-by: Alexander Kanavin <alex@linutronix.de>
---
 bitbake/lib/bb/runqueue.py | 41 ++++++++++++++++++++++++--------------
 1 file changed, 26 insertions(+), 15 deletions(-)
diff mbox series

Patch

diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index 24497c5c173..f54d9b85541 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -1685,6 +1685,17 @@  class RunQueue:
         return
 
     def print_diffscenetasks(self):
+        def get_root_invalid_tasks(task, taskdepends, valid, noexec, visited_invalid):
+            invalidtasks = []
+            for t in taskdepends[task].depends:
+                if t not in valid and t not in visited_invalid:
+                    invalidtasks.extend(get_root_invalid_tasks(t, taskdepends, valid, noexec, visited_invalid))
+                    visited_invalid.add(t)
+
+            direct_invalid = [t for t in taskdepends[task].depends if t not in valid]
+            if not direct_invalid and task not in noexec:
+                invalidtasks = [task]
+            return invalidtasks
 
         noexec = []
         tocheck = set()
@@ -1718,35 +1729,35 @@  class RunQueue:
                     valid_new.add(dep)
 
         invalidtasks = set()
-        for tid in self.rqdata.runtaskentries:
-            if tid not in valid_new and tid not in noexec:
-                invalidtasks.add(tid)
 
-        found = set()
-        processed = set()
-        for tid in invalidtasks:
+        toptasks = set(["{}:{}".format(t[3], t[2]) for t in self.rqdata.targets])
+        for tid in toptasks:
             toprocess = set([tid])
             while toprocess:
                 next = set()
+                visited_invalid = set()
                 for t in toprocess:
-                    for dep in self.rqdata.runtaskentries[t].depends:
-                        if dep in invalidtasks:
-                            found.add(tid)
-                        if dep not in processed:
-                            processed.add(dep)
+                    if t not in valid_new and t not in noexec:
+                        invalidtasks.update(get_root_invalid_tasks(t, self.rqdata.runtaskentries, valid_new, noexec, visited_invalid))
+                        continue
+                    if t in self.rqdata.runq_setscene_tids:
+                        for dep in self.rqexe.sqdata.sq_deps[t]:
                             next.add(dep)
+                        continue
+
+                    for dep in self.rqdata.runtaskentries[t].depends:
+                        next.add(dep)
+
                 toprocess = next
-                if tid in found:
-                    toprocess = set()
 
         tasklist = []
-        for tid in invalidtasks.difference(found):
+        for tid in invalidtasks:
             tasklist.append(tid)
 
         if tasklist:
             bb.plain("The differences between the current build and any cached tasks start at the following tasks:\n" + "\n".join(tasklist))
 
-        return invalidtasks.difference(found)
+        return invalidtasks
 
     def write_diffscenetasks(self, invalidtasks):