diff mbox series

[1/1] runqueue: Fix task execution dependency issue

Message ID 20230924203958.35361-1-gowrishankarstg@gmail.com
State New
Headers show
Series [1/1] runqueue: Fix task execution dependency issue | expand

Commit Message

Gowrishankar Saminathan Sept. 24, 2023, 8:39 p.m. UTC
When the recipe metadata is updated, and the Sstate cache remains unchanged,
a problem arises where tasks dependent on this particular task are found in
both the 'covered' and 'notcovered' lists within the runqueue.

For instance, consider the example of 'base-passwd.bb:do_populate_sysroot',
which serves as a dependent task for both 'systemd.bb:do_populate_sysroot' and
'dbus.bb:do_populate_sysroot.' Due to the lack of ordered task ID storage in the runqueue,
it often leads to the execution of 'systemd.bb::do_populate_sysroot' prior to
'base-passwd.bb:do_populate_sysroot.' This results in 'systemd.bb:do_populate_sysroot'
being erroneously flagged as 'notcovered' by the runqueue.
Nevertheless, sstate later marks it as 'covered' if there is no change in the task output.

To address this issue, this commit enhances the runqueue to store task IDs in an ordered list,
ensuring tasks are carried out in their correct dependency order.
This adjustment will execute 'base-passwd.bb:do_populate_sysroot' as the initial
task and then 'systemd.bb:do_populate_sysroot.'
This strategic execution ensures that 'systemd.bb:do_populate_sysroot' won't
receive an incorrect 'notcovered' status, as 'base-passwd.bb:do_populate_sysroot'
will have already been executed beforehand.

Signed-off-by: Gowrishankar Saminathan <gowrishankarstg@gmail.com>
---
 lib/bb/runqueue.py | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Richard Purdie Sept. 26, 2023, 11:30 a.m. UTC | #1
On Mon, 2023-09-25 at 02:09 +0530, Gowrishankar Saminathan wrote:
> When the recipe metadata is updated, and the Sstate cache remains unchanged,
> a problem arises where tasks dependent on this particular task are found in
> both the 'covered' and 'notcovered' lists within the runqueue.
> 
> For instance, consider the example of 'base-passwd.bb:do_populate_sysroot',
> which serves as a dependent task for both 'systemd.bb:do_populate_sysroot' and
> 'dbus.bb:do_populate_sysroot.' Due to the lack of ordered task ID storage in the runqueue,
> it often leads to the execution of 'systemd.bb::do_populate_sysroot' prior to
> 'base-passwd.bb:do_populate_sysroot.' This results in 'systemd.bb:do_populate_sysroot'
> being erroneously flagged as 'notcovered' by the runqueue.
> Nevertheless, sstate later marks it as 'covered' if there is no change in the task output.
> 
> To address this issue, this commit enhances the runqueue to store task IDs in an ordered list,
> ensuring tasks are carried out in their correct dependency order.
> This adjustment will execute 'base-passwd.bb:do_populate_sysroot' as the initial
> task and then 'systemd.bb:do_populate_sysroot.'
> This strategic execution ensures that 'systemd.bb:do_populate_sysroot' won't
> receive an incorrect 'notcovered' status, as 'base-passwd.bb:do_populate_sysroot'
> will have already been executed beforehand.
> 
> Signed-off-by: Gowrishankar Saminathan <gowrishankarstg@gmail.com>
> ---
>  lib/bb/runqueue.py | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/bb/runqueue.py b/lib/bb/runqueue.py
> index c40a3be2..446b1667 100644
> --- a/lib/bb/runqueue.py
> +++ b/lib/bb/runqueue.py
> @@ -1789,7 +1789,7 @@ class RunQueueExecute:
>          self.sq_live = set()
>  
>          self.updated_taskhash_queue = []
> -        self.pending_migrations = set()
> +        self.pending_migrations = []
>  
>          self.runq_buildable = set()
>          self.runq_running = set()
> @@ -2422,7 +2422,7 @@ class RunQueueExecute:
>  
>      def process_possible_migrations(self):
>  
> -        changed = set()
> +        changed = []
>          toprocess = set()
>          for tid, unihash in self.updated_taskhash_queue.copy():
>              if tid in self.runq_running and tid not in self.runq_complete:
> @@ -2492,7 +2492,7 @@ class RunQueueExecute:
>                      #logger.debug("Task %s hash changes: %s->%s %s->%s" % (tid, orighash, newhash, origuni, newuni))
>                      self.rqdata.runtaskentries[tid].hash = newhash
>                      self.rqdata.runtaskentries[tid].unihash = newuni
> -                    changed.add(tid)
> +                    changed.append(tid)
>  
>                  next |= self.rqdata.runtaskentries[tid].revdeps
>                  total.remove(tid)
> @@ -2510,7 +2510,7 @@ class RunQueueExecute:
>              if tid not in self.rqdata.runq_setscene_tids:
>                  continue
>              if tid not in self.pending_migrations:
> -                self.pending_migrations.add(tid)
> +                self.pending_migrations.append(tid)
>  
>          update_tasks = []
>          for tid in self.pending_migrations.copy():

Thanks for the patch. Which release of bitbake did you find this issue
with? There were fixes for issues like this which were made in master.
I'm worried this patch was developed against an older release and isn't
applicable to master.

Cheers,

Richard
Gowrishankar Saminathan Sept. 26, 2023, 2:59 p.m. UTC | #2
Hello Richard,

I am facing this issue in the BitBake version tagged as yocto-3.1.27 when I
used it in Yocto of the same version.
Upon reviewing the master branch, it appears to have a similar
implementation to yocto-3.1.27 regarding this issue case. Consequently, I
believe this issue should also apply to the master branch.

It would be great if the master branch already includes this fix. If
possible, kindly provide the commit which addresses this issue.

Thanks for reaching me out!

Thanks and regards,
Gowrishankar Saminathan

On Tue, Sep 26, 2023 at 5:00 PM Richard Purdie <
richard.purdie@linuxfoundation.org> wrote:

> On Mon, 2023-09-25 at 02:09 +0530, Gowrishankar Saminathan wrote:
> > When the recipe metadata is updated, and the Sstate cache remains
> unchanged,
> > a problem arises where tasks dependent on this particular task are found
> in
> > both the 'covered' and 'notcovered' lists within the runqueue.
> >
> > For instance, consider the example of 'base-passwd.bb:
> do_populate_sysroot',
> > which serves as a dependent task for both 'systemd.bb:do_populate_sysroot'
> and
> > 'dbus.bb:do_populate_sysroot.' Due to the lack of ordered task ID
> storage in the runqueue,
> > it often leads to the execution of 'systemd.bb::do_populate_sysroot'
> prior to
> > 'base-passwd.bb:do_populate_sysroot.' This results in 'systemd.bb:
> do_populate_sysroot'
> > being erroneously flagged as 'notcovered' by the runqueue.
> > Nevertheless, sstate later marks it as 'covered' if there is no change
> in the task output.
> >
> > To address this issue, this commit enhances the runqueue to store task
> IDs in an ordered list,
> > ensuring tasks are carried out in their correct dependency order.
> > This adjustment will execute 'base-passwd.bb:do_populate_sysroot' as
> the initial
> > task and then 'systemd.bb:do_populate_sysroot.'
> > This strategic execution ensures that 'systemd.bb:do_populate_sysroot'
> won't
> > receive an incorrect 'notcovered' status, as 'base-passwd.bb:
> do_populate_sysroot'
> > will have already been executed beforehand.
> >
> > Signed-off-by: Gowrishankar Saminathan <gowrishankarstg@gmail.com>
> > ---
> >  lib/bb/runqueue.py | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/lib/bb/runqueue.py b/lib/bb/runqueue.py
> > index c40a3be2..446b1667 100644
> > --- a/lib/bb/runqueue.py
> > +++ b/lib/bb/runqueue.py
> > @@ -1789,7 +1789,7 @@ class RunQueueExecute:
> >          self.sq_live = set()
> >
> >          self.updated_taskhash_queue = []
> > -        self.pending_migrations = set()
> > +        self.pending_migrations = []
> >
> >          self.runq_buildable = set()
> >          self.runq_running = set()
> > @@ -2422,7 +2422,7 @@ class RunQueueExecute:
> >
> >      def process_possible_migrations(self):
> >
> > -        changed = set()
> > +        changed = []
> >          toprocess = set()
> >          for tid, unihash in self.updated_taskhash_queue.copy():
> >              if tid in self.runq_running and tid not in
> self.runq_complete:
> > @@ -2492,7 +2492,7 @@ class RunQueueExecute:
> >                      #logger.debug("Task %s hash changes: %s->%s %s->%s"
> % (tid, orighash, newhash, origuni, newuni))
> >                      self.rqdata.runtaskentries[tid].hash = newhash
> >                      self.rqdata.runtaskentries[tid].unihash = newuni
> > -                    changed.add(tid)
> > +                    changed.append(tid)
> >
> >                  next |= self.rqdata.runtaskentries[tid].revdeps
> >                  total.remove(tid)
> > @@ -2510,7 +2510,7 @@ class RunQueueExecute:
> >              if tid not in self.rqdata.runq_setscene_tids:
> >                  continue
> >              if tid not in self.pending_migrations:
> > -                self.pending_migrations.add(tid)
> > +                self.pending_migrations.append(tid)
> >
> >          update_tasks = []
> >          for tid in self.pending_migrations.copy():
>
> Thanks for the patch. Which release of bitbake did you find this issue
> with? There were fixes for issues like this which were made in master.
> I'm worried this patch was developed against an older release and isn't
> applicable to master.
>
> Cheers,
>
> Richard
>
>
Richard Purdie Sept. 26, 2023, 7:47 p.m. UTC | #3
On Tue, 2023-09-26 at 20:29 +0530, Gowrishankar Saminathan wrote:
> Hello Richard,
> 
> I am facing this issue in the BitBake version tagged as yocto-3.1.27
> when I used it in Yocto of the same version. 
> Upon reviewing the master branch, it appears to have a similar
> implementation to yocto-3.1.27 regarding this issue case.
> Consequently, I believe this issue should also apply to the master
> branch.

Whilst your patch might apply, I'm not convinced it is necessary. I'd
like you to reproduce the issue with master please as I'm not sure you
will be able to.

> 
> It would be great if the master branch already includes this fix. If
> possible, kindly provide the commit which addresses this issue.
> 
> Thanks for reaching me out!

There were multiple patches fixing a variety of issues. Some like this
one appear to be in dunfell (3.1).

https://git.yoctoproject.org/poky/commit/bitbake?id=7ea37b929153e112b8d843c05fab06354aa575fc

Some like this one appear not to be:

https://git.yoctoproject.org/poky/commit/bitbake?id=9982b2c0f2a4bf6d451b532856ca6cf126985f22

and this one too:

https://git.yoctoproject.org/poky/commit/bitbake/lib/bb/runqueue.py?id=5fdd28e37face59410da781d5d70178565e69e5e

I'd look at the changes to runqueue.py between dunfell and master as I
do suspect this is fixed there, just not the way your patch attempts
to.

Cheers,

Richard
Gowrishankar Saminathan Oct. 3, 2023, 2:51 a.m. UTC | #4
Hello Richard,

I wanted to express my gratitude for sharing the patches with me for
testing . I have tried all the commit you provided, with the exception of
5fdd28e37face59410da781d5d70178565e69e5e
<https://git.yoctoproject.org/poky/commit/bitbake/lib/bb/runqueue.py?id=5fdd28e37face59410da781d5d70178565e69e5e>,
applied on top of yocto-3.1.27 . With other two commits issue still seems
to appear.
Interestingly *I couldn't reproduce the issue at all in "master" branch*,
It seems highly likely that the issue has already been resolved, and the
fix is readily available in the "master" branch, just as you mentioned.
Thank you for your assistance in addressing this matter.

Best Regards,
Gowrishanakr Saminathan





On Wed, Sep 27, 2023 at 1:17 AM Richard Purdie <
richard.purdie@linuxfoundation.org> wrote:

> On Tue, 2023-09-26 at 20:29 +0530, Gowrishankar Saminathan wrote:
> > Hello Richard,
> >
> > I am facing this issue in the BitBake version tagged as yocto-3.1.27
> > when I used it in Yocto of the same version.
> > Upon reviewing the master branch, it appears to have a similar
> > implementation to yocto-3.1.27 regarding this issue case.
> > Consequently, I believe this issue should also apply to the master
> > branch.
>
> Whilst your patch might apply, I'm not convinced it is necessary. I'd
> like you to reproduce the issue with master please as I'm not sure you
> will be able to.
>
> >
> > It would be great if the master branch already includes this fix. If
> > possible, kindly provide the commit which addresses this issue.
> >
> > Thanks for reaching me out!
>
> There were multiple patches fixing a variety of issues. Some like this
> one appear to be in dunfell (3.1).
>
>
> https://git.yoctoproject.org/poky/commit/bitbake?id=7ea37b929153e112b8d843c05fab06354aa575fc
>
> Some like this one appear not to be:
>
>
> https://git.yoctoproject.org/poky/commit/bitbake?id=9982b2c0f2a4bf6d451b532856ca6cf126985f22
>
> and this one too:
>
>
> https://git.yoctoproject.org/poky/commit/bitbake/lib/bb/runqueue.py?id=5fdd28e37face59410da781d5d70178565e69e5e
>
> I'd look at the changes to runqueue.py between dunfell and master as I
> do suspect this is fixed there, just not the way your patch attempts
> to.
>
> Cheers,
>
> Richard
>
>
>
>
>
>
diff mbox series

Patch

diff --git a/lib/bb/runqueue.py b/lib/bb/runqueue.py
index c40a3be2..446b1667 100644
--- a/lib/bb/runqueue.py
+++ b/lib/bb/runqueue.py
@@ -1789,7 +1789,7 @@  class RunQueueExecute:
         self.sq_live = set()
 
         self.updated_taskhash_queue = []
-        self.pending_migrations = set()
+        self.pending_migrations = []
 
         self.runq_buildable = set()
         self.runq_running = set()
@@ -2422,7 +2422,7 @@  class RunQueueExecute:
 
     def process_possible_migrations(self):
 
-        changed = set()
+        changed = []
         toprocess = set()
         for tid, unihash in self.updated_taskhash_queue.copy():
             if tid in self.runq_running and tid not in self.runq_complete:
@@ -2492,7 +2492,7 @@  class RunQueueExecute:
                     #logger.debug("Task %s hash changes: %s->%s %s->%s" % (tid, orighash, newhash, origuni, newuni))
                     self.rqdata.runtaskentries[tid].hash = newhash
                     self.rqdata.runtaskentries[tid].unihash = newuni
-                    changed.add(tid)
+                    changed.append(tid)
 
                 next |= self.rqdata.runtaskentries[tid].revdeps
                 total.remove(tid)
@@ -2510,7 +2510,7 @@  class RunQueueExecute:
             if tid not in self.rqdata.runq_setscene_tids:
                 continue
             if tid not in self.pending_migrations:
-                self.pending_migrations.add(tid)
+                self.pending_migrations.append(tid)
 
         update_tasks = []
         for tid in self.pending_migrations.copy():