Patchwork [bitbake-devel] command/runqueue: Fix shutdown logic

login
register
mail settings
Submitter Richard Purdie
Date July 21, 2014, 8:35 a.m.
Message ID <1405931753.22985.93.camel@ted>
Download mbox | patch
Permalink /patch/76167/
State New
Headers show

Comments

Richard Purdie - July 21, 2014, 8:35 a.m.
If you hit Ctrl+C at the right point, the system processes the request
but merrily continues building. It turns out finish_runqueue() is called
but this doesn't stop the later generation and execution of the
runqueue.

This patch adjusts some of the conditionals to ensure the build really
does stop.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Martin Jansa - July 22, 2014, 2:46 p.m.
On Mon, Jul 21, 2014 at 09:35:53AM +0100, Richard Purdie wrote:
> If you hit Ctrl+C at the right point, the system processes the request
> but merrily continues building. It turns out finish_runqueue() is called
> but this doesn't stop the later generation and execution of the
> runqueue.
> 
> This patch adjusts some of the conditionals to ensure the build really
> does stop.

Great, I've included this change in my world builds to see if it fixes
bitbake still running after jenkins job is aborted.

I don't think it's caused by this change and I don't know how much we
can do about it, but today I was testing snort build (which eats all
memory in m4 call until OOMK kills it) and when I wanted to interrupt
the build it failed with 2 tracebacks:

NOTE: Preparing runqueue
NOTE: Executing SetScene Tasks
NOTE: Executing RunQueue Tasks
NOTE: Running task 569 of 610 (ID: 5, /OE/build/oe-core/meta-openembedded/meta-networking/recipes-connectivity/snort/snort_2.9.6.0.bb, do_configure)
NOTE: recipe snort-2.9.6.0-r0: task do_configure: Started
^C^C^C^C^C^C^C^C^C^C^C^C^C^CTraceback (most recent call last):
  File "/OE/build/oe-core/bitbake/bin/bitbake", line 382, in <module>
    ret = main()
  File "/OE/build/oe-core/bitbake/bin/bitbake", line 372, in main
    bb.event.ui_queue = []
KeyboardInterrupt
^CException KeyboardInterrupt in <module 'threading' from '/usr/lib64/python2.7/threading.pyc'> ignored

^CError in atexit._run_exitfuncs:
^CError in sys.exitfunc:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/atexit.py", line 30, in _run_exitfuncs
    traceback.print_exc()
  File "/usr/lib64/python2.7/traceback.py", line 233, in print_exc
    print_exception(etype, value, tb, limit, file)
  File "/usr/lib64/python2.7/traceback.py", line 110, in print_exception
    def print_exception(etype, value, tb, limit=None, file=None):
KeyboardInterrupt

There was also about 5 minute delay between first 2 Ctrl+C and actual
exit, but that could be caused by huge load caused by that faulty m4.

Regards,

> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
> 
> diff --git a/bitbake/lib/bb/command.py b/bitbake/lib/bb/command.py
> index 84fcdf9..d797fcf 100644
> --- a/bitbake/lib/bb/command.py
> +++ b/bitbake/lib/bb/command.py
> @@ -86,7 +86,7 @@ class Command:
>  
>      def runAsyncCommand(self):
>          try:
> -            if self.cooker.state == bb.cooker.state.error:
> +            if self.cooker.state in (bb.cooker.state.error, bb.cooker.state.shutdown, bb.cooker.state.forceshutdown):
>                  return False
>              if self.currentAsyncCommand is not None:
>                  (command, options) = self.currentAsyncCommand
> diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
> index 4ea4970..f68a11d 100644
> --- a/bitbake/lib/bb/runqueue.py
> +++ b/bitbake/lib/bb/runqueue.py
> @@ -1065,7 +1065,7 @@ class RunQueue:
>          if self.state is runQueueCleanUp:
>             self.rqexe.finish()
>  
> -        if self.state is runQueueComplete or self.state is runQueueFailed:
> +        if (self.state is runQueueComplete or self.state is runQueueFailed) and self.rqexe:
>              self.teardown_workers()
>              if self.rqexe.stats.failed:
>                  logger.info("Tasks Summary: Attempted %d tasks of which %d didn't need to be rerun and %d failed.", self.rqexe.stats.completed + self.rqexe.stats.failed, self.rqexe.stats.skipped, self.rqexe.stats.failed)
> @@ -1106,6 +1106,7 @@ class RunQueue:
>  
>      def finish_runqueue(self, now = False):
>          if not self.rqexe:
> +            self.state = runQueueComplete
>              return
>  
>          if now:
> -- 
> cgit v0.10.1
> 
> 
> 
> -- 
> _______________________________________________
> bitbake-devel mailing list
> bitbake-devel@lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/bitbake-devel
Richard Purdie - July 22, 2014, 6:10 p.m.
On Tue, 2014-07-22 at 16:46 +0200, Martin Jansa wrote:
> On Mon, Jul 21, 2014 at 09:35:53AM +0100, Richard Purdie wrote:
> > If you hit Ctrl+C at the right point, the system processes the request
> > but merrily continues building. It turns out finish_runqueue() is called
> > but this doesn't stop the later generation and execution of the
> > runqueue.
> > 
> > This patch adjusts some of the conditionals to ensure the build really
> > does stop.
> 
> Great, I've included this change in my world builds to see if it fixes
> bitbake still running after jenkins job is aborted.

I've seen that too and I don't think this fix will address that
unfortunately. Its on my list of things to look into.

> I don't think it's caused by this change and I don't know how much we
> can do about it, but today I was testing snort build (which eats all
> memory in m4 call until OOMK kills it) 

I ended up excluding snort from my builds for that reason. Builds go a
lot faster when its not destroying the machine!

> and when I wanted to interrupt
> the build it failed with 2 tracebacks:
> 
> NOTE: Preparing runqueue
> NOTE: Executing SetScene Tasks
> NOTE: Executing RunQueue Tasks
> NOTE: Running task 569 of 610 (ID: 5, /OE/build/oe-core/meta-openembedded/meta-networking/recipes-connectivity/snort/snort_2.9.6.0.bb, do_configure)
> NOTE: recipe snort-2.9.6.0-r0: task do_configure: Started
> ^C^C^C^C^C^C^C^C^C^C^C^C^C^CTraceback (most recent call last):
>   File "/OE/build/oe-core/bitbake/bin/bitbake", line 382, in <module>
>     ret = main()
>   File "/OE/build/oe-core/bitbake/bin/bitbake", line 372, in main
>     bb.event.ui_queue = []
> KeyboardInterrupt
> ^CException KeyboardInterrupt in <module 'threading' from '/usr/lib64/python2.7/threading.pyc'> ignored
> 
> ^CError in atexit._run_exitfuncs:
> ^CError in sys.exitfunc:
> Traceback (most recent call last):
>   File "/usr/lib64/python2.7/atexit.py", line 30, in _run_exitfuncs
>     traceback.print_exc()
>   File "/usr/lib64/python2.7/traceback.py", line 233, in print_exc
>     print_exception(etype, value, tb, limit, file)
>   File "/usr/lib64/python2.7/traceback.py", line 110, in print_exception
>     def print_exception(etype, value, tb, limit=None, file=None):
> KeyboardInterrupt
> 
> There was also about 5 minute delay between first 2 Ctrl+C and actual
> exit, but that could be caused by huge load caused by that faulty m4.

Thanks, I'll have a look at those and see if they're significant and if
we can do anything about them. It may be they are "one offs" and
unlikely to reproduce but we'll see.

Cheers,

Richard
Martin Jansa - July 22, 2014, 6:51 p.m.
On Tue, Jul 22, 2014 at 07:10:31PM +0100, Richard Purdie wrote:
> On Tue, 2014-07-22 at 16:46 +0200, Martin Jansa wrote:
> > On Mon, Jul 21, 2014 at 09:35:53AM +0100, Richard Purdie wrote:
> > > If you hit Ctrl+C at the right point, the system processes the request
> > > but merrily continues building. It turns out finish_runqueue() is called
> > > but this doesn't stop the later generation and execution of the
> > > runqueue.
> > > 
> > > This patch adjusts some of the conditionals to ensure the build really
> > > does stop.
> > 
> > Great, I've included this change in my world builds to see if it fixes
> > bitbake still running after jenkins job is aborted.
> 
> I've seen that too and I don't think this fix will address that
> unfortunately. Its on my list of things to look into.
> 
> > I don't think it's caused by this change and I don't know how much we
> > can do about it, but today I was testing snort build (which eats all
> > memory in m4 call until OOMK kills it) 
> 
> I ended up excluding snort from my builds for that reason. Builds go a
> lot faster when its not destroying the machine!

adding pkgconfig inherit fixed that, but indeed strange side effect

and it seems to trigger this memory eating only in combination with more
restrictive m4 dependencies or foreign flag (but I got OOM even after
patching snort's configure.in to pass foreign).

> > and when I wanted to interrupt
> > the build it failed with 2 tracebacks:
> > 
> > NOTE: Preparing runqueue
> > NOTE: Executing SetScene Tasks
> > NOTE: Executing RunQueue Tasks
> > NOTE: Running task 569 of 610 (ID: 5, /OE/build/oe-core/meta-openembedded/meta-networking/recipes-connectivity/snort/snort_2.9.6.0.bb, do_configure)
> > NOTE: recipe snort-2.9.6.0-r0: task do_configure: Started
> > ^C^C^C^C^C^C^C^C^C^C^C^C^C^CTraceback (most recent call last):
> >   File "/OE/build/oe-core/bitbake/bin/bitbake", line 382, in <module>
> >     ret = main()
> >   File "/OE/build/oe-core/bitbake/bin/bitbake", line 372, in main
> >     bb.event.ui_queue = []
> > KeyboardInterrupt
> > ^CException KeyboardInterrupt in <module 'threading' from '/usr/lib64/python2.7/threading.pyc'> ignored
> > 
> > ^CError in atexit._run_exitfuncs:
> > ^CError in sys.exitfunc:
> > Traceback (most recent call last):
> >   File "/usr/lib64/python2.7/atexit.py", line 30, in _run_exitfuncs
> >     traceback.print_exc()
> >   File "/usr/lib64/python2.7/traceback.py", line 233, in print_exc
> >     print_exception(etype, value, tb, limit, file)
> >   File "/usr/lib64/python2.7/traceback.py", line 110, in print_exception
> >     def print_exception(etype, value, tb, limit=None, file=None):
> > KeyboardInterrupt
> > 
> > There was also about 5 minute delay between first 2 Ctrl+C and actual
> > exit, but that could be caused by huge load caused by that faulty m4.
> 
> Thanks, I'll have a look at those and see if they're significant and if
> we can do anything about them. It may be they are "one offs" and
> unlikely to reproduce but we'll see.
> 
> Cheers,
> 
> Richard
>

Patch

diff --git a/bitbake/lib/bb/command.py b/bitbake/lib/bb/command.py
index 84fcdf9..d797fcf 100644
--- a/bitbake/lib/bb/command.py
+++ b/bitbake/lib/bb/command.py
@@ -86,7 +86,7 @@  class Command:
 
     def runAsyncCommand(self):
         try:
-            if self.cooker.state == bb.cooker.state.error:
+            if self.cooker.state in (bb.cooker.state.error, bb.cooker.state.shutdown, bb.cooker.state.forceshutdown):
                 return False
             if self.currentAsyncCommand is not None:
                 (command, options) = self.currentAsyncCommand
diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index 4ea4970..f68a11d 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -1065,7 +1065,7 @@  class RunQueue:
         if self.state is runQueueCleanUp:
            self.rqexe.finish()
 
-        if self.state is runQueueComplete or self.state is runQueueFailed:
+        if (self.state is runQueueComplete or self.state is runQueueFailed) and self.rqexe:
             self.teardown_workers()
             if self.rqexe.stats.failed:
                 logger.info("Tasks Summary: Attempted %d tasks of which %d didn't need to be rerun and %d failed.", self.rqexe.stats.completed + self.rqexe.stats.failed, self.rqexe.stats.skipped, self.rqexe.stats.failed)
@@ -1106,6 +1106,7 @@  class RunQueue:
 
     def finish_runqueue(self, now = False):
         if not self.rqexe:
+            self.state = runQueueComplete
             return
 
         if now: