Patchwork [bitbake-devel,5/8] runqueue.py: check results[0] in keys of build_pids before being used to avoid exceptions

login
register
mail settings
Submitter Shane Wang
Date March 29, 2012, 12:54 p.m.
Message ID <b0f25a4ae7b257c5e0631a3e5c1f90facf25aca6.1333025491.git.shane.wang@intel.com>
Download mbox | patch
Permalink /patch/24861/
State New
Headers show

Comments

Shane Wang - March 29, 2012, 12:54 p.m.
[Yocto #2186]

Signed-off-by: Shane Wang <shane.wang@intel.com>
---
 bitbake/lib/bb/runqueue.py |   20 ++++++++++++--------
 1 files changed, 12 insertions(+), 8 deletions(-)
Richard Purdie - March 29, 2012, 8:04 p.m.
On Thu, 2012-03-29 at 20:54 +0800, Shane Wang wrote:
> 
> 
> ne.wang@intel.com>
>                                To: 
> bitbake-devel@lists.openembedded.org
>                           Subject: 
> [bitbake-devel] [PATCH 5/8]
> runqueue.py: check results[0] in
> keys of build_pids before being
> used to avoid exceptions
>                              Date: 
> Thu, 29 Mar 2012 20:54:54 +0800
> (29/03/12 13:54:54)
> 
> 
> [Yocto #2186]
> 
> Signed-off-by: Shane Wang <shane.wang@intel.com>
> ---
>  bitbake/lib/bb/runqueue.py |   20 ++++++++++++--------
>  1 files changed, 12 insertions(+), 8 deletions(-)

This kind of change sets off alarm bells. The big question is why are
you seeing exceptions? I suspect you're forking off processes within hob
which are then confusing the waitpid code. I'd have to ask why the UI is
forking processes when a build is running and why we're suddenly started
seeing this...

So can you please explain the problem further so we can fix the real
problem? I did look at #2186 but that doesn't help me either :(

Cheers,

Richard
Shane Wang - March 30, 2012, 6:10 a.m.
Richard Purdie wrote on 2012-03-30:

> On Thu, 2012-03-29 at 20:54 +0800, Shane Wang wrote:

>> 

>> 

>> ne.wang@intel.com>

>>                                To:

>> bitbake-devel@lists.openembedded.org

>>                           Subject:

>> [bitbake-devel] [PATCH 5/8]

>> runqueue.py: check results[0] in

>> keys of build_pids before being

>> used to avoid exceptions

>>                              Date:

>> Thu, 29 Mar 2012 20:54:54 +0800

>> (29/03/12 13:54:54)

>> 

>> 

>> [Yocto #2186]

>> 

>> Signed-off-by: Shane Wang <shane.wang@intel.com>

>> ---

>>  bitbake/lib/bb/runqueue.py |   20 ++++++++++++--------

>>  1 files changed, 12 insertions(+), 8 deletions(-)

> 

> This kind of change sets off alarm bells. The big question is why are

> you seeing exceptions? I suspect you're forking off processes within hob

> which are then confusing the waitpid code. I'd have to ask why the UI is

> forking processes when a build is running and why we're suddenly started

> seeing this...

The steps I did is to "Force stop" a build and click "build packages" to rebuild. Then I saw the exceptions.
In the command mode, there is no issue because the process exits.

In finish_now() in runqueue.py, os.kill() kills all sub-processes but they don't exit.
when I start a new build, self.build_pids is empty, but due to the above reason, os.waitpid still can get the value of pids.


> 

> So can you please explain the problem further so we can fix the real

> problem? I did look at #2186 but that doesn't help me either :(

OK, I am going to submit another patch, but I think the condition check is also needed.
Otherwise, in the current code of runqueue_process_waitpid(), why do we have:
        if result[0] in self.build_stamps.keys():
            del self.build_stamps[result[0]]


> 

> Cheers,

> 

> Richard


--
Shane
Shane Wang - March 30, 2012, 6:11 a.m.
By the way, I have never met the exception when I do "normally stop" the bitbake.

--
Shane

Wang, Shane wrote on 2012-03-30:

> Richard Purdie wrote on 2012-03-30:

> 

>> On Thu, 2012-03-29 at 20:54 +0800, Shane Wang wrote:

>>> 

>>> 

>>> ne.wang@intel.com>

>>>                                To:

>>> bitbake-devel@lists.openembedded.org

>>>                           Subject:

>>> [bitbake-devel] [PATCH 5/8]

>>> runqueue.py: check results[0] in

>>> keys of build_pids before being

>>> used to avoid exceptions

>>>                              Date:

>>> Thu, 29 Mar 2012 20:54:54 +0800

>>> (29/03/12 13:54:54)

>>> 

>>> 

>>> [Yocto #2186]

>>> 

>>> Signed-off-by: Shane Wang <shane.wang@intel.com>

>>> ---

>>>  bitbake/lib/bb/runqueue.py |   20 ++++++++++++--------

>>>  1 files changed, 12 insertions(+), 8 deletions(-)

>> 

>> This kind of change sets off alarm bells. The big question is why are

>> you seeing exceptions? I suspect you're forking off processes within hob

>> which are then confusing the waitpid code. I'd have to ask why the UI is

>> forking processes when a build is running and why we're suddenly started

>> seeing this...

> The steps I did is to "Force stop" a build and click "build packages" to rebuild.

> Then I saw the exceptions.

> In the command mode, there is no issue because the process exits.

> 

> In finish_now() in runqueue.py, os.kill() kills all sub-processes but

> they don't exit. when I start a new build, self.build_pids is empty, but

> due to the above reason, os.waitpid still can get the value of pids.

> 

> 

>> 

>> So can you please explain the problem further so we can fix the real

>> problem? I did look at #2186 but that doesn't help me either :(

> OK, I am going to submit another patch, but I think the condition check

> is also needed. Otherwise, in the current code of

> runqueue_process_waitpid(), why do we have:

>         if result[0] in self.build_stamps.keys():

>             del self.build_stamps[result[0]]

> 

>> 

>> Cheers,

>> 

>> Richard

>
Chris Larson - March 30, 2012, 6:15 a.m.
On Thu, Mar 29, 2012 at 11:10 PM, Wang, Shane <shane.wang@intel.com> wrote:
>> So can you please explain the problem further so we can fix the real
>> problem? I did look at #2186 but that doesn't help me either :(
> OK, I am going to submit another patch, but I think the condition check is also needed.
> Otherwise, in the current code of runqueue_process_waitpid(), why do we have:
>        if result[0] in self.build_stamps.keys():
>            del self.build_stamps[result[0]]

This is also off, from a code standpoint, even assuming it's needed.
There's no need to use the keys method at all for a map. 'in' against
a map automatically checks by key. result[0] in self.build_stamps.
Richard Purdie - March 30, 2012, 9:27 a.m.
On Fri, 2012-03-30 at 06:10 +0000, Wang, Shane wrote:
> Richard Purdie wrote on 2012-03-30:
> 
> > On Thu, 2012-03-29 at 20:54 +0800, Shane Wang wrote:
> >> 
> >> 
> >> ne.wang@intel.com>
> >>                                To:
> >> bitbake-devel@lists.openembedded.org
> >>                           Subject:
> >> [bitbake-devel] [PATCH 5/8]
> >> runqueue.py: check results[0] in
> >> keys of build_pids before being
> >> used to avoid exceptions
> >>                              Date:
> >> Thu, 29 Mar 2012 20:54:54 +0800
> >> (29/03/12 13:54:54)
> >> 
> >> 
> >> [Yocto #2186]
> >> 
> >> Signed-off-by: Shane Wang <shane.wang@intel.com>
> >> ---
> >>  bitbake/lib/bb/runqueue.py |   20 ++++++++++++--------
> >>  1 files changed, 12 insertions(+), 8 deletions(-)
> > 
> > This kind of change sets off alarm bells. The big question is why are
> > you seeing exceptions? I suspect you're forking off processes within hob
> > which are then confusing the waitpid code. I'd have to ask why the UI is
> > forking processes when a build is running and why we're suddenly started
> > seeing this...
> The steps I did is to "Force stop" a build and click "build packages" to rebuild. Then I saw the exceptions.
> In the command mode, there is no issue because the process exits.

Ok, so what it sounds like is that waitpid() is not being called in the
"force stop" mode to collect the exit values of the processes. We should
fix the code to collect the exit values even in force stop mode.

Cheers,

Richard (resisting the urge to talk about reaping and zombies)

Patch

diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index 6970548..67ad14b 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -1049,17 +1049,21 @@  class RunQueueExecute:
         result = os.waitpid(-1, os.WNOHANG)
         if result[0] == 0 and result[1] == 0:
             return None
-        task = self.build_pids[result[0]]
-        del self.build_pids[result[0]]
-        self.build_pipes[result[0]].close()
-        del self.build_pipes[result[0]]
+        task = None
+        if result[0] in self.build_pids.keys():
+            task = self.build_pids[result[0]]
+            del self.build_pids[result[0]]
+        if result[0] in self.build_pipes.keys():
+            self.build_pipes[result[0]].close()
+            del self.build_pipes[result[0]]
         # self.build_stamps[result[0]] may not exist when use shared work directory.
         if result[0] in self.build_stamps.keys():
             del self.build_stamps[result[0]]
-        if result[1] != 0:
-            self.task_fail(task, result[1]>>8)
-        else:
-            self.task_complete(task)
+        if task:
+            if result[1] != 0:
+                self.task_fail(task, result[1]>>8)
+            else:
+                self.task_complete(task)
         return True
 
     def finish_now(self):