[bitbake-devel,5/8] runqueue.py: check results[0] in keys of build_pids before being used to avoid exceptions

Submitted by Shane Wang on March 29, 2012, 12:54 p.m.

Details

Message ID b0f25a4ae7b257c5e0631a3e5c1f90facf25aca6.1333025491.git.shane.wang@intel.com
State New
Headers show

Commit Message

Shane Wang March 29, 2012, 12:54 p.m.
[Yocto #2186]

Signed-off-by: Shane Wang <shane.wang@intel.com>
---
 bitbake/lib/bb/runqueue.py |   20 ++++++++++++--------
 1 files changed, 12 insertions(+), 8 deletions(-)

Patch hide | download patch | download mbox

diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index 6970548..67ad14b 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -1049,17 +1049,21 @@  class RunQueueExecute:
         result = os.waitpid(-1, os.WNOHANG)
         if result[0] == 0 and result[1] == 0:
             return None
-        task = self.build_pids[result[0]]
-        del self.build_pids[result[0]]
-        self.build_pipes[result[0]].close()
-        del self.build_pipes[result[0]]
+        task = None
+        if result[0] in self.build_pids.keys():
+            task = self.build_pids[result[0]]
+            del self.build_pids[result[0]]
+        if result[0] in self.build_pipes.keys():
+            self.build_pipes[result[0]].close()
+            del self.build_pipes[result[0]]
         # self.build_stamps[result[0]] may not exist when use shared work directory.
         if result[0] in self.build_stamps.keys():
             del self.build_stamps[result[0]]
-        if result[1] != 0:
-            self.task_fail(task, result[1]>>8)
-        else:
-            self.task_complete(task)
+        if task:
+            if result[1] != 0:
+                self.task_fail(task, result[1]>>8)
+            else:
+                self.task_complete(task)
         return True
 
     def finish_now(self):

Comments

Richard Purdie March 29, 2012, 8:04 p.m.
On Thu, 2012-03-29 at 20:54 +0800, Shane Wang wrote:
> 
> 
> ne.wang@intel.com>
>                                To: 
> bitbake-devel@lists.openembedded.org
>                           Subject: 
> [bitbake-devel] [PATCH 5/8]
> runqueue.py: check results[0] in
> keys of build_pids before being
> used to avoid exceptions
>                              Date: 
> Thu, 29 Mar 2012 20:54:54 +0800
> (29/03/12 13:54:54)
> 
> 
> [Yocto #2186]
> 
> Signed-off-by: Shane Wang <shane.wang@intel.com>
> ---
>  bitbake/lib/bb/runqueue.py |   20 ++++++++++++--------
>  1 files changed, 12 insertions(+), 8 deletions(-)

This kind of change sets off alarm bells. The big question is why are
you seeing exceptions? I suspect you're forking off processes within hob
which are then confusing the waitpid code. I'd have to ask why the UI is
forking processes when a build is running and why we're suddenly started
seeing this...

So can you please explain the problem further so we can fix the real
problem? I did look at #2186 but that doesn't help me either :(

Cheers,

Richard
Shane Wang March 30, 2012, 6:10 a.m.
Richard Purdie wrote on 2012-03-30:

> On Thu, 2012-03-29 at 20:54 +0800, Shane Wang wrote:

>> 

>> 

>> ne.wang@intel.com>

>>                                To:

>> bitbake-devel@lists.openembedded.org

>>                           Subject:

>> [bitbake-devel] [PATCH 5/8]

>> runqueue.py: check results[0] in

>> keys of build_pids before being

>> used to avoid exceptions

>>                              Date:

>> Thu, 29 Mar 2012 20:54:54 +0800

>> (29/03/12 13:54:54)

>> 

>> 

>> [Yocto #2186]

>> 

>> Signed-off-by: Shane Wang <shane.wang@intel.com>

>> ---

>>  bitbake/lib/bb/runqueue.py |   20 ++++++++++++--------

>>  1 files changed, 12 insertions(+), 8 deletions(-)

> 

> This kind of change sets off alarm bells. The big question is why are

> you seeing exceptions? I suspect you're forking off processes within hob

> which are then confusing the waitpid code. I'd have to ask why the UI is

> forking processes when a build is running and why we're suddenly started

> seeing this...

The steps I did is to "Force stop" a build and click "build packages" to rebuild. Then I saw the exceptions.
In the command mode, there is no issue because the process exits.

In finish_now() in runqueue.py, os.kill() kills all sub-processes but they don't exit.
when I start a new build, self.build_pids is empty, but due to the above reason, os.waitpid still can get the value of pids.


> 

> So can you please explain the problem further so we can fix the real

> problem? I did look at #2186 but that doesn't help me either :(

OK, I am going to submit another patch, but I think the condition check is also needed.
Otherwise, in the current code of runqueue_process_waitpid(), why do we have:
        if result[0] in self.build_stamps.keys():
            del self.build_stamps[result[0]]


> 

> Cheers,

> 

> Richard


--
Shane
Shane Wang March 30, 2012, 6:11 a.m.
By the way, I have never met the exception when I do "normally stop" the bitbake.

--
Shane

Wang, Shane wrote on 2012-03-30:

> Richard Purdie wrote on 2012-03-30:

> 

>> On Thu, 2012-03-29 at 20:54 +0800, Shane Wang wrote:

>>> 

>>> 

>>> ne.wang@intel.com>

>>>                                To:

>>> bitbake-devel@lists.openembedded.org

>>>                           Subject:

>>> [bitbake-devel] [PATCH 5/8]

>>> runqueue.py: check results[0] in

>>> keys of build_pids before being

>>> used to avoid exceptions

>>>                              Date:

>>> Thu, 29 Mar 2012 20:54:54 +0800

>>> (29/03/12 13:54:54)

>>> 

>>> 

>>> [Yocto #2186]

>>> 

>>> Signed-off-by: Shane Wang <shane.wang@intel.com>

>>> ---

>>>  bitbake/lib/bb/runqueue.py |   20 ++++++++++++--------

>>>  1 files changed, 12 insertions(+), 8 deletions(-)

>> 

>> This kind of change sets off alarm bells. The big question is why are

>> you seeing exceptions? I suspect you're forking off processes within hob

>> which are then confusing the waitpid code. I'd have to ask why the UI is

>> forking processes when a build is running and why we're suddenly started

>> seeing this...

> The steps I did is to "Force stop" a build and click "build packages" to rebuild.

> Then I saw the exceptions.

> In the command mode, there is no issue because the process exits.

> 

> In finish_now() in runqueue.py, os.kill() kills all sub-processes but

> they don't exit. when I start a new build, self.build_pids is empty, but

> due to the above reason, os.waitpid still can get the value of pids.

> 

> 

>> 

>> So can you please explain the problem further so we can fix the real

>> problem? I did look at #2186 but that doesn't help me either :(

> OK, I am going to submit another patch, but I think the condition check

> is also needed. Otherwise, in the current code of

> runqueue_process_waitpid(), why do we have:

>         if result[0] in self.build_stamps.keys():

>             del self.build_stamps[result[0]]

> 

>> 

>> Cheers,

>> 

>> Richard

>
Chris Larson March 30, 2012, 6:15 a.m.
On Thu, Mar 29, 2012 at 11:10 PM, Wang, Shane <shane.wang@intel.com> wrote:
>> So can you please explain the problem further so we can fix the real
>> problem? I did look at #2186 but that doesn't help me either :(
> OK, I am going to submit another patch, but I think the condition check is also needed.
> Otherwise, in the current code of runqueue_process_waitpid(), why do we have:
>        if result[0] in self.build_stamps.keys():
>            del self.build_stamps[result[0]]

This is also off, from a code standpoint, even assuming it's needed.
There's no need to use the keys method at all for a map. 'in' against
a map automatically checks by key. result[0] in self.build_stamps.
Richard Purdie March 30, 2012, 9:27 a.m.
On Fri, 2012-03-30 at 06:10 +0000, Wang, Shane wrote:
> Richard Purdie wrote on 2012-03-30:
> 
> > On Thu, 2012-03-29 at 20:54 +0800, Shane Wang wrote:
> >> 
> >> 
> >> ne.wang@intel.com>
> >>                                To:
> >> bitbake-devel@lists.openembedded.org
> >>                           Subject:
> >> [bitbake-devel] [PATCH 5/8]
> >> runqueue.py: check results[0] in
> >> keys of build_pids before being
> >> used to avoid exceptions
> >>                              Date:
> >> Thu, 29 Mar 2012 20:54:54 +0800
> >> (29/03/12 13:54:54)
> >> 
> >> 
> >> [Yocto #2186]
> >> 
> >> Signed-off-by: Shane Wang <shane.wang@intel.com>
> >> ---
> >>  bitbake/lib/bb/runqueue.py |   20 ++++++++++++--------
> >>  1 files changed, 12 insertions(+), 8 deletions(-)
> > 
> > This kind of change sets off alarm bells. The big question is why are
> > you seeing exceptions? I suspect you're forking off processes within hob
> > which are then confusing the waitpid code. I'd have to ask why the UI is
> > forking processes when a build is running and why we're suddenly started
> > seeing this...
> The steps I did is to "Force stop" a build and click "build packages" to rebuild. Then I saw the exceptions.
> In the command mode, there is no issue because the process exits.

Ok, so what it sounds like is that waitpid() is not being called in the
"force stop" mode to collect the exit values of the processes. We should
fix the code to collect the exit values even in force stop mode.

Cheers,

Richard (resisting the urge to talk about reaping and zombies)