[bitbake-devel,1/1] bitbake: cooker: Kill alive process before join it

Submitted by Robert Yang on Aug. 22, 2019, 7:35 a.m. | Patch ID: 164199

Details

Message ID 3562c905fd50bb986aba8a40778245a026339c93.1566459309.git.liezhi.yang@windriver.com
State New
Headers show

Commit Message

Robert Yang Aug. 22, 2019, 7:35 a.m.
Fixed:
$ echo helloworld >> meta/recipes-extended/bash/bash_4.4.18.bb
$ while true; do kill-bb; rm -fr bitbake-cookerdaemon.log tmp/cache/default-glibc/qemux86-64/x86_64/bb_cache.dat* ; bitbake -p; done

It may hang in 10 mins, there are two problems:
* There might be deadlocks when call process.join() if the queue is not NULL,
  so we need cleanup the queue before join() it, but:
* The self.result_queue.get(timeout=0.25) may hang if the queue._wlock is hold
  by SomeOtherProcess, the queue has the following info when it hangs:
  '_wlock': <Lock(owner=SomeOtherProcess)>, so that we may can't clean up the
  queue.

We can kill alvie process before join it to fix the problems.

Signed-off-by: Robert Yang <liezhi.yang@windriver.com>
---
 bitbake/lib/bb/cooker.py | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

Patch hide | download patch | download mbox

diff --git a/bitbake/lib/bb/cooker.py b/bitbake/lib/bb/cooker.py
index 0607fcc..e42bcba 100644
--- a/bitbake/lib/bb/cooker.py
+++ b/bitbake/lib/bb/cooker.py
@@ -2082,19 +2082,15 @@  class CookerParser(object):
             for process in self.processes:
                 self.parser_quit.put(None)
 
-        # Cleanup the queue before call process.join(), otherwise there might be
-        # deadlocks.
-        while True:
-            try:
-               self.result_queue.get(timeout=0.25)
-            except queue.Empty:
-                break
-
         for process in self.processes:
             if force:
                 process.join(.1)
                 process.terminate()
             else:
+                # Kill the alive process firstly before join() it to avoid
+                # deadlocks
+                if process.is_alive():
+                    os.kill(process.pid, 9)
                 process.join()
 
         sync = threading.Thread(target=self.bb_cache.sync)

Comments

Christopher Larson Aug. 22, 2019, 2:14 p.m.
Looks like you’re partially duplicating what the ‘force’ logic does already, but for the non-force case as well, yes? process.terminate() sounds a lot like sending a SIGTERM, though admittedly I haven’t read the code to Process recently. Perhaps if this is truly necessary, the logic can be simplified.
On Aug 22, 2019, 12:10 AM -0700, Robert Yang <liezhi.yang@windriver.com>, wrote:
> Fixed:
> $ echo helloworld >> meta/recipes-extended/bash/bash_4.4.18.bb
> $ while true; do kill-bb; rm -fr bitbake-cookerdaemon.log tmp/cache/default-glibc/qemux86-64/x86_64/bb_cache.dat* ; bitbake -p; done
>
> It may hang in 10 mins, there are two problems:
> * There might be deadlocks when call process.join() if the queue is not NULL,
> so we need cleanup the queue before join() it, but:
> * The self.result_queue.get(timeout=0.25) may hang if the queue._wlock is hold
> by SomeOtherProcess, the queue has the following info when it hangs:
> '_wlock': <Lock(owner=SomeOtherProcess)>, so that we may can't clean up the
> queue.
>
> We can kill alvie process before join it to fix the problems.
>
> Signed-off-by: Robert Yang <liezhi.yang@windriver.com>
> ---
> bitbake/lib/bb/cooker.py | 12 ++++--------
> 1 file changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/bitbake/lib/bb/cooker.py b/bitbake/lib/bb/cooker.py
> index 0607fcc..e42bcba 100644
> --- a/bitbake/lib/bb/cooker.py
> +++ b/bitbake/lib/bb/cooker.py
> @@ -2082,19 +2082,15 @@ class CookerParser(object):
> for process in self.processes:
> self.parser_quit.put(None)
>
> - # Cleanup the queue before call process.join(), otherwise there might be
> - # deadlocks.
> - while True:
> - try:
> - self.result_queue.get(timeout=0.25)
> - except queue.Empty:
> - break
> -
> for process in self.processes:
> if force:
> process.join(.1)
> process.terminate()
> else:
> + # Kill the alive process firstly before join() it to avoid
> + # deadlocks
> + if process.is_alive():
> + os.kill(process.pid, 9)
> process.join()
>
> sync = threading.Thread(target=self.bb_cache.sync)
> --
> 2.7.4
>
> --
> _______________________________________________
> bitbake-devel mailing list
> bitbake-devel@lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/bitbake-devel
Robert Yang Aug. 23, 2019, 3:52 a.m.
Hi Larson,

On 8/22/19 10:14 PM, Christopher Larson wrote:
> Looks like you’re partially duplicating what the ‘force’ logic does already, but 
> for the non-force case as well, yes? process.terminate() sounds a lot like 
> sending a SIGTERM, though admittedly I haven’t read the code to Process 
> recently. Perhaps if this is truly necessary, the logic can be simplified.


The process.join(.1) may also hang if the result_queue is not NULL, but AFAIK ,
this only happens when KeyboardInterrupt or Failed to parse a recipe, so I think
that 'force' shutdown's process.jion(.1) won't hang, and I didn't see it hangs
in the "force" code, we may have another way to make it better, check
result_queue._wlock, but the _wlock is an internal variable, so I gave it up and
just use os.kill() to avoid the hang problem completely, but there might be
side effects that I didn't know?

// Robert

> On Aug 22, 2019, 12:10 AM -0700, Robert Yang <liezhi.yang@windriver.com>, wrote:
>> Fixed:
>> $ echo helloworld >> meta/recipes-extended/bash/bash_4.4.18.bb
>> $ while true; do kill-bb; rm -fr bitbake-cookerdaemon.log 
>> tmp/cache/default-glibc/qemux86-64/x86_64/bb_cache.dat* ; bitbake -p; done
>>
>> It may hang in 10 mins, there are two problems:
>> * There might be deadlocks when call process.join() if the queue is not NULL,
>> so we need cleanup the queue before join() it, but:
>> * The self.result_queue.get(timeout=0.25) may hang if the queue._wlock is hold
>> by SomeOtherProcess, the queue has the following info when it hangs:
>> '_wlock': <Lock(owner=SomeOtherProcess)>, so that we may can't clean up the
>> queue.
>>
>> We can kill alvie process before join it to fix the problems.
>>
>> Signed-off-by: Robert Yang <liezhi.yang@windriver.com>
>> ---
>> bitbake/lib/bb/cooker.py | 12 ++++--------
>> 1 file changed, 4 insertions(+), 8 deletions(-)
>>
>> diff --git a/bitbake/lib/bb/cooker.py b/bitbake/lib/bb/cooker.py
>> index 0607fcc..e42bcba 100644
>> --- a/bitbake/lib/bb/cooker.py
>> +++ b/bitbake/lib/bb/cooker.py
>> @@ -2082,19 +2082,15 @@ class CookerParser(object):
>> for process in self.processes:
>> self.parser_quit.put(None)
>>
>> - # Cleanup the queue before call process.join(), otherwise there might be
>> - # deadlocks.
>> - while True:
>> - try:
>> - self.result_queue.get(timeout=0.25)
>> - except queue.Empty:
>> - break
>> -
>> for process in self.processes:
>> if force:
>> process.join(.1)
>> process.terminate()
>> else:
>> + # Kill the alive process firstly before join() it to avoid
>> + # deadlocks
>> + if process.is_alive():
>> + os.kill(process.pid, 9)
>> process.join()
>>
>> sync = threading.Thread(target=self.bb_cache.sync)
>> --
>> 2.7.4
>>
>> --
>> _______________________________________________
>> bitbake-devel mailing list
>> bitbake-devel@lists.openembedded.org
>> http://lists.openembedded.org/mailman/listinfo/bitbake-devel