Patchwork [bitbake-devel] bitbake: server/xmlrpc/prserv: Increase timeout to default xmlrpc server

login
register
mail settings
Submitter Peter Bigot
Date Aug. 28, 2013, 1:04 p.m.
Message ID <1377695062-16111-1-git-send-email-pab@pabigot.com>
Download mbox | patch
Permalink /patch/56833/
State New
Headers show

Comments

Peter Bigot - Aug. 28, 2013, 1:04 p.m.
On a heavily-loaded host with local PR server the default 5 second timeout
produces too-frequent errors:

  ERROR: Can NOT get PRAUTO, exception timed out
  ERROR: Function failed: package_get_auto_pr

Since this error aborts the build a generous timeout seems appropriate.

Signed-off-by: Peter A. Bigot <pab@pabigot.com>
---
 lib/bb/server/xmlrpc.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Jason Wessel - Aug. 28, 2013, 1:39 p.m.
On 08/28/2013 08:04 AM, Peter A. Bigot wrote:
> On a heavily-loaded host with local PR server the default 5 second timeout
> produces too-frequent errors:
>
>   ERROR: Can NOT get PRAUTO, exception timed out
>   ERROR: Function failed: package_get_auto_pr
>
> Since this error aborts the build a generous timeout seems appropriate.
>
> Signed-off-by: Peter A. Bigot <pab@pabigot.com>
> ---
>  lib/bb/server/xmlrpc.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lib/bb/server/xmlrpc.py b/lib/bb/server/xmlrpc.py
> index 4dee5d9..bb87fd7 100644
> --- a/lib/bb/server/xmlrpc.py
> +++ b/lib/bb/server/xmlrpc.py
> @@ -78,7 +78,7 @@ class BBTransport(xmlrpclib.Transport):
>              h.putheader("Bitbake-token", self.connection_token)
>          xmlrpclib.Transport.send_content(self, h, body)
>  
> -def _create_server(host, port, timeout = 5):
> +def _create_server(host, port, timeout = 20):
>      t = BBTransport(timeout)
>      s = xmlrpclib.Server("http://%s:%d/" % (host, port), transport=t, allow_none=True)
>      return s, t


I would go so far as to make this 60 seconds and or have it a configurable parameter.

Previously the timeout was infinite.   I have observed process creation lagging by 30-45 seconds on a server with a load average of +300.   The new bitbake python code with the reduced timeout is not yet running on our edge case testing environment, but I do expect to hit the same issue.

Jason.
Peter Bigot - Aug. 28, 2013, 1:59 p.m.
On 08/28/2013 08:39 AM, Jason Wessel wrote:
> On 08/28/2013 08:04 AM, Peter A. Bigot wrote:
>> On a heavily-loaded host with local PR server the default 5 second timeout
>> produces too-frequent errors:
>>
>>    ERROR: Can NOT get PRAUTO, exception timed out
>>    ERROR: Function failed: package_get_auto_pr
>>
>> Since this error aborts the build a generous timeout seems appropriate.
>>
>> Signed-off-by: Peter A. Bigot <pab@pabigot.com>
>> ---
>>   lib/bb/server/xmlrpc.py | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/lib/bb/server/xmlrpc.py b/lib/bb/server/xmlrpc.py
>> index 4dee5d9..bb87fd7 100644
>> --- a/lib/bb/server/xmlrpc.py
>> +++ b/lib/bb/server/xmlrpc.py
>> @@ -78,7 +78,7 @@ class BBTransport(xmlrpclib.Transport):
>>               h.putheader("Bitbake-token", self.connection_token)
>>           xmlrpclib.Transport.send_content(self, h, body)
>>   
>> -def _create_server(host, port, timeout = 5):
>> +def _create_server(host, port, timeout = 20):
>>       t = BBTransport(timeout)
>>       s = xmlrpclib.Server("http://%s:%d/" % (host, port), transport=t, allow_none=True)
>>       return s, t
>
> I would go so far as to make this 60 seconds and or have it a configurable parameter.
>
> Previously the timeout was infinite.   I have observed process creation lagging by 30-45 seconds on a server with a load average of +300.   The new bitbake python code with the reduced timeout is not yet running on our edge case testing environment, but I do expect to hit the same issue.

Not sure when the timeout was added, but I believe it was before the 
modifications in the last few days that moved it to this function; I've 
been having this problem since switching to poky master.

60 seconds would be fine with me; I could update the patch for that.  A 
configurable parameter would be better but it wasn't obvious how to do 
it, so if people prefer that approach I'd rather a bitbake maintainer 
take over from here.

Peter
Richard Purdie - Aug. 29, 2013, 1:38 p.m.
On Wed, 2013-08-28 at 08:59 -0500, Peter A. Bigot wrote:
> On 08/28/2013 08:39 AM, Jason Wessel wrote:
> > On 08/28/2013 08:04 AM, Peter A. Bigot wrote:
> >> On a heavily-loaded host with local PR server the default 5 second timeout
> >> produces too-frequent errors:
> >>
> >>    ERROR: Can NOT get PRAUTO, exception timed out
> >>    ERROR: Function failed: package_get_auto_pr
> >>
> >> Since this error aborts the build a generous timeout seems appropriate.
> >>
> >> Signed-off-by: Peter A. Bigot <pab@pabigot.com>
> >> ---
> >>   lib/bb/server/xmlrpc.py | 2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/lib/bb/server/xmlrpc.py b/lib/bb/server/xmlrpc.py
> >> index 4dee5d9..bb87fd7 100644
> >> --- a/lib/bb/server/xmlrpc.py
> >> +++ b/lib/bb/server/xmlrpc.py
> >> @@ -78,7 +78,7 @@ class BBTransport(xmlrpclib.Transport):
> >>               h.putheader("Bitbake-token", self.connection_token)
> >>           xmlrpclib.Transport.send_content(self, h, body)
> >>   
> >> -def _create_server(host, port, timeout = 5):
> >> +def _create_server(host, port, timeout = 20):
> >>       t = BBTransport(timeout)
> >>       s = xmlrpclib.Server("http://%s:%d/" % (host, port), transport=t, allow_none=True)
> >>       return s, t
> >
> > I would go so far as to make this 60 seconds and or have it a configurable parameter.
> >
> > Previously the timeout was infinite.   I have observed process creation lagging by 30-45 seconds on a server with a load average of +300.   The new bitbake python code with the reduced timeout is not yet running on our edge case testing environment, but I do expect to hit the same issue.
> 
> Not sure when the timeout was added, but I believe it was before the 
> modifications in the last few days that moved it to this function; I've 
> been having this problem since switching to poky master.
> 
> 60 seconds would be fine with me; I could update the patch for that.  A 
> configurable parameter would be better but it wasn't obvious how to do 
> it, so if people prefer that approach I'd rather a bitbake maintainer 
> take over from here.

The downside is that if something goes wrong this ends up leaving
bitbake hanging for 60 seconds at exit whilst it tries to connect to a
server which is never going to exist. I'm rather frustrated that the PR
service is so slow since this will block the packaging process for that
length or time.

With that in mind I've radically improved the performance of the server
with threading. Can people retest with master and see how things behave
now?

Cheers,

Richard
Peter Bigot - Aug. 29, 2013, 5:06 p.m.
On 08/29/2013 08:38 AM, Richard Purdie wrote:
> On Wed, 2013-08-28 at 08:59 -0500, Peter A. Bigot wrote:
>> On 08/28/2013 08:39 AM, Jason Wessel wrote:
>>> On 08/28/2013 08:04 AM, Peter A. Bigot wrote:
>>>> On a heavily-loaded host with local PR server the default 5 second timeout
>>>> produces too-frequent errors:
>>>>
>>>>     ERROR: Can NOT get PRAUTO, exception timed out
>>>>     ERROR: Function failed: package_get_auto_pr
>>>>
>>>> Since this error aborts the build a generous timeout seems appropriate.
>>>>
>>>> Signed-off-by: Peter A. Bigot <pab@pabigot.com>
>>>> ---
>>>>    lib/bb/server/xmlrpc.py | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/lib/bb/server/xmlrpc.py b/lib/bb/server/xmlrpc.py
>>>> index 4dee5d9..bb87fd7 100644
>>>> --- a/lib/bb/server/xmlrpc.py
>>>> +++ b/lib/bb/server/xmlrpc.py
>>>> @@ -78,7 +78,7 @@ class BBTransport(xmlrpclib.Transport):
>>>>                h.putheader("Bitbake-token", self.connection_token)
>>>>            xmlrpclib.Transport.send_content(self, h, body)
>>>>    
>>>> -def _create_server(host, port, timeout = 5):
>>>> +def _create_server(host, port, timeout = 20):
>>>>        t = BBTransport(timeout)
>>>>        s = xmlrpclib.Server("http://%s:%d/" % (host, port), transport=t, allow_none=True)
>>>>        return s, t
>>> I would go so far as to make this 60 seconds and or have it a configurable parameter.
>>>
>>> Previously the timeout was infinite.   I have observed process creation lagging by 30-45 seconds on a server with a load average of +300.   The new bitbake python code with the reduced timeout is not yet running on our edge case testing environment, but I do expect to hit the same issue.
>> Not sure when the timeout was added, but I believe it was before the
>> modifications in the last few days that moved it to this function; I've
>> been having this problem since switching to poky master.
>>
>> 60 seconds would be fine with me; I could update the patch for that.  A
>> configurable parameter would be better but it wasn't obvious how to do
>> it, so if people prefer that approach I'd rather a bitbake maintainer
>> take over from here.
> The downside is that if something goes wrong this ends up leaving
> bitbake hanging for 60 seconds at exit whilst it tries to connect to a
> server which is never going to exist. I'm rather frustrated that the PR
> service is so slow since this will block the packaging process for that
> length or time.
>
> With that in mind I've radically improved the performance of the server
> with threading. Can people retest with master and see how things behave
> now?
I rebased my poky to include current master which has your multithreaded 
PR server patches from 28 Aug, dropped my patch, and started a 
from-scratch build involving 5971 tasks.  It aborted twice with:

   ERROR: Can NOT get PRAUTO, exception timed out
   ERROR: Function failed: package_get_auto_pr

before it got through the first 1500 tasks.  I put my patch back and it 
ran through the remaining 4400+ tasks without error.

The threading makes sense for a shared PR server serving multiple remote 
clients, but it's not enough for a localhost server that's heavily 
loaded with other work.

Peter
Richard Purdie - Aug. 30, 2013, 4:43 p.m.
On Thu, 2013-08-29 at 14:38 +0100, Richard Purdie wrote:
> On Wed, 2013-08-28 at 08:59 -0500, Peter A. Bigot wrote:
> > On 08/28/2013 08:39 AM, Jason Wessel wrote:
> > > On 08/28/2013 08:04 AM, Peter A. Bigot wrote:
> > >> On a heavily-loaded host with local PR server the default 5 second timeout
> > >> produces too-frequent errors:
> > >>
> > >>    ERROR: Can NOT get PRAUTO, exception timed out
> > >>    ERROR: Function failed: package_get_auto_pr
> > >>
> > >> Since this error aborts the build a generous timeout seems appropriate.
> > >>
> > >> Signed-off-by: Peter A. Bigot <pab@pabigot.com>
> > >> ---
> > >>   lib/bb/server/xmlrpc.py | 2 +-
> > >>   1 file changed, 1 insertion(+), 1 deletion(-)
> > >>
> > >> diff --git a/lib/bb/server/xmlrpc.py b/lib/bb/server/xmlrpc.py
> > >> index 4dee5d9..bb87fd7 100644
> > >> --- a/lib/bb/server/xmlrpc.py
> > >> +++ b/lib/bb/server/xmlrpc.py
> > >> @@ -78,7 +78,7 @@ class BBTransport(xmlrpclib.Transport):
> > >>               h.putheader("Bitbake-token", self.connection_token)
> > >>           xmlrpclib.Transport.send_content(self, h, body)
> > >>   
> > >> -def _create_server(host, port, timeout = 5):
> > >> +def _create_server(host, port, timeout = 20):
> > >>       t = BBTransport(timeout)
> > >>       s = xmlrpclib.Server("http://%s:%d/" % (host, port), transport=t, allow_none=True)
> > >>       return s, t
> > >
> > > I would go so far as to make this 60 seconds and or have it a configurable parameter.
> > >
> > > Previously the timeout was infinite.   I have observed process creation lagging by 30-45 seconds on a server with a load average of +300.   The new bitbake python code with the reduced timeout is not yet running on our edge case testing environment, but I do expect to hit the same issue.
> > 
> > Not sure when the timeout was added, but I believe it was before the 
> > modifications in the last few days that moved it to this function; I've 
> > been having this problem since switching to poky master.
> > 
> > 60 seconds would be fine with me; I could update the patch for that.  A 
> > configurable parameter would be better but it wasn't obvious how to do 
> > it, so if people prefer that approach I'd rather a bitbake maintainer 
> > take over from here.
> 
> The downside is that if something goes wrong this ends up leaving
> bitbake hanging for 60 seconds at exit whilst it tries to connect to a
> server which is never going to exist. I'm rather frustrated that the PR
> service is so slow since this will block the packaging process for that
> length or time.
> 
> With that in mind I've radically improved the performance of the server
> with threading. Can people retest with master and see how things behave
> now?

Looking at the number of people hitting this, I've set it to 60s. 

Cheers,

Richard

Patch

diff --git a/lib/bb/server/xmlrpc.py b/lib/bb/server/xmlrpc.py
index 4dee5d9..bb87fd7 100644
--- a/lib/bb/server/xmlrpc.py
+++ b/lib/bb/server/xmlrpc.py
@@ -78,7 +78,7 @@  class BBTransport(xmlrpclib.Transport):
             h.putheader("Bitbake-token", self.connection_token)
         xmlrpclib.Transport.send_content(self, h, body)
 
-def _create_server(host, port, timeout = 5):
+def _create_server(host, port, timeout = 20):
     t = BBTransport(timeout)
     s = xmlrpclib.Server("http://%s:%d/" % (host, port), transport=t, allow_none=True)
     return s, t