[bitbake-devel] bitbake: server/xmlrpc/prserv: Increase timeout to default xmlrpc server

Submitted by Peter Bigot on Aug. 28, 2013, 1:04 p.m.

Details

Message ID 1377695062-16111-1-git-send-email-pab@pabigot.com
State New
Headers show

Commit Message

Peter Bigot Aug. 28, 2013, 1:04 p.m.
On a heavily-loaded host with local PR server the default 5 second timeout
produces too-frequent errors:

  ERROR: Can NOT get PRAUTO, exception timed out
  ERROR: Function failed: package_get_auto_pr

Since this error aborts the build a generous timeout seems appropriate.

Signed-off-by: Peter A. Bigot <pab@pabigot.com>
---
 lib/bb/server/xmlrpc.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/lib/bb/server/xmlrpc.py b/lib/bb/server/xmlrpc.py
index 4dee5d9..bb87fd7 100644
--- a/lib/bb/server/xmlrpc.py
+++ b/lib/bb/server/xmlrpc.py
@@ -78,7 +78,7 @@  class BBTransport(xmlrpclib.Transport):
             h.putheader("Bitbake-token", self.connection_token)
         xmlrpclib.Transport.send_content(self, h, body)
 
-def _create_server(host, port, timeout = 5):
+def _create_server(host, port, timeout = 20):
     t = BBTransport(timeout)
     s = xmlrpclib.Server("http://%s:%d/" % (host, port), transport=t, allow_none=True)
     return s, t

Comments

Jason Wessel Aug. 28, 2013, 1:39 p.m.
On 08/28/2013 08:04 AM, Peter A. Bigot wrote:
> On a heavily-loaded host with local PR server the default 5 second timeout
> produces too-frequent errors:
>
>   ERROR: Can NOT get PRAUTO, exception timed out
>   ERROR: Function failed: package_get_auto_pr
>
> Since this error aborts the build a generous timeout seems appropriate.
>
> Signed-off-by: Peter A. Bigot <pab@pabigot.com>
> ---
>  lib/bb/server/xmlrpc.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lib/bb/server/xmlrpc.py b/lib/bb/server/xmlrpc.py
> index 4dee5d9..bb87fd7 100644
> --- a/lib/bb/server/xmlrpc.py
> +++ b/lib/bb/server/xmlrpc.py
> @@ -78,7 +78,7 @@ class BBTransport(xmlrpclib.Transport):
>              h.putheader("Bitbake-token", self.connection_token)
>          xmlrpclib.Transport.send_content(self, h, body)
>  
> -def _create_server(host, port, timeout = 5):
> +def _create_server(host, port, timeout = 20):
>      t = BBTransport(timeout)
>      s = xmlrpclib.Server("http://%s:%d/" % (host, port), transport=t, allow_none=True)
>      return s, t


I would go so far as to make this 60 seconds and or have it a configurable parameter.

Previously the timeout was infinite.   I have observed process creation lagging by 30-45 seconds on a server with a load average of +300.   The new bitbake python code with the reduced timeout is not yet running on our edge case testing environment, but I do expect to hit the same issue.

Jason.
Peter Bigot Aug. 28, 2013, 1:59 p.m.
On 08/28/2013 08:39 AM, Jason Wessel wrote:
> On 08/28/2013 08:04 AM, Peter A. Bigot wrote:
>> On a heavily-loaded host with local PR server the default 5 second timeout
>> produces too-frequent errors:
>>
>>    ERROR: Can NOT get PRAUTO, exception timed out
>>    ERROR: Function failed: package_get_auto_pr
>>
>> Since this error aborts the build a generous timeout seems appropriate.
>>
>> Signed-off-by: Peter A. Bigot <pab@pabigot.com>
>> ---
>>   lib/bb/server/xmlrpc.py | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/lib/bb/server/xmlrpc.py b/lib/bb/server/xmlrpc.py
>> index 4dee5d9..bb87fd7 100644
>> --- a/lib/bb/server/xmlrpc.py
>> +++ b/lib/bb/server/xmlrpc.py
>> @@ -78,7 +78,7 @@ class BBTransport(xmlrpclib.Transport):
>>               h.putheader("Bitbake-token", self.connection_token)
>>           xmlrpclib.Transport.send_content(self, h, body)
>>   
>> -def _create_server(host, port, timeout = 5):
>> +def _create_server(host, port, timeout = 20):
>>       t = BBTransport(timeout)
>>       s = xmlrpclib.Server("http://%s:%d/" % (host, port), transport=t, allow_none=True)
>>       return s, t
>
> I would go so far as to make this 60 seconds and or have it a configurable parameter.
>
> Previously the timeout was infinite.   I have observed process creation lagging by 30-45 seconds on a server with a load average of +300.   The new bitbake python code with the reduced timeout is not yet running on our edge case testing environment, but I do expect to hit the same issue.

Not sure when the timeout was added, but I believe it was before the 
modifications in the last few days that moved it to this function; I've 
been having this problem since switching to poky master.

60 seconds would be fine with me; I could update the patch for that.  A 
configurable parameter would be better but it wasn't obvious how to do 
it, so if people prefer that approach I'd rather a bitbake maintainer 
take over from here.

Peter
Richard Purdie Aug. 29, 2013, 1:38 p.m.
On Wed, 2013-08-28 at 08:59 -0500, Peter A. Bigot wrote:
> On 08/28/2013 08:39 AM, Jason Wessel wrote:
> > On 08/28/2013 08:04 AM, Peter A. Bigot wrote:
> >> On a heavily-loaded host with local PR server the default 5 second timeout
> >> produces too-frequent errors:
> >>
> >>    ERROR: Can NOT get PRAUTO, exception timed out
> >>    ERROR: Function failed: package_get_auto_pr
> >>
> >> Since this error aborts the build a generous timeout seems appropriate.
> >>
> >> Signed-off-by: Peter A. Bigot <pab@pabigot.com>
> >> ---
> >>   lib/bb/server/xmlrpc.py | 2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/lib/bb/server/xmlrpc.py b/lib/bb/server/xmlrpc.py
> >> index 4dee5d9..bb87fd7 100644
> >> --- a/lib/bb/server/xmlrpc.py
> >> +++ b/lib/bb/server/xmlrpc.py
> >> @@ -78,7 +78,7 @@ class BBTransport(xmlrpclib.Transport):
> >>               h.putheader("Bitbake-token", self.connection_token)
> >>           xmlrpclib.Transport.send_content(self, h, body)
> >>   
> >> -def _create_server(host, port, timeout = 5):
> >> +def _create_server(host, port, timeout = 20):
> >>       t = BBTransport(timeout)
> >>       s = xmlrpclib.Server("http://%s:%d/" % (host, port), transport=t, allow_none=True)
> >>       return s, t
> >
> > I would go so far as to make this 60 seconds and or have it a configurable parameter.
> >
> > Previously the timeout was infinite.   I have observed process creation lagging by 30-45 seconds on a server with a load average of +300.   The new bitbake python code with the reduced timeout is not yet running on our edge case testing environment, but I do expect to hit the same issue.
> 
> Not sure when the timeout was added, but I believe it was before the 
> modifications in the last few days that moved it to this function; I've 
> been having this problem since switching to poky master.
> 
> 60 seconds would be fine with me; I could update the patch for that.  A 
> configurable parameter would be better but it wasn't obvious how to do 
> it, so if people prefer that approach I'd rather a bitbake maintainer 
> take over from here.

The downside is that if something goes wrong this ends up leaving
bitbake hanging for 60 seconds at exit whilst it tries to connect to a
server which is never going to exist. I'm rather frustrated that the PR
service is so slow since this will block the packaging process for that
length or time.

With that in mind I've radically improved the performance of the server
with threading. Can people retest with master and see how things behave
now?

Cheers,

Richard
Peter Bigot Aug. 29, 2013, 5:06 p.m.
On 08/29/2013 08:38 AM, Richard Purdie wrote:
> On Wed, 2013-08-28 at 08:59 -0500, Peter A. Bigot wrote:
>> On 08/28/2013 08:39 AM, Jason Wessel wrote:
>>> On 08/28/2013 08:04 AM, Peter A. Bigot wrote:
>>>> On a heavily-loaded host with local PR server the default 5 second timeout
>>>> produces too-frequent errors:
>>>>
>>>>     ERROR: Can NOT get PRAUTO, exception timed out
>>>>     ERROR: Function failed: package_get_auto_pr
>>>>
>>>> Since this error aborts the build a generous timeout seems appropriate.
>>>>
>>>> Signed-off-by: Peter A. Bigot <pab@pabigot.com>
>>>> ---
>>>>    lib/bb/server/xmlrpc.py | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/lib/bb/server/xmlrpc.py b/lib/bb/server/xmlrpc.py
>>>> index 4dee5d9..bb87fd7 100644
>>>> --- a/lib/bb/server/xmlrpc.py
>>>> +++ b/lib/bb/server/xmlrpc.py
>>>> @@ -78,7 +78,7 @@ class BBTransport(xmlrpclib.Transport):
>>>>                h.putheader("Bitbake-token", self.connection_token)
>>>>            xmlrpclib.Transport.send_content(self, h, body)
>>>>    
>>>> -def _create_server(host, port, timeout = 5):
>>>> +def _create_server(host, port, timeout = 20):
>>>>        t = BBTransport(timeout)
>>>>        s = xmlrpclib.Server("http://%s:%d/" % (host, port), transport=t, allow_none=True)
>>>>        return s, t
>>> I would go so far as to make this 60 seconds and or have it a configurable parameter.
>>>
>>> Previously the timeout was infinite.   I have observed process creation lagging by 30-45 seconds on a server with a load average of +300.   The new bitbake python code with the reduced timeout is not yet running on our edge case testing environment, but I do expect to hit the same issue.
>> Not sure when the timeout was added, but I believe it was before the
>> modifications in the last few days that moved it to this function; I've
>> been having this problem since switching to poky master.
>>
>> 60 seconds would be fine with me; I could update the patch for that.  A
>> configurable parameter would be better but it wasn't obvious how to do
>> it, so if people prefer that approach I'd rather a bitbake maintainer
>> take over from here.
> The downside is that if something goes wrong this ends up leaving
> bitbake hanging for 60 seconds at exit whilst it tries to connect to a
> server which is never going to exist. I'm rather frustrated that the PR
> service is so slow since this will block the packaging process for that
> length or time.
>
> With that in mind I've radically improved the performance of the server
> with threading. Can people retest with master and see how things behave
> now?
I rebased my poky to include current master which has your multithreaded 
PR server patches from 28 Aug, dropped my patch, and started a 
from-scratch build involving 5971 tasks.  It aborted twice with:

   ERROR: Can NOT get PRAUTO, exception timed out
   ERROR: Function failed: package_get_auto_pr

before it got through the first 1500 tasks.  I put my patch back and it 
ran through the remaining 4400+ tasks without error.

The threading makes sense for a shared PR server serving multiple remote 
clients, but it's not enough for a localhost server that's heavily 
loaded with other work.

Peter
Richard Purdie Aug. 30, 2013, 4:43 p.m.
On Thu, 2013-08-29 at 14:38 +0100, Richard Purdie wrote:
> On Wed, 2013-08-28 at 08:59 -0500, Peter A. Bigot wrote:
> > On 08/28/2013 08:39 AM, Jason Wessel wrote:
> > > On 08/28/2013 08:04 AM, Peter A. Bigot wrote:
> > >> On a heavily-loaded host with local PR server the default 5 second timeout
> > >> produces too-frequent errors:
> > >>
> > >>    ERROR: Can NOT get PRAUTO, exception timed out
> > >>    ERROR: Function failed: package_get_auto_pr
> > >>
> > >> Since this error aborts the build a generous timeout seems appropriate.
> > >>
> > >> Signed-off-by: Peter A. Bigot <pab@pabigot.com>
> > >> ---
> > >>   lib/bb/server/xmlrpc.py | 2 +-
> > >>   1 file changed, 1 insertion(+), 1 deletion(-)
> > >>
> > >> diff --git a/lib/bb/server/xmlrpc.py b/lib/bb/server/xmlrpc.py
> > >> index 4dee5d9..bb87fd7 100644
> > >> --- a/lib/bb/server/xmlrpc.py
> > >> +++ b/lib/bb/server/xmlrpc.py
> > >> @@ -78,7 +78,7 @@ class BBTransport(xmlrpclib.Transport):
> > >>               h.putheader("Bitbake-token", self.connection_token)
> > >>           xmlrpclib.Transport.send_content(self, h, body)
> > >>   
> > >> -def _create_server(host, port, timeout = 5):
> > >> +def _create_server(host, port, timeout = 20):
> > >>       t = BBTransport(timeout)
> > >>       s = xmlrpclib.Server("http://%s:%d/" % (host, port), transport=t, allow_none=True)
> > >>       return s, t
> > >
> > > I would go so far as to make this 60 seconds and or have it a configurable parameter.
> > >
> > > Previously the timeout was infinite.   I have observed process creation lagging by 30-45 seconds on a server with a load average of +300.   The new bitbake python code with the reduced timeout is not yet running on our edge case testing environment, but I do expect to hit the same issue.
> > 
> > Not sure when the timeout was added, but I believe it was before the 
> > modifications in the last few days that moved it to this function; I've 
> > been having this problem since switching to poky master.
> > 
> > 60 seconds would be fine with me; I could update the patch for that.  A 
> > configurable parameter would be better but it wasn't obvious how to do 
> > it, so if people prefer that approach I'd rather a bitbake maintainer 
> > take over from here.
> 
> The downside is that if something goes wrong this ends up leaving
> bitbake hanging for 60 seconds at exit whilst it tries to connect to a
> server which is never going to exist. I'm rather frustrated that the PR
> service is so slow since this will block the packaging process for that
> length or time.
> 
> With that in mind I've radically improved the performance of the server
> with threading. Can people retest with master and see how things behave
> now?

Looking at the number of people hitting this, I've set it to 60s. 

Cheers,

Richard