Patchwork [bitbake-devel] server/process.py: Change timeout error handling

login
register
mail settings
Submitter Richard Purdie
Date Nov. 21, 2012, 9:21 a.m.
Message ID <1353489690.10459.0.camel@ted>
Download mbox | patch
Permalink /patch/39381/
State New
Headers show

Comments

Richard Purdie - Nov. 21, 2012, 9:21 a.m.
In normal usage, we never hit the timeout issue. If we do, it becomes obvious
that the current error handling is not good enough. The request may have made it
to the server and the answer will get queued. This means the next command may get
the return value from the previous command with suitably puzzling results.

Without rewriting large sections of code, its not possible to avoid this problem.
It is better to increase the timeout to several seconds giving the server a chance
to respond and if it does timeout, hard exit since recovery is not possible with the
code base today.

I'd be happy to see the structure of this code improved but this quick fix at least
stops corrupted builds from happening which has to be a good thing.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
---
Chris Larson - Nov. 21, 2012, 4:31 p.m.
On Wed, Nov 21, 2012 at 2:21 AM, Richard Purdie <
richard.purdie@linuxfoundation.org> wrote:

> In normal usage, we never hit the timeout issue. If we do, it becomes
> obvious
> that the current error handling is not good enough. The request may have
> made it
> to the server and the answer will get queued. This means the next command
> may get
> the return value from the previous command with suitably puzzling results.
>
> Without rewriting large sections of code, its not possible to avoid this
> problem.
> It is better to increase the timeout to several seconds giving the server
> a chance
> to respond and if it does timeout, hard exit since recovery is not
> possible with the
> code base today.
>
> I'd be happy to see the structure of this code improved but this quick fix
> at least
> stops corrupted builds from happening which has to be a good thing.
>
> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
>


This code is run in UI context, not server context, no? So the UI should be
checking the return of runCommand, seeing the error, and aborting itself in
whatever clean fashion is appropriate, rather than the rather less so
immediate sys.exit. Given that, I don't really see how switching to
bb.fatal here is buying us much. If we truly want that error path to be
different from that of other command failures, then I would think that
raising an appropriate exception (e.g. something other than SystemExit)
would be better.
Richard Purdie - Nov. 21, 2012, 5:08 p.m.
On Wed, 2012-11-21 at 09:31 -0700, Chris Larson wrote:
> On Wed, Nov 21, 2012 at 2:21 AM, Richard Purdie
> <richard.purdie@linuxfoundation.org> wrote:
>         In normal usage, we never hit the timeout issue. If we do, it
>         becomes obvious
>         that the current error handling is not good enough. The
>         request may have made it
>         to the server and the answer will get queued. This means the
>         next command may get
>         the return value from the previous command with suitably
>         puzzling results.
>         
>         Without rewriting large sections of code, its not possible to
>         avoid this problem.
>         It is better to increase the timeout to several seconds giving
>         the server a chance
>         to respond and if it does timeout, hard exit since recovery is
>         not possible with the
>         code base today.
>         
>         I'd be happy to see the structure of this code improved but
>         this quick fix at least
>         stops corrupted builds from happening which has to be a good
>         thing.

>         Signed-off-by: Richard Purdie
>         <richard.purdie@linuxfoundation.org>
> 
> 
> This code is run in UI context, not server context, no? So the UI
> should be checking the return of runCommand, seeing the error, and
> aborting itself in whatever clean fashion is appropriate, rather than
> the rather less so immediate sys.exit. Given that, I don't really see
> how switching to bb.fatal here is buying us much. If we truly want
> that error path to be different from that of other command failures,
> then I would think that raising an appropriate exception (e.g.
> something other than SystemExit) would be better.

The issue is that after this particular failure, any further runCommand
is going to go badly wrong. "Timeout" would imply you could retry and in
this case as described above, you cannot (which I agree sucks).

So I think exiting in this case isn't such a bad thing although I'm less
than happy about it. The only thing the UI could do is throw an error
and exit.

Cheers,

Richard
Chris Larson - Nov. 21, 2012, 5:22 p.m.
On Wed, Nov 21, 2012 at 10:08 AM, Richard Purdie <
richard.purdie@linuxfoundation.org> wrote:

> > This code is run in UI context, not server context, no? So the UI
> > should be checking the return of runCommand, seeing the error, and
> > aborting itself in whatever clean fashion is appropriate, rather than
> > the rather less so immediate sys.exit. Given that, I don't really see
> > how switching to bb.fatal here is buying us much. If we truly want
> > that error path to be different from that of other command failures,
> > then I would think that raising an appropriate exception (e.g.
> > something other than SystemExit) would be better.
>
> The issue is that after this particular failure, any further runCommand
> is going to go badly wrong. "Timeout" would imply you could retry and in
> this case as described above, you cannot (which I agree sucks).
>
> So I think exiting in this case isn't such a bad thing although I'm less
> than happy about it. The only thing the UI could do is throw an error
> and exit.


Agreed, but I think that's where the responsibility belongs. From the
perspective of the UI, calling an API that exits you, for any reason, is
pretty disgusting.

Patch

diff --git a/bitbake/lib/bb/server/process.py b/bitbake/lib/bb/server/process.py
index f1e8450..8ebf771 100644
--- a/bitbake/lib/bb/server/process.py
+++ b/bitbake/lib/bb/server/process.py
@@ -45,10 +45,10 @@  class ServerCommunicator():
         while True:
             # don't let the user ctrl-c while we're waiting for a response
             try:
-                if self.connection.poll(.5):
+                if self.connection.poll(20):
                     return self.connection.recv()
                 else:
-                    return None, "Timeout while attempting to communicate with bitbake server"
+                    bb.fatal("Timeout while attempting to communicate with bitbake server")
             except KeyboardInterrupt:
                 pass