diff mbox series

[4/4] target/ssh: Ensure exit code set for commands

Message ID 20230728111311.1665976-4-richard.purdie@linuxfoundation.org
State Accepted, archived
Commit 3924e94214b5135369be2551d54fb92097d35e95
Headers show
Series [1/4] oeqa/runtime/ltp: Increase ltp test output timeout | expand

Commit Message

Richard Purdie July 28, 2023, 11:13 a.m. UTC
As spotted by Joshua Watt, the returncode isn't set until .poll() or .wait()
is called so we need to call this after the .kill() call.

This fixes return code reporting so that timeouts for example now return an
exit code when they didn't before.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
---
 meta/lib/oeqa/core/target/ssh.py | 3 +++
 1 file changed, 3 insertions(+)

Comments

Joshua Watt July 28, 2023, 2:01 p.m. UTC | #1
On Fri, Jul 28, 2023 at 5:13 AM Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:
>
> As spotted by Joshua Watt, the returncode isn't set until .poll() or .wait()
> is called so we need to call this after the .kill() call.
>
> This fixes return code reporting so that timeouts for example now return an
> exit code when they didn't before.
>
> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
> ---
>  meta/lib/oeqa/core/target/ssh.py | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/meta/lib/oeqa/core/target/ssh.py b/meta/lib/oeqa/core/target/ssh.py
> index 243e45dd99e..72ed1adbf89 100644
> --- a/meta/lib/oeqa/core/target/ssh.py
> +++ b/meta/lib/oeqa/core/target/ssh.py
> @@ -265,6 +265,7 @@ def SSHCall(command, logger, timeout=None, **opts):
>                  time.sleep(5)
>                  try:
>                      process.kill()
> +                    process.wait()

I think this can still result in a orphaned process; if the
process.terminate() on the line above is successful, the process will
exit and then process.kill() here will raise an exception and will
call wait. This is true generally everywhere; wait() must be called
regardless of if kill() successed, because kill() will fail if the
process is already dead. It's a little tricky because
process.commuinicate() calls wait(), so you can't unconditionally call
wait(), but my solution was to remove all the kill() and wait() calls
that already exist and add:

 try:
     run()
 except:
     logger.debug(....)
     raise
 finally:
     if process.returncode is None:
         try:
             process.wait(timeout=5)
         except TimeoutExpired:
             try:
                 process.kill()
             except OSError:
                 pass
             process.wait()

>                  except OSError:
>                      logger.debug('OSError when killing process')
>                      pass
> @@ -287,6 +288,7 @@ def SSHCall(command, logger, timeout=None, **opts):
>              except TimeoutExpired:
>                  try:
>                      process.kill()
> +                    process.wait()
>                  except OSError:
>                      logger.debug('OSError')
>                      pass
> @@ -316,6 +318,7 @@ def SSHCall(command, logger, timeout=None, **opts):
>          # whilst running and ensure we don't leave a process behind.
>          if process.poll() is None:
>              process.kill()
> +            process.wait()
>          logger.debug('Something went wrong, killing SSH process')
>          raise
>
> --
> 2.39.2
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#185024): https://lists.openembedded.org/g/openembedded-core/message/185024
> Mute This Topic: https://lists.openembedded.org/mt/100408183/3616693
> Group Owner: openembedded-core+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [JPEWhacker@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
Joshua Watt July 28, 2023, 2:02 p.m. UTC | #2
On Fri, Jul 28, 2023 at 8:01 AM Joshua Watt <jpewhacker@gmail.com> wrote:
>
> On Fri, Jul 28, 2023 at 5:13 AM Richard Purdie
> <richard.purdie@linuxfoundation.org> wrote:
> >
> > As spotted by Joshua Watt, the returncode isn't set until .poll() or .wait()
> > is called so we need to call this after the .kill() call.
> >
> > This fixes return code reporting so that timeouts for example now return an
> > exit code when they didn't before.
> >
> > Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
> > ---
> >  meta/lib/oeqa/core/target/ssh.py | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/meta/lib/oeqa/core/target/ssh.py b/meta/lib/oeqa/core/target/ssh.py
> > index 243e45dd99e..72ed1adbf89 100644
> > --- a/meta/lib/oeqa/core/target/ssh.py
> > +++ b/meta/lib/oeqa/core/target/ssh.py
> > @@ -265,6 +265,7 @@ def SSHCall(command, logger, timeout=None, **opts):
> >                  time.sleep(5)
> >                  try:
> >                      process.kill()
> > +                    process.wait()
>
> I think this can still result in a orphaned process; if the
> process.terminate() on the line above is successful, the process will
> exit and then process.kill() here will raise an exception and will
> call wait. This is true generally everywhere; wait() must be called

"will _not_ call wait" - Need to proofread better

> regardless of if kill() successed, because kill() will fail if the
> process is already dead. It's a little tricky because
> process.commuinicate() calls wait(), so you can't unconditionally call
> wait(), but my solution was to remove all the kill() and wait() calls
> that already exist and add:
>
>  try:
>      run()
>  except:
>      logger.debug(....)
>      raise
>  finally:
>      if process.returncode is None:
>          try:
>              process.wait(timeout=5)
>          except TimeoutExpired:
>              try:
>                  process.kill()
>              except OSError:
>                  pass
>              process.wait()
>
> >                  except OSError:
> >                      logger.debug('OSError when killing process')
> >                      pass
> > @@ -287,6 +288,7 @@ def SSHCall(command, logger, timeout=None, **opts):
> >              except TimeoutExpired:
> >                  try:
> >                      process.kill()
> > +                    process.wait()
> >                  except OSError:
> >                      logger.debug('OSError')
> >                      pass
> > @@ -316,6 +318,7 @@ def SSHCall(command, logger, timeout=None, **opts):
> >          # whilst running and ensure we don't leave a process behind.
> >          if process.poll() is None:
> >              process.kill()
> > +            process.wait()
> >          logger.debug('Something went wrong, killing SSH process')
> >          raise
> >
> > --
> > 2.39.2
> >
> >
> > -=-=-=-=-=-=-=-=-=-=-=-
> > Links: You receive all messages sent to this group.
> > View/Reply Online (#185024): https://lists.openembedded.org/g/openembedded-core/message/185024
> > Mute This Topic: https://lists.openembedded.org/mt/100408183/3616693
> > Group Owner: openembedded-core+owner@lists.openembedded.org
> > Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [JPEWhacker@gmail.com]
> > -=-=-=-=-=-=-=-=-=-=-=-
> >
Richard Purdie July 29, 2023, 7:24 a.m. UTC | #3
On Fri, 2023-07-28 at 08:02 -0600, Joshua Watt wrote:
> On Fri, Jul 28, 2023 at 8:01 AM Joshua Watt <jpewhacker@gmail.com> wrote:
> > 
> > On Fri, Jul 28, 2023 at 5:13 AM Richard Purdie
> > <richard.purdie@linuxfoundation.org> wrote:
> > > 
> > > As spotted by Joshua Watt, the returncode isn't set until .poll() or .wait()
> > > is called so we need to call this after the .kill() call.
> > > 
> > > This fixes return code reporting so that timeouts for example now return an
> > > exit code when they didn't before.
> > > 
> > > Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
> > > ---
> > >  meta/lib/oeqa/core/target/ssh.py | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/meta/lib/oeqa/core/target/ssh.py b/meta/lib/oeqa/core/target/ssh.py
> > > index 243e45dd99e..72ed1adbf89 100644
> > > --- a/meta/lib/oeqa/core/target/ssh.py
> > > +++ b/meta/lib/oeqa/core/target/ssh.py
> > > @@ -265,6 +265,7 @@ def SSHCall(command, logger, timeout=None, **opts):
> > >                  time.sleep(5)
> > >                  try:
> > >                      process.kill()
> > > +                    process.wait()
> > 
> > I think this can still result in a orphaned process; if the
> > process.terminate() on the line above is successful, the process will
> > exit and then process.kill() here will raise an exception and will
> > call wait. This is true generally everywhere; wait() must be called
> 
> "will _not_ call wait" - Need to proofread better
> 
> > regardless of if kill() successed, because kill() will fail if the
> > process is already dead. It's a little tricky because
> > process.commuinicate() calls wait(), so you can't unconditionally call
> > wait(), but my solution was to remove all the kill() and wait() calls
> > that already exist and add:
> > 
> >  try:
> >      run()
> >  except:
> >      logger.debug(....)
> >      raise
> >  finally:
> >      if process.returncode is None:
> >          try:
> >              process.wait(timeout=5)
> >          except TimeoutExpired:
> >              try:
> >                  process.kill()
> >              except OSError:
> >                  pass
> >              process.wait()
> > 
> > 

I think I'll probably merge this as it does work better than the
existing code and lets me clean up the ltp test results but I'd take a
further patch...

Cheers,

Richard
diff mbox series

Patch

diff --git a/meta/lib/oeqa/core/target/ssh.py b/meta/lib/oeqa/core/target/ssh.py
index 243e45dd99e..72ed1adbf89 100644
--- a/meta/lib/oeqa/core/target/ssh.py
+++ b/meta/lib/oeqa/core/target/ssh.py
@@ -265,6 +265,7 @@  def SSHCall(command, logger, timeout=None, **opts):
                 time.sleep(5)
                 try:
                     process.kill()
+                    process.wait()
                 except OSError:
                     logger.debug('OSError when killing process')
                     pass
@@ -287,6 +288,7 @@  def SSHCall(command, logger, timeout=None, **opts):
             except TimeoutExpired:
                 try:
                     process.kill()
+                    process.wait()
                 except OSError:
                     logger.debug('OSError')
                     pass
@@ -316,6 +318,7 @@  def SSHCall(command, logger, timeout=None, **opts):
         # whilst running and ensure we don't leave a process behind.
         if process.poll() is None:
             process.kill()
+            process.wait()
         logger.debug('Something went wrong, killing SSH process')
         raise