Patchwork [1/1] classes/sanity: fix handling of bblayers.conf updating

login
register
mail settings
Submitter Paul Eggleton
Date April 12, 2013, 8:16 p.m.
Message ID <30edf2f0744b6ce209455d2c5c63015e06f09265.1365797632.git.paul.eggleton@linux.intel.com>
Download mbox | patch
Permalink /patch/48045/
State Accepted
Commit db61a66dbab8f3eccdefdc42a4e59887d28995ca
Headers show

Comments

Paul Eggleton - April 12, 2013, 8:16 p.m.
Fix the fairly long-standing problem of treating a newer bblayers.conf
in the same manner as an older one (reporting that it had been updated
even if nothing was done). The recent work to do a reparse without
having to manually re-run bitbake turned this from an annoyance into an
endless loop, so it had to be fixed.

As part of fixing this the following changes have been made:
* Extensions are now implemented using a function list, so distro layers
  can add their own functions which should either succeed (indicating
  they have successfully updated the file) or raise an exception
  (indicating nothing could be done). The functions are called in
  succession until one succeeds, at which point we reparse.
* If we can't do the update, the error message now says "older/newer"
  instead of just "older" since we only know the version is different.

Signed-off-by: Paul Eggleton <paul.eggleton@linux.intel.com>
---
 meta/classes/sanity.bbclass |   77 +++++++++++++++++++++++++++++--------------
 1 file changed, 52 insertions(+), 25 deletions(-)
Martin Jansa - April 12, 2013, 9:10 p.m.
On Fri, Apr 12, 2013 at 09:16:51PM +0100, Paul Eggleton wrote:
> Fix the fairly long-standing problem of treating a newer bblayers.conf
> in the same manner as an older one (reporting that it had been updated
> even if nothing was done). The recent work to do a reparse without
> having to manually re-run bitbake turned this from an annoyance into an
> endless loop, so it had to be fixed.

Is this endless loop, bitbake hanging forever while parsing or was it
showing something or failing soon with endless loop detected by python?

> As part of fixing this the following changes have been made:
> * Extensions are now implemented using a function list, so distro layers
>   can add their own functions which should either succeed (indicating
>   they have successfully updated the file) or raise an exception
>   (indicating nothing could be done). The functions are called in
>   succession until one succeeds, at which point we reparse.
> * If we can't do the update, the error message now says "older/newer"
>   instead of just "older" since we only know the version is different.
> 
> Signed-off-by: Paul Eggleton <paul.eggleton@linux.intel.com>
> ---
>  meta/classes/sanity.bbclass |   77 +++++++++++++++++++++++++++++--------------
>  1 file changed, 52 insertions(+), 25 deletions(-)
> 
> diff --git a/meta/classes/sanity.bbclass b/meta/classes/sanity.bbclass
> index ecc0a43..ac2314f 100644
> --- a/meta/classes/sanity.bbclass
> +++ b/meta/classes/sanity.bbclass
> @@ -4,8 +4,34 @@
>  
>  SANITY_REQUIRED_UTILITIES ?= "patch diffstat makeinfo git bzip2 tar gzip gawk chrpath wget cpio"
>  
> -python check_bblayers_conf() {
> -    bblayers_fn = os.path.join(d.getVar('TOPDIR', True), 'conf/bblayers.conf')
> +def bblayers_conf_file(d):
> +    return os.path.join(d.getVar('TOPDIR', True), 'conf/bblayers.conf')
> +
> +def sanity_conf_read(fn):
> +    with open(fn, 'r') as f:
> +        lines = f.readlines()
> +    return lines
> +
> +def sanity_conf_find_line(pattern, lines):
> +    import re
> +    return next(((index, line)
> +        for index, line in enumerate(lines)
> +        if re.search(pattern, line)), (None, None))
> +
> +def sanity_conf_update(fn, lines, version_var_name, new_version):
> +    index, line = sanity_conf_find_line(version_var_name, lines)
> +    lines[index] = '%s = "%d"\n' % (version_var_name, new_version)
> +    with open(fn, "w") as f:
> +        f.write(''.join(lines))
> +
> +EXPORT_FUNCTIONS bblayers_conf_file sanity_conf_read sanity_conf_find_line sanity_conf_update
> +
> +# Functions added to this variable MUST throw an exception (or sys.exit()) unless they
> +# successfully changed LCONF_VERSION in bblayers.conf
> +BBLAYERS_CONF_UPDATE_FUNCS += "oecore_update_bblayers"
> +
> +python oecore_update_bblayers() {
> +    # bblayers.conf is out of date, so see if we can resolve that
>  
>      current_lconf = int(d.getVar('LCONF_VERSION', True))
>      if not current_lconf:
> @@ -13,21 +39,15 @@ python check_bblayers_conf() {
>      lconf_version = int(d.getVar('LAYER_CONF_VERSION', True))
>      lines = []
>  
> -    import re
> -    def find_line(pattern, lines):
> -        return next(((index, line)
> -            for index, line in enumerate(lines)
> -            if re.search(pattern, line)), (None, None))
> -
>      if current_lconf < 4:
>          sys.exit()
>  
> -    with open(bblayers_fn, 'r') as f:
> -        lines = f.readlines()
> +    bblayers_fn = bblayers_conf_file(d)
> +    lines = sanity_conf_read(bblayers_fn)
>  
> -    if current_lconf == 4:
> +    if current_lconf == 4 and lconf_version > 4:
>          topdir_var = '$' + '{TOPDIR}'
> -        index, bbpath_line = find_line('BBPATH', lines)
> +        index, bbpath_line = sanity_conf_find_line('BBPATH', lines)
>          if bbpath_line:
>              start = bbpath_line.find('"')
>              if start != -1 and (len(bbpath_line) != (start + 1)):
> @@ -41,17 +61,17 @@ python check_bblayers_conf() {
>              else:
>                  sys.exit()
>          else:
> -            index, bbfiles_line = find_line('BBFILES', lines)
> +            index, bbfiles_line = sanity_conf_find_line('BBFILES', lines)
>              if bbfiles_line:
>                  lines.insert(index, 'BBPATH = "' + topdir_var + '"\n')
>              else:
>                  sys.exit()
>  
> -        index, line = find_line('LCONF_VERSION', lines)
>          current_lconf += 1
> -        lines[index] = 'LCONF_VERSION = "%d"\n' % current_lconf
> -        with open(bblayers_fn, "w") as f:
> -            f.write(''.join(lines))
> +        sanity_conf_update(bblayers_fn, lines, 'LCONF_VERSION', current_lconf)
> +        return
> +
> +    sys.exit()
>  }
>  
>  def raise_sanity_error(msg, d, network_error=False):
> @@ -387,18 +407,25 @@ def check_sanity(sanity_data):
>      conf_version =  sanity_data.getVar('LOCALCONF_VERSION', True)
>  
>      if current_conf != conf_version:
> -        messages = messages + "Your version of local.conf was generated from an older version of local.conf.sample and there have been updates made to this file. Please compare the two files and merge any changes before continuing.\nMatching the version numbers will remove this message.\n\"meld conf/local.conf ${COREBASE}/meta*/conf/local.conf.sample\" is a good way to visualise the changes.\n"
> +        messages = messages + "Your version of local.conf was generated from an older/newer version of local.conf.sample and there have been updates made to this file. Please compare the two files and merge any changes before continuing.\nMatching the version numbers will remove this message.\n\"meld conf/local.conf ${COREBASE}/meta*/conf/local.conf.sample\" is a good way to visualise the changes.\n"
>  
>      # Check bblayers.conf is valid
>      current_lconf = sanity_data.getVar('LCONF_VERSION', True)
>      lconf_version = sanity_data.getVar('LAYER_CONF_VERSION', True)
>      if current_lconf != lconf_version:
> -        try:
> -            bb.build.exec_func("check_bblayers_conf", sanity_data)
> -            bb.note("Your conf/bblayers.conf has been automatically updated.")
> -            reparse = True
> -        except Exception:
> -            messages = messages + "Your version of bblayers.conf was generated from an older version of bblayers.conf.sample and there have been updates made to this file. Please compare the two files and merge any changes before continuing.\nMatching the version numbers will remove this message.\n\"meld conf/bblayers.conf ${COREBASE}/meta*/conf/bblayers.conf.sample\" is a good way to visualise the changes.\n"
> +        funcs = sanity_data.getVar('BBLAYERS_CONF_UPDATE_FUNCS', True).split()
> +        for func in funcs:
> +            success = True
> +            try:
> +                bb.build.exec_func(func, sanity_data)
> +            except Exception:
> +                success = False
> +            if success:
> +                bb.note("Your conf/bblayers.conf has been automatically updated.")
> +                reparse = True
> +                break
> +        if not reparse:
> +            messages = messages + "Your version of bblayers.conf was generated from an older/newer version of bblayers.conf.sample and there have been updates made to this file. Please compare the two files and merge any changes before continuing.\nMatching the version numbers will remove this message.\n\"meld conf/bblayers.conf ${COREBASE}/meta*/conf/bblayers.conf.sample\" is a good way to visualise the changes.\n"
>  
>      # If we have a site.conf, check it's valid
>      if check_conf_exists("conf/site.conf", sanity_data):
> @@ -454,7 +481,7 @@ def check_sanity(sanity_data):
>          messages = messages + "Parsed PATH is " + str(paths) + "\n"
>  
>      bbpaths = sanity_data.getVar('BBPATH', True).split(":")
> -    if "." in bbpaths or "" in bbpaths:
> +    if ("." in bbpaths or "" in bbpaths) and not reparse:
>          # TODO: change the following message to fatal when all BBPATH issues
>          # are fixed
>          bb.warn("BBPATH references the current directory, either through "    \
> -- 
> 1.7.10.4
> 
> 
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.linuxtogo.org/cgi-bin/mailman/listinfo/openembedded-core
Richard Purdie - April 12, 2013, 9:30 p.m.
On Fri, 2013-04-12 at 23:10 +0200, Martin Jansa wrote:
> On Fri, Apr 12, 2013 at 09:16:51PM +0100, Paul Eggleton wrote:
> > Fix the fairly long-standing problem of treating a newer bblayers.conf
> > in the same manner as an older one (reporting that it had been updated
> > even if nothing was done). The recent work to do a reparse without
> > having to manually re-run bitbake turned this from an annoyance into an
> > endless loop, so it had to be fixed.
> 
> Is this endless loop, bitbake hanging forever while parsing or was it
> showing something or failing soon with endless loop detected by python?

You'd see nothing at all, you'd run "bitbake X" and it would just sit
there indefinitely in a loop.

It could be triggered by including meta-yocto but not setting
DISTRO=poky* in a build dir where DISTRO=poky* has previously been used.

Cheers,

Richard
Martin Jansa - April 16, 2013, 5:09 p.m.
On Fri, Apr 12, 2013 at 10:30:42PM +0100, Richard Purdie wrote:
> On Fri, 2013-04-12 at 23:10 +0200, Martin Jansa wrote:
> > On Fri, Apr 12, 2013 at 09:16:51PM +0100, Paul Eggleton wrote:
> > > Fix the fairly long-standing problem of treating a newer bblayers.conf
> > > in the same manner as an older one (reporting that it had been updated
> > > even if nothing was done). The recent work to do a reparse without
> > > having to manually re-run bitbake turned this from an annoyance into an
> > > endless loop, so it had to be fixed.
> > 
> > Is this endless loop, bitbake hanging forever while parsing or was it
> > showing something or failing soon with endless loop detected by python?
> 
> You'd see nothing at all, you'd run "bitbake X" and it would just sit
> there indefinitely in a loop.
> 
> It could be triggered by including meta-yocto but not setting
> DISTRO=poky* in a build dir where DISTRO=poky* has previously been used.

OK, so it's different than hangs I'm seeing here, because I'm still seeing them with latest oe-core

$ . ./oe-init-bitbake-build-env && bitbake my-image
Altered environment for machine@distro development (this is from oe-init-bitbake-build-env)
# and nothing else is shown after that

ps auxf (shortened paths and ascii tree)

bitbake  17625  0.0  0.0   4396   612 pts/18   S+   18:31   0:00  \_ /bin/sh -c . ./oe-init-bitbake-build-env && bitbake  my-image
bitbake  17626  0.0  0.0   4396   700 pts/18   S+   18:31   0:00     \_ /bin/sh /OE/build/oe-core/scripts/bitbake my-image
bitbake  17656  0.2  1.0 219356 168120 pts/18  S+   18:31   0:02        \_ python /OE/build/bitbake/bin/bitbake my-image
bitbake  18504  0.0  1.0 289044 166556 pts/18  Sl+  18:31   0:00           \_ python /OE/build/bitbake/bin/bitbake my-image

Not sure if there is better way to see what's going on in those python processes, here is strace:

# strace -p 17656
Process 17656 attached
wait4(18504, 

# strace -p 18504
Process 18504 attached
recvfrom(12, 

Hitting Ctrl+C with strace still running:
# strace -p 17656
Process 17656 attached
wait4(18504, 0x7fffd2abdf80, 0, NULL)   = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL, si_value={int=12337, ptr=0x3031}} ---
wait4(18504,

Process 18504 attached
recvfrom(12, 0xb773684, 8192, 0, 0, 0)  = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL, si_value={int=12337, ptr=0x3031}} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0xb16e5a0, FUTEX_WAIT_PRIVATE, 0, NULL

More Ctrl+C only repeats it in both processes but bitbake does not quit

One more python process is created and disowned by parent and becames child of init and dies later
bitbake  18524  0.0  1.0 289044 165128 ?       S    18:31   0:00 python /OE/build/bitbake/bin/bitbake my-image
but bitbake is still hanging until I send kill 18504 from another terminal

Process 18504:
futex(0xb16e5a0, FUTEX_WAIT_PRIVATE, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL, si_value={int=12337, ptr=0x3031}} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0xb16e5a0, FUTEX_WAIT_PRIVATE, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=23837, si_uid=1026} ---
+++ killed by SIGTERM +++

Process 17656: (iirc you said that python-logging module is not safe with multiple processes and it looks like logging something)
# strace -p 17656
Process 17656 attached
wait4(18504, 0x7fffd2abdf80, 0, NULL)   = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL, si_value={int=12337, ptr=0x3031}} ---
wait4(18504, 0x7fffd2abdf80, 0, NULL)   = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL, si_value={int=12337, ptr=0x3031}} ---

read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 554) = 554
select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
read(4, "\0\0\2\34", 4)                 = 4
read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 540) = 540
select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
read(4, "\0\0\2&", 4)                   = 4
read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 550) = 550
select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
read(4, "\0\0\2\32", 4)                 = 4
read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 538) = 538
select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
read(4, "\0\0\2$", 4)                   = 4
read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 548) = 548
select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
read(4, "\0\0\2!", 4)                   = 4
read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 545) = 545
select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
read(4, "\0\0\2+", 4)                   = 4
read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 555) = 555
select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
read(4, "\0\0\2)", 4)                   = 4
read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 553) = 553
select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
read(4, "\0\0\0023", 4)                 = 4
read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 563) = 563
select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
read(4, "\0\0\2#", 4)                   = 4
read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 547) = 547
select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
read(4, "\0\0\2-", 4)                   = 4
read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 557) = 557
select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
read(4, "\0\0\2\36", 4)                 = 4
read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 542) = 542
select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
read(4, "\0\0\2(", 4)                   = 4
read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 552) = 552
select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
read(4, "\0\0\2*", 4)                   = 4
read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 554) = 554
select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
read(4, "\0\0\2$", 4)                   = 4
read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 548) = 548
select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
read(4, "\0\0\2.", 4)                   = 4
read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 558) = 558
select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
read(4, "\0\0\2\37", 4)                 = 4
read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 543) = 543
select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
read(4, "\0\0\2\7", 4)                  = 4
read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 519) = 519
select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
read(4, "\0\0\2\37", 4)                 = 4
read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 543) = 543
select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
read(4, "\0\0\2\7", 4)                  = 4
read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 519) = 519
select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
read(4, "\0\0\2\36", 4)                 = 4
read(4, "\0\0\2q", 4)                   = 4
read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 625) = 625
select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
read(4, "\0\0\2R", 4)                   = 4
read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 594) = 594
select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
read(4, "\0\0\2=", 4)                   = 4
read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 573) = 573
select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
read(4, "\0\0\2r", 4)                   = 4
read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 626) = 626
select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
read(4, "\0\0\2S", 4)                   = 4
read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 595) = 595
select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
read(4, "\0\0\2>", 4)                   = 4
read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 574) = 574
select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
read(4, "\0\0\2l", 4)                   = 4
read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 620) = 620
select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
read(4, "\0\0\2M", 4)                   = 4
read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 589) = 589
select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
read(4, "\0\0\0028", 4)                 = 4
read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 568) = 568
select(5, [4], NULL, NULL, {0, 0})      = 0 (Timeout)
close(6)                                = 0
close(4)                                = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x2aaaaaf27cb0}, {SIG_IGN, [], SA_RESTORER, 0x2aaaaaf27cb0}, 8) = 0
rt_sigaction(SIGTERM, {SIG_DFL, [], SA_RESTORER, 0x2aaaaaf27cb0}, {0x45190f, [], SA_RESTORER, 0x2aaaaaf27cb0}, 8) = 0
close(7)                                = 0
munmap(0x2aaaaac18000, 32)              = 0
munmap(0x2aaaaac10000, 32)              = 0
munmap(0x2aaaaab78000, 32)              = 0
munmap(0x2aaaaab70000, 32)              = 0
munmap(0x2aaaaab80000, 32)              = 0
munmap(0x2aaaaab68000, 32)              = 0
close(5)                                = 0
munmap(0x2aaaaab58000, 32)              = 0
munmap(0x2aaaaab60000, 32)              = 0
close(8)                                = 0
close(9)                                = 0
exit_group(1)                           = ?
+++ exited with 1 +++


I'll put this in bug report later, but if you see what other information could be useful,
I'll add it next time I see bitbake hanging like this (it doesn't happen every time, probably more
with machine under higher load).

Happens in my gentoo chroot with
Python 2.7.3 (default, Mar 19 2013, 08:59:58) 
[GCC 4.7.2] on linux2

as well as in ubuntu-12.10 chroot with
Python 2.7.3 (default, Sep 26 2012, 21:51:14) 
[GCC 4.7.2] on linux2

both using latest oe-core+bitbake.
Richard Purdie - April 16, 2013, 9:06 p.m.
On Tue, 2013-04-16 at 19:09 +0200, Martin Jansa wrote:
> OK, so it's different than hangs I'm seeing here, because I'm still seeing them with latest oe-core

Yes, that was almost certainly a different issue.

> $ . ./oe-init-bitbake-build-env && bitbake my-image
> Altered environment for machine@distro development (this is from oe-init-bitbake-build-env)
> # and nothing else is shown after that

So just to ensure I understand this, it never prints any messages at
all?

> ps auxf (shortened paths and ascii tree)
> 
> bitbake  17625  0.0  0.0   4396   612 pts/18   S+   18:31   0:00  \_ /bin/sh -c . ./oe-init-bitbake-build-env && bitbake  my-image
> bitbake  17626  0.0  0.0   4396   700 pts/18   S+   18:31   0:00     \_ /bin/sh /OE/build/oe-core/scripts/bitbake my-image
> bitbake  17656  0.2  1.0 219356 168120 pts/18  S+   18:31   0:02        \_ python /OE/build/bitbake/bin/bitbake my-image
> bitbake  18504  0.0  1.0 289044 166556 pts/18  Sl+  18:31   0:00           \_ python /OE/build/bitbake/bin/bitbake my-image
> 
> Not sure if there is better way to see what's going on in those python processes, here is strace:
> 
> # strace -p 17656
> Process 17656 attached
> wait4(18504, 
> 
> # strace -p 18504
> Process 18504 attached
> recvfrom(12, 
> 
> Hitting Ctrl+C with strace still running:
> # strace -p 17656
> Process 17656 attached
> wait4(18504, 0x7fffd2abdf80, 0, NULL)   = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL, si_value={int=12337, ptr=0x3031}} ---
> wait4(18504,
> 
> Process 18504 attached
> recvfrom(12, 0xb773684, 8192, 0, 0, 0)  = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL, si_value={int=12337, ptr=0x3031}} ---
> rt_sigreturn()                          = -1 EINTR (Interrupted system call)
> futex(0xb16e5a0, FUTEX_WAIT_PRIVATE, 0, NULL
> 
> More Ctrl+C only repeats it in both processes but bitbake does not quit
> 
> One more python process is created and disowned by parent and becames child of init and dies later
> bitbake  18524  0.0  1.0 289044 165128 ?       S    18:31   0:00 python /OE/build/bitbake/bin/bitbake my-image
> but bitbake is still hanging until I send kill 18504 from another terminal
> 
> Process 18504:
> futex(0xb16e5a0, FUTEX_WAIT_PRIVATE, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL, si_value={int=12337, ptr=0x3031}} ---
> rt_sigreturn()                          = -1 EINTR (Interrupted system call)
> futex(0xb16e5a0, FUTEX_WAIT_PRIVATE, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=23837, si_uid=1026} ---
> +++ killed by SIGTERM +++
> 
> Process 17656: (iirc you said that python-logging module is not safe with multiple processes and it looks like logging something)
> # strace -p 17656
> Process 17656 attached
> wait4(18504, 0x7fffd2abdf80, 0, NULL)   = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL, si_value={int=12337, ptr=0x3031}} ---
> wait4(18504, 0x7fffd2abdf80, 0, NULL)   = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL, si_value={int=12337, ptr=0x3031}} ---
> 
> read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 554) = 554
> select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
> read(4, "\0\0\2\34", 4)                 = 4

These look like pickled python LogRecord objects being passed between
the processes. If this happens again, pass the -s 8912 option to strace
(increases the size of the string dumped) and lets see if we can get the
complete log record. We should be able to unpickle them and get some
idea what log messages it was passing around at least...

> I'll put this in bug report later, but if you see what other information could be useful,
> I'll add it next time I see bitbake hanging like this (it doesn't happen every time, probably more
> with machine under higher load).

Ideally we need to turn this into a 100% reproducer somehow. Failing
that, perhaps more of the LogRecord can give us some clue what is going
on. I'm struggling for other ideas...

Cheers,

Richard
Martin Jansa - April 16, 2013, 9:41 p.m.
On Tue, Apr 16, 2013 at 10:06:06PM +0100, Richard Purdie wrote:
> On Tue, 2013-04-16 at 19:09 +0200, Martin Jansa wrote:
> > OK, so it's different than hangs I'm seeing here, because I'm still seeing them with latest oe-core
> 
> Yes, that was almost certainly a different issue.
> 
> > $ . ./oe-init-bitbake-build-env && bitbake my-image
> > Altered environment for machine@distro development (this is from oe-init-bitbake-build-env)
> > # and nothing else is shown after that
> 
> So just to ensure I understand this, it never prints any messages at
> all?

yes nothing at all from bitbake, only that echo from setup script
(oe-init-bitbake-build-env)

> > ps auxf (shortened paths and ascii tree)
> > 
> > bitbake  17625  0.0  0.0   4396   612 pts/18   S+   18:31   0:00  \_ /bin/sh -c . ./oe-init-bitbake-build-env && bitbake  my-image
> > bitbake  17626  0.0  0.0   4396   700 pts/18   S+   18:31   0:00     \_ /bin/sh /OE/build/oe-core/scripts/bitbake my-image
> > bitbake  17656  0.2  1.0 219356 168120 pts/18  S+   18:31   0:02        \_ python /OE/build/bitbake/bin/bitbake my-image
> > bitbake  18504  0.0  1.0 289044 166556 pts/18  Sl+  18:31   0:00           \_ python /OE/build/bitbake/bin/bitbake my-image
> > 
> > Not sure if there is better way to see what's going on in those python processes, here is strace:
> > 
> > # strace -p 17656
> > Process 17656 attached
> > wait4(18504, 
> > 
> > # strace -p 18504
> > Process 18504 attached
> > recvfrom(12, 
> > 
> > Hitting Ctrl+C with strace still running:
> > # strace -p 17656
> > Process 17656 attached
> > wait4(18504, 0x7fffd2abdf80, 0, NULL)   = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> > --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL, si_value={int=12337, ptr=0x3031}} ---
> > wait4(18504,
> > 
> > Process 18504 attached
> > recvfrom(12, 0xb773684, 8192, 0, 0, 0)  = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> > --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL, si_value={int=12337, ptr=0x3031}} ---
> > rt_sigreturn()                          = -1 EINTR (Interrupted system call)
> > futex(0xb16e5a0, FUTEX_WAIT_PRIVATE, 0, NULL
> > 
> > More Ctrl+C only repeats it in both processes but bitbake does not quit
> > 
> > One more python process is created and disowned by parent and becames child of init and dies later
> > bitbake  18524  0.0  1.0 289044 165128 ?       S    18:31   0:00 python /OE/build/bitbake/bin/bitbake my-image
> > but bitbake is still hanging until I send kill 18504 from another terminal
> > 
> > Process 18504:
> > futex(0xb16e5a0, FUTEX_WAIT_PRIVATE, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> > --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL, si_value={int=12337, ptr=0x3031}} ---
> > rt_sigreturn()                          = -1 EINTR (Interrupted system call)
> > futex(0xb16e5a0, FUTEX_WAIT_PRIVATE, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> > --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=23837, si_uid=1026} ---
> > +++ killed by SIGTERM +++
> > 
> > Process 17656: (iirc you said that python-logging module is not safe with multiple processes and it looks like logging something)
> > # strace -p 17656
> > Process 17656 attached
> > wait4(18504, 0x7fffd2abdf80, 0, NULL)   = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> > --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL, si_value={int=12337, ptr=0x3031}} ---
> > wait4(18504, 0x7fffd2abdf80, 0, NULL)   = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> > --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL, si_value={int=12337, ptr=0x3031}} ---
> > 
> > read(4, "\200\2clogging\nLogRecord\nq\1)\201q\2}q\3(U"..., 554) = 554
> > select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
> > read(4, "\0\0\2\34", 4)                 = 4
> 
> These look like pickled python LogRecord objects being passed between
> the processes. If this happens again, pass the -s 8912 option to strace
> (increases the size of the string dumped) and lets see if we can get the
> complete log record. We should be able to unpickle them and get some
> idea what log messages it was passing around at least...

OK will do

> > I'll put this in bug report later, but if you see what other information could be useful,
> > I'll add it next time I see bitbake hanging like this (it doesn't happen every time, probably more
> > with machine under higher load).
> 
> Ideally we need to turn this into a 100% reproducer somehow. Failing
> that, perhaps more of the LogRecord can give us some clue what is going
> on. I'm struggling for other ideas...

I would say it's not depending on metadata, because build works fine if
I kill bitbake and start it again. I've also seen it in very different
layer setups (including different DISTROs and IIRC also with distro-less
oe-core only).

Thinking about it it maybe happens more often when tmp-eglibc is empty,
I'll try couple of builds with tmp-eglibc removed to see if I can
reproduce it faster then every 10th build or so.
Paul Eggleton - April 16, 2013, 11:25 p.m.
On Tuesday 16 April 2013 23:41:08 Martin Jansa wrote:
> On Tue, Apr 16, 2013 at 10:06:06PM +0100, Richard Purdie wrote:
> > On Tue, 2013-04-16 at 19:09 +0200, Martin Jansa wrote:
> > > OK, so it's different than hangs I'm seeing here, because I'm still
> > > seeing them with latest oe-core> 
> > Yes, that was almost certainly a different issue.
> > 
> > > $ . ./oe-init-bitbake-build-env && bitbake my-image
> > > Altered environment for machine@distro development (this is from
> > > oe-init-bitbake-build-env) # and nothing else is shown after that
> > 
> > So just to ensure I understand this, it never prints any messages at
> > all?
> 
> yes nothing at all from bitbake, only that echo from setup script
> (oe-init-bitbake-build-env)

Are you sure it's not getting stuck doing a fetch as a result of a non-fully-
specified SRCREV?

Cheers,
Paul
Richard Purdie - April 17, 2013, 7:01 a.m.
On Wed, 2013-04-17 at 00:25 +0100, Paul Eggleton wrote:
> On Tuesday 16 April 2013 23:41:08 Martin Jansa wrote:
> > On Tue, Apr 16, 2013 at 10:06:06PM +0100, Richard Purdie wrote:
> > > On Tue, 2013-04-16 at 19:09 +0200, Martin Jansa wrote:
> > > > OK, so it's different than hangs I'm seeing here, because I'm still
> > > > seeing them with latest oe-core> 
> > > Yes, that was almost certainly a different issue.
> > > 
> > > > $ . ./oe-init-bitbake-build-env && bitbake my-image
> > > > Altered environment for machine@distro development (this is from
> > > > oe-init-bitbake-build-env) # and nothing else is shown after that
> > > 
> > > So just to ensure I understand this, it never prints any messages at
> > > all?
> > 
> > yes nothing at all from bitbake, only that echo from setup script
> > (oe-init-bitbake-build-env)
> 
> Are you sure it's not getting stuck doing a fetch as a result of a non-fully-
> specified SRCREV?

Its not something like one of the sanity checks testing network access
and hanging is it?

Cheers,

Richard
Martin Jansa - April 17, 2013, 8:54 a.m.
On Wed, Apr 17, 2013 at 12:25:14AM +0100, Paul Eggleton wrote:
> On Tuesday 16 April 2013 23:41:08 Martin Jansa wrote:
> > On Tue, Apr 16, 2013 at 10:06:06PM +0100, Richard Purdie wrote:
> > > On Tue, 2013-04-16 at 19:09 +0200, Martin Jansa wrote:
> > > > OK, so it's different than hangs I'm seeing here, because I'm still
> > > > seeing them with latest oe-core> 
> > > Yes, that was almost certainly a different issue.
> > > 
> > > > $ . ./oe-init-bitbake-build-env && bitbake my-image
> > > > Altered environment for machine@distro development (this is from
> > > > oe-init-bitbake-build-env) # and nothing else is shown after that
> > > 
> > > So just to ensure I understand this, it never prints any messages at
> > > all?
> > 
> > yes nothing at all from bitbake, only that echo from setup script
> > (oe-init-bitbake-build-env)
> 
> Are you sure it's not getting stuck doing a fetch as a result of a non-fully-
> specified SRCREV?

That would show some git process in ps/htop tree and got stuck after
showing build summary.

Patch

diff --git a/meta/classes/sanity.bbclass b/meta/classes/sanity.bbclass
index ecc0a43..ac2314f 100644
--- a/meta/classes/sanity.bbclass
+++ b/meta/classes/sanity.bbclass
@@ -4,8 +4,34 @@ 
 
 SANITY_REQUIRED_UTILITIES ?= "patch diffstat makeinfo git bzip2 tar gzip gawk chrpath wget cpio"
 
-python check_bblayers_conf() {
-    bblayers_fn = os.path.join(d.getVar('TOPDIR', True), 'conf/bblayers.conf')
+def bblayers_conf_file(d):
+    return os.path.join(d.getVar('TOPDIR', True), 'conf/bblayers.conf')
+
+def sanity_conf_read(fn):
+    with open(fn, 'r') as f:
+        lines = f.readlines()
+    return lines
+
+def sanity_conf_find_line(pattern, lines):
+    import re
+    return next(((index, line)
+        for index, line in enumerate(lines)
+        if re.search(pattern, line)), (None, None))
+
+def sanity_conf_update(fn, lines, version_var_name, new_version):
+    index, line = sanity_conf_find_line(version_var_name, lines)
+    lines[index] = '%s = "%d"\n' % (version_var_name, new_version)
+    with open(fn, "w") as f:
+        f.write(''.join(lines))
+
+EXPORT_FUNCTIONS bblayers_conf_file sanity_conf_read sanity_conf_find_line sanity_conf_update
+
+# Functions added to this variable MUST throw an exception (or sys.exit()) unless they
+# successfully changed LCONF_VERSION in bblayers.conf
+BBLAYERS_CONF_UPDATE_FUNCS += "oecore_update_bblayers"
+
+python oecore_update_bblayers() {
+    # bblayers.conf is out of date, so see if we can resolve that
 
     current_lconf = int(d.getVar('LCONF_VERSION', True))
     if not current_lconf:
@@ -13,21 +39,15 @@  python check_bblayers_conf() {
     lconf_version = int(d.getVar('LAYER_CONF_VERSION', True))
     lines = []
 
-    import re
-    def find_line(pattern, lines):
-        return next(((index, line)
-            for index, line in enumerate(lines)
-            if re.search(pattern, line)), (None, None))
-
     if current_lconf < 4:
         sys.exit()
 
-    with open(bblayers_fn, 'r') as f:
-        lines = f.readlines()
+    bblayers_fn = bblayers_conf_file(d)
+    lines = sanity_conf_read(bblayers_fn)
 
-    if current_lconf == 4:
+    if current_lconf == 4 and lconf_version > 4:
         topdir_var = '$' + '{TOPDIR}'
-        index, bbpath_line = find_line('BBPATH', lines)
+        index, bbpath_line = sanity_conf_find_line('BBPATH', lines)
         if bbpath_line:
             start = bbpath_line.find('"')
             if start != -1 and (len(bbpath_line) != (start + 1)):
@@ -41,17 +61,17 @@  python check_bblayers_conf() {
             else:
                 sys.exit()
         else:
-            index, bbfiles_line = find_line('BBFILES', lines)
+            index, bbfiles_line = sanity_conf_find_line('BBFILES', lines)
             if bbfiles_line:
                 lines.insert(index, 'BBPATH = "' + topdir_var + '"\n')
             else:
                 sys.exit()
 
-        index, line = find_line('LCONF_VERSION', lines)
         current_lconf += 1
-        lines[index] = 'LCONF_VERSION = "%d"\n' % current_lconf
-        with open(bblayers_fn, "w") as f:
-            f.write(''.join(lines))
+        sanity_conf_update(bblayers_fn, lines, 'LCONF_VERSION', current_lconf)
+        return
+
+    sys.exit()
 }
 
 def raise_sanity_error(msg, d, network_error=False):
@@ -387,18 +407,25 @@  def check_sanity(sanity_data):
     conf_version =  sanity_data.getVar('LOCALCONF_VERSION', True)
 
     if current_conf != conf_version:
-        messages = messages + "Your version of local.conf was generated from an older version of local.conf.sample and there have been updates made to this file. Please compare the two files and merge any changes before continuing.\nMatching the version numbers will remove this message.\n\"meld conf/local.conf ${COREBASE}/meta*/conf/local.conf.sample\" is a good way to visualise the changes.\n"
+        messages = messages + "Your version of local.conf was generated from an older/newer version of local.conf.sample and there have been updates made to this file. Please compare the two files and merge any changes before continuing.\nMatching the version numbers will remove this message.\n\"meld conf/local.conf ${COREBASE}/meta*/conf/local.conf.sample\" is a good way to visualise the changes.\n"
 
     # Check bblayers.conf is valid
     current_lconf = sanity_data.getVar('LCONF_VERSION', True)
     lconf_version = sanity_data.getVar('LAYER_CONF_VERSION', True)
     if current_lconf != lconf_version:
-        try:
-            bb.build.exec_func("check_bblayers_conf", sanity_data)
-            bb.note("Your conf/bblayers.conf has been automatically updated.")
-            reparse = True
-        except Exception:
-            messages = messages + "Your version of bblayers.conf was generated from an older version of bblayers.conf.sample and there have been updates made to this file. Please compare the two files and merge any changes before continuing.\nMatching the version numbers will remove this message.\n\"meld conf/bblayers.conf ${COREBASE}/meta*/conf/bblayers.conf.sample\" is a good way to visualise the changes.\n"
+        funcs = sanity_data.getVar('BBLAYERS_CONF_UPDATE_FUNCS', True).split()
+        for func in funcs:
+            success = True
+            try:
+                bb.build.exec_func(func, sanity_data)
+            except Exception:
+                success = False
+            if success:
+                bb.note("Your conf/bblayers.conf has been automatically updated.")
+                reparse = True
+                break
+        if not reparse:
+            messages = messages + "Your version of bblayers.conf was generated from an older/newer version of bblayers.conf.sample and there have been updates made to this file. Please compare the two files and merge any changes before continuing.\nMatching the version numbers will remove this message.\n\"meld conf/bblayers.conf ${COREBASE}/meta*/conf/bblayers.conf.sample\" is a good way to visualise the changes.\n"
 
     # If we have a site.conf, check it's valid
     if check_conf_exists("conf/site.conf", sanity_data):
@@ -454,7 +481,7 @@  def check_sanity(sanity_data):
         messages = messages + "Parsed PATH is " + str(paths) + "\n"
 
     bbpaths = sanity_data.getVar('BBPATH', True).split(":")
-    if "." in bbpaths or "" in bbpaths:
+    if ("." in bbpaths or "" in bbpaths) and not reparse:
         # TODO: change the following message to fatal when all BBPATH issues
         # are fixed
         bb.warn("BBPATH references the current directory, either through "    \