[bitbake-devel,1.44,00/18] Patch review

Submitted by Armin Kuster on Dec. 23, 2019, 4:50 a.m. | Patch ID: 168304

Details

Message ID cover.1577076251.git.akuster808@gmail.com
State New
Headers show

Pull-request

http://lists.openembedded.org/mailman/listinfo/bitbake-devel

Commit Message

Armin Kuster Dec. 23, 2019, 4:50 a.m.
This series has been run on the AB.

I relize the hash work is progress for fixes to 1.44.

This is what I have so far.
The following changes since commit cfa307aabf710d79c404a8571b4158b864a94727:

  runqueue.py: not show warning for deferred multiconfig task (2019-11-29 11:26:07 +0000)

are available in the git repository at:

  git://git.openembedded.org/bitbake-contrib stable/1.44-nut
  http://cgit.openembedded.org//log/?h=stable/1.44-nut

Chris Laplante via bitbake-devel (1):
  bb.utils.fileslocked: don't leak files if yield throws

Joshua Watt (1):
  runqueue: Batch scenequeue updates

Richard Purdie (16):
  hashserv: Add support for equivalent hash reporting
  runqueue/siggen: Allow handling of equivalent hashes
  runqueue: Add extra debugging when locked sigs mismatches occur
  knotty/uihelper: Switch from pids to tids for Task event management
  siggen: Avoid taskhash mismatch errors for nostamp tasks when
    dependencies rehash
  siggen: Ensure new unihash propagates through the system
  siggen: Fix performance issue in get_unihash
  runqueue: Rework process_possible_migrations() to improve performance
  runqueue: Fix task mismatch failures from incorrect logic
  siggen: Split get_tashhash for performance
  runqueue: Fix sstate task iteration performance
  runqueue: Optimise task migration code slightly
  runqueue: Optimise out pointless loop iteration
  runqueue: Optimise task filtering
  runqueue: Only call into the migrations function if migrations active
  lib/bb: Optimise out debug messages from cooker

 lib/bb/__init__.py     |   5 ++
 lib/bb/build.py        |  25 +++++-----
 lib/bb/runqueue.py     | 127 ++++++++++++++++++++++++++++---------------------
 lib/bb/siggen.py       |  74 ++++++++++++++++++++++------
 lib/bb/ui/knotty.py    |  12 ++---
 lib/bb/ui/uihelper.py  |  39 +++++++++------
 lib/bb/utils.py        |   9 ++--
 lib/hashserv/client.py |   8 ++++
 lib/hashserv/server.py |  36 ++++++++++++++
 9 files changed, 231 insertions(+), 104 deletions(-)

Comments

Armin Kuster Dec. 23, 2019, 4:50 a.m.
From: Richard Purdie <richard.purdie@linuxfoundation.org>

The reason for this should be recorded in the commit logs. Imagine
you have a target recipe (e.g. meta-extsdk-toolchain) which depends on
gdb-cross. sstate in OE-Core allows gdb-cross to have the same hash
regardless of whether its built on x86 or arm. The outhash will be
different.

We need hashequiv to be able to adapt to the prescence of sstate artefacts
for meta-extsdk-toolchain and allow the hashes to re-intersect, rather than
trying to force a rebuild of meta-extsdk-toolchain. By this point in the build,
it would have already been installed from sstate so the build needs to adapt.

Equivalent hashes should be reported to the server as a taskhash that
needs to map to an specific unihash. This patch adds API to the hashserv
client/server to allow this.

[Thanks to Joshua Watt for help with this patch]

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit 674692fd46a7691a1de59ace6af0556cc5dd6a71)
---
 lib/hashserv/client.py |  8 ++++++++
 lib/hashserv/server.py | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+)
Armin Kuster Dec. 23, 2019, 4:50 a.m.
From: Richard Purdie <richard.purdie@linuxfoundation.org>

Based on the hashserv's new ability to accept hash mappings, update runqueue
to use this through a helper function in siggen.

This addresses problems with meta-extsdk-toolchain and its dependency on
gdb-cross which caused errors when building eSDK. See the previous commit
for more details.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit 39098b4ba2133f4d9229a0aa4fcf4c3e1291286a)
---
 lib/bb/runqueue.py | 31 +++++++++++++++++++------------
 lib/bb/siggen.py   | 26 ++++++++++++++++++++++++++
 2 files changed, 45 insertions(+), 12 deletions(-)
Armin Kuster Dec. 23, 2019, 4:50 a.m.
From: Richard Purdie <richard.purdie@linuxfoundation.org>

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit 3aad9978be2a40d4c535a5ae092f374ba2a5f627)
---
 lib/bb/runqueue.py | 2 ++
 1 file changed, 2 insertions(+)
Armin Kuster Dec. 23, 2019, 4:50 a.m.
From: Richard Purdie <richard.purdie@linuxfoundation.org>

We've seen cases where a task can execute with a given pid, complete
and a new task can start using the same pid before the UI handler has
had time to adapt.

Traceback (most recent call last):
  File "/home/pokybuild/yocto-worker/qemux86-alt/build/bitbake/lib/bb/ui/knotty.py", line 484, in main
    helper.eventHandler(event)
  File "/home/pokybuild/yocto-worker/qemux86-alt/build/bitbake/lib/bb/ui/uihelper.py", line 30, in eventHandler
    del self.running_tasks[event.pid]
KeyError: 13490

This means using pids to match up events on the UI side is a bad
idea. Change the code to use task ids instead. There is a small
amount of fuzzy matching for the progress information since there
is no task information there and we don't want the overhead of a task
ID in every event, however since pid reuse is unlikely, we can live
with a progress bar not quite working properly in a corner case like
this.

[YOCTO #13667]

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit e427eafa1bb04008d12100ccc5c862122bba53e0)
---
 lib/bb/build.py       | 25 +++++++++++++------------
 lib/bb/ui/knotty.py   | 12 ++++++------
 lib/bb/ui/uihelper.py | 39 ++++++++++++++++++++++++---------------
 3 files changed, 43 insertions(+), 33 deletions(-)
Armin Kuster Dec. 23, 2019, 4:50 a.m.
From: Richard Purdie <richard.purdie@linuxfoundation.org>

An example:

NOTE: recipe binutils-cross-testsuite-2.32.0-r0: task do_check: Started
ERROR: Taskhash mismatch b074da4334aff8aa06572e7a8725c941fa6b08de4ce714a65a90c0c0b680abea versus 17375278daed609a7129769b74a1336a37bdef14b534ae85189ccc033a9f2db4 for /home/pokybuild/yocto-worker/qemux86-64/build/meta/recipes-devtools/binutils/binutils-cross-testsuite_2.32.bb:do_check
NOTE: recipe binutils-cross-testsuite-2.32.0-r0: task do_check: Succeeded

Is caused by a rehash in a dependency happening somewhere earlier in the build
and the taint being reset.

Change the code so that nostamp taints are preserved to avoid the issue.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit 61624a3fc38e8546e01356d5ce7a09f21e7094ab)
---
 lib/bb/siggen.py | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)
Armin Kuster Dec. 23, 2019, 4:50 a.m.
From: Richard Purdie <richard.purdie@linuxfoundation.org>

Its possible the new unihash may not exist in sstate. Currently the code
would create an sstate object with the old hash however this updates it to
create the object with the new unihash.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit abcaa1398031fa5338a43859c661e6d4a9ce863d)
---
 lib/bb/siggen.py | 1 +
 1 file changed, 1 insertion(+)
Armin Kuster Dec. 23, 2019, 4:50 a.m.
From: Joshua Watt <jpewhacker@gmail.com>

Batch all updates to scenequeue data together in a single invocation
instead of checking each task serially. This allows the checks for
sstate object to happen in parallel, and also makes sure the log
statement only happens once (per set of rehashes).

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit db033a8f8a276d864bdb2e1eef159ab5794a0658)
---
 lib/bb/runqueue.py | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)
Armin Kuster Dec. 23, 2019, 4:50 a.m.
From: Richard Purdie <richard.purdie@linuxfoundation.org>

There is a significant performance issue in get_unihash(). The issue turns out
to be the lookups of setscene tasks. We can fix this by using a set() instead of
the current list.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit 1e561672d039ebfb8cd0e0654a44dcf48513317c)
---
 lib/bb/siggen.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
Armin Kuster Dec. 23, 2019, 4:50 a.m.
From: Chris Laplante via bitbake-devel <bitbake-devel@lists.openembedded.org>

Discovered with a recipe under devtool. The ${S}/singletask.lock file (added by
externalsrc.bbclass) was leaked, giving a warning like:

  WARNING: <PN>+git999-r0 do_populate_lic: /home/laplante/yocto/sources/poky/bitbake/lib/bb/build.py:582: ResourceWarning: unclosed file <_io.TextIOWrapper name='/home/laplante/yocto/build/workspace/sources/<PN>/singletask.lock' mode='a+' encoding='UTF-8'>
    exec_func(task, localdata)

Signed-off-by: Chris Laplante <chris.laplante@agilent.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit 6beddf6214e22b4002626761031a9e9d34fb04db)
---
 lib/bb/utils.py | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)
Armin Kuster Dec. 23, 2019, 4:50 a.m.
From: Richard Purdie <richard.purdie@linuxfoundation.org>

The looping over multiple changed hashes causes many calls to get_taskhash
and get_unihash which are potentially slow and then overwritten.

Instead, batch up all the tasks which have changed unihashes and then
do one big loop over the changed tasks rather than each in turn.

This makes worlds of difference to the performance graphs and should speed
up build where many tasks are being rehashed.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit c9c68d898985cf0bec6fc95f54c151cc50255cac)
---
 lib/bb/runqueue.py | 103 +++++++++++++++++++++++++++++------------------------
 1 file changed, 56 insertions(+), 47 deletions(-)
Armin Kuster Dec. 23, 2019, 4:50 a.m.
From: Richard Purdie <richard.purdie@linuxfoundation.org>

The "no dependencies" task case was not being correctly considered in this
code and seemed to be the cause of occasionaly task hash mismatch errors
that were being seen as the dependencies were never accounted for properly.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit 608b9f821539de813bfbd9e65950dbc56a274bc2)
---
 lib/bb/runqueue.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Armin Kuster Dec. 23, 2019, 4:50 a.m.
From: Richard Purdie <richard.purdie@linuxfoundation.org>

There are two operations happening in get_taskhash, the building of the
underlying data and the calculation of the hash.

Split these into two funtions since the preparation part doesn't need
to rerun when unihash changes, only the calculation does.

This split allows sigificant performance improvements for hashequiv
in builds where many hashes are equivalent and many hashes are changing.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit 6a32af2808d748819f4af55c443578c8a63062b3)
---
 lib/bb/runqueue.py |  1 +
 lib/bb/siggen.py   | 33 ++++++++++++++++++++++++---------
 2 files changed, 25 insertions(+), 9 deletions(-)
Armin Kuster Dec. 23, 2019, 4:50 a.m.
From: Richard Purdie <richard.purdie@linuxfoundation.org>

Creating a new sorted list of sstate tasks each iteration through runqueue is
extremely ineffecient and was compounded by the recent change from a list to set.

Create one sorted list instead of recreating it each time.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit de18824996841c3f35f54ff5ad12f94f6dc20d88)
---
 lib/bb/runqueue.py | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)
Armin Kuster Dec. 23, 2019, 4:50 a.m.
From: Richard Purdie <richard.purdie@linuxfoundation.org>

Move the calls to difference_update out a code level which improves efficiency
significantly.

Also further combine the outer loop for efficiency too.

These two changes remove a bottleneck from the performance charts.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit e28ec69356f1797de3e4e3fca0fef710bc4564de)
---
 lib/bb/runqueue.py | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)
Armin Kuster Dec. 23, 2019, 4:50 a.m.
From: Richard Purdie <richard.purdie@linuxfoundation.org>

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit 105d1f0748edde7753a4063e6fdc758ffc8a8a9e)
---
 lib/bb/runqueue.py | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)
Armin Kuster Dec. 23, 2019, 4:50 a.m.
From: Richard Purdie <richard.purdie@linuxfoundation.org>

We were seeing this running thousands of times with hashequiv, do
the filtering where it makes more sense and make it persist.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit 2cfeb9998a8ad5b1dcda0bb4e192c5e4306dab17)
---
 lib/bb/runqueue.py | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)
Armin Kuster Dec. 23, 2019, 4:50 a.m.
From: Richard Purdie <richard.purdie@linuxfoundation.org>

This doesn't save much time but does make the profile counts for the
function more accurate which is in itself useful.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit d446fa89d206fbc6d098215163c968ea5a8cf4a9)
---
 lib/bb/runqueue.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
Armin Kuster Dec. 23, 2019, 4:50 a.m.
From: Richard Purdie <richard.purdie@linuxfoundation.org>

We have bb.debug(2, xxx) messages in cooker which are useful for debugging
but have really bad effects on performance, 640,000 calls on recent profile
graphs taking tens of seconds.

Rather than commenting out debug which can be useful for debugging, don't
create events for debug log messages from cooker which would never be seen.
We already stop the messages hitting the IPC but this avoids the overhead
of creating the log messages too, which has been shown to be signficiant
on the profiles. This allows the code to perform whilst allowing debug
messages to be availble when wanted/enabled.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit f04cd931091fb0508badf3e002d70a6952700495)
---
 lib/bb/__init__.py | 5 +++++
 1 file changed, 5 insertions(+)
Peter Kjellerstedt Dec. 23, 2019, 8:25 a.m.
> -----Original Message-----
> From: bitbake-devel-bounces@lists.openembedded.org <bitbake-devel-
> bounces@lists.openembedded.org> On Behalf Of Armin Kuster
> Sent: den 23 december 2019 05:50
> To: bitbake-devel@lists.openembedded.org
> Subject: [bitbake-devel] [1.44 02/18] runqueue/siggen: Allow handling of
> equivalent hashes
> 
> From: Richard Purdie <richard.purdie@linuxfoundation.org>
> 
> Based on the hashserv's new ability to accept hash mappings, update
> runqueue
> to use this through a helper function in siggen.
> 
> This addresses problems with meta-extsdk-toolchain and its dependency on
> gdb-cross which caused errors when building eSDK. See the previous commit
> for more details.
> 
> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
> (cherry picked from commit 39098b4ba2133f4d9229a0aa4fcf4c3e1291286a)
> ---
>  lib/bb/runqueue.py | 31 +++++++++++++++++++------------
>  lib/bb/siggen.py   | 26 ++++++++++++++++++++++++++
>  2 files changed, 45 insertions(+), 12 deletions(-)
> 
> diff --git a/lib/bb/runqueue.py b/lib/bb/runqueue.py
> index bd7f03f..a869ba5 100644
> --- a/lib/bb/runqueue.py
> +++ b/lib/bb/runqueue.py
> @@ -2283,12 +2283,26 @@ class RunQueueExecute:
>                          for dep in self.rqdata.runtaskentries[tid].depends:
>                              procdep.append(dep)
>                          orighash = self.rqdata.runtaskentries[tid].hash
> -                        self.rqdata.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(tid, procdep, self.rqdata.dataCaches[mc_from_tid(tid)])
> +                        newhash = bb.parse.siggen.get_taskhash(tid, procdep, self.rqdata.dataCaches[mc_from_tid(tid)])
>                          origuni = self.rqdata.runtaskentries[tid].unihash
> -                        self.rqdata.runtaskentries[tid].unihash = bb.parse.siggen.get_unihash(tid)
> -                        logger.debug(1, "Task %s hash changes: %s->%s %s->%s" % (tid, orighash, self.rqdata.runtaskentries[tid].hash, origuni, self.rqdata.runtaskentries[tid].unihash))
> +                        newuni = bb.parse.siggen.get_unihash(tid)
> +                        # FIXME, need to check it can come from sstate at all for determinism?
> +                        remapped = False
> +                        if newuni == origuni:
> +                            # Nothing to do, we match, skip code below
> +                            remapped = True
> +                        elif tid in self.scenequeue_covered or tid in self.sq_live:
> +                            # Already ran this setscene task or it running. Report the new taskhash
> +                            remapped = bb.parse.siggen.report_unihash_equiv(tid, newhash, origuni, newuni, self.rqdata.dataCaches)
> +                            logger.info("Already covered setscene for %s so ignoring rehash (remap)" % (tid))

Do we really want this level of logging? I have not commented on it for 
master (yet), as I expect it is invaluable for RP & co that are currently 
working hard on the hash server, but for me as common user of bitbake, 
this is just noise (and a lot of it). I have a local patch that changes 
this to logger.debug(1, ...) together with another logger.info("Task %s 
unihash changed to %s" ...) a few lines earlier to get the cooker UI 
reasonably back to usable. Also see below.

> +
> +                        if not remapped:
> +                            logger.debug(1, "Task %s hash changes: %s->%s %s->%s" % (tid, orighash, newhash, origuni, newuni))
> +                            self.rqdata.runtaskentries[tid].hash = newhash
> +                            self.rqdata.runtaskentries[tid].unihash = newuni
> +                            changed.add(tid)
> +
>                          next |= self.rqdata.runtaskentries[tid].revdeps
> -                        changed.add(tid)
>                          total.remove(tid)
>                          next.intersection_update(total)
> 
> @@ -2307,18 +2321,11 @@ class RunQueueExecute:
>                  self.pending_migrations.add(tid)
> 
>          for tid in self.pending_migrations.copy():
> -            if tid in self.runq_running:
> +            if tid in self.runq_running or tid in self.sq_live:
>                  # Too late, task already running, not much we can do now
>                  self.pending_migrations.remove(tid)
>                  continue
> 
> -            if tid in self.scenequeue_covered or tid in self.sq_live:
> -                # Already ran this setscene task or it running
> -                # Potentially risky, should we report this hash as a match?
> -                logger.info("Already covered setscene for %s so ignoring rehash" % (tid))
> -                self.pending_migrations.remove(tid)
> -                continue
> -
>              valid = True
>              # Check no tasks this covers are running
>              for dep in self.sqdata.sq_covered_tasks[tid]:
> diff --git a/lib/bb/siggen.py b/lib/bb/siggen.py
> index e19812b..edf1010 100644
> --- a/lib/bb/siggen.py
> +++ b/lib/bb/siggen.py
> @@ -525,6 +525,32 @@ class SignatureGeneratorUniHashMixIn(object):
>                  except OSError:
>                      pass
> 
> +    def report_unihash_equiv(self, tid, taskhash, wanted_unihash, current_unihash, datacaches):
> +        try:
> +            extra_data = {}
> +            data = self.client().report_unihash_equiv(taskhash, self.method, wanted_unihash, extra_data)
> +            bb.note('Reported task %s as unihash %s to %s (%s)' % (tid, wanted_unihash, self.server, str(data)))

This one and ...

> +
> +            if data is None:
> +                bb.warn("Server unable to handle unihash report")
> +                return False
> +
> +            finalunihash = data['unihash']
> +
> +            if finalunihash == current_unihash:
> +                bb.note('Task %s unihash %s unchanged by server' % (tid, finalunihash))
> +            elif finalunihash == wanted_unihash:
> +                bb.note('Task %s unihash changed %s -> %s as wanted' % (tid, current_unihash, finalunihash))
> +                self.set_unihash(tid, finalunihash)
> +                return True
> +            else:
> +                # TODO: What to do here?
> +                bb.note('Task %s unihash reported as unwanted hash %s' % (tid, finalunihash))

... and these also seem excessive. I have changed them locally to 
bb.debug(1, ...) too.

> +        except hashserv.client.HashConnectionError as e:
> +            bb.warn('Error contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
> +
> +        return False
> 
>  #
>  # Dummy class used for bitbake-selftest
> --
> 2.7.4

//Peter
Armin Kuster Dec. 23, 2019, 4:18 p.m.
On 12/23/19 12:25 AM, Peter Kjellerstedt wrote:
>> -----Original Message-----
>> From: bitbake-devel-bounces@lists.openembedded.org <bitbake-devel-
>> bounces@lists.openembedded.org> On Behalf Of Armin Kuster
>> Sent: den 23 december 2019 05:50
>> To: bitbake-devel@lists.openembedded.org
>> Subject: [bitbake-devel] [1.44 02/18] runqueue/siggen: Allow handling of
>> equivalent hashes
>>
>> From: Richard Purdie <richard.purdie@linuxfoundation.org>
>>
>> Based on the hashserv's new ability to accept hash mappings, update
>> runqueue
>> to use this through a helper function in siggen.
>>
>> This addresses problems with meta-extsdk-toolchain and its dependency on
>> gdb-cross which caused errors when building eSDK. See the previous commit
>> for more details.
>>
>> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
>> (cherry picked from commit 39098b4ba2133f4d9229a0aa4fcf4c3e1291286a)
>> ---
>>  lib/bb/runqueue.py | 31 +++++++++++++++++++------------
>>  lib/bb/siggen.py   | 26 ++++++++++++++++++++++++++
>>  2 files changed, 45 insertions(+), 12 deletions(-)
>>
>> diff --git a/lib/bb/runqueue.py b/lib/bb/runqueue.py
>> index bd7f03f..a869ba5 100644
>> --- a/lib/bb/runqueue.py
>> +++ b/lib/bb/runqueue.py
>> @@ -2283,12 +2283,26 @@ class RunQueueExecute:
>>                          for dep in self.rqdata.runtaskentries[tid].depends:
>>                              procdep.append(dep)
>>                          orighash = self.rqdata.runtaskentries[tid].hash
>> -                        self.rqdata.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(tid, procdep, self.rqdata.dataCaches[mc_from_tid(tid)])
>> +                        newhash = bb.parse.siggen.get_taskhash(tid, procdep, self.rqdata.dataCaches[mc_from_tid(tid)])
>>                          origuni = self.rqdata.runtaskentries[tid].unihash
>> -                        self.rqdata.runtaskentries[tid].unihash = bb.parse.siggen.get_unihash(tid)
>> -                        logger.debug(1, "Task %s hash changes: %s->%s %s->%s" % (tid, orighash, self.rqdata.runtaskentries[tid].hash, origuni, self.rqdata.runtaskentries[tid].unihash))
>> +                        newuni = bb.parse.siggen.get_unihash(tid)
>> +                        # FIXME, need to check it can come from sstate at all for determinism?
>> +                        remapped = False
>> +                        if newuni == origuni:
>> +                            # Nothing to do, we match, skip code below
>> +                            remapped = True
>> +                        elif tid in self.scenequeue_covered or tid in self.sq_live:
>> +                            # Already ran this setscene task or it running. Report the new taskhash
>> +                            remapped = bb.parse.siggen.report_unihash_equiv(tid, newhash, origuni, newuni, self.rqdata.dataCaches)
>> +                            logger.info("Already covered setscene for %s so ignoring rehash (remap)" % (tid))
> Do we really want this level of logging? 
I appreciate you taking the time out to review the patch series.

For keeping backporting easier while master continues to address hash
equivalence, it seemed responsible to keep patches as they are.

Also for my own sanity, I needed to start collection fixes for hq as I
know we are not finished sorting that out and I feel like there will be
more to come.
> I have not commented on it for 
> master (yet), as I expect it is invaluable for RP & co that are currently 
> working hard on the hash server, but for me as common user of bitbake, 
> this is just noise (and a lot of it). I have a local patch that changes 
> this to logger.debug(1, ...) together with another logger.info("Task %s 
> unihash changed to %s" ...) a few lines earlier to get the cooker UI 
> reasonably back to usable. Also see below.

I will wait for Master for any logging changes made there before
diverging 1.44. The next zeus update which is currently scheduled after
M2. ( Jan/2021) and if hq is still showing signs of being problematic, i
suspect this entire series may wait for more stability from master.

regards,
Armin

>
>> +
>> +                        if not remapped:
>> +                            logger.debug(1, "Task %s hash changes: %s->%s %s->%s" % (tid, orighash, newhash, origuni, newuni))
>> +                            self.rqdata.runtaskentries[tid].hash = newhash
>> +                            self.rqdata.runtaskentries[tid].unihash = newuni
>> +                            changed.add(tid)
>> +
>>                          next |= self.rqdata.runtaskentries[tid].revdeps
>> -                        changed.add(tid)
>>                          total.remove(tid)
>>                          next.intersection_update(total)
>>
>> @@ -2307,18 +2321,11 @@ class RunQueueExecute:
>>                  self.pending_migrations.add(tid)
>>
>>          for tid in self.pending_migrations.copy():
>> -            if tid in self.runq_running:
>> +            if tid in self.runq_running or tid in self.sq_live:
>>                  # Too late, task already running, not much we can do now
>>                  self.pending_migrations.remove(tid)
>>                  continue
>>
>> -            if tid in self.scenequeue_covered or tid in self.sq_live:
>> -                # Already ran this setscene task or it running
>> -                # Potentially risky, should we report this hash as a match?
>> -                logger.info("Already covered setscene for %s so ignoring rehash" % (tid))
>> -                self.pending_migrations.remove(tid)
>> -                continue
>> -
>>              valid = True
>>              # Check no tasks this covers are running
>>              for dep in self.sqdata.sq_covered_tasks[tid]:
>> diff --git a/lib/bb/siggen.py b/lib/bb/siggen.py
>> index e19812b..edf1010 100644
>> --- a/lib/bb/siggen.py
>> +++ b/lib/bb/siggen.py
>> @@ -525,6 +525,32 @@ class SignatureGeneratorUniHashMixIn(object):
>>                  except OSError:
>>                      pass
>>
>> +    def report_unihash_equiv(self, tid, taskhash, wanted_unihash, current_unihash, datacaches):
>> +        try:
>> +            extra_data = {}
>> +            data = self.client().report_unihash_equiv(taskhash, self.method, wanted_unihash, extra_data)
>> +            bb.note('Reported task %s as unihash %s to %s (%s)' % (tid, wanted_unihash, self.server, str(data)))
> This one and ...
>
>> +
>> +            if data is None:
>> +                bb.warn("Server unable to handle unihash report")
>> +                return False
>> +
>> +            finalunihash = data['unihash']
>> +
>> +            if finalunihash == current_unihash:
>> +                bb.note('Task %s unihash %s unchanged by server' % (tid, finalunihash))
>> +            elif finalunihash == wanted_unihash:
>> +                bb.note('Task %s unihash changed %s -> %s as wanted' % (tid, current_unihash, finalunihash))
>> +                self.set_unihash(tid, finalunihash)
>> +                return True
>> +            else:
>> +                # TODO: What to do here?
>> +                bb.note('Task %s unihash reported as unwanted hash %s' % (tid, finalunihash))
> ... and these also seem excessive. I have changed them locally to 
> bb.debug(1, ...) too.
>
>> +        except hashserv.client.HashConnectionError as e:
>> +            bb.warn('Error contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
>> +
>> +        return False
>>
>>  #
>>  # Dummy class used for bitbake-selftest
>> --
>> 2.7.4
> //Peter
>
Richard Purdie Dec. 28, 2019, 12:24 p.m.
On Mon, 2019-12-23 at 08:25 +0000, Peter Kjellerstedt wrote:
> > -----Original Message-----
> > From: bitbake-devel-bounces@lists.openembedded.org <bitbake-devel-
> > bounces@lists.openembedded.org> On Behalf Of Armin Kuster
> > Sent: den 23 december 2019 05:50
> > To: bitbake-devel@lists.openembedded.org
> > Subject: [bitbake-devel] [1.44 02/18] runqueue/siggen: Allow handling of
> > equivalent hashes
> > 
> > From: Richard Purdie <richard.purdie@linuxfoundation.org>
> > 
> > Based on the hashserv's new ability to accept hash mappings, update
> > runqueue
> > to use this through a helper function in siggen.
> > 
> > This addresses problems with meta-extsdk-toolchain and its dependency on
> > gdb-cross which caused errors when building eSDK. See the previous commit
> > for more details.
> > 
> > Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
> > (cherry picked from commit 39098b4ba2133f4d9229a0aa4fcf4c3e1291286a)
> > ---
> >  lib/bb/runqueue.py | 31 +++++++++++++++++++------------
> >  lib/bb/siggen.py   | 26 ++++++++++++++++++++++++++
> >  2 files changed, 45 insertions(+), 12 deletions(-)
> > 
> > diff --git a/lib/bb/runqueue.py b/lib/bb/runqueue.py
> > index bd7f03f..a869ba5 100644
> > --- a/lib/bb/runqueue.py
> > +++ b/lib/bb/runqueue.py
> > @@ -2283,12 +2283,26 @@ class RunQueueExecute:
> >                          for dep in self.rqdata.runtaskentries[tid].depends:
> >                              procdep.append(dep)
> >                          orighash = self.rqdata.runtaskentries[tid].hash
> > -                        self.rqdata.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(tid, procdep, self.rqdata.dataCaches[mc_from_tid(tid)])
> > +                        newhash = bb.parse.siggen.get_taskhash(tid, procdep, self.rqdata.dataCaches[mc_from_tid(tid)])
> >                          origuni = self.rqdata.runtaskentries[tid].unihash
> > -                        self.rqdata.runtaskentries[tid].unihash = bb.parse.siggen.get_unihash(tid)
> > -                        logger.debug(1, "Task %s hash changes: %s->%s %s->%s" % (tid, orighash, self.rqdata.runtaskentries[tid].hash, origuni, self.rqdata.runtaskentries[tid].unihash))
> > +                        newuni = bb.parse.siggen.get_unihash(tid)
> > +                        # FIXME, need to check it can come from sstate at all for determinism?
> > +                        remapped = False
> > +                        if newuni == origuni:
> > +                            # Nothing to do, we match, skip code below
> > +                            remapped = True
> > +                        elif tid in self.scenequeue_covered or tid in self.sq_live:
> > +                            # Already ran this setscene task or it running. Report the new taskhash
> > +                            remapped = bb.parse.siggen.report_unihash_equiv(tid, newhash, origuni, newuni, self.rqdata.dataCaches)
> > +                            logger.info("Already covered setscene for %s so ignoring rehash (remap)" % (tid))
> 
> Do we really want this level of logging? I have not commented on it for 
> master (yet), as I expect it is invaluable for RP & co that are currently 
> working hard on the hash server, but for me as common user of bitbake, 
> this is just noise (and a lot of it). I have a local patch that changes 
> this to logger.debug(1, ...) together with another logger.info("Task %s 
> unihash changed to %s" ...) a few lines earlier to get the cooker UI 
> reasonably back to usable. Also see below.

This is a tough one. I appreciate the console is getting noisy,
equally, there are some significant problems with hashequiv and without
this logging we're even more in the dark about what went wrong :(.

We're struggling to figure out what is wrong already and I'd much
rather have logs I can look at and have some idea of what happened
rather than shrugging and saying "dunno".

Whilst right now I'm using this to debug the issues the autobuilder is
showing, if other users run into issues we'll need the same logs. If
someone sends me a log with these entries missing, I will close the bug
as unfixable, I really don't have any other choice.

Fundamentally, we need to sort this out differently so that bitbake
does write a decent log somewhere about what happened even if its not
on the console.

We also need to ensure it *is* on the console on the autobuilder since
those are the logs which are preserved.

I do agree with Armin that we should keep master and zeus with the same
behaviour as to do otherwise will just create more of a mess.

I will likely reduce the noise eventually but only when we're not
hitting issues.

Cheers,

Richard