From patchwork Fri Jan 7 19:08:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Opdenacker X-Patchwork-Id: 2147 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1169CC433EF for ; Fri, 7 Jan 2022 19:08:46 +0000 (UTC) Received: from relay9-d.mail.gandi.net (relay9-d.mail.gandi.net [217.70.183.199]) by mx.groups.io with SMTP id smtpd.web10.11265.1641582524655732761 for ; Fri, 07 Jan 2022 11:08:45 -0800 Authentication-Results: mx.groups.io; dkim=missing; spf=pass (domain: bootlin.com, ip: 217.70.183.199, mailfrom: michael.opdenacker@bootlin.com) Received: (Authenticated sender: michael.opdenacker@bootlin.com) by relay9-d.mail.gandi.net (Postfix) with ESMTPSA id C610AFF80A; Fri, 7 Jan 2022 19:08:42 +0000 (UTC) From: Michael Opdenacker To: docs@lists.yoctoproject.org Cc: Michael Opdenacker Subject: [PATCH V3] overview-manual: document hash equivalence Date: Fri, 7 Jan 2022 20:08:40 +0100 Message-Id: <20220107190840.784216-1-michael.opdenacker@bootlin.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <16C812800894D4AC.11018@lists.yoctoproject.org> References: <16C812800894D4AC.11018@lists.yoctoproject.org> MIME-Version: 1.0 List-Id: X-Webhook-Received: from li982-79.members.linode.com [45.33.32.79] by aws-us-west-2-korg-lkml-1.web.codeaurora.org with HTTPS for ; Fri, 07 Jan 2022 19:08:46 -0000 X-Groupsio-URL: https://lists.yoctoproject.org/g/docs/message/2360 Signed-off-by: Michael Opdenacker --- documentation/overview-manual/concepts.rst | 126 +++++++++++++++++++++ 1 file changed, 126 insertions(+) diff --git a/documentation/overview-manual/concepts.rst b/documentation/overview-manual/concepts.rst index 6f8a3def69..781ba1b070 100644 --- a/documentation/overview-manual/concepts.rst +++ b/documentation/overview-manual/concepts.rst @@ -1938,6 +1938,132 @@ another reason why a task-based approach is preferred over a recipe-based approach, which would have to install the output from every task. +Hash Equivalence +---------------- + +The above section explained how BitBake skips the execution of tasks +which output can already be found in the Shared State cache. + +During a build, it may often be the case that the output / result of a task might +be unchanged despite changes in the task's input values. An example might be +whitespace changes in some input C code. In project terms, this is what we define +as "equivalence". + +To keep track of such equivalence, BitBake has to manage three hashes +for each task: + +- The *task hash* explained earlier: computed from the recipe metadata, + the task code and the task hash values from its dependencies. + When changes are made, these task hashes are therefore modified, + causing the task to re-execute. The task hashes of tasks depending on this + task are therefore modified too, causing the whole dependency + chain to re-execute. + +- The *output hash*, a new hash computed from the output of Shared State tasks, + tasks that save their resulting output to a Shared State tarball. + The mapping between the task hash and its output hash is reported + to a new *Hash Equivalence* server. This mapping is stored in a database + by the server for future reference. + +- The *unihash*, a new hash, initially set to the task hash for the task. + This is used to track the *unicity* of task output, and we will explain + how its value is maintained. + +When Hash Equivalence is enabled, BitBake computes the task hash +for each task by using the unihash of its dependencies, instead +of their task hash. + +Now, imagine that a Shared State task is modified because of a change in +its code or metadata, or because of a change in its dependencies. +Since this modifies its task hash, this task will need re-executing. +Its output hash will therefore be computed again. + +Then, the new mapping between the new task hash and its output hash +will be reported to the Hash Equivalence server. The server will +let BitBake know whether this output hash is the same as a previously +reported output hash, for a different task hash. + +If the output hash is already known, BitBake will update the task's +unihash to match the original task hash that generated that output. +Thanks to this, the depending tasks will keep a previously recorded +task hash, and BitBake will be able to retrieve their output from +the Shared State cache, instead of re-executing them. Similarly, the +output of further downstream tasks can also be retrieved from Shared +Shate. + +If the output hash is unknown, a new entry will be created on the Hash +Equivalence server, matching the task hash to that output. +The depending tasks, still having a new task hash because of the +change, will need to re-execute as expected. The change propagates +to the depending tasks. + +To summarize, when Hash Equivalence is enabled, a change in one of the +tasks in BitBake's run queue doesn't have to propagate to all the +downstream tasks that depend on the output of this task, causing a +full rebuild of such tasks, and so on with the next depending tasks. +Instead, when the output of this task remains identical to previously +recorded output, BitBake can safely retrieve all the downstream +task output from the Shared State cache. + +This applies to multiple scenarios: + +- A "trivial" change to a recipe that doesn't impact its generated output, + such as whitespace changes, modifications to unused code paths or + in the ordering of variables. + +- Shared library updates, for example to fix a security vulnerability. + For sure, the programs using such a library should be rebuilt, but + their new binaries should remain identical. The corresponding tasks should + have a different output hash because of the change in the hash of their + library dependency, but thanks to their output being identical, Hash + Equivalence will stop the propagation down the dependency chain. + +- Native tool updates. Though the depending tasks should be rebuilt, + it's likely that they will generate the same output and be marked + as equivalent. + +This mechanism is enabled by default in Poky, and is controlled by three +variables: + +- :term:`bitbake:BB_HASHSERVE`, specifying a local or remote Hash + Equivalence server to use. + +- :term:`BB_HASHSERVE_UPSTREAM`, when ``BB_HASHSERVE = "auto"``, + allowing to connect the local server to an upstream one. + +- :term:`bitbake:BB_SIGNATURE_HANDLER`, which must be set to ``OEEquivHash``. + +Therefore, the default configuration in Poky corresponds to the +below settings:: + + BB_HASHSERVE = "auto" + BB_SIGNATURE_HANDLER = "OEEquivHash" + +Rather than starting a local server, another possibility is to rely +on a Hash Equivalence server on a network, by setting:: + + BB_HASHSERVE = ":" + +.. note:: + + The shared Hash Equivalence server needs to be maintained together with the + Share State cache. Otherwise, the server could report Shared State hashes + that only exist on specific clients. + + We therefore recommend that one Hash Equivalence server be set up to + correspond with a given Shared State cache, and to start this server + in *read-only mode*, so that it doesn't store equivalences for + Shared State caches that are local to clients. + + See the :term:`BB_HASHSERVE` reference for details about starting + a Hash Equivalence server. + +See the `video `__ +of Joshua Watt's `Hash Equivalence and Reproducible Builds +`__ +presentation at ELC 2020 for a very synthetic introduction to the +Hash Equivalence implementation in the Yocto Project. + Automatically Added Runtime Dependencies ========================================