diff mbox series

[kirkstone,2.0,4/8] siggen: Fix insufficent entropy in sigtask file names

Message ID 63bb5591e833de0e7b552963ad9bc4b39e56fda9.1664898736.git.steve@sakoman.com
State Accepted, archived
Commit 63bb5591e833de0e7b552963ad9bc4b39e56fda9
Headers show
Series [kirkstone,2.0,1/8] runqueue: Ensure deferred tasks are sorted by multiconfig | expand

Commit Message

Steve Sakoman Oct. 4, 2022, 3:54 p.m. UTC
From: Joshua Watt <JPEWhacker@gmail.com>

Signature generation uses mkstemp() to get a file descriptor to a unique
file and then write the signature into it. However, the unique file name
generation in glibc is based on the system timestamp, which means that
with highly parallel builds it is more likely than one might expect
expected that a conflict will occur between two different builder nodes.
When operating over NFS (such as a shared sstate cache), this can cause
race conditions and rare failures (particularly with NFS servers that
may not correctly implement O_EXCL).

The signature generation code is particularly susceptible to races since
a single "sigtask." prefix used for all signatures from all tasks, which
makes collision even more likely.

To work around this, add an internal implementation of mkstemp() that
adds additional truly random entropy to the file name to eliminate
conflicts.

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
(cherry picked from commit 97955f3c1c738aa4b4478a6ec10a08094ffc689d)
Signed-off-by: Steve Sakoman <steve@sakoman.com>
---
 lib/bb/siggen.py |  2 +-
 lib/bb/utils.py  | 21 +++++++++++++++++++++
 2 files changed, 22 insertions(+), 1 deletion(-)
diff mbox series

Patch

diff --git a/lib/bb/siggen.py b/lib/bb/siggen.py
index 9fa568f6..bd6fc204 100644
--- a/lib/bb/siggen.py
+++ b/lib/bb/siggen.py
@@ -419,7 +419,7 @@  class SignatureGeneratorBasic(SignatureGenerator):
                 bb.error("Taskhash mismatch %s versus %s for %s" % (computed_taskhash, self.taskhash[tid], tid))
                 sigfile = sigfile.replace(self.taskhash[tid], computed_taskhash)
 
-        fd, tmpfile = tempfile.mkstemp(dir=os.path.dirname(sigfile), prefix="sigtask.")
+        fd, tmpfile = bb.utils.mkstemp(dir=os.path.dirname(sigfile), prefix="sigtask.")
         try:
             with bb.compress.zstd.open(fd, "wt", encoding="utf-8", num_threads=1) as f:
                 json.dump(data, f, sort_keys=True, separators=(",", ":"), cls=SetEncoder)
diff --git a/lib/bb/utils.py b/lib/bb/utils.py
index 95b3c898..92d44c52 100644
--- a/lib/bb/utils.py
+++ b/lib/bb/utils.py
@@ -28,6 +28,8 @@  import signal
 import collections
 import copy
 import ctypes
+import random
+import tempfile
 from subprocess import getstatusoutput
 from contextlib import contextmanager
 from ctypes import cdll
@@ -1756,3 +1758,22 @@  def is_local_uid(uid=''):
             if str(uid) == line_split[2]:
                 return True
     return False
+
+def mkstemp(suffix=None, prefix=None, dir=None, text=False):
+    """
+    Generates a unique filename, independent of time.
+
+    mkstemp() in glibc (at least) generates unique file names based on the
+    current system time. When combined with highly parallel builds, and
+    operating over NFS (e.g. shared sstate/downloads) this can result in
+    conflicts and race conditions.
+
+    This function adds additional entropy to the file name so that a collision
+    is independent of time and thus extremely unlikely.
+    """
+    entropy = "".join(random.choices("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890", k=20))
+    if prefix:
+        prefix = prefix + entropy
+    else:
+        prefix = tempfile.gettempprefix() + entropy
+    return tempfile.mkstemp(suffix=suffix, prefix=prefix, dir=dir, text=text)