[bitbake-devel,dunfell,1.46,7/8] server/process: Fix a rare lockfile race

Submitted by Steve Sakoman on July 15, 2020, 2:26 p.m. | Patch ID: 174457


Message ID de919782f488a83b80d7c40896bf5b2596f1f65f.1594822992.git.steve@sakoman.com
State New
Headers show

Commit Message

Steve Sakoman July 15, 2020, 2:26 p.m.
From: Richard Purdie <richard.purdie@linuxfoundation.org>

We're seeing rare occasional races on the autobuilder as if two server
processes have the lockfile at the same time. We need to be extremely
careful this does not happen.

I think there is a potential race in this shutdown code since we delete
the lockfile, then call unlockfile() which also tries to delete it.

This means we may remove a lock file now held by another process if we're
unlucky. Since unlockfile removes the lockfile when it can, just rely on
that and remove any possible race window.

An example cooker-deamonlog:

--- Starting bitbake server pid 2266 at 2020-07-11 06:17:18.210777 ---
Started bitbake server pid 2266
Entering server connection loop
Accepting [<socket.socket fd=20, family=AddressFamily.AF_UNIX, type=SocketKind.SOCK_STREAM, proto=0, laddr=bitbake.sock>] ([])
Processing Client
Connecting Client
Running command ['setFeatures', [2]]
Running command ['updateConfig', XXX]
Running command ['getVariable', 'BBINCLUDELOGS']
Running command ['getVariable', 'BBINCLUDELOGS_LINES']
Running command ['getSetVariable', 'BB_CONSOLELOG']
Running command ['getSetVariable', 'BB_LOGCONFIG']
Running command ['getUIHandlerNum']
Running command ['setEventMask', XXXX]
Running command ['getVariable', 'BB_DEFAULT_TASK']
Running command ['setConfig', 'cmd', 'build']
Running command ['getVariable', 'BBTARGETS']
Running command ['parseFiles']
--- Starting bitbake server pid 8252 at 2020-07-11 06:17:28.584514 ---
Started bitbake server pid 8252
--- Starting bitbake server pid 13278 at 2020-07-11 06:17:31.330635 ---
Started bitbake server pid 13278
Running command ['dataStoreConnectorCmd', 0, 'getVar', ('BBMULTICONFIG',), {}]
Running command ['getRecipes', '']
Running command ['clientComplete']
Processing Client
Disconnecting Client
No timeout, exiting.

where it looks like there are two server processes running which should not be.
In that build there was a process left sitting in memory with its bitbake.sock file
missing but holding the lock (not sure why it wouldn't timeout/exit).

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>

(cherry picked from commit e1a7c1821483031b224a1570bfe834da755219cc)

Signed-off-by: Steve Sakoman <steve@sakoman.com>
 lib/bb/server/process.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/lib/bb/server/process.py b/lib/bb/server/process.py
index 83385baf..475931a7 100644
--- a/lib/bb/server/process.py
+++ b/lib/bb/server/process.py
@@ -243,7 +243,7 @@  class ProcessServer(multiprocessing.Process):
                 lock = bb.utils.lockfile(lockfile, shared=False, retry=False, block=True)
                 if lock:
                     # We hold the lock so we can remove the file (hide stale pid data)
-                    bb.utils.remove(lockfile)
+                    # via unlockfile.