Patchwork [bitbake-devel,1/2] bitbake: ensure -f causes dependent tasks to be re-run

login
register
mail settings
Submitter Paul Eggleton
Date June 18, 2012, 3:45 p.m.
Message ID <dcf1fb0acb4bd3424cbb5f48cd62f4f994896c20.1340034255.git.paul.eggleton@linux.intel.com>
Download mbox | patch
Permalink /patch/30057/
State New
Headers show

Comments

Paul Eggleton - June 18, 2012, 3:45 p.m.
If -f is specified, force dependent tasks to be re-run next time. This
works by changing the force behaviour so that instead of deleting the
task's stamp, we write a "taint" file into the stamps directory, which
will alter the taskhash randomly and thus trigger the task to re-run
next time we evaluate whether or not that should be done as well as
influencing the taskhashes of any dependent tasks so that they are
similarly re-triggered. As a bonus because we write this file as
<stamp file name>.taskname.taint, the existing code which deletes the
stamp files in OE's do_clean will already handle removing it.

This means you can now do the following:

bitbake somepackage
[ change the source code in the package's WORKDIR ]
bitbake -c compile -f somepackage
bitbake somepackage

and the result will be that all of the tasks that depend on do_compile
(do_install, do_package, etc.) will be re-run in the last step.

Note that to operate in the manner described above you need full hashing
enabled (i.e. BB_SIGNATURE_HANDLER must be set to a signature handler
that inherits from BasicHash). If this is not the case, -f will just
delete the stamp for the specified task as it did before.

This fix is required for [YOCTO #2615] and [YOCTO #2256].

Signed-off-by: Paul Eggleton <paul.eggleton@linux.intel.com>
---
 bitbake/lib/bb/build.py    |   18 ++++++++++++++++++
 bitbake/lib/bb/cooker.py   |    6 +++---
 bitbake/lib/bb/runqueue.py |   12 ++++++------
 bitbake/lib/bb/siggen.py   |   35 +++++++++++++++++++++++++++++++++++
 4 files changed, 62 insertions(+), 9 deletions(-)
Björn Stenberg - June 19, 2012, 7:35 p.m.
Paul Eggleton wrote:
> If -f is specified, force dependent tasks to be re-run next time. This
> works by changing the force behaviour so that instead of deleting the
> task's stamp, we write a "taint" file into the stamps directory, which
> will alter the taskhash randomly and thus trigger the task to re-run

I'm concerned about calling this -f/--force. I don't think I'm alone in interpreting -f / --force as "run all commands, even if the dependencies say we don't need to". I would expect the same output as the first time, with the same sstate checksum.

Would it be reasonable to call it something like -t/--taint instead?
Paul Eggleton - June 19, 2012, 11:50 p.m.
On Tuesday 19 June 2012 21:35:39 Björn Stenberg wrote:
> Paul Eggleton wrote:
> > If -f is specified, force dependent tasks to be re-run next time. This
> > works by changing the force behaviour so that instead of deleting the
> > task's stamp, we write a "taint" file into the stamps directory, which
> > will alter the taskhash randomly and thus trigger the task to re-run
> 
> I'm concerned about calling this -f/--force. I don't think I'm alone in
> interpreting -f / --force as "run all commands, even if the dependencies
> say we don't need to".

Well, what it used to do was just cause the specified task to be run even if 
there is a stamp recording that it was already done; dependencies don't come 
into it (although perhaps you meant stamps?)

> I would expect the same output as the first time,
> with the same sstate checksum.

So my assumption is -f is most often used for the purpose of manually forcing 
a recompile after you have made modifications to the already extracted source 
code under the workdir. If that is the case, there are two consequences as I 
see it:

 * When Bitbake next checks whether you want to run dependent tasks, given 
that the output of the task almost certainly changed due to your modifications, 
you would want those dependent tasks to be re-run again also. i.e. if you've 
forced a compile, you would expect for the results to be installed and 
packaged when you next ask for the do_install / do_package tasks to run.

 * You don't really want those modifications going into the sstate package that 
effectively claims to be produced from inputs that only come from the metadata. 
Now, obviously it doesn't physically prevent you from ever getting modified 
data into an sstate package with the same signature, but it makes it less 
likely.

My question would be, are you using -f for something different or do you 
disagree with one or both of the consequences above?

Cheers,
Paul
Björn Stenberg - June 20, 2012, 7:45 a.m.
Paul Eggleton wrote:
> So my assumption is -f is most often used for the purpose of manually
> forcing a recompile after you have made modifications to the already
> extracted source code under the workdir.

I agree with that.

My concern is based on the fact that people (including myself) don't fully know all the details of how bitbake works, and tend to make assumptions based on other build systems they know, such as simple Makefiles.

I think the fact that bitbake sometime works differently means we should be extra careful about not playing into devlelopers' assumptions. The bitbake option --force sounds rather similar to make's --always-make, especially when it is described as: "force run of specified cmd, regardless of stamp status". Also see the --force parameter in standard unix utils like cp, mv, rm etc.

If we were to call it something different instead, like -t/--taint, this would avoid some assumptions about its behaviour and make it more clear that the output will be different even if the input is the same.

Sure, it's not a major issue. But I'm fairly confident that if we keep the option name but change its behaviour, I am going to have to explain more than once to developers not following this list or the commit logs why -f does not do what they think (even though one can argue it never did). I'd rather they discover up front that -f is deprecated and that they should look up a new option instead.
Björn Stenberg - June 20, 2012, 7:55 a.m.
Paul Eggleton wrote:
> So my assumption is -f is most often used for the purpose of manually
> forcing a recompile after you have made modifications to the already
> extracted source code under the workdir.

I agree with that.

My concern is based on the fact that people (including myself) don't fully know all the details of how bitbake works, and tend to make assumptions based on other build systems they know, such as simple Makefiles.

I think the fact that bitbake sometime works differently means we should be extra careful about not playing into devlelopers' assumptions. The bitbake option --force sounds rather similar to make's --always-make, especially when it is described as: "force run of specified cmd, regardless of stamp status". While a tangent, the --force parameter in standard unix utils like cp, mv, rm also matter.

If we were to call it something different instead, like -t/--taint, this would avoid some assumptions about its behaviour and make it more clear that the output will be different even if the input is the same.

Sure, it's not a major issue. But I'm fairly confident that if we keep the option name but change its behaviour, I am going to have to explain more than once to developers not following this list or the commit logs why -f does not do what they think (even though one can argue it never did). I'd rather they discover up front that -f is deprecated and that they should look up a new option instead.
Richard Purdie - June 20, 2012, 8:38 a.m.
On Wed, 2012-06-20 at 09:55 +0200, Björn Stenberg wrote:
> Paul Eggleton wrote:
> > So my assumption is -f is most often used for the purpose of manually
> > forcing a recompile after you have made modifications to the already
> > extracted source code under the workdir.
> 
> I agree with that.
> 
> My concern is based on the fact that people (including myself) don't
> fully know all the details of how bitbake works, and tend to make
> assumptions based on other build systems they know, such as simple
> Makefiles.
> 
> I think the fact that bitbake sometime works differently means we
> should be extra careful about not playing into devlelopers'
> assumptions. The bitbake option --force sounds rather similar to
> make's --always-make, especially when it is described as: "force run
> of specified cmd, regardless of stamp status". While a tangent, the
> --force parameter in standard unix utils like cp, mv, rm also matter.
> 
> If we were to call it something different instead, like -t/--taint,
> this would avoid some assumptions about its behaviour and make it more
> clear that the output will be different even if the input is the same.
> 
> Sure, it's not a major issue. But I'm fairly confident that if we keep
> the option name but change its behaviour, I am going to have to
> explain more than once to developers not following this list or the
> commit logs why -f does not do what they think (even though one can
> argue it never did). I'd rather they discover up front that -f is
> deprecated and that they should look up a new option instead.

We should be clear, its not a change in behaviour, its a bugfix for a
variety of nasty problems related to sstate. sstate needs to behave as
people would expect under a variety of circumstances and this change
only changes the interaction between sstate and the stamp files. This is
an area that is relatively new, we've found a nasty issue where sstate
files can become "corrupted" and we need to avoid that as it threatens
the integrity of the project.

I don't think renaming the option is a particularly good idea, that will
upset many user's fingers and mean we have to scrub the documents and in
itself will cause a ton of questions that need to be answered.

I would agree that the bitbake help text should be enhanced to make it
clear what this option does though. I do also think we need to raise
awareness of the change (which is why it was mentioned in yesterday's
meeting).

I appreciate we probably will get this question from time to time. We
(Yocto) are working on adding some kind of question/answer system (stack
overflow style) to the website btw, and this would make a good question
and answer on there! That would give us all a lightweight way to at
least answer the question.

Cheers,

Richard
Paul Eggleton - June 20, 2012, 8:40 a.m.
On Wednesday 20 June 2012 09:55:04 Björn Stenberg wrote:
> Paul Eggleton wrote:
> > So my assumption is -f is most often used for the purpose of manually
> > forcing a recompile after you have made modifications to the already
> > extracted source code under the workdir.
> 
> I agree with that.
> 
> My concern is based on the fact that people (including myself) don't fully
> know all the details of how bitbake works, and tend to make assumptions
> based on other build systems they know, such as simple Makefiles.
> 
> I think the fact that bitbake sometime works differently means we should be
> extra careful about not playing into devlelopers' assumptions. The bitbake
> option --force sounds rather similar to make's --always-make, especially
> when it is described as: "force run of specified cmd, regardless of stamp
> status". 

I've not used this option with make before, so I tried it with the following 
trivial Makefile:

------------- snip --------------
b:
        touch b

a: b
        touch a
------------- snip --------------

If I run "make a" it of course runs "touch b" then "touch a". Then I run "make 
--always-make b", it runs "touch b" which is what you expect. However, if I 
subsequently run "make a" it does a "touch a" again - the dependency b has 
changed, so it recognises it needs to rebuild a. This looks to me to be 
exactly the same as the new proposed behaviour of -f in bitbake. The only 
wrinkle in external behaviour is that "make --always-make a" runs both "touch 
b" and "touch a"; this would be difficult for us to replicate in bitbake and I'm 
not sure it would be particularly useful. In any case it appears to me that if 
we're moving anywhere, we're at least moving in the direction of behaving more 
like make rather than less like it, would you agree?

> If we were to call it something different instead, like -t/--taint, this
> would avoid some assumptions about its behaviour and make it more clear
> that the output will be different even if the input is the same.
> 
> Sure, it's not a major issue. But I'm fairly confident that if we keep the
> option name but change its behaviour, I am going to have to explain more
> than once to developers not following this list or the commit logs why -f
> does not do what they think (even though one can argue it never did). I'd
> rather they discover up front that -f is deprecated and that they should
> look up a new option instead.

This is a change in established behaviour, yes, and all changes in behaviour 
of existing functionality come with a potential cost in terms of breaking 
existing user assumptions. It's worth pointing out though that bitbake's 
behaviour with regard to how it handles building after changes have been made 
has already changed quite significantly recently with the introduction and 
extension of shared state - it is now the case that if you make almost any 
change to the recipe and/or files it points to, the sstate signature changes 
and the task (plus all dependent tasks) will be rebuilt if called for. IMHO, 
this makes the change to -f make more sense - we're remaining consistent with 
the way the rest of the build system now works.

Cheers,
Paul
Björn Stenberg - June 21, 2012, 11:25 a.m.
Paul Eggleton wrote:
> In any case it appears to me that if we're moving anywhere, we're at least
> moving in the direction of behaving more like make rather than less like it,
> would you agree?

Yes, in terms of which commands are being ran. No, in terms of the resulting output.

-f intentionally produces different output. I think this will confuse people, and that we should therefore be extra careful in informing the user.
Paul Eggleton - June 21, 2012, 12:10 p.m.
On Thursday 21 June 2012 13:25:55 Björn Stenberg wrote:
> Paul Eggleton wrote:
> > In any case it appears to me that if we're moving anywhere, we're at least
> > moving in the direction of behaving more like make rather than less like
> > it, would you agree?
> 
> Yes, in terms of which commands are being ran. No, in terms of the resulting
> output.

The resulting output is only different in terms of what the user has changed in 
the sources (if anything) and the sstate checksum. Why do you want it to be 
producing the same checksum for potentially different contents?

> -f intentionally produces different output. I think this will confuse
> people, and that we should therefore be extra careful in informing the
> user.

It intentionally changes the checksum yes, and as Richard said we should 
probably mention this in the help text. I will produce a v2 of this series 
that does that.

Cheers,
Paul
Björn Stenberg - June 21, 2012, 12:26 p.m.
Paul Eggleton wrote:
> Why do you want it to be producing the same checksum for potentially
> different contents?

I don't. Sorry if that was unclear. I want:

a) That we don't need an -f option at all, i.e. that we have dependency tracking pinned down so well that any changed input automatically causes a rebuild and changed hash. (A boy can dream, can't he? :-), or

b) A different name for the -f option, or

c) A clear information message when using -f, maybe something like "INFO: Tainting the hash to force a rebuild", that alerts the user to the goings-on under the hood.
Paul Eggleton - June 21, 2012, 1:25 p.m.
On Thursday 21 June 2012 14:26:37 Björn Stenberg wrote:
> Paul Eggleton wrote:
> > Why do you want it to be producing the same checksum for potentially
> > different contents?
> 
> I don't. Sorry if that was unclear. I want:
> 
> a) That we don't need an -f option at all, i.e. that we have dependency
> tracking pinned down so well that any changed input automatically causes a
> rebuild and changed hash. (A boy can dream, can't he? :-), or

If you mean inputs that come entirely from the metadata, we can do this 
already (although the final pieces - namely detecting changes to local files 
referred to in SRC_URI, and detecting changes to varflags - only went in very 
recently). So for most of the cases in the past where you would have edited 
the recipe and/or files that the recipe points to and then needed to use -f to 
rebuild, now you no longer need to use -f and next time the recipe is called 
for it will be rebuilt automatically. I think we do need to communicate this 
more effectively though.

> b) A different name for the -f option, or
> 
> c) A clear information message when using -f, maybe something like "INFO:
> Tainting the hash to force a rebuild", that alerts the user to the
> goings-on under the hood.

I think c) is reasonable. I'll put together a follow-up patch today to add 
this.

Cheers,
Paul
Björn Stenberg - June 21, 2012, 1:41 p.m.
Björn Stenberg wrote:
> c) A clear information message when using -f, maybe something like "INFO:
> Tainting the hash to force a rebuild", that alerts the user to the goings-on
> under the hood.

Another thing: How will bitbake-diffsigs show a tainted build vs the untained version? It would be very nice if that could be shown clearly.
Paul Eggleton - June 21, 2012, 1:52 p.m.
On Thursday 21 June 2012 15:41:40 Björn Stenberg wrote:
> Björn Stenberg wrote:
> > c) A clear information message when using -f, maybe something like "INFO:
> > Tainting the hash to force a rebuild", that alerts the user to the
> > goings-on under the hood.
> 
> Another thing: How will bitbake-diffsigs show a tainted build vs the
> untained version? It would be very nice if that could be shown clearly.

It reports that the taint changed from None to the taint value (or vice 
versa). Right now the taint value is not particularly useful - it's a random 
UUID; however in future it's possible we might set the taint to something non-
random for other purposes. In any case, if a signature is tainted both 
bitbake-diffsigs and bitbake-dumpsig will report it.

Cheers,
Paul
Richard Purdie - June 21, 2012, 2:44 p.m.
On Thu, 2012-06-21 at 14:26 +0200, Björn Stenberg wrote:
> Paul Eggleton wrote:
> > Why do you want it to be producing the same checksum for potentially
> > different contents?
> 
> I don't. Sorry if that was unclear. I want:
> 
> a) That we don't need an -f option at all, i.e. that we have
> dependency tracking pinned down so well that any changed input
> automatically causes a rebuild and changed hash. (A boy can dream,
> can't he? :-), or

For what its worth I share the dream and in many ways we are working
towards it. There are a few things we still need to figure out to make
that possible without totally destroying performance but we're closer
than we ever have been.

This is also a reason I fought hard not to have a "cleansstate" task.
What would be the point of it? I did give in and let it in whilst
various fixes were made to sstate but I'd like to think we can remove it
at some point.

Cheers,

Richard

Patch

diff --git a/bitbake/lib/bb/build.py b/bitbake/lib/bb/build.py
index a9ba02d..a0a7dd4 100644
--- a/bitbake/lib/bb/build.py
+++ b/bitbake/lib/bb/build.py
@@ -494,6 +494,24 @@  def del_stamp(task, d, file_name = None):
     stamp = stamp_internal(task, d, file_name)
     bb.utils.remove(stamp)
 
+def write_taint(task, d, file_name = None):
+    """
+    Creates a "taint" file which will force the specified task and its
+    dependents to be re-run the next time by influencing the value of its
+    taskhash.
+    (d can be a data dict or dataCache)
+    """
+    import uuid
+    if file_name:
+        taintfn = d.stamp[file_name] + '.' + task + '.taint'
+    else:
+        taintfn = d.getVar('STAMP', True) + '.' + task + '.taint'
+    bb.utils.mkdirhier(os.path.dirname(taintfn))
+    # The specific content of the taint file is not really important,
+    # we just need it to be random, so a random UUID is used
+    with open(taintfn, 'w') as taintf:
+        taintf.write(str(uuid.uuid4()))
+
 def stampfile(taskname, d, file_name = None):
     """
     Return the stamp for a given task
diff --git a/bitbake/lib/bb/cooker.py b/bitbake/lib/bb/cooker.py
index 928b600..9b8d4b2 100644
--- a/bitbake/lib/bb/cooker.py
+++ b/bitbake/lib/bb/cooker.py
@@ -1066,10 +1066,10 @@  class BBCooker:
         self.status.rundeps[fn] = []
         self.status.runrecs[fn] = []
 
-        # Remove stamp for target if force mode active
+        # Invalidate task for target if force mode active
         if self.configuration.force:
-            logger.verbose("Remove stamp %s, %s", task, fn)
-            bb.build.del_stamp('do_%s' % task, self.status, fn)
+            logger.verbose("Invalidate task %s, %s", task, fn)
+            bb.parse.siggen.invalidate_task('do_%s' % task, self.status, fn)
 
         # Setup taskdata structure
         taskdata = bb.taskdata.TaskData(self.configuration.abort)
diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index d925b4c..28eb072 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -705,6 +705,12 @@  class RunQueueData:
                 continue
             self.runq_setscene.append(task)
 
+        # Invalidate task if force mode active
+        if self.cooker.configuration.force:
+            for (fn, target) in self.target_pairs:
+                logger.verbose("Invalidate task %s, %s", target, fn)
+                bb.parse.siggen.invalidate_task(target, self.dataCache, fn)
+
         # Interate over the task list and call into the siggen code
         dealtwith = set()
         todeal = set(range(len(self.runq_fnid)))
@@ -731,12 +737,6 @@  class RunQueueData:
                 deps.append(depidentifier)
             self.hash_deps[identifier] = deps
 
-        # Remove stamps for targets if force mode active
-        if self.cooker.configuration.force:
-            for (fn, target) in self.target_pairs:
-                logger.verbose("Remove stamp %s, %s", target, fn)
-                bb.build.del_stamp(target, self.dataCache, fn)
-
         return len(self.runq_fnid)
 
     def dump_data(self, taskQueue):
diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
index c4b7c39..0fb2642 100644
--- a/bitbake/lib/bb/siggen.py
+++ b/bitbake/lib/bb/siggen.py
@@ -50,6 +50,10 @@  class SignatureGenerator(object):
     def dump_sigtask(self, fn, task, stampbase, runtime):
         return
 
+    def invalidate_task(self, task, d, fn):
+        bb.build.del_stamp(task, d, fn)
+
+
 class SignatureGeneratorBasic(SignatureGenerator):
     """
     """
@@ -153,6 +157,15 @@  class SignatureGeneratorBasic(SignatureGenerator):
                 return False
         return True
 
+    def read_taint(self, fn, task, stampbase):
+        taint = None
+        try:
+            with open(stampbase + '.' + task + '.taint', 'r') as taintf:
+                taint = taintf.read()
+        except IOError:
+            pass
+        return taint
+
     def get_taskhash(self, fn, task, deps, dataCache):
         k = fn + "." + task
         data = dataCache.basetaskhash[k]
@@ -173,6 +186,11 @@  class SignatureGeneratorBasic(SignatureGenerator):
             for (f,cs) in checksums:
                self.file_checksum_values[k][f] = cs
                data = data + cs
+
+        taint = self.read_taint(fn, task, dataCache.stamp[fn])
+        if taint:
+            data = data + taint
+
         h = hashlib.md5(data).hexdigest()
         self.taskhash[k] = h
         #d.setVar("BB_TASKHASH_task-%s" % task, taskhash[task])
@@ -214,6 +232,10 @@  class SignatureGeneratorBasic(SignatureGenerator):
             for dep in data['runtaskdeps']:
                 data['runtaskhashes'][dep] = self.taskhash[dep]
 
+        taint = self.read_taint(fn, task, stampbase)
+        if taint:
+            data['taint'] = taint
+
         with open(sigfile, "wb") as f:
             p = pickle.Pickler(f, -1)
             p.dump(data)
@@ -245,6 +267,9 @@  class SignatureGeneratorBasicHash(SignatureGeneratorBasic):
             h = self.basehash[k]
         return ("%s.%s.%s.%s" % (stampbase, taskname, h, extrainfo)).rstrip('.')
 
+    def invalidate_task(self, task, d, fn):
+        bb.build.write_taint(task, d, fn)
+
 def dump_this_task(outfile, d):
     import bb.parse
     fn = d.getVar("BB_FILENAME", True)
@@ -357,6 +382,13 @@  def compare_sigfiles(a, b):
             for dep in changed:
                 print "Hash for dependent task %s changed from %s to %s" % (dep, a[dep], b[dep])
 
+
+    a_taint = a_data.get('taint', None)
+    b_taint = b_data.get('taint', None)
+    if a_taint != b_taint:
+        print "Taint (by forced/invalidated task) changed from %s to %s" % (a_taint, b_taint)
+
+
 def dump_sigfile(a):
     p1 = pickle.Unpickler(open(a, "rb"))
     a_data = p1.load()
@@ -384,3 +416,6 @@  def dump_sigfile(a):
     if 'runtaskhashes' in a_data:
         for dep in a_data['runtaskhashes']:
             print "Hash for dependent task %s is %s" % (dep, a_data['runtaskhashes'][dep])
+
+    if 'taint' in a_data:
+        print "Tainted (by forced/invalidated task): %s" % a_data['taint']