| Submitter | Paul Eggleton |
|---|---|
| Date | June 18, 2012, 3:45 p.m. |
| Message ID | <dcf1fb0acb4bd3424cbb5f48cd62f4f994896c20.1340034255.git.paul.eggleton@linux.intel.com> |
| Download | mbox | patch |
| Permalink | /patch/30057/ |
| State | New |
| Headers | show |
Comments
Paul Eggleton wrote: > If -f is specified, force dependent tasks to be re-run next time. This > works by changing the force behaviour so that instead of deleting the > task's stamp, we write a "taint" file into the stamps directory, which > will alter the taskhash randomly and thus trigger the task to re-run I'm concerned about calling this -f/--force. I don't think I'm alone in interpreting -f / --force as "run all commands, even if the dependencies say we don't need to". I would expect the same output as the first time, with the same sstate checksum. Would it be reasonable to call it something like -t/--taint instead?
On Tuesday 19 June 2012 21:35:39 Björn Stenberg wrote: > Paul Eggleton wrote: > > If -f is specified, force dependent tasks to be re-run next time. This > > works by changing the force behaviour so that instead of deleting the > > task's stamp, we write a "taint" file into the stamps directory, which > > will alter the taskhash randomly and thus trigger the task to re-run > > I'm concerned about calling this -f/--force. I don't think I'm alone in > interpreting -f / --force as "run all commands, even if the dependencies > say we don't need to". Well, what it used to do was just cause the specified task to be run even if there is a stamp recording that it was already done; dependencies don't come into it (although perhaps you meant stamps?) > I would expect the same output as the first time, > with the same sstate checksum. So my assumption is -f is most often used for the purpose of manually forcing a recompile after you have made modifications to the already extracted source code under the workdir. If that is the case, there are two consequences as I see it: * When Bitbake next checks whether you want to run dependent tasks, given that the output of the task almost certainly changed due to your modifications, you would want those dependent tasks to be re-run again also. i.e. if you've forced a compile, you would expect for the results to be installed and packaged when you next ask for the do_install / do_package tasks to run. * You don't really want those modifications going into the sstate package that effectively claims to be produced from inputs that only come from the metadata. Now, obviously it doesn't physically prevent you from ever getting modified data into an sstate package with the same signature, but it makes it less likely. My question would be, are you using -f for something different or do you disagree with one or both of the consequences above? Cheers, Paul
Paul Eggleton wrote: > So my assumption is -f is most often used for the purpose of manually > forcing a recompile after you have made modifications to the already > extracted source code under the workdir. I agree with that. My concern is based on the fact that people (including myself) don't fully know all the details of how bitbake works, and tend to make assumptions based on other build systems they know, such as simple Makefiles. I think the fact that bitbake sometime works differently means we should be extra careful about not playing into devlelopers' assumptions. The bitbake option --force sounds rather similar to make's --always-make, especially when it is described as: "force run of specified cmd, regardless of stamp status". Also see the --force parameter in standard unix utils like cp, mv, rm etc. If we were to call it something different instead, like -t/--taint, this would avoid some assumptions about its behaviour and make it more clear that the output will be different even if the input is the same. Sure, it's not a major issue. But I'm fairly confident that if we keep the option name but change its behaviour, I am going to have to explain more than once to developers not following this list or the commit logs why -f does not do what they think (even though one can argue it never did). I'd rather they discover up front that -f is deprecated and that they should look up a new option instead.
Paul Eggleton wrote: > So my assumption is -f is most often used for the purpose of manually > forcing a recompile after you have made modifications to the already > extracted source code under the workdir. I agree with that. My concern is based on the fact that people (including myself) don't fully know all the details of how bitbake works, and tend to make assumptions based on other build systems they know, such as simple Makefiles. I think the fact that bitbake sometime works differently means we should be extra careful about not playing into devlelopers' assumptions. The bitbake option --force sounds rather similar to make's --always-make, especially when it is described as: "force run of specified cmd, regardless of stamp status". While a tangent, the --force parameter in standard unix utils like cp, mv, rm also matter. If we were to call it something different instead, like -t/--taint, this would avoid some assumptions about its behaviour and make it more clear that the output will be different even if the input is the same. Sure, it's not a major issue. But I'm fairly confident that if we keep the option name but change its behaviour, I am going to have to explain more than once to developers not following this list or the commit logs why -f does not do what they think (even though one can argue it never did). I'd rather they discover up front that -f is deprecated and that they should look up a new option instead.
On Wed, 2012-06-20 at 09:55 +0200, Björn Stenberg wrote: > Paul Eggleton wrote: > > So my assumption is -f is most often used for the purpose of manually > > forcing a recompile after you have made modifications to the already > > extracted source code under the workdir. > > I agree with that. > > My concern is based on the fact that people (including myself) don't > fully know all the details of how bitbake works, and tend to make > assumptions based on other build systems they know, such as simple > Makefiles. > > I think the fact that bitbake sometime works differently means we > should be extra careful about not playing into devlelopers' > assumptions. The bitbake option --force sounds rather similar to > make's --always-make, especially when it is described as: "force run > of specified cmd, regardless of stamp status". While a tangent, the > --force parameter in standard unix utils like cp, mv, rm also matter. > > If we were to call it something different instead, like -t/--taint, > this would avoid some assumptions about its behaviour and make it more > clear that the output will be different even if the input is the same. > > Sure, it's not a major issue. But I'm fairly confident that if we keep > the option name but change its behaviour, I am going to have to > explain more than once to developers not following this list or the > commit logs why -f does not do what they think (even though one can > argue it never did). I'd rather they discover up front that -f is > deprecated and that they should look up a new option instead. We should be clear, its not a change in behaviour, its a bugfix for a variety of nasty problems related to sstate. sstate needs to behave as people would expect under a variety of circumstances and this change only changes the interaction between sstate and the stamp files. This is an area that is relatively new, we've found a nasty issue where sstate files can become "corrupted" and we need to avoid that as it threatens the integrity of the project. I don't think renaming the option is a particularly good idea, that will upset many user's fingers and mean we have to scrub the documents and in itself will cause a ton of questions that need to be answered. I would agree that the bitbake help text should be enhanced to make it clear what this option does though. I do also think we need to raise awareness of the change (which is why it was mentioned in yesterday's meeting). I appreciate we probably will get this question from time to time. We (Yocto) are working on adding some kind of question/answer system (stack overflow style) to the website btw, and this would make a good question and answer on there! That would give us all a lightweight way to at least answer the question. Cheers, Richard
On Wednesday 20 June 2012 09:55:04 Björn Stenberg wrote: > Paul Eggleton wrote: > > So my assumption is -f is most often used for the purpose of manually > > forcing a recompile after you have made modifications to the already > > extracted source code under the workdir. > > I agree with that. > > My concern is based on the fact that people (including myself) don't fully > know all the details of how bitbake works, and tend to make assumptions > based on other build systems they know, such as simple Makefiles. > > I think the fact that bitbake sometime works differently means we should be > extra careful about not playing into devlelopers' assumptions. The bitbake > option --force sounds rather similar to make's --always-make, especially > when it is described as: "force run of specified cmd, regardless of stamp > status". I've not used this option with make before, so I tried it with the following trivial Makefile: ------------- snip -------------- b: touch b a: b touch a ------------- snip -------------- If I run "make a" it of course runs "touch b" then "touch a". Then I run "make --always-make b", it runs "touch b" which is what you expect. However, if I subsequently run "make a" it does a "touch a" again - the dependency b has changed, so it recognises it needs to rebuild a. This looks to me to be exactly the same as the new proposed behaviour of -f in bitbake. The only wrinkle in external behaviour is that "make --always-make a" runs both "touch b" and "touch a"; this would be difficult for us to replicate in bitbake and I'm not sure it would be particularly useful. In any case it appears to me that if we're moving anywhere, we're at least moving in the direction of behaving more like make rather than less like it, would you agree? > If we were to call it something different instead, like -t/--taint, this > would avoid some assumptions about its behaviour and make it more clear > that the output will be different even if the input is the same. > > Sure, it's not a major issue. But I'm fairly confident that if we keep the > option name but change its behaviour, I am going to have to explain more > than once to developers not following this list or the commit logs why -f > does not do what they think (even though one can argue it never did). I'd > rather they discover up front that -f is deprecated and that they should > look up a new option instead. This is a change in established behaviour, yes, and all changes in behaviour of existing functionality come with a potential cost in terms of breaking existing user assumptions. It's worth pointing out though that bitbake's behaviour with regard to how it handles building after changes have been made has already changed quite significantly recently with the introduction and extension of shared state - it is now the case that if you make almost any change to the recipe and/or files it points to, the sstate signature changes and the task (plus all dependent tasks) will be rebuilt if called for. IMHO, this makes the change to -f make more sense - we're remaining consistent with the way the rest of the build system now works. Cheers, Paul
Paul Eggleton wrote: > In any case it appears to me that if we're moving anywhere, we're at least > moving in the direction of behaving more like make rather than less like it, > would you agree? Yes, in terms of which commands are being ran. No, in terms of the resulting output. -f intentionally produces different output. I think this will confuse people, and that we should therefore be extra careful in informing the user.
On Thursday 21 June 2012 13:25:55 Björn Stenberg wrote: > Paul Eggleton wrote: > > In any case it appears to me that if we're moving anywhere, we're at least > > moving in the direction of behaving more like make rather than less like > > it, would you agree? > > Yes, in terms of which commands are being ran. No, in terms of the resulting > output. The resulting output is only different in terms of what the user has changed in the sources (if anything) and the sstate checksum. Why do you want it to be producing the same checksum for potentially different contents? > -f intentionally produces different output. I think this will confuse > people, and that we should therefore be extra careful in informing the > user. It intentionally changes the checksum yes, and as Richard said we should probably mention this in the help text. I will produce a v2 of this series that does that. Cheers, Paul
Paul Eggleton wrote: > Why do you want it to be producing the same checksum for potentially > different contents? I don't. Sorry if that was unclear. I want: a) That we don't need an -f option at all, i.e. that we have dependency tracking pinned down so well that any changed input automatically causes a rebuild and changed hash. (A boy can dream, can't he? :-), or b) A different name for the -f option, or c) A clear information message when using -f, maybe something like "INFO: Tainting the hash to force a rebuild", that alerts the user to the goings-on under the hood.
On Thursday 21 June 2012 14:26:37 Björn Stenberg wrote: > Paul Eggleton wrote: > > Why do you want it to be producing the same checksum for potentially > > different contents? > > I don't. Sorry if that was unclear. I want: > > a) That we don't need an -f option at all, i.e. that we have dependency > tracking pinned down so well that any changed input automatically causes a > rebuild and changed hash. (A boy can dream, can't he? :-), or If you mean inputs that come entirely from the metadata, we can do this already (although the final pieces - namely detecting changes to local files referred to in SRC_URI, and detecting changes to varflags - only went in very recently). So for most of the cases in the past where you would have edited the recipe and/or files that the recipe points to and then needed to use -f to rebuild, now you no longer need to use -f and next time the recipe is called for it will be rebuilt automatically. I think we do need to communicate this more effectively though. > b) A different name for the -f option, or > > c) A clear information message when using -f, maybe something like "INFO: > Tainting the hash to force a rebuild", that alerts the user to the > goings-on under the hood. I think c) is reasonable. I'll put together a follow-up patch today to add this. Cheers, Paul
Björn Stenberg wrote: > c) A clear information message when using -f, maybe something like "INFO: > Tainting the hash to force a rebuild", that alerts the user to the goings-on > under the hood. Another thing: How will bitbake-diffsigs show a tainted build vs the untained version? It would be very nice if that could be shown clearly.
On Thursday 21 June 2012 15:41:40 Björn Stenberg wrote: > Björn Stenberg wrote: > > c) A clear information message when using -f, maybe something like "INFO: > > Tainting the hash to force a rebuild", that alerts the user to the > > goings-on under the hood. > > Another thing: How will bitbake-diffsigs show a tainted build vs the > untained version? It would be very nice if that could be shown clearly. It reports that the taint changed from None to the taint value (or vice versa). Right now the taint value is not particularly useful - it's a random UUID; however in future it's possible we might set the taint to something non- random for other purposes. In any case, if a signature is tainted both bitbake-diffsigs and bitbake-dumpsig will report it. Cheers, Paul
On Thu, 2012-06-21 at 14:26 +0200, Björn Stenberg wrote: > Paul Eggleton wrote: > > Why do you want it to be producing the same checksum for potentially > > different contents? > > I don't. Sorry if that was unclear. I want: > > a) That we don't need an -f option at all, i.e. that we have > dependency tracking pinned down so well that any changed input > automatically causes a rebuild and changed hash. (A boy can dream, > can't he? :-), or For what its worth I share the dream and in many ways we are working towards it. There are a few things we still need to figure out to make that possible without totally destroying performance but we're closer than we ever have been. This is also a reason I fought hard not to have a "cleansstate" task. What would be the point of it? I did give in and let it in whilst various fixes were made to sstate but I'd like to think we can remove it at some point. Cheers, Richard
Patch
diff --git a/bitbake/lib/bb/build.py b/bitbake/lib/bb/build.py index a9ba02d..a0a7dd4 100644 --- a/bitbake/lib/bb/build.py +++ b/bitbake/lib/bb/build.py @@ -494,6 +494,24 @@ def del_stamp(task, d, file_name = None): stamp = stamp_internal(task, d, file_name) bb.utils.remove(stamp) +def write_taint(task, d, file_name = None): + """ + Creates a "taint" file which will force the specified task and its + dependents to be re-run the next time by influencing the value of its + taskhash. + (d can be a data dict or dataCache) + """ + import uuid + if file_name: + taintfn = d.stamp[file_name] + '.' + task + '.taint' + else: + taintfn = d.getVar('STAMP', True) + '.' + task + '.taint' + bb.utils.mkdirhier(os.path.dirname(taintfn)) + # The specific content of the taint file is not really important, + # we just need it to be random, so a random UUID is used + with open(taintfn, 'w') as taintf: + taintf.write(str(uuid.uuid4())) + def stampfile(taskname, d, file_name = None): """ Return the stamp for a given task diff --git a/bitbake/lib/bb/cooker.py b/bitbake/lib/bb/cooker.py index 928b600..9b8d4b2 100644 --- a/bitbake/lib/bb/cooker.py +++ b/bitbake/lib/bb/cooker.py @@ -1066,10 +1066,10 @@ class BBCooker: self.status.rundeps[fn] = [] self.status.runrecs[fn] = [] - # Remove stamp for target if force mode active + # Invalidate task for target if force mode active if self.configuration.force: - logger.verbose("Remove stamp %s, %s", task, fn) - bb.build.del_stamp('do_%s' % task, self.status, fn) + logger.verbose("Invalidate task %s, %s", task, fn) + bb.parse.siggen.invalidate_task('do_%s' % task, self.status, fn) # Setup taskdata structure taskdata = bb.taskdata.TaskData(self.configuration.abort) diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py index d925b4c..28eb072 100644 --- a/bitbake/lib/bb/runqueue.py +++ b/bitbake/lib/bb/runqueue.py @@ -705,6 +705,12 @@ class RunQueueData: continue self.runq_setscene.append(task) + # Invalidate task if force mode active + if self.cooker.configuration.force: + for (fn, target) in self.target_pairs: + logger.verbose("Invalidate task %s, %s", target, fn) + bb.parse.siggen.invalidate_task(target, self.dataCache, fn) + # Interate over the task list and call into the siggen code dealtwith = set() todeal = set(range(len(self.runq_fnid))) @@ -731,12 +737,6 @@ class RunQueueData: deps.append(depidentifier) self.hash_deps[identifier] = deps - # Remove stamps for targets if force mode active - if self.cooker.configuration.force: - for (fn, target) in self.target_pairs: - logger.verbose("Remove stamp %s, %s", target, fn) - bb.build.del_stamp(target, self.dataCache, fn) - return len(self.runq_fnid) def dump_data(self, taskQueue): diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py index c4b7c39..0fb2642 100644 --- a/bitbake/lib/bb/siggen.py +++ b/bitbake/lib/bb/siggen.py @@ -50,6 +50,10 @@ class SignatureGenerator(object): def dump_sigtask(self, fn, task, stampbase, runtime): return + def invalidate_task(self, task, d, fn): + bb.build.del_stamp(task, d, fn) + + class SignatureGeneratorBasic(SignatureGenerator): """ """ @@ -153,6 +157,15 @@ class SignatureGeneratorBasic(SignatureGenerator): return False return True + def read_taint(self, fn, task, stampbase): + taint = None + try: + with open(stampbase + '.' + task + '.taint', 'r') as taintf: + taint = taintf.read() + except IOError: + pass + return taint + def get_taskhash(self, fn, task, deps, dataCache): k = fn + "." + task data = dataCache.basetaskhash[k] @@ -173,6 +186,11 @@ class SignatureGeneratorBasic(SignatureGenerator): for (f,cs) in checksums: self.file_checksum_values[k][f] = cs data = data + cs + + taint = self.read_taint(fn, task, dataCache.stamp[fn]) + if taint: + data = data + taint + h = hashlib.md5(data).hexdigest() self.taskhash[k] = h #d.setVar("BB_TASKHASH_task-%s" % task, taskhash[task]) @@ -214,6 +232,10 @@ class SignatureGeneratorBasic(SignatureGenerator): for dep in data['runtaskdeps']: data['runtaskhashes'][dep] = self.taskhash[dep] + taint = self.read_taint(fn, task, stampbase) + if taint: + data['taint'] = taint + with open(sigfile, "wb") as f: p = pickle.Pickler(f, -1) p.dump(data) @@ -245,6 +267,9 @@ class SignatureGeneratorBasicHash(SignatureGeneratorBasic): h = self.basehash[k] return ("%s.%s.%s.%s" % (stampbase, taskname, h, extrainfo)).rstrip('.') + def invalidate_task(self, task, d, fn): + bb.build.write_taint(task, d, fn) + def dump_this_task(outfile, d): import bb.parse fn = d.getVar("BB_FILENAME", True) @@ -357,6 +382,13 @@ def compare_sigfiles(a, b): for dep in changed: print "Hash for dependent task %s changed from %s to %s" % (dep, a[dep], b[dep]) + + a_taint = a_data.get('taint', None) + b_taint = b_data.get('taint', None) + if a_taint != b_taint: + print "Taint (by forced/invalidated task) changed from %s to %s" % (a_taint, b_taint) + + def dump_sigfile(a): p1 = pickle.Unpickler(open(a, "rb")) a_data = p1.load() @@ -384,3 +416,6 @@ def dump_sigfile(a): if 'runtaskhashes' in a_data: for dep in a_data['runtaskhashes']: print "Hash for dependent task %s is %s" % (dep, a_data['runtaskhashes'][dep]) + + if 'taint' in a_data: + print "Tainted (by forced/invalidated task): %s" % a_data['taint']
If -f is specified, force dependent tasks to be re-run next time. This works by changing the force behaviour so that instead of deleting the task's stamp, we write a "taint" file into the stamps directory, which will alter the taskhash randomly and thus trigger the task to re-run next time we evaluate whether or not that should be done as well as influencing the taskhashes of any dependent tasks so that they are similarly re-triggered. As a bonus because we write this file as <stamp file name>.taskname.taint, the existing code which deletes the stamp files in OE's do_clean will already handle removing it. This means you can now do the following: bitbake somepackage [ change the source code in the package's WORKDIR ] bitbake -c compile -f somepackage bitbake somepackage and the result will be that all of the tasks that depend on do_compile (do_install, do_package, etc.) will be re-run in the last step. Note that to operate in the manner described above you need full hashing enabled (i.e. BB_SIGNATURE_HANDLER must be set to a signature handler that inherits from BasicHash). If this is not the case, -f will just delete the stamp for the specified task as it did before. This fix is required for [YOCTO #2615] and [YOCTO #2256]. Signed-off-by: Paul Eggleton <paul.eggleton@linux.intel.com> --- bitbake/lib/bb/build.py | 18 ++++++++++++++++++ bitbake/lib/bb/cooker.py | 6 +++--- bitbake/lib/bb/runqueue.py | 12 ++++++------ bitbake/lib/bb/siggen.py | 35 +++++++++++++++++++++++++++++++++++ 4 files changed, 62 insertions(+), 9 deletions(-)