| Submitter | Enrico Scholz |
|---|---|
| Date | Oct. 29, 2012, 3:11 p.m. |
| Message ID | <1351523465-26489-1-git-send-email-enrico.scholz@sigma-chemnitz.de> |
| Download | mbox | patch |
| Permalink | /patch/38665/ |
| State | New |
| Headers | show |
Comments
On Mon, 2012-10-29 at 16:11 +0100, Enrico Scholz wrote: > When packages are recreated after a 'bitbake -c clean', files will get > wrong date because tar has been invoked with the '-m' option. > > Correct timestamps are useful for bug hunting and there are better > ways (e.g. using of ntp) than using '-m'. > > Signed-off-by: Enrico Scholz <enrico.scholz@sigma-chemnitz.de> > --- > meta/classes/sstate.bbclass | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass > index cbb14e1..d7c4e11 100644 > --- a/meta/classes/sstate.bbclass > +++ b/meta/classes/sstate.bbclass > @@ -551,7 +551,7 @@ sstate_create_package () { > sstate_unpack_package () { > mkdir -p ${SSTATE_INSTDIR} > cd ${SSTATE_INSTDIR} > - tar -xmvzf ${SSTATE_PKG} > + tar -xvzf ${SSTATE_PKG} > } > > # Need to inject information about classes not in the global configuration scope This is a revert of: http://git.yoctoproject.org/cgit.cgi/poky/commit/?id=2d89cff42af2bb0049224bfaaebaa2b21966169f where the option was added deliberately to deal with time mismatch between autobuilders which was causing real world bugs. Cheers, Richard
Richard Purdie <richard.purdie@linuxfoundation.org> writes: >> When packages are recreated after a 'bitbake -c clean', files will get >> wrong date because tar has been invoked with the '-m' option. >> >> Correct timestamps are useful for bug hunting and there are better >> ways (e.g. using of ntp) than using '-m'. > > This is a revert of: > > http://git.yoctoproject.org/cgit.cgi/poky/commit/?id=2d89cff42af2bb0049224bfaaebaa2b21966169f > > where the option was added deliberately to deal with time mismatch > between autobuilders which was causing real world bugs. But the real bug is the time mismatch in the autobuilders, isn't it? And this can/should be solved by synchronizing time by ntp on them instead of applying dirty hacks like resetting file dates. Enrico
On Mon, 2012-10-29 at 17:24 +0100, Enrico Scholz wrote: > Richard Purdie <richard.purdie@linuxfoundation.org> writes: > > >> When packages are recreated after a 'bitbake -c clean', files will get > >> wrong date because tar has been invoked with the '-m' option. > >> > >> Correct timestamps are useful for bug hunting and there are better > >> ways (e.g. using of ntp) than using '-m'. > > > > This is a revert of: > > > > http://git.yoctoproject.org/cgit.cgi/poky/commit/?id=2d89cff42af2bb0049224bfaaebaa2b21966169f > > > > where the option was added deliberately to deal with time mismatch > > between autobuilders which was causing real world bugs. > > But the real bug is the time mismatch in the autobuilders, isn't it? > And this can/should be solved by synchronizing time by ntp on them > instead of applying dirty hacks like resetting file dates. I have asked that ntp be installed/fixed on the autobuilders to sort the problem out but it seems that even with ntp running, mismatches can happen (e.g. misconfigured timezones). Worse, when this does happen the failures are extremely unpredictable and hard to debug. It causes things to repeatedly recompile for example, even during do_install. So no, I don't think this is a dirty hack, its part of ensuring the builds are deterministic and helping people avoid what can be a very nasty and hard to debug set of build issues. I appreciate it hurts some other debugging forensics but I'd rather that than anyone suffering some of the nasty build failures I debugged. Cheers, Richard
Richard Purdie <richard.purdie@linuxfoundation.org> writes: >> > where the option was added deliberately to deal with time mismatch >> > between autobuilders which was causing real world bugs. >> >> But the real bug is the time mismatch in the autobuilders, isn't it? >> And this can/should be solved by synchronizing time by ntp on them >> instead of applying dirty hacks like resetting file dates. > > I have asked that ntp be installed/fixed on the autobuilders to sort the > problem out but it seems that even with ntp running, mismatches can > happen (e.g. misconfigured timezones). Really? The only timezone related problems might arise when a package makes 'touch -d <date>' during build. But I can not remember that I have ever seen such a package and this problem should be easy to localize. Else, timezone configuration is completely uninteresting for comparing timestamps or for setting time from ntp. > Worse, when this does happen the failures are extremely unpredictable > and hard to debug. It causes things to repeatedly recompile for example, > even during do_install. How comes do_install() into the game? 'populate-lic' are the only sstate files created before do_install() and I can not imagine how they affect the other build phases; do_compile() results are not sstated (and with the 'tar -m' thing, timestamps get havoc completely causing unpredictable rebuilds). When do_install() is executed, all the following sstate files (populate_sysroot, package) are invalidated and must be recreated. So your problem with do_install() sounds more like an incomplete/racy cleanup of old files. Enrico
On Mon, 2012-10-29 at 18:59 +0100, Enrico Scholz wrote: > Richard Purdie <richard.purdie@linuxfoundation.org> writes: > > >> > where the option was added deliberately to deal with time mismatch > >> > between autobuilders which was causing real world bugs. > >> > >> But the real bug is the time mismatch in the autobuilders, isn't it? > >> And this can/should be solved by synchronizing time by ntp on them > >> instead of applying dirty hacks like resetting file dates. > > > > I have asked that ntp be installed/fixed on the autobuilders to sort the > > problem out but it seems that even with ntp running, mismatches can > > happen (e.g. misconfigured timezones). > > Really? The only timezone related problems might arise when a package > makes 'touch -d <date>' during build. But I can not remember that I > have ever seen such a package and this problem should be easy to > localize. > > Else, timezone configuration is completely uninteresting for comparing > timestamps or for setting time from ntp. > > > Worse, when this does happen the failures are extremely unpredictable > > and hard to debug. It causes things to repeatedly recompile for example, > > even during do_install. > > How comes do_install() into the game? 'populate-lic' are the only > sstate files created before do_install() and I can not imagine how they > affect the other build phases; do_compile() results are not sstated > (and with the 'tar -m' thing, timestamps get havoc completely causing > unpredictable rebuilds). > > When do_install() is executed, all the following sstate files > (populate_sysroot, package) are invalidated and must be recreated. So > your problem with do_install() sounds more like an incomplete/racy > cleanup of old files. Set the date stamp of some headers in the target sysroot of some key system components (say glib) to a date about a day in the future, then clean and rebuild some software that uses glib. You will find that every time the recipe runs make, *everything* will recompile. This will happen during do_install as well as do_compile and if you recipe calls make for any other reason, it will recompile again. Normally the builds actually tolerate this quite well, until you get something like qt4 where the do_install environment isn't setup to support compiling and then things explode in interesting ways. Cheers, Richard
Richard Purdie <richard.purdie@linuxfoundation.org> writes: >> >> But the real bug is the time mismatch in the autobuilders, isn't it? >> >> And this can/should be solved by synchronizing time by ntp on them >> >> instead of applying dirty hacks like resetting file dates. >> ... >> > Worse, when this does happen the failures are extremely unpredictable >> > and hard to debug. It causes things to repeatedly recompile for example, >> > even during do_install. > ... > Set the date stamp of some headers in the target sysroot of some key > system components (say glib) to a date about a day in the future, Are there really packages which create files dated in the future? Perhaps a sanity check should be written which rejects files which are newer than their containing directory and/or the time-of-day? > then clean and rebuild some software that uses glib. How will 'tar -m' fix this? It makes things just worse because the files generated with -m are always newer than without -m (in practice, time offset between hosts served by ntp is far below 100ms. which is enough for the build stages doing the sstage file extraction). Enrico
On Mon, 2012-10-29 at 20:00 +0100, Enrico Scholz wrote: > Richard Purdie <richard.purdie@linuxfoundation.org> writes: > > >> >> But the real bug is the time mismatch in the autobuilders, isn't it? > >> >> And this can/should be solved by synchronizing time by ntp on them > >> >> instead of applying dirty hacks like resetting file dates. > >> ... > >> > Worse, when this does happen the failures are extremely unpredictable > >> > and hard to debug. It causes things to repeatedly recompile for example, > >> > even during do_install. > > ... > > Set the date stamp of some headers in the target sysroot of some key > > system components (say glib) to a date about a day in the future, > > Are there really packages which create files dated in the future? > Perhaps a sanity check should be written which rejects files which are > newer than their containing directory and/or the time-of-day? > > > then clean and rebuild some software that uses glib. > > How will 'tar -m' fix this? It makes things just worse because the > files generated with -m are always newer than without -m (in practice, > time offset between hosts served by ntp is far below 100ms. which is > enough for the build stages doing the sstage file extraction). Imagine system A generates the sysroot headers with a time ahead of system B. These are packaged up into an sstate tarball. System B which has a clock at some time behind system A then downloads and uses them so the sysroot headers become some time in the future. You then see the problem I described previously. tar -m fixes this by timestamping things at the time of extraction, thereby removing any issue of the timestamps being in the future. Yes, we could add a step which iterated over the extracted files and checked to see if any were in the future and if so, change their timestamps but it seems a bit overkill when the option to tar resolves all the problems. The alternative is to mandate *every* system that builds are run on use ntp and add checks to sanity.bbclass to this effect since someone might try using a sstate feed with a bad clock. This would cause no end of problems, not least with corporate filewalls and hurt usability of the project so we took the other option which fixes things in a way this should become a non-issue. Cheers, Richard
Richard Purdie <richard.purdie@linuxfoundation.org> writes: >> >> >> But the real bug is the time mismatch in the autobuilders, isn't it? >> >> >> And this can/should be solved by synchronizing time by ntp on them >> >> >> instead of applying dirty hacks like resetting file dates. > ... > Imagine system A generates the sysroot headers with a time ahead of > system B. These are packaged up into an sstate tarball. System B which > has a clock at some time behind system A then downloads and uses them > so the sysroot headers become some time in the future. This can not happen when both machines are synchronizing their time with ntp. Drift to stratum-1 machine is usually <1ms in local networks and <50ms for remote ones (--> see 'ntpq' -> pe output). Nothing, which can cause the problem described by you. > The alternative is to mandate *every* system that builds are run on > use ntp Yes; a common timesource is mandatory for so nearly every distributed system. Even windoze enables (s)ntp clients by default (although its daily synchronization is just a bad joke) and I remember Fedora/Ubuntu enabling it by default too. > and add checks to sanity.bbclass to this effect since someone might > try using a sstate feed with a bad clock. This would cause no end of > problems, not least with corporate filewalls Every non-trivial network has local ntp servers which are used by clients there. > and hurt usability of the project How common is the distributed autobuilder setup? How many of these installations do not use ntp? Enrico
On Mon, 2012-10-29 at 22:41 +0100, Enrico Scholz wrote: > Richard Purdie <richard.purdie@linuxfoundation.org> writes: > > >> >> >> But the real bug is the time mismatch in the autobuilders, isn't it? > >> >> >> And this can/should be solved by synchronizing time by ntp on them > >> >> >> instead of applying dirty hacks like resetting file dates. > > ... > > Imagine system A generates the sysroot headers with a time ahead of > > system B. These are packaged up into an sstate tarball. System B which > > has a clock at some time behind system A then downloads and uses them > > so the sysroot headers become some time in the future. > > This can not happen when both machines are synchronizing their time with > ntp. Drift to stratum-1 machine is usually <1ms in local networks and > <50ms for remote ones (--> see 'ntpq' -> pe output). > > Nothing, which can cause the problem described by you. I have already agreed that this cannot happen if the machines are in good sync via ntp, I'm not arguing with that. > > The alternative is to mandate *every* system that builds are run on > > use ntp > > Yes; a common timesource is mandatory for so nearly every distributed > system. Even windoze enables (s)ntp clients by default (although its > daily synchronization is just a bad joke) and I remember Fedora/Ubuntu > enabling it by default too. > > > > and add checks to sanity.bbclass to this effect since someone might > > try using a sstate feed with a bad clock. This would cause no end of > > problems, not least with corporate filewalls > > Every non-trivial network has local ntp servers which are used by clients > there. Most of the time I get email from people who are technically clued up on decent non-trivial networks (say Intel). The amount of emails I get where the sender's machine's clock is way out is surprisingly high. That alone suggests that relying on people to setup ntp is not going to work. > > and hurt usability of the project > > How common is the distributed autobuilder setup? How many of these > installations do not use ntp? The local.conf.sample details how a user can use a public sstate feed. If at the same time they don't ensure their clock is correct, things will break in unexpected and nasty ways. The yocto autobuilder infrastructure had some misconfiguration and failed due to ntp not working properly and the clocks going out of sync. Having seen the problem there, spent many hours tracking it down and asked for it to get fixed, the Intel autobuilders then suffered exactly the same issue for different reasons (effectively firewalls again though). All the evidence I have says relying on working ntp setups is not good enough in the real world much as you, I and others wish it were so. There is a simple and easy way to avoid this problem with tar -m so I think we have a good justification for that. Cheers, Richard
Patch
diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass index cbb14e1..d7c4e11 100644 --- a/meta/classes/sstate.bbclass +++ b/meta/classes/sstate.bbclass @@ -551,7 +551,7 @@ sstate_create_package () { sstate_unpack_package () { mkdir -p ${SSTATE_INSTDIR} cd ${SSTATE_INSTDIR} - tar -xmvzf ${SSTATE_PKG} + tar -xvzf ${SSTATE_PKG} } # Need to inject information about classes not in the global configuration scope
When packages are recreated after a 'bitbake -c clean', files will get wrong date because tar has been invoked with the '-m' option. Correct timestamps are useful for bug hunting and there are better ways (e.g. using of ntp) than using '-m'. Signed-off-by: Enrico Scholz <enrico.scholz@sigma-chemnitz.de> --- meta/classes/sstate.bbclass | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)