Patchwork sstate.bbclass: preserve time when unstaging files

login
register
mail settings
Submitter Enrico Scholz
Date Oct. 29, 2012, 3:11 p.m.
Message ID <1351523465-26489-1-git-send-email-enrico.scholz@sigma-chemnitz.de>
Download mbox | patch
Permalink /patch/38665/
State New
Headers show

Comments

Enrico Scholz - Oct. 29, 2012, 3:11 p.m.
When packages are recreated after a 'bitbake -c clean', files will get
wrong date because tar has been invoked with the '-m' option.

Correct timestamps are useful for bug hunting and there are better
ways (e.g. using of ntp) than using '-m'.

Signed-off-by: Enrico Scholz <enrico.scholz@sigma-chemnitz.de>
---
 meta/classes/sstate.bbclass | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Richard Purdie - Oct. 29, 2012, 4:11 p.m.
On Mon, 2012-10-29 at 16:11 +0100, Enrico Scholz wrote:
> When packages are recreated after a 'bitbake -c clean', files will get
> wrong date because tar has been invoked with the '-m' option.
> 
> Correct timestamps are useful for bug hunting and there are better
> ways (e.g. using of ntp) than using '-m'.
> 
> Signed-off-by: Enrico Scholz <enrico.scholz@sigma-chemnitz.de>
> ---
>  meta/classes/sstate.bbclass | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
> index cbb14e1..d7c4e11 100644
> --- a/meta/classes/sstate.bbclass
> +++ b/meta/classes/sstate.bbclass
> @@ -551,7 +551,7 @@ sstate_create_package () {
>  sstate_unpack_package () {
>  	mkdir -p ${SSTATE_INSTDIR}
>  	cd ${SSTATE_INSTDIR}
> -	tar -xmvzf ${SSTATE_PKG}
> +	tar -xvzf ${SSTATE_PKG}
>  }
>  
>  # Need to inject information about classes not in the global configuration scope

This is a revert of:

http://git.yoctoproject.org/cgit.cgi/poky/commit/?id=2d89cff42af2bb0049224bfaaebaa2b21966169f

where the option was added deliberately to deal with time mismatch
between autobuilders which was causing real world bugs.

Cheers,

Richard
Enrico Scholz - Oct. 29, 2012, 4:24 p.m.
Richard Purdie <richard.purdie@linuxfoundation.org> writes:

>> When packages are recreated after a 'bitbake -c clean', files will get
>> wrong date because tar has been invoked with the '-m' option.
>> 
>> Correct timestamps are useful for bug hunting and there are better
>> ways (e.g. using of ntp) than using '-m'.
>
> This is a revert of:
>
> http://git.yoctoproject.org/cgit.cgi/poky/commit/?id=2d89cff42af2bb0049224bfaaebaa2b21966169f
>
> where the option was added deliberately to deal with time mismatch
> between autobuilders which was causing real world bugs.

But the real bug is the time mismatch in the autobuilders, isn't it?
And this can/should be solved by synchronizing time by ntp on them
instead of applying dirty hacks like resetting file dates.


Enrico
Richard Purdie - Oct. 29, 2012, 5:22 p.m.
On Mon, 2012-10-29 at 17:24 +0100, Enrico Scholz wrote:
> Richard Purdie <richard.purdie@linuxfoundation.org> writes:
> 
> >> When packages are recreated after a 'bitbake -c clean', files will get
> >> wrong date because tar has been invoked with the '-m' option.
> >> 
> >> Correct timestamps are useful for bug hunting and there are better
> >> ways (e.g. using of ntp) than using '-m'.
> >
> > This is a revert of:
> >
> > http://git.yoctoproject.org/cgit.cgi/poky/commit/?id=2d89cff42af2bb0049224bfaaebaa2b21966169f
> >
> > where the option was added deliberately to deal with time mismatch
> > between autobuilders which was causing real world bugs.
> 
> But the real bug is the time mismatch in the autobuilders, isn't it?
> And this can/should be solved by synchronizing time by ntp on them
> instead of applying dirty hacks like resetting file dates.

I have asked that ntp be installed/fixed on the autobuilders to sort the
problem out but it seems that even with ntp running, mismatches can
happen (e.g. misconfigured timezones). Worse, when this does happen the
failures are extremely unpredictable and hard to debug. It causes things
to repeatedly recompile for example, even during do_install.

So no, I don't think this is a dirty hack, its part of ensuring the
builds are deterministic and helping people avoid what can be a very
nasty and hard to debug set of build issues.

I appreciate it hurts some other debugging forensics but I'd rather that
than anyone suffering some of the nasty build failures I debugged.

Cheers,

Richard
Enrico Scholz - Oct. 29, 2012, 5:59 p.m.
Richard Purdie <richard.purdie@linuxfoundation.org> writes:

>> > where the option was added deliberately to deal with time mismatch
>> > between autobuilders which was causing real world bugs.
>> 
>> But the real bug is the time mismatch in the autobuilders, isn't it?
>> And this can/should be solved by synchronizing time by ntp on them
>> instead of applying dirty hacks like resetting file dates.
>
> I have asked that ntp be installed/fixed on the autobuilders to sort the
> problem out but it seems that even with ntp running, mismatches can
> happen (e.g. misconfigured timezones).

Really? The only timezone related problems might arise when a package
makes 'touch -d <date>' during build.  But I can not remember that I
have ever seen such a package and this problem should be easy to
localize.

Else, timezone configuration is completely uninteresting for comparing
timestamps or for setting time from ntp.


> Worse, when this does happen the failures are extremely unpredictable
> and hard to debug. It causes things to repeatedly recompile for example,
> even during do_install.

How comes do_install() into the game?  'populate-lic' are the only
sstate files created before do_install() and I can not imagine how they
affect the other build phases; do_compile() results are not sstated
(and with the 'tar -m' thing, timestamps get havoc completely causing
unpredictable rebuilds).

When do_install() is executed, all the following sstate files
(populate_sysroot, package) are invalidated and must be recreated.  So
your problem with do_install() sounds more like an incomplete/racy
cleanup of old files.



Enrico
Richard Purdie - Oct. 29, 2012, 6:19 p.m.
On Mon, 2012-10-29 at 18:59 +0100, Enrico Scholz wrote:
> Richard Purdie <richard.purdie@linuxfoundation.org> writes:
> 
> >> > where the option was added deliberately to deal with time mismatch
> >> > between autobuilders which was causing real world bugs.
> >> 
> >> But the real bug is the time mismatch in the autobuilders, isn't it?
> >> And this can/should be solved by synchronizing time by ntp on them
> >> instead of applying dirty hacks like resetting file dates.
> >
> > I have asked that ntp be installed/fixed on the autobuilders to sort the
> > problem out but it seems that even with ntp running, mismatches can
> > happen (e.g. misconfigured timezones).
> 
> Really? The only timezone related problems might arise when a package
> makes 'touch -d <date>' during build.  But I can not remember that I
> have ever seen such a package and this problem should be easy to
> localize.
> 
> Else, timezone configuration is completely uninteresting for comparing
> timestamps or for setting time from ntp.
>
> > Worse, when this does happen the failures are extremely unpredictable
> > and hard to debug. It causes things to repeatedly recompile for example,
> > even during do_install.
> 
> How comes do_install() into the game?  'populate-lic' are the only
> sstate files created before do_install() and I can not imagine how they
> affect the other build phases; do_compile() results are not sstated
> (and with the 'tar -m' thing, timestamps get havoc completely causing
> unpredictable rebuilds).
> 
> When do_install() is executed, all the following sstate files
> (populate_sysroot, package) are invalidated and must be recreated.  So
> your problem with do_install() sounds more like an incomplete/racy
> cleanup of old files.

Set the date stamp of some headers in the target sysroot of some key
system components (say glib) to a date about a day in the future, then
clean and rebuild some software that uses glib.

You will find that every time the recipe runs make, *everything* will
recompile. This will happen during do_install as well as do_compile and
if you recipe calls make for any other reason, it will recompile again.

Normally the builds actually tolerate this quite well, until you get
something like qt4 where the do_install environment isn't setup to
support compiling and then things explode in interesting ways.

Cheers,

Richard
Enrico Scholz - Oct. 29, 2012, 7 p.m.
Richard Purdie <richard.purdie@linuxfoundation.org> writes:

>> >> But the real bug is the time mismatch in the autobuilders, isn't it?
>> >> And this can/should be solved by synchronizing time by ntp on them
>> >> instead of applying dirty hacks like resetting file dates.
>> ...
>> > Worse, when this does happen the failures are extremely unpredictable
>> > and hard to debug. It causes things to repeatedly recompile for example,
>> > even during do_install.
> ...
> Set the date stamp of some headers in the target sysroot of some key
> system components (say glib) to a date about a day in the future,

Are there really packages which create files dated in the future?
Perhaps a sanity check should be written which rejects files which are
newer than their containing directory and/or the time-of-day?


> then clean and rebuild some software that uses glib.

How will 'tar -m' fix this?  It makes things just worse because the
files generated with -m are always newer than without -m (in practice,
time offset between hosts served by ntp is far below 100ms. which is
enough for the build stages doing the sstage file extraction).




Enrico
Richard Purdie - Oct. 29, 2012, 9:20 p.m.
On Mon, 2012-10-29 at 20:00 +0100, Enrico Scholz wrote:
> Richard Purdie <richard.purdie@linuxfoundation.org> writes:
> 
> >> >> But the real bug is the time mismatch in the autobuilders, isn't it?
> >> >> And this can/should be solved by synchronizing time by ntp on them
> >> >> instead of applying dirty hacks like resetting file dates.
> >> ...
> >> > Worse, when this does happen the failures are extremely unpredictable
> >> > and hard to debug. It causes things to repeatedly recompile for example,
> >> > even during do_install.
> > ...
> > Set the date stamp of some headers in the target sysroot of some key
> > system components (say glib) to a date about a day in the future,
> 
> Are there really packages which create files dated in the future?
> Perhaps a sanity check should be written which rejects files which are
> newer than their containing directory and/or the time-of-day?
>
> > then clean and rebuild some software that uses glib.
> 
> How will 'tar -m' fix this?  It makes things just worse because the
> files generated with -m are always newer than without -m (in practice,
> time offset between hosts served by ntp is far below 100ms. which is
> enough for the build stages doing the sstage file extraction).

Imagine system A generates the sysroot headers with a time ahead of
system B. These are packaged up into an sstate tarball. System B which
has a clock at some time behind system A then downloads and uses them so
the sysroot headers become some time in the future. You then see the
problem I described previously.

tar -m fixes this by timestamping things at the time of extraction,
thereby removing any issue of the timestamps being in the future. Yes,
we could add a step which iterated over the extracted files and checked
to see if any were in the future and if so, change their timestamps but
it seems a bit overkill when the option to tar resolves all the
problems.

The alternative is to mandate *every* system that builds are run on use
ntp and add checks to sanity.bbclass to this effect since someone might
try using a sstate feed with a bad clock. This would cause no end of
problems, not least with corporate filewalls and hurt usability of the
project so we took the other option which fixes things in a way this
should become a non-issue.

Cheers,

Richard
Enrico Scholz - Oct. 29, 2012, 9:41 p.m.
Richard Purdie <richard.purdie@linuxfoundation.org> writes:

>> >> >> But the real bug is the time mismatch in the autobuilders, isn't it?
>> >> >> And this can/should be solved by synchronizing time by ntp on them
>> >> >> instead of applying dirty hacks like resetting file dates.
> ...
> Imagine system A generates the sysroot headers with a time ahead of
> system B. These are packaged up into an sstate tarball. System B which
> has a clock at some time behind system A then downloads and uses them
> so the sysroot headers become some time in the future.

This can not happen when both machines are synchronizing their time with
ntp.  Drift to stratum-1 machine is usually <1ms in local networks and
<50ms for remote ones (--> see 'ntpq' -> pe output).

Nothing, which can cause the problem described by you.


> The alternative is to mandate *every* system that builds are run on
> use ntp

Yes; a common timesource is mandatory for so nearly every distributed
system.  Even windoze enables (s)ntp clients by default (although its
daily synchronization is just a bad joke) and I remember Fedora/Ubuntu
enabling it by default too.


> and add checks to sanity.bbclass to this effect since someone might
> try using a sstate feed with a bad clock. This would cause no end of
> problems, not least with corporate filewalls

Every non-trivial network has local ntp servers which are used by clients
there.


> and hurt usability of the project

How common is the distributed autobuilder setup?  How many of these
installations do not use ntp?



Enrico
Richard Purdie - Oct. 29, 2012, 9:55 p.m.
On Mon, 2012-10-29 at 22:41 +0100, Enrico Scholz wrote:
> Richard Purdie <richard.purdie@linuxfoundation.org> writes:
> 
> >> >> >> But the real bug is the time mismatch in the autobuilders, isn't it?
> >> >> >> And this can/should be solved by synchronizing time by ntp on them
> >> >> >> instead of applying dirty hacks like resetting file dates.
> > ...
> > Imagine system A generates the sysroot headers with a time ahead of
> > system B. These are packaged up into an sstate tarball. System B which
> > has a clock at some time behind system A then downloads and uses them
> > so the sysroot headers become some time in the future.
> 
> This can not happen when both machines are synchronizing their time with
> ntp.  Drift to stratum-1 machine is usually <1ms in local networks and
> <50ms for remote ones (--> see 'ntpq' -> pe output).
> 
> Nothing, which can cause the problem described by you.

I have already agreed that this cannot happen if the machines are in
good sync via ntp, I'm not arguing with that.

> > The alternative is to mandate *every* system that builds are run on
> > use ntp
> 
> Yes; a common timesource is mandatory for so nearly every distributed
> system.  Even windoze enables (s)ntp clients by default (although its
> daily synchronization is just a bad joke) and I remember Fedora/Ubuntu
> enabling it by default too.
> 
> 
> > and add checks to sanity.bbclass to this effect since someone might
> > try using a sstate feed with a bad clock. This would cause no end of
> > problems, not least with corporate filewalls
> 
> Every non-trivial network has local ntp servers which are used by clients
> there.

Most of the time I get email from people who are technically clued up on
decent non-trivial networks (say Intel). The amount of emails I get
where the sender's machine's clock is way out is surprisingly high.

That alone suggests that relying on people to setup ntp is not going to
work.

> > and hurt usability of the project
> 
> How common is the distributed autobuilder setup?  How many of these
> installations do not use ntp?

The local.conf.sample details how a user can use a public sstate feed.
If at the same time they don't ensure their clock is correct, things
will break in unexpected and nasty ways.

The yocto autobuilder infrastructure had some misconfiguration and
failed due to ntp not working properly and the clocks going out of sync.
Having seen the problem there, spent many hours tracking it down and
asked for it to get fixed, the Intel autobuilders then suffered exactly
the same issue for different reasons (effectively firewalls again
though). All the evidence I have says relying on working ntp setups is
not good enough in the real world much as you, I and others wish it were
so.

There is a simple and easy way to avoid this problem with tar -m so I
think we have a good justification for that.

Cheers,

Richard

Patch

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index cbb14e1..d7c4e11 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -551,7 +551,7 @@  sstate_create_package () {
 sstate_unpack_package () {
 	mkdir -p ${SSTATE_INSTDIR}
 	cd ${SSTATE_INSTDIR}
-	tar -xmvzf ${SSTATE_PKG}
+	tar -xvzf ${SSTATE_PKG}
 }
 
 # Need to inject information about classes not in the global configuration scope