Patchwork populate_sdk_base: repeat to tar archive file five time

login
register
mail settings
Submitter rongqing.li@windriver.com
Date Oct. 16, 2013, 5:53 a.m.
Message ID <1381902795-3187-1-git-send-email-rongqing.li@windriver.com>
Download mbox | patch
Permalink /patch/59989/
State New
Headers show

Comments

rongqing.li@windriver.com - Oct. 16, 2013, 5:53 a.m.
From: Roy Li <rongqing.li@windriver.com>

[YOCTO #5287]

tar failed and reported that file changed as we read it, now
we workaround it

Signed-off-by: Roy Li <rongqing.li@windriver.com>
---
 meta/classes/populate_sdk_base.bbclass |   16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)
Otavio Salvador - Oct. 16, 2013, 6:24 a.m.
On Wed, Oct 16, 2013 at 2:53 AM,  <rongqing.li@windriver.com> wrote:
> From: Roy Li <rongqing.li@windriver.com>
>
> [YOCTO #5287]
>
> tar failed and reported that file changed as we read it, now
> we workaround it
>
> Signed-off-by: Roy Li <rongqing.li@windriver.com>

You must be kidding right?! loop 5 times?!? why not fix the root cause
of the change?
rongqing.li@windriver.com - Oct. 16, 2013, 6:34 a.m.
On 10/16/2013 02:24 PM, Otavio Salvador wrote:
> On Wed, Oct 16, 2013 at 2:53 AM,  <rongqing.li@windriver.com> wrote:
>> From: Roy Li <rongqing.li@windriver.com>
>>
>> [YOCTO #5287]
>>
>> tar failed and reported that file changed as we read it, now
>> we workaround it
>>
>> Signed-off-by: Roy Li <rongqing.li@windriver.com>
>
> You must be kidding right?! loop 5 times?!? why not fix the root cause
> of the change?
>

Sorry, I do not know the root cause, and I see many people spent
lots of efforts to investigate, but do not find the root cause,
sometime we suspect it is the building servers kernel issue,
if it is true, we can not fix the building servers, we only
workaround the code.
Otavio Salvador - Oct. 16, 2013, 6:37 a.m.
On Wed, Oct 16, 2013 at 3:34 AM, Rongqing Li <rongqing.li@windriver.com> wrote:
>
>
> On 10/16/2013 02:24 PM, Otavio Salvador wrote:
>>
>> On Wed, Oct 16, 2013 at 2:53 AM,  <rongqing.li@windriver.com> wrote:
>>>
>>> From: Roy Li <rongqing.li@windriver.com>
>>>
>>> [YOCTO #5287]
>>>
>>> tar failed and reported that file changed as we read it, now
>>> we workaround it
>>>
>>> Signed-off-by: Roy Li <rongqing.li@windriver.com>
>>
>>
>> You must be kidding right?! loop 5 times?!? why not fix the root cause
>> of the change?
>>
>
> Sorry, I do not know the root cause, and I see many people spent
> lots of efforts to investigate, but do not find the root cause,
> sometime we suspect it is the building servers kernel issue,
> if it is true, we can not fix the building servers, we only
> workaround the code.
Otavio Salvador - Oct. 16, 2013, 6:39 a.m.
On Wed, Oct 16, 2013 at 3:34 AM, Rongqing Li <rongqing.li@windriver.com> wrote:
>
>
> On 10/16/2013 02:24 PM, Otavio Salvador wrote:
>>
>> On Wed, Oct 16, 2013 at 2:53 AM,  <rongqing.li@windriver.com> wrote:
>>>
>>> From: Roy Li <rongqing.li@windriver.com>
>>>
>>> [YOCTO #5287]
>>>
>>> tar failed and reported that file changed as we read it, now
>>> we workaround it
>>>
>>> Signed-off-by: Roy Li <rongqing.li@windriver.com>
>>
>>
>> You must be kidding right?! loop 5 times?!? why not fix the root cause
>> of the change?
>>
>
> Sorry, I do not know the root cause, and I see many people spent
> lots of efforts to investigate, but do not find the root cause,
> sometime we suspect it is the building servers kernel issue,
> if it is true, we can not fix the building servers, we only
> workaround the code.

The only possible cause is it starting to prepare the tar /before/ all
packages are unpack. This would be a missing dependency against a task
or something like that.

How can I reproduce the issue?
rongqing.li@windriver.com - Oct. 16, 2013, 6:49 a.m.
On 10/16/2013 02:39 PM, Otavio Salvador wrote:
> On Wed, Oct 16, 2013 at 3:34 AM, Rongqing Li <rongqing.li@windriver.com> wrote:
>>
>>
>> On 10/16/2013 02:24 PM, Otavio Salvador wrote:
>>>
>>> On Wed, Oct 16, 2013 at 2:53 AM,  <rongqing.li@windriver.com> wrote:
>>>>
>>>> From: Roy Li <rongqing.li@windriver.com>
>>>>
>>>> [YOCTO #5287]
>>>>
>>>> tar failed and reported that file changed as we read it, now
>>>> we workaround it
>>>>
>>>> Signed-off-by: Roy Li <rongqing.li@windriver.com>
>>>
>>>
>>> You must be kidding right?! loop 5 times?!? why not fix the root cause
>>> of the change?
>>>
>>
>> Sorry, I do not know the root cause, and I see many people spent
>> lots of efforts to investigate, but do not find the root cause,
>> sometime we suspect it is the building servers kernel issue,
>> if it is true, we can not fix the building servers, we only
>> workaround the code.
>
> The only possible cause is it starting to prepare the tar /before/ all
> packages are unpack. This would be a missing dependency against a task
> or something like that.
>
> How can I reproduce the issue?
>

Describe in the bugzillar:

https://bugzilla.yoctoproject.org/show_bug.cgi?id=4757
Otavio Salvador - Oct. 16, 2013, 7:37 a.m.
On Wed, Oct 16, 2013 at 3:49 AM, Rongqing Li <rongqing.li@windriver.com> wrote:
>
>
> On 10/16/2013 02:39 PM, Otavio Salvador wrote:
>>
>> On Wed, Oct 16, 2013 at 3:34 AM, Rongqing Li <rongqing.li@windriver.com>
>> wrote:
>>>
>>>
>>>
>>> On 10/16/2013 02:24 PM, Otavio Salvador wrote:
>>>>
>>>>
>>>> On Wed, Oct 16, 2013 at 2:53 AM,  <rongqing.li@windriver.com> wrote:
>>>>>
>>>>>
>>>>> From: Roy Li <rongqing.li@windriver.com>
>>>>>
>>>>> [YOCTO #5287]
>>>>>
>>>>> tar failed and reported that file changed as we read it, now
>>>>> we workaround it
>>>>>
>>>>> Signed-off-by: Roy Li <rongqing.li@windriver.com>
>>>>
>>>>
>>>>
>>>> You must be kidding right?! loop 5 times?!? why not fix the root cause
>>>> of the change?
>>>>
>>>
>>> Sorry, I do not know the root cause, and I see many people spent
>>> lots of efforts to investigate, but do not find the root cause,
>>> sometime we suspect it is the building servers kernel issue,
>>> if it is true, we can not fix the building servers, we only
>>> workaround the code.
>>
>>
>> The only possible cause is it starting to prepare the tar /before/ all
>> packages are unpack. This would be a missing dependency against a task
>> or something like that.
>>
>> How can I reproduce the issue?
>>
>
> Describe in the bugzillar:
>
> https://bugzilla.yoctoproject.org/show_bug.cgi?id=4757

This seems quote system specific.

I am wondering if this could be 'workarounded' issuing a sync before
running the tar.
Richard Purdie - Oct. 16, 2013, 12:12 p.m.
On Wed, 2013-10-16 at 14:34 +0800, Rongqing Li wrote:
> 
> On 10/16/2013 02:24 PM, Otavio Salvador wrote:
> > On Wed, Oct 16, 2013 at 2:53 AM,  <rongqing.li@windriver.com> wrote:
> >> From: Roy Li <rongqing.li@windriver.com>
> >>
> >> [YOCTO #5287]
> >>
> >> tar failed and reported that file changed as we read it, now
> >> we workaround it
> >>
> >> Signed-off-by: Roy Li <rongqing.li@windriver.com>
> >
> > You must be kidding right?! loop 5 times?!? why not fix the root cause
> > of the change?
> >
> 
> Sorry, I do not know the root cause, and I see many people spent
> lots of efforts to investigate, but do not find the root cause,
> sometime we suspect it is the building servers kernel issue,
> if it is true, we can not fix the building servers, we only
> workaround the code.

This workaround is not going into master, its horrid. Do we know which
versions of the kernel on the server have the issue. I'd much rather
tell people to fix their broken filesystems for example and refuse to
run on them.

Cheers,

Richard
rongqing.li@windriver.com - Oct. 17, 2013, 2:01 a.m.
On 10/16/2013 08:12 PM, Richard Purdie wrote:
> On Wed, 2013-10-16 at 14:34 +0800, Rongqing Li wrote:
>>
>> On 10/16/2013 02:24 PM, Otavio Salvador wrote:
>>> On Wed, Oct 16, 2013 at 2:53 AM,  <rongqing.li@windriver.com> wrote:
>>>> From: Roy Li <rongqing.li@windriver.com>
>>>>
>>>> [YOCTO #5287]
>>>>
>>>> tar failed and reported that file changed as we read it, now
>>>> we workaround it
>>>>
>>>> Signed-off-by: Roy Li <rongqing.li@windriver.com>
>>>
>>> You must be kidding right?! loop 5 times?!? why not fix the root cause
>>> of the change?
>>>
>>
>> Sorry, I do not know the root cause, and I see many people spent
>> lots of efforts to investigate, but do not find the root cause,
>> sometime we suspect it is the building servers kernel issue,
>> if it is true, we can not fix the building servers, we only
>> workaround the code.
>
> This workaround is not going into master, its horrid. Do we know which
> versions of the kernel on the server have the issue. I'd much rather
> tell people to fix their broken filesystems for example and refuse to
> run on them.
>

I saw it happened on CentOS 5.9 and RedHat 5.5, and We deployed lots
of them, it is hard to replace them.

The bug did not happen everytime, only intermittently, this workaround
is ugly, but it is better than fixing the servers, or declaring them
as unsupported server.


-Roy



> Cheers,
>
> Richard
>
>
>
Zhenhua Luo - Oct. 17, 2013, 8:52 a.m.
It is worth to give a try to add PSEUDO_UNLOAD=1 sync before tar. The issue doesn't appear on my CentOS 5.9 after doing that. 


Best Regards,

Zhenhua


> -----Original Message-----
> From: openembedded-core-bounces@lists.openembedded.org
> [mailto:openembedded-core-bounces@lists.openembedded.org] On Behalf Of
> Rongqing Li
> Sent: Thursday, October 17, 2013 10:02 AM
> To: Richard Purdie
> Cc: Otavio Salvador; Patches and discussions about the oe-core layer
> Subject: Re: [OE-core] [PATCH] populate_sdk_base: repeat to tar archive
> file five time
> 
> 
> 
> On 10/16/2013 08:12 PM, Richard Purdie wrote:
> > On Wed, 2013-10-16 at 14:34 +0800, Rongqing Li wrote:
> >>
> >> On 10/16/2013 02:24 PM, Otavio Salvador wrote:
> >>> On Wed, Oct 16, 2013 at 2:53 AM,  <rongqing.li@windriver.com> wrote:
> >>>> From: Roy Li <rongqing.li@windriver.com>
> >>>>
> >>>> [YOCTO #5287]
> >>>>
> >>>> tar failed and reported that file changed as we read it, now we
> >>>> workaround it
> >>>>
> >>>> Signed-off-by: Roy Li <rongqing.li@windriver.com>
> >>>
> >>> You must be kidding right?! loop 5 times?!? why not fix the root
> >>> cause of the change?
> >>>
> >>
> >> Sorry, I do not know the root cause, and I see many people spent lots
> >> of efforts to investigate, but do not find the root cause, sometime
> >> we suspect it is the building servers kernel issue, if it is true, we
> >> can not fix the building servers, we only workaround the code.
> >
> > This workaround is not going into master, its horrid. Do we know which
> > versions of the kernel on the server have the issue. I'd much rather
> > tell people to fix their broken filesystems for example and refuse to
> > run on them.
> >
> 
> I saw it happened on CentOS 5.9 and RedHat 5.5, and We deployed lots of
> them, it is hard to replace them.
> 
> The bug did not happen everytime, only intermittently, this workaround is
> ugly, but it is better than fixing the servers, or declaring them as
> unsupported server.
> 
> 
> -Roy
> 
> 
> 
> > Cheers,
> >
> > Richard
> >
> >
> >
> 
> --
> Best Reagrds,
> Roy | RongQing Li
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/openembedded-core
Otavio Salvador - Oct. 17, 2013, 11:58 a.m.
On Wed, Oct 16, 2013 at 11:01 PM, Rongqing Li <rongqing.li@windriver.com> wrote:
> On 10/16/2013 08:12 PM, Richard Purdie wrote:
>>
>> On Wed, 2013-10-16 at 14:34 +0800, Rongqing Li wrote:
>>>
>>>
>>> On 10/16/2013 02:24 PM, Otavio Salvador wrote:
>>>>
>>>> On Wed, Oct 16, 2013 at 2:53 AM,  <rongqing.li@windriver.com> wrote:
>>>>>
>>>>> From: Roy Li <rongqing.li@windriver.com>
>>>>>
>>>>> [YOCTO #5287]
>>>>>
>>>>> tar failed and reported that file changed as we read it, now
>>>>> we workaround it
>>>>>
>>>>> Signed-off-by: Roy Li <rongqing.li@windriver.com>
>>>>
>>>>
>>>> You must be kidding right?! loop 5 times?!? why not fix the root cause
>>>> of the change?
>>>>
>>>
>>> Sorry, I do not know the root cause, and I see many people spent
>>> lots of efforts to investigate, but do not find the root cause,
>>> sometime we suspect it is the building servers kernel issue,
>>> if it is true, we can not fix the building servers, we only
>>> workaround the code.
>>
>>
>> This workaround is not going into master, its horrid. Do we know which
>> versions of the kernel on the server have the issue. I'd much rather
>> tell people to fix their broken filesystems for example and refuse to
>> run on them.
>>
>
> I saw it happened on CentOS 5.9 and RedHat 5.5, and We deployed lots
> of them, it is hard to replace them.
>
> The bug did not happen everytime, only intermittently, this workaround
> is ugly, but it is better than fixing the servers, or declaring them
> as unsupported server.

This workaround may make it hard to happen but it is not a fix and
nothing guarantee it is enough for all cases; so I think forcing tar
to rerun for all machine because of two broken distributions is not an
option. We need to find other way to fix this.
rongqing.li@windriver.com - Oct. 18, 2013, 5:33 a.m.
On 10/17/2013 04:52 PM, Luo Zhenhua-B19537 wrote:
> It is worth to give a try to add PSEUDO_UNLOAD=1 sync before tar. The issue doesn't appear on my CentOS 5.9 after doing that.
>
>
> Best Regards,


Thanks, I will test it


-Roy

>
> Zhenhua
>
>
>> -----Original Message-----
>> From: openembedded-core-bounces@lists.openembedded.org
>> [mailto:openembedded-core-bounces@lists.openembedded.org] On Behalf Of
>> Rongqing Li
>> Sent: Thursday, October 17, 2013 10:02 AM
>> To: Richard Purdie
>> Cc: Otavio Salvador; Patches and discussions about the oe-core layer
>> Subject: Re: [OE-core] [PATCH] populate_sdk_base: repeat to tar archive
>> file five time
>>
>>
>>
>> On 10/16/2013 08:12 PM, Richard Purdie wrote:
>>> On Wed, 2013-10-16 at 14:34 +0800, Rongqing Li wrote:
>>>>
>>>> On 10/16/2013 02:24 PM, Otavio Salvador wrote:
>>>>> On Wed, Oct 16, 2013 at 2:53 AM,  <rongqing.li@windriver.com> wrote:
>>>>>> From: Roy Li <rongqing.li@windriver.com>
>>>>>>
>>>>>> [YOCTO #5287]
>>>>>>
>>>>>> tar failed and reported that file changed as we read it, now we
>>>>>> workaround it
>>>>>>
>>>>>> Signed-off-by: Roy Li <rongqing.li@windriver.com>
>>>>>
>>>>> You must be kidding right?! loop 5 times?!? why not fix the root
>>>>> cause of the change?
>>>>>
>>>>
>>>> Sorry, I do not know the root cause, and I see many people spent lots
>>>> of efforts to investigate, but do not find the root cause, sometime
>>>> we suspect it is the building servers kernel issue, if it is true, we
>>>> can not fix the building servers, we only workaround the code.
>>>
>>> This workaround is not going into master, its horrid. Do we know which
>>> versions of the kernel on the server have the issue. I'd much rather
>>> tell people to fix their broken filesystems for example and refuse to
>>> run on them.
>>>
>>
>> I saw it happened on CentOS 5.9 and RedHat 5.5, and We deployed lots of
>> them, it is hard to replace them.
>>
>> The bug did not happen everytime, only intermittently, this workaround is
>> ugly, but it is better than fixing the servers, or declaring them as
>> unsupported server.
>>
>>
>> -Roy
>>
>>
>>
>>> Cheers,
>>>
>>> Richard
>>>
>>>
>>>
>>
>> --
>> Best Reagrds,
>> Roy | RongQing Li
>> _______________________________________________
>> Openembedded-core mailing list
>> Openembedded-core@lists.openembedded.org
>> http://lists.openembedded.org/mailman/listinfo/openembedded-core
>
>
>
>

Patch

diff --git a/meta/classes/populate_sdk_base.bbclass b/meta/classes/populate_sdk_base.bbclass
index b7ea851..87dea7b 100644
--- a/meta/classes/populate_sdk_base.bbclass
+++ b/meta/classes/populate_sdk_base.bbclass
@@ -111,7 +111,21 @@  fakeroot tar_sdk() {
 	# Package it up
 	mkdir -p ${SDK_DEPLOY}
 	cd ${SDK_OUTPUT}/${SDKPATH}
-	tar ${SDKTAROPTS} -c --file=${SDK_DEPLOY}/${TOOLCHAIN_OUTPUTNAME}.tar.bz2 .
+	set +e
+	count=0
+	while true; do
+		tar ${SDKTAROPTS} -c --file=${SDK_DEPLOY}/${TOOLCHAIN_OUTPUTNAME}.tar.bz2 .
+		if [ $? -eq 0 ] ; then
+			set -e
+			exit 0
+		fi
+		count=`expr $count + 1`
+		rm -rf ${SDK_DEPLOY}/${TOOLCHAIN_OUTPUTNAME}.tar.bz2
+		if [ $count -eq 5 ] ; then
+			set -e
+			exit 1
+		fi
+	done
 }
 
 fakeroot create_shar() {