Patchwork [PATCHv4] scripts/sstate-sysroot-cruft.sh: add simple script to find files in sysroots not tracked by sstate

login
register
mail settings
Submitter Martin Jansa
Date Nov. 15, 2012, 7:30 a.m.
Message ID <1352964649-31187-1-git-send-email-Martin.Jansa@gmail.com>
Download mbox | patch
Permalink /patch/39103/
State Superseded, archived
Headers show

Comments

Martin Jansa - Nov. 15, 2012, 7:30 a.m.
* it's not very universal, but works with default oe-core setup and
  shows basic HOW-TO. It can be improved later.

Signed-off-by: Martin Jansa <Martin.Jansa@gmail.com>
---
 V2: added .pyo to WHITELIST
     shorter filenames
     TMPDIR
     added duplicates but not shown

 V3: use also populate-sysroot.MACHINE, manifest name for populate-sysroot
     was changed in febeaf3d1b8917b660c7279b008d8b03337568e9

 V4: dropped eglibc-initial work around, it was fixed in oe-core
                                                                                                                                                              
 scripts/sstate-sysroot-cruft.sh | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)
 create mode 100755 scripts/sstate-sysroot-cruft.sh
Chris Larson - Nov. 15, 2012, 3:25 p.m.
On Thu, Nov 15, 2012 at 12:30 AM, Martin Jansa <martin.jansa@gmail.com>wrote:

> * it's not very universal, but works with default oe-core setup and
>   shows basic HOW-TO. It can be improved later.
>
> Signed-off-by: Martin Jansa <Martin.Jansa@gmail.com>
> ---
>  V2: added .pyo to WHITELIST
>      shorter filenames
>      TMPDIR
>      added duplicates but not shown
>
>  V3: use also populate-sysroot.MACHINE, manifest name for populate-sysroot
>      was changed in febeaf3d1b8917b660c7279b008d8b03337568e9
>
>  V4: dropped eglibc-initial work around, it was fixed in oe-core
>
>  scripts/sstate-sysroot-cruft.sh | 34 ++++++++++++++++++++++++++++++++++
>  1 file changed, 34 insertions(+)
>  create mode 100755 scripts/sstate-sysroot-cruft.sh
>
> diff --git a/scripts/sstate-sysroot-cruft.sh
> b/scripts/sstate-sysroot-cruft.sh
> new file mode 100755
> index 0000000..ca23dcf
> --- /dev/null
> +++ b/scripts/sstate-sysroot-cruft.sh
> @@ -0,0 +1,34 @@
> +#!/bin/sh
> +
> +# Used to find files installed in sysroot which are not tracked by sstate
> manifest
> +# Update BASE
> +
> +BASE="/OE/oe-core"
>

This seems interesting, but I have a few comments/concerns.

1) don't hardcode BASE, figure out the path relative to the script's
location, e.g. BASE="$(cd $(dirname $(dirname $0)) && pwd)"
2) output files shouldn't go into oe-core directly, as oe-core isn't
guaranteed to be writable, and it's more common to expect output from a
script like this to go relative to the current directory, or a temp
directory
3) extract TMPDIR from bitbake -e, rather than hardcoding that, as that
breaks for any distros or users which separate their tmpdirs by distro, or
set TCLIBCAPPEND = ""
Martin Jansa - Nov. 15, 2012, 4:19 p.m.
On Thu, Nov 15, 2012 at 08:25:20AM -0700, Chris Larson wrote:
> On Thu, Nov 15, 2012 at 12:30 AM, Martin Jansa <martin.jansa@gmail.com>wrote:
> 
> > * it's not very universal, but works with default oe-core setup and
> >   shows basic HOW-TO. It can be improved later.
> >
> > Signed-off-by: Martin Jansa <Martin.Jansa@gmail.com>
> > ---
> >  V2: added .pyo to WHITELIST
> >      shorter filenames
> >      TMPDIR
> >      added duplicates but not shown
> >
> >  V3: use also populate-sysroot.MACHINE, manifest name for populate-sysroot
> >      was changed in febeaf3d1b8917b660c7279b008d8b03337568e9
> >
> >  V4: dropped eglibc-initial work around, it was fixed in oe-core
> >
> >  scripts/sstate-sysroot-cruft.sh | 34 ++++++++++++++++++++++++++++++++++
> >  1 file changed, 34 insertions(+)
> >  create mode 100755 scripts/sstate-sysroot-cruft.sh
> >
> > diff --git a/scripts/sstate-sysroot-cruft.sh
> > b/scripts/sstate-sysroot-cruft.sh
> > new file mode 100755
> > index 0000000..ca23dcf
> > --- /dev/null
> > +++ b/scripts/sstate-sysroot-cruft.sh
> > @@ -0,0 +1,34 @@
> > +#!/bin/sh
> > +
> > +# Used to find files installed in sysroot which are not tracked by sstate
> > manifest
> > +# Update BASE
> > +
> > +BASE="/OE/oe-core"
> >
> 
> This seems interesting, but I have a few comments/concerns.
> 
> 1) don't hardcode BASE, figure out the path relative to the script's
> location, e.g. BASE="$(cd $(dirname $(dirname $0)) && pwd)"
> 2) output files shouldn't go into oe-core directly, as oe-core isn't
> guaranteed to be writable, and it's more common to expect output from a
> script like this to go relative to the current directory, or a temp
> directory
> 3) extract TMPDIR from bitbake -e, rather than hardcoding that, as that
> breaks for any distros or users which separate their tmpdirs by distro, or
> set TCLIBCAPPEND = ""

Ah in this case BASE is not directory with oe-core layer but one above.

I didn't want to use bitbake -e, so that this script can be executed
when some other bitbake process is running.

As said it's not really universal (I wrote it just to confirm something
and then shared it to find if someone else finds it also usefull to
improve it).

What about moving output to TMPDIR and setting TMPDIR by parameter?

Cheers,
Chris Larson - Nov. 15, 2012, 4:27 p.m.
On Thu, Nov 15, 2012 at 9:19 AM, Martin Jansa <martin.jansa@gmail.com>wrote:

> On Thu, Nov 15, 2012 at 08:25:20AM -0700, Chris Larson wrote:
> > On Thu, Nov 15, 2012 at 12:30 AM, Martin Jansa <martin.jansa@gmail.com
> >wrote:
> >
> > > * it's not very universal, but works with default oe-core setup and
> > >   shows basic HOW-TO. It can be improved later.
> > >
> > > Signed-off-by: Martin Jansa <Martin.Jansa@gmail.com>
> > > ---
> > >  V2: added .pyo to WHITELIST
> > >      shorter filenames
> > >      TMPDIR
> > >      added duplicates but not shown
> > >
> > >  V3: use also populate-sysroot.MACHINE, manifest name for
> populate-sysroot
> > >      was changed in febeaf3d1b8917b660c7279b008d8b03337568e9
> > >
> > >  V4: dropped eglibc-initial work around, it was fixed in oe-core
> > >
> > >  scripts/sstate-sysroot-cruft.sh | 34
> ++++++++++++++++++++++++++++++++++
> > >  1 file changed, 34 insertions(+)
> > >  create mode 100755 scripts/sstate-sysroot-cruft.sh
> > >
> > > diff --git a/scripts/sstate-sysroot-cruft.sh
> > > b/scripts/sstate-sysroot-cruft.sh
> > > new file mode 100755
> > > index 0000000..ca23dcf
> > > --- /dev/null
> > > +++ b/scripts/sstate-sysroot-cruft.sh
> > > @@ -0,0 +1,34 @@
> > > +#!/bin/sh
> > > +
> > > +# Used to find files installed in sysroot which are not tracked by
> sstate
> > > manifest
> > > +# Update BASE
> > > +
> > > +BASE="/OE/oe-core"
> > >
> >
> > This seems interesting, but I have a few comments/concerns.
> >
> > 1) don't hardcode BASE, figure out the path relative to the script's
> > location, e.g. BASE="$(cd $(dirname $(dirname $0)) && pwd)"
> > 2) output files shouldn't go into oe-core directly, as oe-core isn't
> > guaranteed to be writable, and it's more common to expect output from a
> > script like this to go relative to the current directory, or a temp
> > directory
> > 3) extract TMPDIR from bitbake -e, rather than hardcoding that, as that
> > breaks for any distros or users which separate their tmpdirs by distro,
> or
> > set TCLIBCAPPEND = ""
>
> Ah in this case BASE is not directory with oe-core layer but one above.
>

Still trivial to determine relative to the script's location. Adding an
extra call to dirname, or an extra '/..' isn't particularly tough.

I didn't want to use bitbake -e, so that this script can be executed
> when some other bitbake process is running.
>

I don't see having to hardcode assumptions about the environment as a net
win, personally. At least make them arguments, or add an argument to
optionally use bitbake -e, or let it take vars from the environment.

As said it's not really universal (I wrote it just to confirm something
> and then shared it to find if someone else finds it also usefull to
> improve it).
>

If it can't be used in a wide variety of circumstances, I'd argue against
its inclusion in oe-core until such time that it's ready. The subject of
this thread didn't include "RFC' ;)
Paul Eggleton - Nov. 15, 2012, 4:49 p.m.
On Thursday 15 November 2012 17:19:07 Martin Jansa wrote:
> I didn't want to use bitbake -e, so that this script can be executed
> when some other bitbake process is running.

You really should not be trying to touch anything under TMPDIR let alone 
sstate files while bitbake is actually running against that same directory. I'm 
not sure why you would want to though...

Cheers,
Paul
Martin Jansa - Nov. 15, 2012, 5:03 p.m.
On Thu, Nov 15, 2012 at 04:49:01PM +0000, Paul Eggleton wrote:
> On Thursday 15 November 2012 17:19:07 Martin Jansa wrote:
> > I didn't want to use bitbake -e, so that this script can be executed
> > when some other bitbake process is running.
> 
> You really should not be trying to touch anything under TMPDIR let alone 
> sstate files while bitbake is actually running against that same directory. I'm 
> not sure why you would want to though...

While running builds for 8 MACHINEs in for loop I find it usefull to
test one sysroot (from 1st MACHINE already built) while keeping
build for remaining machines running for rest of night and sometimes
following days..

At least that was my use-case when I wrote this script..

Yes I could have execute build only for 1st machine, wait till it's
finished, then run script wait till it's finished and then start for
loop for remainig 7, but that does not allow me to fall asleep while
it's still building 1st :) or I have to plan it in advance and build
with much longer one-liner.

Cheers,
Martin Jansa - Nov. 15, 2012, 5:09 p.m.
On Thu, Nov 15, 2012 at 09:27:52AM -0700, Chris Larson wrote:
> On Thu, Nov 15, 2012 at 9:19 AM, Martin Jansa <martin.jansa@gmail.com>wrote:
> 
> > On Thu, Nov 15, 2012 at 08:25:20AM -0700, Chris Larson wrote:
> > > On Thu, Nov 15, 2012 at 12:30 AM, Martin Jansa <martin.jansa@gmail.com
> > >wrote:
> > >
> > > > * it's not very universal, but works with default oe-core setup and
> > > >   shows basic HOW-TO. It can be improved later.
> > > >
> > > > Signed-off-by: Martin Jansa <Martin.Jansa@gmail.com>
> > > > ---
> > > >  V2: added .pyo to WHITELIST
> > > >      shorter filenames
> > > >      TMPDIR
> > > >      added duplicates but not shown
> > > >
> > > >  V3: use also populate-sysroot.MACHINE, manifest name for
> > populate-sysroot
> > > >      was changed in febeaf3d1b8917b660c7279b008d8b03337568e9
> > > >
> > > >  V4: dropped eglibc-initial work around, it was fixed in oe-core
> > > >
> > > >  scripts/sstate-sysroot-cruft.sh | 34
> > ++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 34 insertions(+)
> > > >  create mode 100755 scripts/sstate-sysroot-cruft.sh
> > > >
> > > > diff --git a/scripts/sstate-sysroot-cruft.sh
> > > > b/scripts/sstate-sysroot-cruft.sh
> > > > new file mode 100755
> > > > index 0000000..ca23dcf
> > > > --- /dev/null
> > > > +++ b/scripts/sstate-sysroot-cruft.sh
> > > > @@ -0,0 +1,34 @@
> > > > +#!/bin/sh
> > > > +
> > > > +# Used to find files installed in sysroot which are not tracked by
> > sstate
> > > > manifest
> > > > +# Update BASE
> > > > +
> > > > +BASE="/OE/oe-core"
> > > >
> > >
> > > This seems interesting, but I have a few comments/concerns.
> > >
> > > 1) don't hardcode BASE, figure out the path relative to the script's
> > > location, e.g. BASE="$(cd $(dirname $(dirname $0)) && pwd)"
> > > 2) output files shouldn't go into oe-core directly, as oe-core isn't
> > > guaranteed to be writable, and it's more common to expect output from a
> > > script like this to go relative to the current directory, or a temp
> > > directory
> > > 3) extract TMPDIR from bitbake -e, rather than hardcoding that, as that
> > > breaks for any distros or users which separate their tmpdirs by distro,
> > or
> > > set TCLIBCAPPEND = ""
> >
> > Ah in this case BASE is not directory with oe-core layer but one above.
> >
> 
> Still trivial to determine relative to the script's location. Adding an
> extra call to dirname, or an extra '/..' isn't particularly tough.

Yes, but making another assumption that TMPDIR is on same level as
oe-core checkout it.

> I didn't want to use bitbake -e, so that this script can be executed
> > when some other bitbake process is running.
> >
> 
> I don't see having to hardcode assumptions about the environment as a net
> win, personally. At least make them arguments, or add an argument to
> optionally use bitbake -e, or let it take vars from the environment.

v5 is using argument or env variable (like sstate-cache-management.sh)

> As said it's not really universal (I wrote it just to confirm something
> > and then shared it to find if someone else finds it also usefull to
> > improve it).
> >
> 
> If it can't be used in a wide variety of circumstances, I'd argue against
> its inclusion in oe-core until such time that it's ready. The subject of
> this thread didn't include "RFC' ;)

I was using it in wide variety of circumstances just by updating 1
variable in script, but I agree that argument + shell history can make
it easier.

Cheers,

Patch

diff --git a/scripts/sstate-sysroot-cruft.sh b/scripts/sstate-sysroot-cruft.sh
new file mode 100755
index 0000000..ca23dcf
--- /dev/null
+++ b/scripts/sstate-sysroot-cruft.sh
@@ -0,0 +1,34 @@ 
+#!/bin/sh
+
+# Used to find files installed in sysroot which are not tracked by sstate manifest
+# Update BASE
+
+BASE="/OE/oe-core"
+TMPDIR="${BASE}/tmp-eglibc"
+
+OUTPUT=${BASE}/sysroot.cruft.`date "+%s"`
+WHITELIST="\/var\/pseudo\/*[^\/]*$ \/shlibs$ \.pyc$ \.pyo$"
+
+mkdir ${OUTPUT}
+find ${TMPDIR}/sstate-control -name \*.populate-sysroot\* -o -name \*.package\* | xargs cat | grep sysroots | \
+  sed 's#/$##g; s#///*#/#g' | \
+  # work around for paths ending with / for directories and multiplied // (e.g. paths to native sysroot)
+  sort > ${OUTPUT}/master.list.all
+sort -u ${OUTPUT}/master.list.all > ${OUTPUT}/master.list # -u because some directories are listed for more recipes
+find ${TMPDIR}/sysroots/ | \
+  sort > ${OUTPUT}/sysroot.list
+
+diff ${OUTPUT}/master.list.all ${OUTPUT}/master.list > ${OUTPUT}/duplicates
+diff ${OUTPUT}/master.list ${OUTPUT}/sysroot.list > ${OUTPUT}/diff.all
+
+cp ${OUTPUT}/diff.all ${OUTPUT}/diff
+for item in ${WHITELIST}; do 
+  sed -i "/${item}/d" ${OUTPUT}/diff;
+done
+
+# too many false positives for directories
+# echo "Following files are installed in sysroot at least twice"
+# cat ${OUTPUT}/duplicates
+
+echo "Following files are installed in sysroot, but not tracked by sstate"
+cat ${OUTPUT}/diff