Patchwork [3/5] scripts/sstate-sysroot-cruft.sh: add simple script to find files in sysroots not tracked by sstate

login
register
mail settings
Submitter Martin Jansa
Date Dec. 5, 2012, 6:26 p.m.
Message ID <8184c008606ece423da0122a9fa6087509dfa0f5.1354731942.git.Martin.Jansa@gmail.com>
Download mbox | patch
Permalink /patch/40451/
State Superseded, archived
Headers show

Comments

Martin Jansa - Dec. 5, 2012, 6:26 p.m.
Signed-off-by: Martin Jansa <Martin.Jansa@gmail.com>
---
 scripts/sstate-sysroot-cruft.sh | 78 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 78 insertions(+)
 create mode 100755 scripts/sstate-sysroot-cruft.sh
Enrico Scholz - Dec. 5, 2012, 7:04 p.m.
Martin Jansa <martin.jansa-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
writes:

> +WHITELIST="\/var\/pseudo\/*[^\/]*$ \/shlibs$ \.pyc$ \.pyo$"

Is it really wanted that this matches paths like '/var/pseudosomepath'?



Enrico
Martin Jansa - Dec. 5, 2012, 7:35 p.m.
On Wed, Dec 05, 2012 at 08:04:47PM +0100, Enrico Scholz wrote:
> Martin Jansa <martin.jansa-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> writes:
> 
> > +WHITELIST="\/var\/pseudo\/*[^\/]*$ \/shlibs$ \.pyc$ \.pyo$"
> 
> Is it really wanted that this matches paths like '/var/pseudosomepath'?

.*/var/pseudo
and
.*/var/pseudo/somefile$

which matches also /var/pseudosomepath, I can split it in 2 regexps if
that's likely to show in sysroot, or use "\/var\/pseudo\($\|\/[^\/]*$\)"

to match:
/OE/jansa-test/shr-core/tmp-eglibc/sysroots/om-gta02/var/pseudo
/OE/jansa-test/shr-core/tmp-eglibc/sysroots/om-gta02/var/pseudo/files.db
/OE/jansa-test/shr-core/tmp-eglibc/sysroots/om-gta02/var/pseudo/logs.db
/OE/jansa-test/shr-core/tmp-eglibc/sysroots/om-gta02/var/pseudo/pseudo.lock
/OE/jansa-test/shr-core/tmp-eglibc/sysroots/om-gta02/var/pseudo/pseudo.log
/OE/jansa-test/shr-core/tmp-eglibc/sysroots/om-gta02/var/pseudo/pseudo.pid
/OE/jansa-test/shr-core/tmp-eglibc/sysroots/om-gta02/var/pseudo/pseudo.socket

Cheers,
Enrico Scholz - Dec. 5, 2012, 10:49 p.m.
Martin Jansa <martin.jansa@gmail.com> writes:

> .*/var/pseudo
> and
> .*/var/pseudo/somefile$
>
> which matches also /var/pseudosomepath, I can split it in 2 regexps if
> that's likely to show in sysroot, or use "\/var\/pseudo\($\|\/[^\/]*$\)"

A basic regexp '/var/pseudo\(/.*\)\?$' should suffice.  Btw, to avoid
escaping of '/', you can write

  sed -i "\!${var}!d"


Enrico
Martin Jansa - Dec. 6, 2012, 4:30 a.m.
That would match .*/var/pseudo/too/bar,wouldn't it? but true that making
that group optional is probably easier to read.. But who doesn't read
regexps fluently nowadays? :)
On Dec 5, 2012 11:49 PM, "Enrico Scholz" <enrico.scholz@sigma-chemnitz.de>
wrote:

> Martin Jansa <martin.jansa@gmail.com> writes:
>
> > .*/var/pseudo
> > and
> > .*/var/pseudo/somefile$
> >
> > which matches also /var/pseudosomepath, I can split it in 2 regexps if
> > that's likely to show in sysroot, or use "\/var\/pseudo\($\|\/[^\/]*$\)"
>
> A basic regexp '/var/pseudo\(/.*\)\?$' should suffice.  Btw, to avoid
> escaping of '/', you can write
>
>   sed -i "\!${var}!d"
>
>
> Enrico
>
Enrico Scholz - Dec. 6, 2012, 10:41 a.m.
Martin Jansa <martin.jansa@gmail.com> writes:

> That would match .*/var/pseudo/too/bar,wouldn't it?

yes; when you really want to exclude such subpaths, you can match against
'/var/pseudo\(/[^/]*\)\?$'.

Enrico

Patch

diff --git a/scripts/sstate-sysroot-cruft.sh b/scripts/sstate-sysroot-cruft.sh
new file mode 100755
index 0000000..6caa252
--- /dev/null
+++ b/scripts/sstate-sysroot-cruft.sh
@@ -0,0 +1,78 @@ 
+#!/bin/sh
+
+# Used to find files installed in sysroot which are not tracked by sstate manifest
+
+# Global vars
+tmpdir=
+
+usage () {
+  cat << EOF
+Welcome to sysroot cruft finding utility.
+$0 <OPTION>
+
+Options:
+  -h, --help
+        Display this help and exit.
+
+  --tmpdir=<tmpdir>
+        Specify tmpdir, will use the environment variable TMPDIR if it is not specified.
+	Something like /OE/oe-core/tmp-eglibc (no / at the end).
+EOF
+}
+
+# Print error information and exit.
+echo_error () {
+  echo "ERROR: $1" >&2
+  exit 1
+}
+
+while [ -n "$1" ]; do
+  case $1 in
+    --tmpdir=*)
+      tmpdir=`echo $1 | sed -e 's#^--tmpdir=##' | xargs readlink -e`
+      [ -d "$tmpdir" ] || echo_error "Invalid argument to --tmpdir"
+      shift
+        ;;
+    --help|-h)
+      usage
+      exit 0
+        ;;
+    *)
+      echo "Invalid arguments $*"
+      echo_error "Try '$0 -h' for more information."
+        ;;
+  esac
+done
+
+# sstate cache directory, use environment variable TMPDIR
+# if it was not specified, otherwise, error.
+[ -n "$tmpdir" ] || tmpdir=$TMPDIR
+[ -n "$tmpdir" ] || echo_error "No tmpdir found!"
+[ -d "$tmpdir" ] || echo_error "Invalid tmpdir \"$tmpdir\""
+
+OUTPUT=${tmpdir}/sysroot.cruft.`date "+%s"`
+WHITELIST="\/var\/pseudo\/*[^\/]*$ \/shlibs$ \.pyc$ \.pyo$"
+
+mkdir ${OUTPUT}
+find ${tmpdir}/sstate-control -name \*.populate-sysroot\* -o -name \*.package\* | xargs cat | grep sysroots | \
+  sed 's#/$##g; s#///*#/#g' | \
+  # work around for paths ending with / for directories and multiplied // (e.g. paths to native sysroot)
+  sort > ${OUTPUT}/master.list.all
+sort -u ${OUTPUT}/master.list.all > ${OUTPUT}/master.list # -u because some directories are listed for more recipes
+find ${tmpdir}/sysroots/ | \
+  sort > ${OUTPUT}/sysroot.list
+
+diff ${OUTPUT}/master.list.all ${OUTPUT}/master.list > ${OUTPUT}/duplicates
+diff ${OUTPUT}/master.list ${OUTPUT}/sysroot.list > ${OUTPUT}/diff.all
+
+cp ${OUTPUT}/diff.all ${OUTPUT}/diff
+for item in ${WHITELIST}; do 
+  sed -i "/${item}/d" ${OUTPUT}/diff;
+done
+
+# too many false positives for directories
+# echo "Following files are installed in sysroot at least twice"
+# cat ${OUTPUT}/duplicates
+
+echo "Following files are installed in sysroot, but not tracked by sstate"
+cat ${OUTPUT}/diff