Patchwork [2/3] package.bbclass: use oe.path.realpath()

login
register
mail settings
Submitter Enrico Scholz
Date Feb. 10, 2013, 12:41 p.m.
Message ID <69be0b1aae50eb097172bcfa2e9aa02157f62436.1360499939.git.enrico.scholz@sigma-chemnitz.de>
Download mbox | patch
Permalink /patch/44419/
State Accepted
Commit f2e16c655286aab465054db6727fa3365bef3124
Headers show

Comments

Enrico Scholz - Feb. 10, 2013, 12:41 p.m.
oe.path.realpath() provides are common and more correct implementation
for resolving symlinks within sysroot. Use it.

Old implementation suffered from lot of problems; e.g.

* redundant code

* calls 'os.stat()' which references files on host; this can give wrong
  results about existing/non-existing and can cause EPERM (instead of
  the catched ENONENT) exceptions

* does not deal with special cases like '..' leaving the sysroot.

Signed-off-by: Enrico Scholz <enrico.scholz@sigma-chemnitz.de>
---
 meta/classes/package.bbclass | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)
Richard Purdie - March 6, 2013, 11:49 a.m.
On Sun, 2013-02-10 at 13:41 +0100, Enrico Scholz wrote:
> oe.path.realpath() provides are common and more correct implementation
> for resolving symlinks within sysroot. Use it.
> 
> Old implementation suffered from lot of problems; e.g.
> 
> * redundant code
> 
> * calls 'os.stat()' which references files on host; this can give wrong
>   results about existing/non-existing and can cause EPERM (instead of
>   the catched ENONENT) exceptions
> 
> * does not deal with special cases like '..' leaving the sysroot.

Whilst these changes are good, they do come at a cost:

post symlink package changes
./perfscript -c 8c22531e491e6b0cfffaaa80d6bc75db757fc1d1
49:38.46,17:12.15

pre symlink package changes
./perfscript -c 1a80329b3fcf23ecc23e409a260b9b2182652f65
48:16.33,13:39.97

So it added 1m20 to the overall build time, but more worryingly, added
added nearly 3m30 to the time for:

bitbake virtual/kernel -c cleansstate; bitbake virtual/kernel

These tests are based on the script linked from
https://wiki.yoctoproject.org/wiki/Performance_Test where the kernel
test is this is the second number above list, overall build time is the
first.

Have you any time to look into this performance regression?

FWIW, "bitbake virtual/kernel -c package -P -f" shows that
package_fixsymlinks() takes a rather long time in the profile output.

Cheers,

Richard
Enrico Scholz - March 12, 2013, 10:25 a.m.
Richard Purdie <richard.purdie@linuxfoundation.org> writes:

>> Old implementation suffered from lot of problems; e.g.
>> 
>> * redundant code
>> 
>> * calls 'os.stat()' which references files on host; this can give wrong
>>   results about existing/non-existing and can cause EPERM (instead of
>>   the catched ENONENT) exceptions
>> 
>> * does not deal with special cases like '..' leaving the sysroot.
>
> Whilst these changes are good, they do come at a cost:
>
> post symlink package changes
> ./perfscript -c 8c22531e491e6b0cfffaaa80d6bc75db757fc1d1
> 49:38.46,17:12.15
>
> pre symlink package changes
> ./perfscript -c 1a80329b3fcf23ecc23e409a260b9b2182652f65
> 48:16.33,13:39.97
>
> So it added 1m20 to the overall build time, but more worryingly, added
> added nearly 3m30 to the time for:
>
> bitbake virtual/kernel -c cleansstate; bitbake virtual/kernel
>
> These tests are based on the script linked from
> https://wiki.yoctoproject.org/wiki/Performance_Test where the kernel
> test is this is the second number above list, overall build time is the
> first.
>
> Have you any time to look into this performance regression?

Well, patch replaced a call to os.path.realpath() with a more accurate
variant honoring a sysroot. There must be more work be done to replace
things previously solved by the operating system with custom (python)
code.

I will try to write a C implementation of oe.path.realpath() but this
can take some time and I do not have an idea how to integrate it into
oe.


Enrico
Richard Purdie - March 12, 2013, 6:47 p.m.
On Tue, 2013-03-12 at 11:25 +0100, Enrico Scholz wrote:
> Richard Purdie <richard.purdie@linuxfoundation.org> writes:
> 
> >> Old implementation suffered from lot of problems; e.g.
> >> 
> >> * redundant code
> >> 
> >> * calls 'os.stat()' which references files on host; this can give wrong
> >>   results about existing/non-existing and can cause EPERM (instead of
> >>   the catched ENONENT) exceptions
> >> 
> >> * does not deal with special cases like '..' leaving the sysroot.
> >
> > Whilst these changes are good, they do come at a cost:
> >
> > post symlink package changes
> > ./perfscript -c 8c22531e491e6b0cfffaaa80d6bc75db757fc1d1
> > 49:38.46,17:12.15
> >
> > pre symlink package changes
> > ./perfscript -c 1a80329b3fcf23ecc23e409a260b9b2182652f65
> > 48:16.33,13:39.97
> >
> > So it added 1m20 to the overall build time, but more worryingly, added
> > added nearly 3m30 to the time for:
> >
> > bitbake virtual/kernel -c cleansstate; bitbake virtual/kernel
> >
> > These tests are based on the script linked from
> > https://wiki.yoctoproject.org/wiki/Performance_Test where the kernel
> > test is this is the second number above list, overall build time is the
> > first.
> >
> > Have you any time to look into this performance regression?
> 
> Well, patch replaced a call to os.path.realpath() with a more accurate
> variant honoring a sysroot. There must be more work be done to replace
> things previously solved by the operating system with custom (python)
> code.
> 
> I will try to write a C implementation of oe.path.realpath() but this
> can take some time and I do not have an idea how to integrate it into
> oe.

The issue is not an interpreted language verses C, the problem is the
shear number of syscalls. Each isdir() and islink() call means a stat
call into the kernel. For "bitbake virtual/kernel -c package", I'm
seeing 400,000 stat calls for 23,000 files. We know the filesystem
doesn't change so we could cache the stat calls and hopefully hence the
speed. I actually have some code I tried this with but its not working
as well as I'd like so I need to do some further investigation about
what is happening.

Cheers,

Richard
Enrico Scholz - March 12, 2013, 7:10 p.m.
Richard Purdie <richard.purdie@linuxfoundation.org> writes:

>> > Have you any time to look into this performance regression?
>> 
>> Well, patch replaced a call to os.path.realpath() with a more accurate
>> variant honoring a sysroot. There must be more work be done to replace
>> things previously solved by the operating system with custom (python)
>> code.
>> 
>> I will try to write a C implementation of oe.path.realpath() but this
>> can take some time and I do not have an idea how to integrate it into
>> oe.
>
> The issue is not an interpreted language verses C, the problem is the
> shear number of syscalls. Each isdir() and islink() call means a stat
> call into the kernel.

With

    try:
-        is_dir = os.path.isdir(file)
+        if assume_dir:
+            is_dir = True
+        else:
+            is_dir = os.path.isdir(file)
    except:

in meta/lib/oe/path.py you should be able to reduce calls to isdir().

A

-                rtarget = oe.path.realpath(path, inst_root, True, assume_dir = True)
+                rtarget = oe.path.realpath(path, inst_root, False, assume_dir = True)

in package.bbclass seems to be save (pkgfiles[] is filled by os.walk() so
that locations should be already physical) and might improve performance
too.

But I do not have numbers how much you can save by these changes.


Enrico

Patch

diff --git a/meta/classes/package.bbclass b/meta/classes/package.bbclass
index b10e6f6..a74ec8a 100644
--- a/meta/classes/package.bbclass
+++ b/meta/classes/package.bbclass
@@ -741,7 +741,8 @@  python split_and_strip_files () {
                     continue
 
                 try:
-                    s = os.stat(file)
+                    ltarget = oe.path.realpath(file, dvar, False)
+                    s = os.lstat(ltarget)
                 except OSError, (err, strerror):
                     if err != errno.ENOENT:
                         raise
@@ -752,11 +753,6 @@  python split_and_strip_files () {
                     # If it's a symlink, and points to an ELF file, we capture the readlink target
                     if os.path.islink(file):
                         target = os.readlink(file)
-                        if not os.path.isabs(target):
-                            ltarget = os.path.join(os.path.dirname(file), target)
-                        else:
-                            ltarget = target
-
                         if isELF(ltarget):
                             #bb.note("Sym: %s (%d)" % (ltarget, isELF(ltarget)))
                             symlinks[file] = target
@@ -1005,14 +1001,12 @@  python package_fixsymlinks () {
                 rpath = path[len(inst_root):]
                 pkg_files[pkg].append(rpath)
                 try:
-                    s = os.stat(path)
+                    rtarget = oe.path.realpath(path, inst_root, True)
+                    os.lstat(rtarget)
                 except OSError, (err, strerror):
                     if err != errno.ENOENT:
                         raise
-                    target = os.readlink(path)
-                    if target[0] != '/':
-                        target = os.path.join(os.path.dirname(path)[len(inst_root):], target)
-                    dangling_links[pkg].append(os.path.normpath(target))
+                    dangling_links[pkg].append(os.path.normpath(rtarget[len(inst_root):]))
 
     newrdepends = {}
     for pkg in dangling_links: