Patchwork [bitbake-devel,2/6] data_smart: Improve the calculation of config hash

login
register
mail settings
Submitter Dongxiao Xu
Date April 9, 2012, 8:41 a.m.
Message ID <ccb22c39f4aeb8e310f9a6f8e0f9c55c18ac6891.1333960654.git.dongxiao.xu@intel.com>
Download mbox | patch
Permalink /patch/25357/
State New
Headers show

Comments

Dongxiao Xu - April 9, 2012, 8:41 a.m.
The order of keys are not sensitive for config hash, so we need to
identify its order while calculating the md5 value.

While for certain values, order is also not sensitive (for example,
BBINCLUDED), we also need to identify its order while calculating md5
value.

This could fix the problem that Martin Jansa reported in the mailing
list:

http://lists.linuxtogo.org/pipermail/bitbake-devel/2012-March/002122.html

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
---
 lib/bb/data_smart.py |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)
Richard Purdie - April 10, 2012, 11 p.m.
On Mon, 2012-04-09 at 16:41 +0800, Dongxiao Xu wrote:
> The order of keys are not sensitive for config hash, so we need to
> identify its order while calculating the md5 value.
> 
> While for certain values, order is also not sensitive (for example,
> BBINCLUDED), we also need to identify its order while calculating md5
> value.
> 
> This could fix the problem that Martin Jansa reported in the mailing
> list:
> 
> http://lists.linuxtogo.org/pipermail/bitbake-devel/2012-March/002122.html
> 
> Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
> ---
>  lib/bb/data_smart.py |   10 +++++++---
>  1 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/bb/data_smart.py b/lib/bb/data_smart.py
> index 2c200db..cc61a03 100644
> --- a/lib/bb/data_smart.py
> +++ b/lib/bb/data_smart.py
> @@ -462,13 +462,17 @@ class DataSmart(MutableMapping):
>          self.delVar(var)
>  
>      def get_hash(self):
> -        data = ""
> +        data = {}
>          config_whitelist = set((self.getVar("BB_HASHCONFIG_WHITELIST", True) or "").split())
> +        config_sort = set((self.getVar("BB_HASHCONFIG_SORT", True) or "").split())
>          keys = set(key for key in iter(self) if not key.startswith("__"))
>          for key in keys:
>              if key in config_whitelist:
>                  continue
>              value = self.getVar(key, False) or ""
> -            data = data + key + ': ' + str(value) + '\n'
> +            if key in config_sort:
> +                value = " ".join(sorted(value.split()))
> +            data.update({key:value})
>  
> -        return hashlib.md5(data).hexdigest()
> +        data_str = str([(k, data[k]) for k in sorted(data.keys())])
> +        return hashlib.md5(data_str).hexdigest()


This and the corresponding change in bitbake.conf look rather worrying
to me. The order in BBINCLUDED is significant and if it changes we
should be reparsing.

Looking at the code, get_file_depends(d) is probably buggy due to the
use of set() which could reorder the data. We need to keep the data in
order there and this issue should then be resolved.

Lets fix the real bug.

Cheers,

Richard
Dongxiao Xu - April 16, 2012, 7:15 a.m.
On Wed, 2012-04-11 at 00:00 +0100, Richard Purdie wrote:
> On Mon, 2012-04-09 at 16:41 +0800, Dongxiao Xu wrote:
> > The order of keys are not sensitive for config hash, so we need to
> > identify its order while calculating the md5 value.
> > 
> > While for certain values, order is also not sensitive (for example,
> > BBINCLUDED), we also need to identify its order while calculating md5
> > value.
> > 
> > This could fix the problem that Martin Jansa reported in the mailing
> > list:
> > 
> > http://lists.linuxtogo.org/pipermail/bitbake-devel/2012-March/002122.html
> > 
> > Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
> > ---
> >  lib/bb/data_smart.py |   10 +++++++---
> >  1 files changed, 7 insertions(+), 3 deletions(-)
> > 
> > diff --git a/lib/bb/data_smart.py b/lib/bb/data_smart.py
> > index 2c200db..cc61a03 100644
> > --- a/lib/bb/data_smart.py
> > +++ b/lib/bb/data_smart.py
> > @@ -462,13 +462,17 @@ class DataSmart(MutableMapping):
> >          self.delVar(var)
> >  
> >      def get_hash(self):
> > -        data = ""
> > +        data = {}
> >          config_whitelist = set((self.getVar("BB_HASHCONFIG_WHITELIST", True) or "").split())
> > +        config_sort = set((self.getVar("BB_HASHCONFIG_SORT", True) or "").split())
> >          keys = set(key for key in iter(self) if not key.startswith("__"))
> >          for key in keys:
> >              if key in config_whitelist:
> >                  continue
> >              value = self.getVar(key, False) or ""
> > -            data = data + key + ': ' + str(value) + '\n'
> > +            if key in config_sort:
> > +                value = " ".join(sorted(value.split()))
> > +            data.update({key:value})
> >  
> > -        return hashlib.md5(data).hexdigest()
> > +        data_str = str([(k, data[k]) for k in sorted(data.keys())])
> > +        return hashlib.md5(data_str).hexdigest()
> 
> 
> This and the corresponding change in bitbake.conf look rather worrying
> to me. The order in BBINCLUDED is significant and if it changes we
> should be reparsing.

Hi Richard,

Why do you say the order in BBINCLUDED is significant?

I saw the original code ignores the order when handling __depends and
__base_depends.

For example:

def mark_dependency(d, f):
    if f.startswith('./'):
        f = "%s/%s" % (os.getcwd(), f[2:]) 
    deps = d.getVar('__depends') or set() 
    deps.update([(f, cached_mtime(f))])
    d.setVar('__depends', deps)    

I think the get_file_depends(d) just follows the original logic.

Or do you mean the mark_dependency(d, f) is also buggy?

Thanks,
Dongxiao

> 
> Looking at the code, get_file_depends(d) is probably buggy due to the
> use of set() which could reorder the data. We need to keep the data in
> order there and this issue should then be resolved.
> 
> Lets fix the real bug.
> 
> Cheers,
> 
> Richard
> 
> 
>
Richard Purdie - April 16, 2012, 8:28 a.m.
On Mon, 2012-04-16 at 15:15 +0800, Xu, Dongxiao wrote:
> On Wed, 2012-04-11 at 00:00 +0100, Richard Purdie wrote:
> > On Mon, 2012-04-09 at 16:41 +0800, Dongxiao Xu wrote:
> > > The order of keys are not sensitive for config hash, so we need to
> > > identify its order while calculating the md5 value.
> > > 
> > > While for certain values, order is also not sensitive (for example,
> > > BBINCLUDED), we also need to identify its order while calculating md5
> > > value.
> > > 
> > > This could fix the problem that Martin Jansa reported in the mailing
> > > list:
> > > 
> > > http://lists.linuxtogo.org/pipermail/bitbake-devel/2012-March/002122.html
> > > 
> > > Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
> > > ---
> > >  lib/bb/data_smart.py |   10 +++++++---
> > >  1 files changed, 7 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/lib/bb/data_smart.py b/lib/bb/data_smart.py
> > > index 2c200db..cc61a03 100644
> > > --- a/lib/bb/data_smart.py
> > > +++ b/lib/bb/data_smart.py
> > > @@ -462,13 +462,17 @@ class DataSmart(MutableMapping):
> > >          self.delVar(var)
> > >  
> > >      def get_hash(self):
> > > -        data = ""
> > > +        data = {}
> > >          config_whitelist = set((self.getVar("BB_HASHCONFIG_WHITELIST", True) or "").split())
> > > +        config_sort = set((self.getVar("BB_HASHCONFIG_SORT", True) or "").split())
> > >          keys = set(key for key in iter(self) if not key.startswith("__"))
> > >          for key in keys:
> > >              if key in config_whitelist:
> > >                  continue
> > >              value = self.getVar(key, False) or ""
> > > -            data = data + key + ': ' + str(value) + '\n'
> > > +            if key in config_sort:
> > > +                value = " ".join(sorted(value.split()))
> > > +            data.update({key:value})
> > >  
> > > -        return hashlib.md5(data).hexdigest()
> > > +        data_str = str([(k, data[k]) for k in sorted(data.keys())])
> > > +        return hashlib.md5(data_str).hexdigest()
> > 
> > 
> > This and the corresponding change in bitbake.conf look rather worrying
> > to me. The order in BBINCLUDED is significant and if it changes we
> > should be reparsing.
> 
> Hi Richard,
> 
> Why do you say the order in BBINCLUDED is significant?
> 
> I saw the original code ignores the order when handling __depends and
> __base_depends.
> 
> For example:
> 
> def mark_dependency(d, f):
>     if f.startswith('./'):
>         f = "%s/%s" % (os.getcwd(), f[2:]) 
>     deps = d.getVar('__depends') or set() 
>     deps.update([(f, cached_mtime(f))])
>     d.setVar('__depends', deps)    
> 
> I think the get_file_depends(d) just follows the original logic.
> 
> Or do you mean the mark_dependency(d, f) is also buggy?

In that case I think the original code is also buggy.

Cheers,

Richard
Dongxiao Xu - April 16, 2012, 8:41 a.m.
On Mon, 2012-04-16 at 09:28 +0100, Richard Purdie wrote:
> On Mon, 2012-04-16 at 15:15 +0800, Xu, Dongxiao wrote:
> > On Wed, 2012-04-11 at 00:00 +0100, Richard Purdie wrote:
> > > On Mon, 2012-04-09 at 16:41 +0800, Dongxiao Xu wrote:
> > > > The order of keys are not sensitive for config hash, so we need to
> > > > identify its order while calculating the md5 value.
> > > > 
> > > > While for certain values, order is also not sensitive (for example,
> > > > BBINCLUDED), we also need to identify its order while calculating md5
> > > > value.
> > > > 
> > > > This could fix the problem that Martin Jansa reported in the mailing
> > > > list:
> > > > 
> > > > http://lists.linuxtogo.org/pipermail/bitbake-devel/2012-March/002122.html
> > > > 
> > > > Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
> > > > ---
> > > >  lib/bb/data_smart.py |   10 +++++++---
> > > >  1 files changed, 7 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/lib/bb/data_smart.py b/lib/bb/data_smart.py
> > > > index 2c200db..cc61a03 100644
> > > > --- a/lib/bb/data_smart.py
> > > > +++ b/lib/bb/data_smart.py
> > > > @@ -462,13 +462,17 @@ class DataSmart(MutableMapping):
> > > >          self.delVar(var)
> > > >  
> > > >      def get_hash(self):
> > > > -        data = ""
> > > > +        data = {}
> > > >          config_whitelist = set((self.getVar("BB_HASHCONFIG_WHITELIST", True) or "").split())
> > > > +        config_sort = set((self.getVar("BB_HASHCONFIG_SORT", True) or "").split())
> > > >          keys = set(key for key in iter(self) if not key.startswith("__"))
> > > >          for key in keys:
> > > >              if key in config_whitelist:
> > > >                  continue
> > > >              value = self.getVar(key, False) or ""
> > > > -            data = data + key + ': ' + str(value) + '\n'
> > > > +            if key in config_sort:
> > > > +                value = " ".join(sorted(value.split()))
> > > > +            data.update({key:value})
> > > >  
> > > > -        return hashlib.md5(data).hexdigest()
> > > > +        data_str = str([(k, data[k]) for k in sorted(data.keys())])
> > > > +        return hashlib.md5(data_str).hexdigest()
> > > 
> > > 
> > > This and the corresponding change in bitbake.conf look rather worrying
> > > to me. The order in BBINCLUDED is significant and if it changes we
> > > should be reparsing.
> > 
> > Hi Richard,
> > 
> > Why do you say the order in BBINCLUDED is significant?
> > 
> > I saw the original code ignores the order when handling __depends and
> > __base_depends.
> > 
> > For example:
> > 
> > def mark_dependency(d, f):
> >     if f.startswith('./'):
> >         f = "%s/%s" % (os.getcwd(), f[2:]) 
> >     deps = d.getVar('__depends') or set() 
> >     deps.update([(f, cached_mtime(f))])
> >     d.setVar('__depends', deps)    
> > 
> > I think the get_file_depends(d) just follows the original logic.
> > 
> > Or do you mean the mark_dependency(d, f) is also buggy?
> 
> In that case I think the original code is also buggy.

Could you explain more why __depends should be ordered? I saw the
variable's definition is set() type and there is no code depends on the
order of "__depends".

Thanks,
Dongxiao

> 
> Cheers,
> 
> Richard
>
Richard Purdie - April 16, 2012, 9 a.m.
On Mon, 2012-04-16 at 16:41 +0800, Xu, Dongxiao wrote:
> On Mon, 2012-04-16 at 09:28 +0100, Richard Purdie wrote:
> > On Mon, 2012-04-16 at 15:15 +0800, Xu, Dongxiao wrote:
> > > On Wed, 2012-04-11 at 00:00 +0100, Richard Purdie wrote:
> > > > On Mon, 2012-04-09 at 16:41 +0800, Dongxiao Xu wrote:
> > > > > The order of keys are not sensitive for config hash, so we need to
> > > > > identify its order while calculating the md5 value.
> > > > > 
> > > > > While for certain values, order is also not sensitive (for example,
> > > > > BBINCLUDED), we also need to identify its order while calculating md5
> > > > > value.
> > > > > 
> > > > > This could fix the problem that Martin Jansa reported in the mailing
> > > > > list:
> > > > > 
> > > > > http://lists.linuxtogo.org/pipermail/bitbake-devel/2012-March/002122.html
> > > > > 
> > > > > Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
> > > > > ---
> > > > >  lib/bb/data_smart.py |   10 +++++++---
> > > > >  1 files changed, 7 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/lib/bb/data_smart.py b/lib/bb/data_smart.py
> > > > > index 2c200db..cc61a03 100644
> > > > > --- a/lib/bb/data_smart.py
> > > > > +++ b/lib/bb/data_smart.py
> > > > > @@ -462,13 +462,17 @@ class DataSmart(MutableMapping):
> > > > >          self.delVar(var)
> > > > >  
> > > > >      def get_hash(self):
> > > > > -        data = ""
> > > > > +        data = {}
> > > > >          config_whitelist = set((self.getVar("BB_HASHCONFIG_WHITELIST", True) or "").split())
> > > > > +        config_sort = set((self.getVar("BB_HASHCONFIG_SORT", True) or "").split())
> > > > >          keys = set(key for key in iter(self) if not key.startswith("__"))
> > > > >          for key in keys:
> > > > >              if key in config_whitelist:
> > > > >                  continue
> > > > >              value = self.getVar(key, False) or ""
> > > > > -            data = data + key + ': ' + str(value) + '\n'
> > > > > +            if key in config_sort:
> > > > > +                value = " ".join(sorted(value.split()))
> > > > > +            data.update({key:value})
> > > > >  
> > > > > -        return hashlib.md5(data).hexdigest()
> > > > > +        data_str = str([(k, data[k]) for k in sorted(data.keys())])
> > > > > +        return hashlib.md5(data_str).hexdigest()
> > > > 
> > > > 
> > > > This and the corresponding change in bitbake.conf look rather worrying
> > > > to me. The order in BBINCLUDED is significant and if it changes we
> > > > should be reparsing.
> > > 
> > > Hi Richard,
> > > 
> > > Why do you say the order in BBINCLUDED is significant?
> > > 
> > > I saw the original code ignores the order when handling __depends and
> > > __base_depends.
> > > 
> > > For example:
> > > 
> > > def mark_dependency(d, f):
> > >     if f.startswith('./'):
> > >         f = "%s/%s" % (os.getcwd(), f[2:]) 
> > >     deps = d.getVar('__depends') or set() 
> > >     deps.update([(f, cached_mtime(f))])
> > >     d.setVar('__depends', deps)    
> > > 
> > > I think the get_file_depends(d) just follows the original logic.
> > > 
> > > Or do you mean the mark_dependency(d, f) is also buggy?
> > 
> > In that case I think the original code is also buggy.
> 
> Could you explain more why __depends should be ordered? I saw the
> variable's definition is set() type and there is no code depends on the
> order of "__depends".

Well, your new code does!

Bitbake does care deeply about the order things are included in so I
believe this variable should reflect that.

Cheers,

Richard

Patch

diff --git a/lib/bb/data_smart.py b/lib/bb/data_smart.py
index 2c200db..cc61a03 100644
--- a/lib/bb/data_smart.py
+++ b/lib/bb/data_smart.py
@@ -462,13 +462,17 @@  class DataSmart(MutableMapping):
         self.delVar(var)
 
     def get_hash(self):
-        data = ""
+        data = {}
         config_whitelist = set((self.getVar("BB_HASHCONFIG_WHITELIST", True) or "").split())
+        config_sort = set((self.getVar("BB_HASHCONFIG_SORT", True) or "").split())
         keys = set(key for key in iter(self) if not key.startswith("__"))
         for key in keys:
             if key in config_whitelist:
                 continue
             value = self.getVar(key, False) or ""
-            data = data + key + ': ' + str(value) + '\n'
+            if key in config_sort:
+                value = " ".join(sorted(value.split()))
+            data.update({key:value})
 
-        return hashlib.md5(data).hexdigest()
+        data_str = str([(k, data[k]) for k in sorted(data.keys())])
+        return hashlib.md5(data_str).hexdigest()