Patchwork [bitbake-devel,1/1] fetch2/__init__.py: add an "auto" policy for BB_SRCREV_POLICY

login
register
mail settings
Submitter Robert Yang
Date Aug. 9, 2012, 8:56 a.m.
Message ID <2caf68f6aa334948840a647d7c34a59f0405d11b.1344482807.git.liezhi.yang@windriver.com>
Download mbox | patch
Permalink /patch/34129/
State New
Headers show

Comments

Robert Yang - Aug. 9, 2012, 8:56 a.m.
The newly added "auto" policy is used for the SRCREV="${AUTOREV}" recipe,
the current BB_SRCREV_POLICY are: "clear" (default) and "cache":

* When "clear", the "AUTOREV" recipe will not be cached in bb_cache.dat

* When "cache", the "AUTOREV" would be cached in bb_cache.dat, but it
  doesn't try to get the latest revision.

The "auto" policy is much like the "cache" except that it will try to
get the latest revision.

[YOCTO #2920]

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Robert Yang <liezhi.yang@windriver.com>
---
 bitbake/lib/bb/fetch2/__init__.py |   12 +++++++++---
 1 files changed, 9 insertions(+), 3 deletions(-)
Richard Purdie - Aug. 21, 2012, 1:08 p.m.
On Thu, 2012-08-09 at 16:56 +0800, Robert Yang wrote:
> The newly added "auto" policy is used for the SRCREV="${AUTOREV}" recipe,
> the current BB_SRCREV_POLICY are: "clear" (default) and "cache":
> 
> * When "clear", the "AUTOREV" recipe will not be cached in bb_cache.dat
> 
> * When "cache", the "AUTOREV" would be cached in bb_cache.dat, but it
>   doesn't try to get the latest revision.
> 
> The "auto" policy is much like the "cache" except that it will try to
> get the latest revision.
> 
> [YOCTO #2920]
> 
> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
> Signed-off-by: Robert Yang <liezhi.yang@windriver.com>
> ---
>  bitbake/lib/bb/fetch2/__init__.py |   12 +++++++++---
>  1 files changed, 9 insertions(+), 3 deletions(-)

I've been asked why I've not taken this. The simple answer is that I
can't figure out why we need to do this. The commit message above says
what was done but not why.

I also strongly dislike "auto" as a name for this mode. "clear" and
"cache" are at least descriptions of the behaviour, "auto" is not.

So can you please explain why we need this change? What use case does it
fix or improve or how is it useful? At the very least the commit message
needs rewriting and the option renaming.

> 
> diff --git a/bitbake/lib/bb/fetch2/__init__.py b/bitbake/lib/bb/fetch2/__init__.py
> index 12ebce2..27b73c0 100644
> --- a/bitbake/lib/bb/fetch2/__init__.py
> +++ b/bitbake/lib/bb/fetch2/__init__.py
> @@ -249,8 +249,8 @@ def fetcher_init(d):
>      Calls before this must not hit the cache.
>      """
>      # When to drop SCM head revisions controlled by user policy
> -    srcrev_policy = d.getVar('BB_SRCREV_POLICY', True) or "clear"
> -    if srcrev_policy == "cache":
> +    srcrev_policy = get_srcrev_policy(d)
> +    if srcrev_policy == "cache" or srcrev_policy == "auto":
>          logger.debug(1, "Keeping SRCREV cache due to cache policy of: %s", srcrev_policy)
>      elif srcrev_policy == "clear":
>          logger.debug(1, "Clearing SRCREV cache due to cache policy of: %s", srcrev_policy)
> @@ -384,9 +384,13 @@ def subprocess_setup():
>      # SIGPIPE errors are known issues with gzip/bash
>      signal.signal(signal.SIGPIPE, signal.SIG_DFL)
>  
> +def get_srcrev_policy(d):
> +    return d.getVar('BB_SRCREV_POLICY', True) or "auto"
> +
>  def get_autorev(d):
>      #  only not cache src rev in autorev case
> -    if d.getVar('BB_SRCREV_POLICY', True) != "cache":
> +    srcrev_policy = get_srcrev_policy(d)
> +    if srcrev_policy != "cache" and srcrev_policy != "auto":
>          d.setVar('__BB_DONT_CACHE', '1')

This looks highly suspect. If you do this, it won't check the upstream
revision at each invocation of bitbake which is how AUTOINC is meant to
behave.

Now if you want that behaviour, fine, but calling it "auto" is rather
misleading. It also seems to contradict your statement above that you
want the latest revision since with this you won't always get it.

Cheers,

Richard

>      return "AUTOINC"
>  
> @@ -1048,6 +1052,8 @@ class FetchMethod(object):
>  
>          revs = bb.persist_data.persist('BB_URI_HEADREVS', d)
>          key = self.generate_revision_key(url, ud, d, name)
> +        if get_srcrev_policy(d) == "auto":
> +            revs[key] = self._latest_revision(url, ud, d, name)
>          try:
>              return revs[key]
>          except KeyError:
Jason Wessel - Aug. 21, 2012, 1:30 p.m.
On 08/21/2012 08:08 AM, Richard Purdie wrote:
> On Thu, 2012-08-09 at 16:56 +0800, Robert Yang wrote:
>> The newly added "auto" policy is used for the SRCREV="${AUTOREV}" recipe,
>> the current BB_SRCREV_POLICY are: "clear" (default) and "cache":
>>
>> * When "clear", the "AUTOREV" recipe will not be cached in bb_cache.dat
>>
>> * When "cache", the "AUTOREV" would be cached in bb_cache.dat, but it
>>   doesn't try to get the latest revision.
>>
>> The "auto" policy is much like the "cache" except that it will try to
>> get the latest revision.
>>
>> [YOCTO #2920]
>>
>> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
>> Signed-off-by: Robert Yang <liezhi.yang@windriver.com>
>> ---
>>  bitbake/lib/bb/fetch2/__init__.py |   12 +++++++++---
>>  1 files changed, 9 insertions(+), 3 deletions(-)
> I've been asked why I've not taken this. The simple answer is that I
> can't figure out why we need to do this. The commit message above says
> what was done but not why.
>
> I also strongly dislike "auto" as a name for this mode. "clear" and
> "cache" are at least descriptions of the behaviour, "auto" is not.
>
> So can you please explain why we need this change? What use case does it
> fix or improve or how is it useful? At the very least the commit message
> needs rewriting and the option renaming.


I had asked Robert to follow up on this and intended this as a RFC, not a final implementation.  This basically looks like the original patch I sent him when I was using the code to illustrate a possible solution to the problem of missing values from the data cache.

Let us turn this into a discussion about the problem and see if we can come up with a plausible answer.

We use the contrib/dump_cache.py to generate the entire list of possible recipes -> version -> package split mapping.  As you can imagine this data is important and can be used for a number of purposes.  I noticed a few values were missing when ever I ran the dump_cache.py and narrowed it down to the fact that anything that was listed as AUTOREV did not show up in the cache.

The desired outcome is to have the key value pairs that were used in the most recent parse show up in the cache, as opposed to completely dropping the cache values from being written at all.   For the version retrieval code, if the value in the cache is autorev, it should look up the value as one would expect.  In terms of the clean vs cache vs a 3rd state, I don't really care what key words get used.   You might not even need to add a 3rd state depending on the implementation of solution.

I didn't see a reason not to write out the cache such that it could be parsed externally.   Just because you write it, doesn't mean it has to get re-used in the case of the AUTOREV.

If I have not described the problem clearly enough, please ask further questions.  Is it a bit more clear what the nature of the problem is?

Cheers,
Jason.
Richard Purdie - Aug. 21, 2012, 1:59 p.m.
On Tue, 2012-08-21 at 08:30 -0500, Jason Wessel wrote:
> I had asked Robert to follow up on this and intended this as a RFC,
> not a final implementation.  This basically looks like the original
> patch I sent him when I was using the code to illustrate a possible
> solution to the problem of missing values from the data cache.
>
> Let us turn this into a discussion about the problem and see if we can
> come up with a plausible answer.
>
> We use the contrib/dump_cache.py to generate the entire list of
> possible recipes -> version -> package split mapping.  As you can
> imagine this data is important and can be used for a number of
> purposes.  I noticed a few values were missing when ever I ran the
> dump_cache.py and narrowed it down to the fact that anything that was
> listed as AUTOREV did not show up in the cache.
>
> The desired outcome is to have the key value pairs that were used in
> the most recent parse show up in the cache, as opposed to completely
> dropping the cache values from being written at all.   For the version
> retrieval code, if the value in the cache is autorev, it should look
> up the value as one would expect.  In terms of the clean vs cache vs a
> 3rd state, I don't really care what key words get used.   You might
> not even need to add a 3rd state depending on the implementation of
> solution.
>
> I didn't see a reason not to write out the cache such that it could be
> parsed externally.   Just because you write it, doesn't mean it has to
> get re-used in the case of the AUTOREV.
>
> If I have not described the problem clearly enough, please ask further
> questions.  Is it a bit more clear what the nature of the problem is?

The problem is a lot clearer now, you basically want the data written to
the cache in the __BB_DONT_CACHE case. The trouble with the patch is we
tell bitbake to drop the data for good reason which is why the patch is
not acceptable.

I can imagine complaints about saving out data only to discard it at
load time and at this point I think your use of the cache is diverging
from what bitbake itself wants to use it for. I suspect what we might
need to do is extend cache.py to allow execution of custom cache
handlers and the you can write out the data you want. We already added
UI specific support to the cache so UIs could put extra data into the
cache although that perhaps doesn't cover this usecase :(.

Cheers,

Richard
Jason Wessel - Aug. 21, 2012, 2:14 p.m.
On 08/21/2012 08:59 AM, Richard Purdie wrote:
> On Tue, 2012-08-21 at 08:30 -0500, Jason Wessel wrote:
>> I had asked Robert to follow up on this and intended this as a RFC,
>> not a final implementation.  This basically looks like the original
>> patch I sent him when I was using the code to illustrate a possible
>> solution to the problem of missing values from the data cache.
>>
>> Let us turn this into a discussion about the problem and see if we can
>> come up with a plausible answer.
>>
>> We use the contrib/dump_cache.py to generate the entire list of
>> possible recipes -> version -> package split mapping.  As you can
>> imagine this data is important and can be used for a number of
>> purposes.  I noticed a few values were missing when ever I ran the
>> dump_cache.py and narrowed it down to the fact that anything that was
>> listed as AUTOREV did not show up in the cache.
>>
>> The desired outcome is to have the key value pairs that were used in
>> the most recent parse show up in the cache, as opposed to completely
>> dropping the cache values from being written at all.   For the version
>> retrieval code, if the value in the cache is autorev, it should look
>> up the value as one would expect.  In terms of the clean vs cache vs a
>> 3rd state, I don't really care what key words get used.   You might
>> not even need to add a 3rd state depending on the implementation of
>> solution.
>>
>> I didn't see a reason not to write out the cache such that it could be
>> parsed externally.   Just because you write it, doesn't mean it has to
>> get re-used in the case of the AUTOREV.
>>
>> If I have not described the problem clearly enough, please ask further
>> questions.  Is it a bit more clear what the nature of the problem is?
> The problem is a lot clearer now, you basically want the data written to
> the cache in the __BB_DONT_CACHE case. The trouble with the patch is we
> tell bitbake to drop the data for good reason which is why the patch is
> not acceptable.
>
> I can imagine complaints about saving out data only to discard it at
> load time and at this point I think your use of the cache is diverging
> from what bitbake itself wants to use it for. I suspect what we might
> need to do is extend cache.py to allow execution of custom cache
> handlers and the you can write out the data you want. We already added
> UI specific support to the cache so UIs could put extra data into the
> cache although that perhaps doesn't cover this usecase :(.

Ultimately I believe this type of data is something we want to display in either the hob or external UIs.

The bitbake parse is obviously quite expensive, and it is throwing away things we want to use later.  Even in the case of autorev, it didn't really make sense to me why the cache entry is entirely dropped, because you lose other things that could be cached such as the package split information.  Is there any reason you would want to compute that every single time because from what I can tell it never changes.

While I don't fully understand the implementation of the cache, it seems to me there is possibly a middle ground where the entries that don't actually change can be preserved.   The previous patch was a crude attempt to insert the keyword "autorev" in the value section of the cached version, with the intent to populate it when it was accessed later.   I am fairly certain the implementation was not correct, but that was why it was intended as an RFC.

Cheers,
Jason.
Richard Purdie - Aug. 21, 2012, 2:58 p.m.
On Tue, 2012-08-21 at 09:14 -0500, Jason Wessel wrote:
> On 08/21/2012 08:59 AM, Richard Purdie wrote:
> > On Tue, 2012-08-21 at 08:30 -0500, Jason Wessel wrote:
> >> I had asked Robert to follow up on this and intended this as a RFC,
> >> not a final implementation.  This basically looks like the original
> >> patch I sent him when I was using the code to illustrate a possible
> >> solution to the problem of missing values from the data cache.
> >>
> >> Let us turn this into a discussion about the problem and see if we can
> >> come up with a plausible answer.
> >>
> >> We use the contrib/dump_cache.py to generate the entire list of
> >> possible recipes -> version -> package split mapping.  As you can
> >> imagine this data is important and can be used for a number of
> >> purposes.  I noticed a few values were missing when ever I ran the
> >> dump_cache.py and narrowed it down to the fact that anything that was
> >> listed as AUTOREV did not show up in the cache.
> >>
> >> The desired outcome is to have the key value pairs that were used in
> >> the most recent parse show up in the cache, as opposed to completely
> >> dropping the cache values from being written at all.   For the version
> >> retrieval code, if the value in the cache is autorev, it should look
> >> up the value as one would expect.  In terms of the clean vs cache vs a
> >> 3rd state, I don't really care what key words get used.   You might
> >> not even need to add a 3rd state depending on the implementation of
> >> solution.
> >>
> >> I didn't see a reason not to write out the cache such that it could be
> >> parsed externally.   Just because you write it, doesn't mean it has to
> >> get re-used in the case of the AUTOREV.
> >>
> >> If I have not described the problem clearly enough, please ask further
> >> questions.  Is it a bit more clear what the nature of the problem is?
> > The problem is a lot clearer now, you basically want the data written to
> > the cache in the __BB_DONT_CACHE case. The trouble with the patch is we
> > tell bitbake to drop the data for good reason which is why the patch is
> > not acceptable.
> >
> > I can imagine complaints about saving out data only to discard it at
> > load time and at this point I think your use of the cache is diverging
> > from what bitbake itself wants to use it for. I suspect what we might
> > need to do is extend cache.py to allow execution of custom cache
> > handlers and the you can write out the data you want. We already added
> > UI specific support to the cache so UIs could put extra data into the
> > cache although that perhaps doesn't cover this usecase :(.
> 
> Ultimately I believe this type of data is something we want to display
> in either the hob or external UIs.

It is always available to them as the data is in memory used by a given
UI, just not saved to the cache file.

> The bitbake parse is obviously quite expensive, and it is throwing
> away things we want to use later.  Even in the case of autorev, it
> didn't really make sense to me why the cache entry is entirely
> dropped, because you lose other things that could be cached such as
> the package split information.  Is there any reason you would want to
> compute that every single time because from what I can tell it never
> changes.

How do you tell which data changes and which does not? The auto revision
information is included in PV. PV can be included anywhere.

Yes, you can compute the dependency trees and so on but by the time
you've done it, you might as well reparse. The reparse is what triggers
the check back upstream to fetch a new revision (depending on the cache
policy).

> While I don't fully understand the implementation of the cache, it
> seems to me there is possibly a middle ground where the entries that
> don't actually change can be preserved.   The previous patch was a
> crude attempt to insert the keyword "autorev" in the value section of
> the cached version, with the intent to populate it when it was
> accessed later.   I am fairly certain the implementation was not
> correct, but that was why it was intended as an RFC.

The entry *can* change though. The revision changes, PV changes, PR
could change, the stamp files change and so on. Having magic strings we
sed later would just turn bitbake into a basketcase and what we have
today doesn't actually perform that badly.

Unfortunately it doesn't let you use the cache file in a way it was
never intended though. I don't think that is unreasonable, it simply
wasn't designed for that and I'm not sure changing the design makes
sense.

Cheers,

Richard

Patch

diff --git a/bitbake/lib/bb/fetch2/__init__.py b/bitbake/lib/bb/fetch2/__init__.py
index 12ebce2..27b73c0 100644
--- a/bitbake/lib/bb/fetch2/__init__.py
+++ b/bitbake/lib/bb/fetch2/__init__.py
@@ -249,8 +249,8 @@  def fetcher_init(d):
     Calls before this must not hit the cache.
     """
     # When to drop SCM head revisions controlled by user policy
-    srcrev_policy = d.getVar('BB_SRCREV_POLICY', True) or "clear"
-    if srcrev_policy == "cache":
+    srcrev_policy = get_srcrev_policy(d)
+    if srcrev_policy == "cache" or srcrev_policy == "auto":
         logger.debug(1, "Keeping SRCREV cache due to cache policy of: %s", srcrev_policy)
     elif srcrev_policy == "clear":
         logger.debug(1, "Clearing SRCREV cache due to cache policy of: %s", srcrev_policy)
@@ -384,9 +384,13 @@  def subprocess_setup():
     # SIGPIPE errors are known issues with gzip/bash
     signal.signal(signal.SIGPIPE, signal.SIG_DFL)
 
+def get_srcrev_policy(d):
+    return d.getVar('BB_SRCREV_POLICY', True) or "auto"
+
 def get_autorev(d):
     #  only not cache src rev in autorev case
-    if d.getVar('BB_SRCREV_POLICY', True) != "cache":
+    srcrev_policy = get_srcrev_policy(d)
+    if srcrev_policy != "cache" and srcrev_policy != "auto":
         d.setVar('__BB_DONT_CACHE', '1')
     return "AUTOINC"
 
@@ -1048,6 +1052,8 @@  class FetchMethod(object):
 
         revs = bb.persist_data.persist('BB_URI_HEADREVS', d)
         key = self.generate_revision_key(url, ud, d, name)
+        if get_srcrev_policy(d) == "auto":
+            revs[key] = self._latest_revision(url, ud, d, name)
         try:
             return revs[key]
         except KeyError: