Patchwork [bitbake-devel,0/2] Variable tracking: Cleaned up and rebased.

login
register
mail settings
Submitter Richard Purdie
Date Jan. 17, 2013, 11:29 a.m.
Message ID <1358422179.24249.24.camel@ted>
Download mbox | patch
Permalink /patch/42783/
State New
Headers show

Comments

Richard Purdie - Jan. 17, 2013, 11:29 a.m.
I figured out the issue, I shouldn't have been touching current at all.
Rather than deepcopy, its also better to use our own copy operation on
the children:

So with this applied, we get -e taking around 10s and parsing 3m26 so
about a 10% increase and the output of -e is the same before and after
that patch. I'm still not happy but that is obviously a better
proposition.

The real issue is these highly recursive and duplicate data structures
are inefficient so I think there is still much room for improvement.
Separating the variable and include history will likely help a lot.

The main point of the above is to prove how much the deepcopy hurts us
and also that there are ways we can fix it.

Cheers,

Richard
Richard Purdie - Jan. 18, 2013, 11:58 a.m.
To round this discussion out, I played with the code quite a bit and
whilst I love the idea of the code, I found a few bugs and there were
some structural issues in the way the code worked I didn't like. It was
extremely easy to unintentionally break the history tracking.

This was due to many operations doing "fall though" with the set
operation doing the final record, maybe with modified parameters.

Whatever we do, we need it to be robust enough to withstand other
improvements in the codebase so this would have been enough to block it.

I decided to see if I could improve things so I've taken the patches and
made quite some number of changes, I think the output with the revised
version is either equal to or in some cases improved to the original.
I've hopefully addresses the "robustness" issues with those changes as
it feels less fragile to me and I removed some sources of confusion like
the "value" verses "detail" business, standardising on "detail" in the
end for a practicality reasons.

I still don't think the result is perfect however I do propose to merge
it in the revised form I've just posted. I think any further
optimisations or improvements can be done in tree.

Performance wise, with the code disabled, we're now down to 3m5 for
overall parsing compared with 3m3 before which is in the region of
statistical noise. The -e output is in around 10s which I think is
reasonable for the functionality.

I appreciate its taken a while to get here with this. I'm hopeful the
time waiting for review will be worth the revised patches.

Cheers,

Richard

Patch

--- a/bitbake/lib/bb/data_smart.py
+++ b/bitbake/lib/bb/data_smart.py
@@ -174,7 +174,14 @@  class IncludeHistory(object):
         self.filename = filename or '[TOP LEVEL]'
         self.children = []
         self.current = self
-        self.variables = {}
+        self.variables = COWDictBase.copy()
+
+    def copy(self):
+        new = IncludeHistory(self.parent, self.filename, self.datasmart)
+        for c in self.children:
+            new.children.append(c.copy())
+        new.variables = self.variables.copy()
+        return new
 
     def include(self, filename):
         newfile = IncludeHistory(self.current, filename)
@@ -615,7 +622,7 @@  class DataSmart(MutableMapping):
         # we really want this to be a DataSmart...
         data = DataSmart(seen=self._seen_overrides.copy(), special=self._special_values.copy())
         data.dict["_data"] = self.dict
-        data.history = copy.deepcopy(self.history)
+        data.history = self.history.copy()
         data.history.datasmart = data
         data._tracking = self._tracking