[1/2] python3-native: Remove all pyc files

Message ID 20220303163451.336518-1-richard.purdie@linuxfoundation.org
State Accepted, archived
Commit 2d6490fa23cce58922a1b54f87c8369925ff8f90
Headers show
Series [1/2] python3-native: Remove all pyc files | expand

Commit Message

Richard Purdie March 3, 2022, 4:34 p.m. UTC
This removes a further 1600 files from sstate handling and lets python
create the ones it wants at runtime which is likely much better overall
for performance.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
---
 meta/recipes-devtools/python/python3_3.10.2.bb | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Konrad Weihmann March 3, 2022, 5:11 p.m. UTC | #1
On 03.03.22 17:34, Richard Purdie wrote:
> This removes a further 1600 files from sstate handling and lets python
> create the ones it wants at runtime which is likely much better overall
> for performance.
> 
> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
> ---
>   meta/recipes-devtools/python/python3_3.10.2.bb | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/meta/recipes-devtools/python/python3_3.10.2.bb b/meta/recipes-devtools/python/python3_3.10.2.bb
> index b28aa6505a0..7ec443a531f 100644
> --- a/meta/recipes-devtools/python/python3_3.10.2.bb
> +++ b/meta/recipes-devtools/python/python3_3.10.2.bb
> @@ -156,7 +156,12 @@ do_install:append:class-native() {
>           # Remove the opt-1.pyc and opt-2.pyc files. There are over 3,000 of them
>           # and the overhead in each recipe-sysroot-native isn't worth it, particularly
>           # when they're only used for python called with -O or -OO.
> -        find ${D} -name *opt-*.pyc -delete
> +        #find ${D} -name *opt-*.pyc -delete

This here looks like a leftover - guess that could be removed.

Thx for finally making that happen (I've done the same for quite a while 
in an extension in my setup) - I remember there was a bug about pyc 
leading to weird staging issues, I opened up like a year or two.
Guess that one could be closed when this is merged

> +        # Remove all pyc files. There are a ton of them and it is probably faster to let
> +        # python create the ones it wants at runtime rather than manage in the sstate
> +        # tarballs and sysroot creation.
> +        find ${D} -name *.pyc -delete
> +
>   }
>   
>   do_install:append() {
> 
> 
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#162663): https://lists.openembedded.org/g/openembedded-core/message/162663
> Mute This Topic: https://lists.openembedded.org/mt/89528765/3647476
> Group Owner: openembedded-core+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [kweihmann@outlook.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
Konrad Weihmann March 3, 2022, 5:14 p.m. UTC | #2
On 03.03.22 18:11, Konrad Weihmann wrote:
> 
> 
> On 03.03.22 17:34, Richard Purdie wrote:
>> This removes a further 1600 files from sstate handling and lets python
>> create the ones it wants at runtime which is likely much better overall
>> for performance.
>>
>> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
>> ---
>>   meta/recipes-devtools/python/python3_3.10.2.bb | 7 ++++++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/meta/recipes-devtools/python/python3_3.10.2.bb 
>> b/meta/recipes-devtools/python/python3_3.10.2.bb
>> index b28aa6505a0..7ec443a531f 100644
>> --- a/meta/recipes-devtools/python/python3_3.10.2.bb
>> +++ b/meta/recipes-devtools/python/python3_3.10.2.bb
>> @@ -156,7 +156,12 @@ do_install:append:class-native() {
>>           # Remove the opt-1.pyc and opt-2.pyc files. There are over 
>> 3,000 of them
>>           # and the overhead in each recipe-sysroot-native isn't worth 
>> it, particularly
>>           # when they're only used for python called with -O or -OO.
>> -        find ${D} -name *opt-*.pyc -delete
>> +        #find ${D} -name *opt-*.pyc -delete
> 
> This here looks like a leftover - guess that could be removed.
> 
> Thx for finally making that happen (I've done the same for quite a while 
> in an extension in my setup) - I remember there was a bug about pyc 
> leading to weird staging issues, I opened up like a year or two.
> Guess that one could be closed when this is merged

For reference: it's this bug here 
https://bugzilla.yoctoproject.org/show_bug.cgi?id=13868

> 
>> +        # Remove all pyc files. There are a ton of them and it is 
>> probably faster to let
>> +        # python create the ones it wants at runtime rather than 
>> manage in the sstate
>> +        # tarballs and sysroot creation.
>> +        find ${D} -name *.pyc -delete
>> +
>>   }
>>   do_install:append() {
>>
>>
>>
>>
>>
> 
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#162666): https://lists.openembedded.org/g/openembedded-core/message/162666
> Mute This Topic: https://lists.openembedded.org/mt/89528765/3647476
> Group Owner: openembedded-core+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [kweihmann@outlook.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
Ross Burton March 3, 2022, 5:14 p.m. UTC | #3
On Thu, 3 Mar 2022 at 16:34, Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:
> This removes a further 1600 files from sstate handling and lets python
> create the ones it wants at runtime which is likely much better overall
> for performance.

Playing devil's advocate: doesn't having them in sstate mean they're
generated once and hardlinked, instead of needing to be generated for
every recipe which runs pythonnative?

Whilst I can't disagree that 1600 files being dropped from sstate is
good, we're just punting the recompile step to every recipe when it
runs python code.

I guess the question here is how long does the Python library take to recompile.

Ross
Richard Purdie March 3, 2022, 5:16 p.m. UTC | #4
On Thu, 2022-03-03 at 17:14 +0000, Ross Burton wrote:
> On Thu, 3 Mar 2022 at 16:34, Richard Purdie
> <richard.purdie@linuxfoundation.org> wrote:
> > This removes a further 1600 files from sstate handling and lets python
> > create the ones it wants at runtime which is likely much better overall
> > for performance.
> 
> Playing devil's advocate: doesn't having them in sstate mean they're
> generated once and hardlinked, instead of needing to be generated for
> every recipe which runs pythonnative?
> 
> Whilst I can't disagree that 1600 files being dropped from sstate is
> good, we're just punting the recompile step to every recipe when it
> runs python code.
> 
> I guess the question here is how long does the Python library take to recompile.

Another consideration is that there are many sysroots pulling in python3-native
which don't run python and they're only there as there are python scripts being
added which means python has to come too.

I suspect for that reason it could be a net win but it is a tough call.

Cheers,

Richard
Tim Orling March 3, 2022, 11:28 p.m. UTC | #5
On Thu, Mar 3, 2022 at 9:16 AM Richard Purdie <
richard.purdie@linuxfoundation.org> wrote:

> On Thu, 2022-03-03 at 17:14 +0000, Ross Burton wrote:
> > On Thu, 3 Mar 2022 at 16:34, Richard Purdie
> > <richard.purdie@linuxfoundation.org> wrote:
> > > This removes a further 1600 files from sstate handling and lets python
> > > create the ones it wants at runtime which is likely much better overall
> > > for performance.
> >
> > Playing devil's advocate: doesn't having them in sstate mean they're
> > generated once and hardlinked, instead of needing to be generated for
> > every recipe which runs pythonnative?
> >
> > Whilst I can't disagree that 1600 files being dropped from sstate is
> > good, we're just punting the recompile step to every recipe when it
> > runs python code.
> >
> > I guess the question here is how long does the Python library take to
> recompile.
>

At runtime it should only generate pyc for modules actually used?

>
> Another consideration is that there are many sysroots pulling in
> python3-native
> which don't run python and they're only there as there are python scripts
> being
> added which means python has to come too.
>
> I suspect for that reason it could be a net win but it is a tough call.


> I tend to be in favor of the reduced number of files moving around. I
suppose we could do some timing comparisons if we really care. It will be a
trade-off of network/io vs compute/io, correct?


> Cheers,
>
> Richard
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#162669):
> https://lists.openembedded.org/g/openembedded-core/message/162669
> Mute This Topic: https://lists.openembedded.org/mt/89528765/924729
> Group Owner: openembedded-core+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [
> ticotimo@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
>
Richard Purdie March 3, 2022, 11:46 p.m. UTC | #6
On Thu, 2022-03-03 at 15:28 -0800, Tim Orling wrote:
> 
> 
> On Thu, Mar 3, 2022 at 9:16 AM Richard Purdie
> <richard.purdie@linuxfoundation.org> wrote:
> > On Thu, 2022-03-03 at 17:14 +0000, Ross Burton wrote:
> > > On Thu, 3 Mar 2022 at 16:34, Richard Purdie
> > > <richard.purdie@linuxfoundation.org> wrote:
> > > > This removes a further 1600 files from sstate handling and lets python
> > > > create the ones it wants at runtime which is likely much better overall
> > > > for performance.
> > > 
> > > Playing devil's advocate: doesn't having them in sstate mean they're
> > > generated once and hardlinked, instead of needing to be generated for
> > > every recipe which runs pythonnative?
> > > 
> > > Whilst I can't disagree that 1600 files being dropped from sstate is
> > > good, we're just punting the recompile step to every recipe when it
> > > runs python code.
> > > 
> > > I guess the question here is how long does the Python library take to
> > recompile.
> > 
> 
> 
> At runtime it should only generate pyc for modules actually used?

Yes.

> > Another consideration is that there are many sysroots pulling in python3-
> > native
> > which don't run python and they're only there as there are python scripts
> > being
> > added which means python has to come too.
> > 
> > I suspect for that reason it could be a net win but it is a tough call.
> > 
> 
> I tend to be in favor of the reduced number of files moving around. I suppose
> we could do some timing comparisons if we really care. It will be a trade-off
> of network/io vs compute/io, correct?

Yes. I suspect the compile cost is tiny compared to the io/network cost of
chucking so many small files around when I suspect they're not used that often.

Cheers,

Richard
Konrad Weihmann March 7, 2022, 7:36 a.m. UTC | #7
On 04.03.22 00:46, Richard Purdie wrote:
> On Thu, 2022-03-03 at 15:28 -0800, Tim Orling wrote:
>>
>>
>> On Thu, Mar 3, 2022 at 9:16 AM Richard Purdie
>> <richard.purdie@linuxfoundation.org> wrote:
>>> On Thu, 2022-03-03 at 17:14 +0000, Ross Burton wrote:
>>>> On Thu, 3 Mar 2022 at 16:34, Richard Purdie
>>>> <richard.purdie@linuxfoundation.org> wrote:
>>>>> This removes a further 1600 files from sstate handling and lets python
>>>>> create the ones it wants at runtime which is likely much better overall
>>>>> for performance.
>>>>
>>>> Playing devil's advocate: doesn't having them in sstate mean they're
>>>> generated once and hardlinked, instead of needing to be generated for
>>>> every recipe which runs pythonnative?
>>>>
>>>> Whilst I can't disagree that 1600 files being dropped from sstate is
>>>> good, we're just punting the recompile step to every recipe when it
>>>> runs python code.
>>>>
>>>> I guess the question here is how long does the Python library take to
>>> recompile.
>>>
>>
>>
>> At runtime it should only generate pyc for modules actually used?
> 
> Yes.
> 
>>> Another consideration is that there are many sysroots pulling in python3-
>>> native
>>> which don't run python and they're only there as there are python scripts
>>> being
>>> added which means python has to come too.
>>>
>>> I suspect for that reason it could be a net win but it is a tough call.
>>>
>>
>> I tend to be in favor of the reduced number of files moving around. I suppose
>> we could do some timing comparisons if we really care. It will be a trade-off
>> of network/io vs compute/io, correct?
> 
> Yes. I suspect the compile cost is tiny compared to the io/network cost of
> chucking so many small files around when I suspect they're not used that often.
> 
> Cheers,
> 
> Richard

Now as this is in master and works reasonably well, could we extent the 
removal of pyc for native recipes to *all* python packages.
I'm still seeing occasional staging issues on random native python 
recipes as described in 
https://bugzilla.yoctoproject.org/show_bug.cgi?id=13868.
And as the common understanding here seems to be that the additional 
compile costs are negligible compared to the staging costs of additional 
files, I would propose to add

do_install:append:class-native() {
	find ${D} -name *.pyc -delete
}

to setuptools3.bbclass (as this looks like the one piece of code that 
almost all python packages have in common)

Any thoughts?

> 
> 
> 
> 
> 
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#162706): https://lists.openembedded.org/g/openembedded-core/message/162706
> Mute This Topic: https://lists.openembedded.org/mt/89528765/3647476
> Group Owner: openembedded-core+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [kweihmann@outlook.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
Richard Purdie March 7, 2022, 9:45 a.m. UTC | #8
On Mon, 2022-03-07 at 08:36 +0100, Konrad Weihmann wrote:
> 
> On 04.03.22 00:46, Richard Purdie wrote:
> > On Thu, 2022-03-03 at 15:28 -0800, Tim Orling wrote:
> > > 
> > > 
> > > On Thu, Mar 3, 2022 at 9:16 AM Richard Purdie
> > > <richard.purdie@linuxfoundation.org> wrote:
> > > > On Thu, 2022-03-03 at 17:14 +0000, Ross Burton wrote:
> > > > > On Thu, 3 Mar 2022 at 16:34, Richard Purdie
> > > > > <richard.purdie@linuxfoundation.org> wrote:
> > > > > > This removes a further 1600 files from sstate handling and lets python
> > > > > > create the ones it wants at runtime which is likely much better overall
> > > > > > for performance.
> > > > > 
> > > > > Playing devil's advocate: doesn't having them in sstate mean they're
> > > > > generated once and hardlinked, instead of needing to be generated for
> > > > > every recipe which runs pythonnative?
> > > > > 
> > > > > Whilst I can't disagree that 1600 files being dropped from sstate is
> > > > > good, we're just punting the recompile step to every recipe when it
> > > > > runs python code.
> > > > > 
> > > > > I guess the question here is how long does the Python library take to
> > > > recompile.
> > > > 
> > > 
> > > 
> > > At runtime it should only generate pyc for modules actually used?
> > 
> > Yes.
> > 
> > > > Another consideration is that there are many sysroots pulling in python3-
> > > > native
> > > > which don't run python and they're only there as there are python scripts
> > > > being
> > > > added which means python has to come too.
> > > > 
> > > > I suspect for that reason it could be a net win but it is a tough call.
> > > > 
> > > 
> > > I tend to be in favor of the reduced number of files moving around. I suppose
> > > we could do some timing comparisons if we really care. It will be a trade-off
> > > of network/io vs compute/io, correct?
> > 
> > Yes. I suspect the compile cost is tiny compared to the io/network cost of
> > chucking so many small files around when I suspect they're not used that often.
> > 
> > Cheers,
> > 
> > Richard
> 
> Now as this is in master and works reasonably well, could we extent the 
> removal of pyc for native recipes to *all* python packages.
> I'm still seeing occasional staging issues on random native python 
> recipes as described in 
> https://bugzilla.yoctoproject.org/show_bug.cgi?id=13868.
> And as the common understanding here seems to be that the additional 
> compile costs are negligible compared to the staging costs of additional 
> files, I would propose to add
> 
> do_install:append:class-native() {
> 	find ${D} -name *.pyc -delete
> }
> 
> to setuptools3.bbclass (as this looks like the one piece of code that 
> almost all python packages have in common)
> 
> Any thoughts?

I don't really like working around errors like the above bug where we haven't
understood the underlying issue :/

I suspect those pyc files are nowhere near as large an issue as the ones in the
python recipe itself either, there are likely only small numbers of them.

Cheers,

Richard
Konrad Weihmann March 7, 2022, 1:25 p.m. UTC | #9
On 07.03.22 10:45, Richard Purdie wrote:
> On Mon, 2022-03-07 at 08:36 +0100, Konrad Weihmann wrote:
>>
>> On 04.03.22 00:46, Richard Purdie wrote:
>>> On Thu, 2022-03-03 at 15:28 -0800, Tim Orling wrote:
>>>>
>>>>
>>>> On Thu, Mar 3, 2022 at 9:16 AM Richard Purdie
>>>> <richard.purdie@linuxfoundation.org> wrote:
>>>>> On Thu, 2022-03-03 at 17:14 +0000, Ross Burton wrote:
>>>>>> On Thu, 3 Mar 2022 at 16:34, Richard Purdie
>>>>>> <richard.purdie@linuxfoundation.org> wrote:
>>>>>>> This removes a further 1600 files from sstate handling and lets python
>>>>>>> create the ones it wants at runtime which is likely much better overall
>>>>>>> for performance.
>>>>>>
>>>>>> Playing devil's advocate: doesn't having them in sstate mean they're
>>>>>> generated once and hardlinked, instead of needing to be generated for
>>>>>> every recipe which runs pythonnative?
>>>>>>
>>>>>> Whilst I can't disagree that 1600 files being dropped from sstate is
>>>>>> good, we're just punting the recompile step to every recipe when it
>>>>>> runs python code.
>>>>>>
>>>>>> I guess the question here is how long does the Python library take to
>>>>> recompile.
>>>>>
>>>>
>>>>
>>>> At runtime it should only generate pyc for modules actually used?
>>>
>>> Yes.
>>>
>>>>> Another consideration is that there are many sysroots pulling in python3-
>>>>> native
>>>>> which don't run python and they're only there as there are python scripts
>>>>> being
>>>>> added which means python has to come too.
>>>>>
>>>>> I suspect for that reason it could be a net win but it is a tough call.
>>>>>
>>>>
>>>> I tend to be in favor of the reduced number of files moving around. I suppose
>>>> we could do some timing comparisons if we really care. It will be a trade-off
>>>> of network/io vs compute/io, correct?
>>>
>>> Yes. I suspect the compile cost is tiny compared to the io/network cost of
>>> chucking so many small files around when I suspect they're not used that often.
>>>
>>> Cheers,
>>>
>>> Richard
>>
>> Now as this is in master and works reasonably well, could we extent the
>> removal of pyc for native recipes to *all* python packages.
>> I'm still seeing occasional staging issues on random native python
>> recipes as described in
>> https://bugzilla.yoctoproject.org/show_bug.cgi?id=13868.
>> And as the common understanding here seems to be that the additional
>> compile costs are negligible compared to the staging costs of additional
>> files, I would propose to add
>>
>> do_install:append:class-native() {
>> 	find ${D} -name *.pyc -delete
>> }
>>
>> to setuptools3.bbclass (as this looks like the one piece of code that
>> almost all python packages have in common)
>>
>> Any thoughts?
> 
> I don't really like working around errors like the above bug where we haven't
> understood the underlying issue :/
> 
> I suspect those pyc files are nowhere near as large an issue as the ones in the
> python recipe itself either, there are likely only small numbers of them.

I can come up with a setup that has ~21k pyc files of different native 
python site packages - and due to the nature of python's import system I 
guess only a very very small chunk of these will be actually needed to 
be recompiled.

I know that bug is hard to reproduce - I can only manage that (but there 
with a quite good percentage) in a GitHub actions pipeline (2vCPUs, 7GB 
RAM, ~12GB HDD) - and I had it once that could reproduce it locally. And 
the pyc files, even though they should have been there, disappeared 
while hardlinking the files from the components dir to the workspace.

So like with the other issue I reported last week, my question is what 
information do you need to look at these things? - a browseable 
workspace is likely out of scope, but certain files and system infos 
could be gathered I guess.

> 
> Cheers,
> 
> Richard
>

Patch

diff --git a/meta/recipes-devtools/python/python3_3.10.2.bb b/meta/recipes-devtools/python/python3_3.10.2.bb
index b28aa6505a0..7ec443a531f 100644
--- a/meta/recipes-devtools/python/python3_3.10.2.bb
+++ b/meta/recipes-devtools/python/python3_3.10.2.bb
@@ -156,7 +156,12 @@  do_install:append:class-native() {
         # Remove the opt-1.pyc and opt-2.pyc files. There are over 3,000 of them
         # and the overhead in each recipe-sysroot-native isn't worth it, particularly
         # when they're only used for python called with -O or -OO.
-        find ${D} -name *opt-*.pyc -delete
+        #find ${D} -name *opt-*.pyc -delete
+        # Remove all pyc files. There are a ton of them and it is probably faster to let
+        # python create the ones it wants at runtime rather than manage in the sstate 
+        # tarballs and sysroot creation.
+        find ${D} -name *.pyc -delete
+
 }
 
 do_install:append() {