diff mbox series

[meta-ti,master/kirkstone] conf: machine: k3: Use Cortex-A53/A72 CPU tune

Message ID 20240215212613.57012-1-afd@ti.com
State Changes Requested
Delegated to: Ryan Eatmon
Headers show
Series [meta-ti,master/kirkstone] conf: machine: k3: Use Cortex-A53/A72 CPU tune | expand

Commit Message

Andrew Davis Feb. 15, 2024, 9:26 p.m. UTC
All current K3 devices use either A53 or A72. Use the compile tune
configuration specific for these to allow the compiler to make
better optimizations.

Signed-off-by: Andrew Davis <afd@ti.com>
---
 meta-ti-bsp/conf/machine/include/k3.inc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Aniket Limaye Feb. 16, 2024, 7:07 a.m. UTC | #1
Hi Andrew,

I was testing this patch locally, and wanted to see if we get some perf 
improvements with some benchmarks available on the default image. What 
benchmark do you recommend testing this against?

I ran '/runLinpack/' on the tisdk-default-image on j7200 and did not see 
any difference in the reported performance with and without this 
patch... is this expected?

With the default image built on latest SDK, WITHOUT the patch:
Unrolled Single  Precision 1845878 Kflops ; 10 Reps

With default image built WITH the patch:
Unrolled Single  Precision 1857362 Kflops ; 10 Reps

Thanks,
Aniket

On 2/16/2024 2:56 AM, Andrew Davis via lists.yoctoproject.org wrote:
> All current K3 devices use either A53 or A72. Use the compile tune 
> configuration specific for these to allow the compiler to make better 
> optimizations. Signed-off-by: Andrew Davis <afd@ ti. com> --- 
> meta-ti-bsp/conf/machine/include/k3. inc
> ZjQcmQRYFpfptBannerStart
> This message was sent from outside of Texas Instruments.
> Do not click links or open attachments unless you recognize the source 
> of this email and know the content is safe.
> ZjQcmQRYFpfptBannerEnd
> All current K3 devices use either A53 or A72. Use the compile tune
> configuration specific for these to allow the compiler to make
> better optimizations.
>
> Signed-off-by: Andrew Davis<afd@ti.com>
> ---
>   meta-ti-bsp/conf/machine/include/k3.inc | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/meta-ti-bsp/conf/machine/include/k3.inc b/meta-ti-bsp/conf/machine/include/k3.inc
> index 2415f0ba..7c3579af 100644
> --- a/meta-ti-bsp/conf/machine/include/k3.inc
> +++ b/meta-ti-bsp/conf/machine/include/k3.inc
> @@ -3,7 +3,7 @@
>   require conf/machine/include/ti-soc.inc
>   SOC_FAMILY:append = ":k3"
>   
> -require conf/machine/include/arm/arch-arm64.inc
> +require conf/machine/include/arm/armv8a/tune-cortexa72-cortexa53.inc
>   
>   BBMULTICONFIG += "k3r5"
>   
> -- 
> 2.39.2
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#17482):https://urldefense.com/v3/__https://lists.yoctoproject.org/g/meta-ti/message/17482__;!!G3vK!T21e4UwMAwqmb2WXp3iTH2w5zs9CtoI5wX4pmvJQk3-F9H6FDcANpOSFX7ctu-yvXLL-_io6FQngJ72BANL844OMEXw$  
> Mute This Topic:https://urldefense.com/v3/__https://lists.yoctoproject.org/mt/104381861/6607860__;!!G3vK!T21e4UwMAwqmb2WXp3iTH2w5zs9CtoI5wX4pmvJQk3-F9H6FDcANpOSFX7ctu-yvXLL-_io6FQngJ72BANL8UKq3aCI$  
> Group Owner:meta-ti+owner@lists.yoctoproject.org
> Unsubscribe:https://urldefense.com/v3/__https://lists.yoctoproject.org/g/meta-ti/unsub__;!!G3vK!T21e4UwMAwqmb2WXp3iTH2w5zs9CtoI5wX4pmvJQk3-F9H6FDcANpOSFX7ctu-yvXLL-_io6FQngJ72BANL8aX0fg_c$   [a-limaye@ti.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
Denys Dmytriyenko Feb. 16, 2024, 8:23 p.m. UTC | #2
Unfortunately, NAK.

This is considered an antisocial behavior for a BSP in the Yocto Project 
world. And the performance benefit is questionable with 1%-2%, if at all.

The proper place for any extra optimization tunes is in a distro config. Maybe 
even by end customer's final product, not a reference distro.

Consider a distro that supports multiple HW platforms and uses multiple BSPs 
besides meta-ti - YoE, AGL, etc. You do want a common denominator tunes in 
order to get the most binary re-use across the platforms.

For example, AGL goes to some extreme lengths to override such custom tunes 
set by misbehaving BSPs and it's quite ugly.

And moreover, we've gone through this motion in the past many years ago when 
we had our ARMv7 platforms set to their corresponding cortex-a8/a9/a15 tunes 
by default, but eventually ended up setting a common ARMv7 tune:

DEFAULTTUNE ?= "armv7athf-neon"

So, you should either leave the current arch-arm64.inc inclusion as is, or if 
you insist on including tune-cortexa72-cortexa53.inc, set the default tune 
back to plain aarch64:

DEFAULTTUNE ?= "aarch64"


On Thu, Feb 15, 2024 at 03:26:13PM -0600, Andrew Davis via lists.yoctoproject.org wrote:
> All current K3 devices use either A53 or A72. Use the compile tune
> configuration specific for these to allow the compiler to make
> better optimizations.
> 
> Signed-off-by: Andrew Davis <afd@ti.com>
> ---
>  meta-ti-bsp/conf/machine/include/k3.inc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/meta-ti-bsp/conf/machine/include/k3.inc b/meta-ti-bsp/conf/machine/include/k3.inc
> index 2415f0ba..7c3579af 100644
> --- a/meta-ti-bsp/conf/machine/include/k3.inc
> +++ b/meta-ti-bsp/conf/machine/include/k3.inc
> @@ -3,7 +3,7 @@
>  require conf/machine/include/ti-soc.inc
>  SOC_FAMILY:append = ":k3"
>  
> -require conf/machine/include/arm/arch-arm64.inc
> +require conf/machine/include/arm/armv8a/tune-cortexa72-cortexa53.inc
>  
>  BBMULTICONFIG += "k3r5"
>  
> -- 
> 2.39.2
Andrew Davis Feb. 20, 2024, 2:31 p.m. UTC | #3
On 2/16/24 2:23 PM, Denys Dmytriyenko wrote:
> Unfortunately, NAK.
> 
> This is considered an antisocial behavior for a BSP in the Yocto Project
> world. And the performance benefit is questionable with 1%-2%, if at all.
> 

This stated when a potential customer noticed building and running some
benchmarks (linpack for instance) on our SDK were being out-performed by
some other vendors. Even though on paper our platforms should have been
the better performing ones.

After investigating it turns out these other vendors have these tune
options in  their BSP layers, causing the performance discrepancy.

So the performance here, even of a couple percent, is very important.

> The proper place for any extra optimization tunes is in a distro config. Maybe
> even by end customer's final product, not a reference distro.
> 
> Consider a distro that supports multiple HW platforms and uses multiple BSPs
> besides meta-ti - YoE, AGL, etc. You do want a common denominator tunes in
> order to get the most binary re-use across the platforms.
> 

If one wants binary re-use they can override the tune. Otherwise maybe
they should be using Debian or some other binary distro. The main selling
point for Yocto IMHO is customizing like this. The best part of rebuilding
everything from scratch every time for every machine is we can have these
machine specific tunings.

> For example, AGL goes to some extreme lengths to override such custom tunes
> set by misbehaving BSPs and it's quite ugly.
>

Then we should work to make it easier to override for those folks, not simply
leave this performance on the table.

> And moreover, we've gone through this motion in the past many years ago when
> we had our ARMv7 platforms set to their corresponding cortex-a8/a9/a15 tunes
> by default, but eventually ended up setting a common ARMv7 tune:
> 
> DEFAULTTUNE ?= "armv7athf-neon"
> 
> So, you should either leave the current arch-arm64.inc inclusion as is, or if
> you insist on including tune-cortexa72-cortexa53.inc, set the default tune
> back to plain aarch64:
> 
> DEFAULTTUNE ?= "aarch64"
> 

I see our friends over in meta-xilinx are doing machine specific DEFAULTTUNEs.
I was thinking of matching that to keep our BSP performance competitive. But
as a compromise and to avoid "antisocial behavior" as you say, I think I can
live with DEFAULTTUNE ?= "aarch64".

Will resend with that.

Andrew

> 
> On Thu, Feb 15, 2024 at 03:26:13PM -0600, Andrew Davis via lists.yoctoproject.org wrote:
>> All current K3 devices use either A53 or A72. Use the compile tune
>> configuration specific for these to allow the compiler to make
>> better optimizations.
>>
>> Signed-off-by: Andrew Davis <afd@ti.com>
>> ---
>>   meta-ti-bsp/conf/machine/include/k3.inc | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/meta-ti-bsp/conf/machine/include/k3.inc b/meta-ti-bsp/conf/machine/include/k3.inc
>> index 2415f0ba..7c3579af 100644
>> --- a/meta-ti-bsp/conf/machine/include/k3.inc
>> +++ b/meta-ti-bsp/conf/machine/include/k3.inc
>> @@ -3,7 +3,7 @@
>>   require conf/machine/include/ti-soc.inc
>>   SOC_FAMILY:append = ":k3"
>>   
>> -require conf/machine/include/arm/arch-arm64.inc
>> +require conf/machine/include/arm/armv8a/tune-cortexa72-cortexa53.inc
>>   
>>   BBMULTICONFIG += "k3r5"
>>   
>> -- 
>> 2.39.2
Ryan Eatmon Feb. 20, 2024, 3 p.m. UTC | #4
On 2/20/2024 8:31 AM, Andrew Davis wrote:
> On 2/16/24 2:23 PM, Denys Dmytriyenko wrote:
>> Unfortunately, NAK.
>>
>> This is considered an antisocial behavior for a BSP in the Yocto Project
>> world. And the performance benefit is questionable with 1%-2%, if at all.
>>
> 
> This stated when a potential customer noticed building and running some
> benchmarks (linpack for instance) on our SDK were being out-performed by
> some other vendors. Even though on paper our platforms should have been
> the better performing ones.
> 
> After investigating it turns out these other vendors have these tune
> options in  their BSP layers, causing the performance discrepancy.
> 
> So the performance here, even of a couple percent, is very important.
> 
>> The proper place for any extra optimization tunes is in a distro 
>> config. Maybe
>> even by end customer's final product, not a reference distro.
>>
>> Consider a distro that supports multiple HW platforms and uses 
>> multiple BSPs
>> besides meta-ti - YoE, AGL, etc. You do want a common denominator 
>> tunes in
>> order to get the most binary re-use across the platforms.
>>
> 
> If one wants binary re-use they can override the tune. Otherwise maybe
> they should be using Debian or some other binary distro. The main selling
> point for Yocto IMHO is customizing like this. The best part of rebuilding
> everything from scratch every time for every machine is we can have these
> machine specific tunings.
> 
>> For example, AGL goes to some extreme lengths to override such custom 
>> tunes
>> set by misbehaving BSPs and it's quite ugly.
>>
> 
> Then we should work to make it easier to override for those folks, not 
> simply
> leave this performance on the table.
> 
>> And moreover, we've gone through this motion in the past many years 
>> ago when
>> we had our ARMv7 platforms set to their corresponding cortex-a8/a9/a15 
>> tunes
>> by default, but eventually ended up setting a common ARMv7 tune:
>>
>> DEFAULTTUNE ?= "armv7athf-neon"
>>
>> So, you should either leave the current arch-arm64.inc inclusion as 
>> is, or if
>> you insist on including tune-cortexa72-cortexa53.inc, set the default 
>> tune
>> back to plain aarch64:
>>
>> DEFAULTTUNE ?= "aarch64"
>>
> 
> I see our friends over in meta-xilinx are doing machine specific 
> DEFAULTTUNEs.
> I was thinking of matching that to keep our BSP performance competitive. 
> But
> as a compromise and to avoid "antisocial behavior" as you say, I think I 
> can
> live with DEFAULTTUNE ?= "aarch64".
> 
> Will resend with that.

So, if we include the more targeted tuning file, but set DEFAULTUNE to 
the generic, then how do our builds use the more targeted tuning?  Is 
that something we have to set in the local.conf as part of our builds? 
Or is this some sort of magic that occurs that gets the correct thing?


> Andrew
> 
>>
>> On Thu, Feb 15, 2024 at 03:26:13PM -0600, Andrew Davis via 
>> lists.yoctoproject.org wrote:
>>> All current K3 devices use either A53 or A72. Use the compile tune
>>> configuration specific for these to allow the compiler to make
>>> better optimizations.
>>>
>>> Signed-off-by: Andrew Davis <afd@ti.com>
>>> ---
>>>   meta-ti-bsp/conf/machine/include/k3.inc | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/meta-ti-bsp/conf/machine/include/k3.inc 
>>> b/meta-ti-bsp/conf/machine/include/k3.inc
>>> index 2415f0ba..7c3579af 100644
>>> --- a/meta-ti-bsp/conf/machine/include/k3.inc
>>> +++ b/meta-ti-bsp/conf/machine/include/k3.inc
>>> @@ -3,7 +3,7 @@
>>>   require conf/machine/include/ti-soc.inc
>>>   SOC_FAMILY:append = ":k3"
>>> -require conf/machine/include/arm/arch-arm64.inc
>>> +require conf/machine/include/arm/armv8a/tune-cortexa72-cortexa53.inc
>>>   BBMULTICONFIG += "k3r5"
>>> -- 
>>> 2.39.2
diff mbox series

Patch

diff --git a/meta-ti-bsp/conf/machine/include/k3.inc b/meta-ti-bsp/conf/machine/include/k3.inc
index 2415f0ba..7c3579af 100644
--- a/meta-ti-bsp/conf/machine/include/k3.inc
+++ b/meta-ti-bsp/conf/machine/include/k3.inc
@@ -3,7 +3,7 @@ 
 require conf/machine/include/ti-soc.inc
 SOC_FAMILY:append = ":k3"
 
-require conf/machine/include/arm/arch-arm64.inc
+require conf/machine/include/arm/armv8a/tune-cortexa72-cortexa53.inc
 
 BBMULTICONFIG += "k3r5"