[PATCHv2] openssl: switch ARM builds from linux-elf-arm to linux-armv4 config

Submitted by Koen Kooi on Oct. 16, 2013, 7:25 a.m.

Details

Message ID 1381908355-18124-1-git-send-email-koen@dominion.thruhere.net
State New
Headers show

Commit Message

Koen Kooi Oct. 16, 2013, 7:25 a.m.
From: Koen Kooi <koen.kooi@linaro.org>

This enables aes and sha1 assembly at buildtime. Openssl does a
runtime check to see which portion gets enabled.

'./Configure TABLE' gives the following:

*** linux-elf-arm
$cc           =
$cflags       = -DL_ENDIAN      -DTERMIO  -O2 -pipe -g -feliminate-unused-debug-types -Wall -Wa,--noexecstack -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS
$unistd       =
$thread_cflag = -D_REENTRANT
$sys_id       =
$lflags       = -ldl
$bn_ops       = BN_LLONG DES_RISC1
$cpuid_obj    =
$bn_obj       =
$des_obj      =
$aes_obj      =
$bf_obj       =
$md5_obj      =
$sha1_obj     =
$cast_obj     =
$rc4_obj      =
$rmd160_obj   =
$rc5_obj      =
$wp_obj       =
$cmll_obj     =
$modes_obj    =
$engines_obj  =
$perlasm_scheme = void
$dso_scheme   = dlfcn
$shared_target= linux-shared
$shared_cflag = -fPIC
$shared_ldflag =
$shared_extension = .so.$(SHLIB_MAJOR).$(SHLIB_MINOR)
$ranlib       =
$arflags      =
$multilib     =

*** linux-armv4
$cc           = gcc
$cflags       = -DTERMIO -O3 -Wall
$unistd       =
$thread_cflag = -D_REENTRANT
$sys_id       =
$lflags       = -ldl
$bn_ops       = BN_LLONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR
$cpuid_obj    = armcap.o armv4cpuid.o
$bn_obj       = bn_asm.o armv4-mont.o armv4-gf2m.o
$des_obj      =
$aes_obj      = aes_cbc.o aes-armv4.o bsaes-armv7.o
$bf_obj       =
$md5_obj      =
$sha1_obj     = sha1-armv4-large.o sha256-armv4.o sha512-armv4.o
$cast_obj     =
$rc4_obj      =
$rmd160_obj   =
$rc5_obj      =
$wp_obj       =
$cmll_obj     =
$modes_obj    = ghash-armv4.o
$engines_obj  =
$perlasm_scheme = void
$dso_scheme   = dlfcn
$shared_target= linux-shared
$shared_cflag = -fPIC
$shared_ldflag =
$shared_extension = .so.$(SHLIB_MAJOR).$(SHLIB_MINOR)
$ranlib       =
$arflags      =
$multilib     =

Build tested on armv7a/angstrom and armv8/distroless, runtime tested on armv7a/angstrom.

'openssl speed' results:

Algo    blocksize       ops/s after
                ops/s before    difference
-------------------------------------------
MD5	16	308,766	264,664	-14.28%
	64	277,090	263,340	-4.96%
	256	212,652	197,043	-7.34%
	1024	103,604	100,157	-3.33%
	8192	17,936	17,796	-0.78%
sha1	16	290,011	385,098	32.79%
	64	234,939	302,788	28.88%
	256	144,831	177,028	22.23%
	1024	57,043	67,374	18.11%
	8192	8,586	9,932	15.68%
sha256	16	290,443	605,747	108.56%
	64	178,010	370,598	108.19%
	256	82,107	168,770	105.55%
	1024	26,064	53,068	103.61%
	8192	3,550	7,211	103.10%
sha512	16	59,618	259,354	335.03%
	64	59,616	258,265	333.22%
	256	21,727	98,057	351.31%
	1024	7,449	34,304	360.49%
	8192	1,047	4,842	362.63%
des cbc	16	964,682	1,124,459	16.56%
	64	260,188	298,910	14.88%
	256	65,945	76,273	15.66%
	1024	16,570	19,110	15.33%
	8192	2,082	2,398	15.17%
des ede3	16	370,442	429,906	16.05%
	64	95,429	110,147	15.42%
	256	23,928	27,808	16.21%
	1024	5,993	6,960	16.13%
	8192	752	868	15.36%
aes128	16	1,712,050	2,301,100	34.41%
	64	466,491	651,155	39.59%
	256	120,181	168,953	40.58%
	1024	30,177	42,792	41.80%
	8192	3,791	5,361	41.41%
aes192	16	1,472,560	1,964,900	33.43%
	64	400,087	544,971	36.21%
	256	103,245	141,062	36.63%
	1024	25,902	35,389	36.63%
	8192	3,256	4,451	36.67%
eas256	16	1,330,524	1,772,143	33.19%
	64	355,025	486,221	36.95%
	256	90,663	125,281	38.18%
	1024	22,725	31,484	38.54%
	8192	2,837	3,952	39.31%
rsa	2048bit	15	25	69.94%
	public	547	832	52.00%
dsa	2048bit	55	86	54.26%
	verify	47	73	53.33%

Signed-off-by: Koen Kooi <koen.kooi@linaro.org>
---
 meta/recipes-connectivity/openssl/openssl.inc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/meta/recipes-connectivity/openssl/openssl.inc b/meta/recipes-connectivity/openssl/openssl.inc
index f5b2432..f0fa180 100644
--- a/meta/recipes-connectivity/openssl/openssl.inc
+++ b/meta/recipes-connectivity/openssl/openssl.inc
@@ -61,7 +61,7 @@  do_configure () {
 	target="$os-${HOST_ARCH}"
 	case $target in
 	linux-arm)
-		target=linux-elf-arm
+		target=linux-armv4
 		;;
 	linux-armeb)
 		target=linux-elf-armeb

Comments

Phil Blundell Oct. 16, 2013, 9:20 a.m.
On Wed, 2013-10-16 at 09:25 +0200, Koen Kooi wrote:
> From: Koen Kooi <koen.kooi@linaro.org>
> 
> This enables aes and sha1 assembly at buildtime. Openssl does a
> runtime check to see which portion gets enabled.

[...]

> Algo    blocksize       ops/s after
>                 ops/s before    difference
> -------------------------------------------
> MD5	16	308,766	264,664	-14.28%
> 	64	277,090	263,340	-4.96%
> 	256	212,652	197,043	-7.34%
> 	1024	103,604	100,157	-3.33%
> 	8192	17,936	17,796	-0.78%

Do you know why it's causing MD5 to get slower?  I guess md5 with
blocksize=16 is not a very common case, but still.

Also, it seems generally a bit unwholesome for openssl to be picking its
own CFLAGS at all.  Would it be better to just make it use the same
CFLAGS as everything else?

p.
Koen Kooi Oct. 16, 2013, 9:45 a.m.
Op 16 okt. 2013, om 11:20 heeft Phil Blundell <pb@pbcl.net> het volgende geschreven:

> On Wed, 2013-10-16 at 09:25 +0200, Koen Kooi wrote:
>> From: Koen Kooi <koen.kooi@linaro.org>
>> 
>> This enables aes and sha1 assembly at buildtime. Openssl does a
>> runtime check to see which portion gets enabled.
> 
> [...]
> 
>> Algo    blocksize       ops/s after
>>                ops/s before    difference
>> -------------------------------------------
>> MD5	16	308,766	264,664	-14.28%
>> 	64	277,090	263,340	-4.96%
>> 	256	212,652	197,043	-7.34%
>> 	1024	103,604	100,157	-3.33%
>> 	8192	17,936	17,796	-0.78%
> 
> Do you know why it's causing MD5 to get slower?  I guess md5 with
> blocksize=16 is not a very common case, but still.

I really don't know and it is the only algo that gets a lot slower. This patch is a preparation for (more) NEON optimizations which seem to fix the regression, see

	https://docs.google.com/spreadsheet/ccc?key=0AhgZ33Tf6eBldHVONjRXRnItWld4eFlRWTJ3RzVIdGc&usp=sharing

and

	http://dominion.thruhere.net/koen/angstrom/0002-openssl-1.0.1e-add-ARMv7-AES-optimizations.patch

I need to test that patch on A9 and A15 cores as well since it doesn't seem to do a lot on A8 cores :(

> Also, it seems generally a bit unwholesome for openssl to be picking its
> own CFLAGS at all.  Would it be better to just make it use the same
> CFLAGS as everything else?

openssl.inc already pokes at CLAG(S):

CFLAG = "${@base_conditional('SITEINFO_ENDIANNESS', 'le', '-DL_ENDIAN', '-DB_ENDIAN', d)} \
        -DTERMIO ${CFLAGS} -Wall -Wa,--noexecstack"

The complete command looks like this:

arm-angstrom-linux-gnueabi-gcc  -march=armv7-a -mthumb-interwork -mfloat-abi=hard -mfpu=neon -mtune=cortex-a8 --sysroot=/build/v2013.06/build/tmp-angstrom_v2013_06-eglibc/sysroots/beaglebone -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN  -DTERMIO  -O2 -pipe -g -feliminate-unused-debug-types -Wall -Wa,--noexecstack -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -c   -c -o armv4cpuid.o armv4cpuid.S

So the '$cflags       = -DTERMIO -O3 -Wall' in linux-armv4 gets overridden by OE, just like we want :)

regards,

Koen
Koen Kooi Oct. 17, 2013, 5:54 a.m.
Op 16 okt. 2013, om 11:45 heeft Koen Kooi <koen@dominion.thruhere.net> het volgende geschreven:

> 
> Op 16 okt. 2013, om 11:20 heeft Phil Blundell <pb@pbcl.net> het volgende geschreven:
> 
>> On Wed, 2013-10-16 at 09:25 +0200, Koen Kooi wrote:
>>> From: Koen Kooi <koen.kooi@linaro.org>
>>> 
>>> This enables aes and sha1 assembly at buildtime. Openssl does a
>>> runtime check to see which portion gets enabled.
>> 
>> [...]
>> 
>>> Algo    blocksize       ops/s after
>>>               ops/s before    difference
>>> -------------------------------------------
>>> MD5	16	308,766	264,664	-14.28%
>>> 	64	277,090	263,340	-4.96%
>>> 	256	212,652	197,043	-7.34%
>>> 	1024	103,604	100,157	-3.33%
>>> 	8192	17,936	17,796	-0.78%
>> 
>> Do you know why it's causing MD5 to get slower?  I guess md5 with
>> blocksize=16 is not a very common case, but still.
> 
> I really don't know and it is the only algo that gets a lot slower. This patch is a preparation for (more) NEON optimizations which seem to fix the regression, see
> 
> 	https://docs.google.com/spreadsheet/ccc?key=0AhgZ33Tf6eBldHVONjRXRnItWld4eFlRWTJ3RzVIdGc&usp=sharing
> 
> and
> 
> 	http://dominion.thruhere.net/koen/angstrom/0002-openssl-1.0.1e-add-ARMv7-AES-optimizations.patch
> 
> I need to test that patch on A9 and A15 cores as well since it doesn't seem to do a lot on A8 cores :(

And on A9 cores linux-armv4 is 20% *faster* on MD5 with blocksize=16, see spreadsheet above. So it probably is a scheduling issue that favours A9 cores.

regards,

Koen.


> 
>> Also, it seems generally a bit unwholesome for openssl to be picking its
>> own CFLAGS at all.  Would it be better to just make it use the same
>> CFLAGS as everything else?
> 
> openssl.inc already pokes at CLAG(S):
> 
> CFLAG = "${@base_conditional('SITEINFO_ENDIANNESS', 'le', '-DL_ENDIAN', '-DB_ENDIAN', d)} \
>        -DTERMIO ${CFLAGS} -Wall -Wa,--noexecstack"
> 
> The complete command looks like this:
> 
> arm-angstrom-linux-gnueabi-gcc  -march=armv7-a -mthumb-interwork -mfloat-abi=hard -mfpu=neon -mtune=cortex-a8 --sysroot=/build/v2013.06/build/tmp-angstrom_v2013_06-eglibc/sysroots/beaglebone -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN  -DTERMIO  -O2 -pipe -g -feliminate-unused-debug-types -Wall -Wa,--noexecstack -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -c   -c -o armv4cpuid.o armv4cpuid.S
> 
> So the '$cflags       = -DTERMIO -O3 -Wall' in linux-armv4 gets overridden by OE, just like we want :)
> 
> regards,
> 
> Koen