Patchwork [PATCHv2] openssl: switch ARM builds from linux-elf-arm to linux-armv4 config

login
register
mail settings
Submitter Koen Kooi
Date Oct. 16, 2013, 7:25 a.m.
Message ID <1381908355-18124-1-git-send-email-koen@dominion.thruhere.net>
Download mbox | patch
Permalink /patch/60001/
State New
Headers show

Comments

Koen Kooi - Oct. 16, 2013, 7:25 a.m.
From: Koen Kooi <koen.kooi@linaro.org>

This enables aes and sha1 assembly at buildtime. Openssl does a
runtime check to see which portion gets enabled.

'./Configure TABLE' gives the following:

*** linux-elf-arm
$cc           =
$cflags       = -DL_ENDIAN      -DTERMIO  -O2 -pipe -g -feliminate-unused-debug-types -Wall -Wa,--noexecstack -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS
$unistd       =
$thread_cflag = -D_REENTRANT
$sys_id       =
$lflags       = -ldl
$bn_ops       = BN_LLONG DES_RISC1
$cpuid_obj    =
$bn_obj       =
$des_obj      =
$aes_obj      =
$bf_obj       =
$md5_obj      =
$sha1_obj     =
$cast_obj     =
$rc4_obj      =
$rmd160_obj   =
$rc5_obj      =
$wp_obj       =
$cmll_obj     =
$modes_obj    =
$engines_obj  =
$perlasm_scheme = void
$dso_scheme   = dlfcn
$shared_target= linux-shared
$shared_cflag = -fPIC
$shared_ldflag =
$shared_extension = .so.$(SHLIB_MAJOR).$(SHLIB_MINOR)
$ranlib       =
$arflags      =
$multilib     =

*** linux-armv4
$cc           = gcc
$cflags       = -DTERMIO -O3 -Wall
$unistd       =
$thread_cflag = -D_REENTRANT
$sys_id       =
$lflags       = -ldl
$bn_ops       = BN_LLONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR
$cpuid_obj    = armcap.o armv4cpuid.o
$bn_obj       = bn_asm.o armv4-mont.o armv4-gf2m.o
$des_obj      =
$aes_obj      = aes_cbc.o aes-armv4.o bsaes-armv7.o
$bf_obj       =
$md5_obj      =
$sha1_obj     = sha1-armv4-large.o sha256-armv4.o sha512-armv4.o
$cast_obj     =
$rc4_obj      =
$rmd160_obj   =
$rc5_obj      =
$wp_obj       =
$cmll_obj     =
$modes_obj    = ghash-armv4.o
$engines_obj  =
$perlasm_scheme = void
$dso_scheme   = dlfcn
$shared_target= linux-shared
$shared_cflag = -fPIC
$shared_ldflag =
$shared_extension = .so.$(SHLIB_MAJOR).$(SHLIB_MINOR)
$ranlib       =
$arflags      =
$multilib     =

Build tested on armv7a/angstrom and armv8/distroless, runtime tested on armv7a/angstrom.

'openssl speed' results:

Algo    blocksize       ops/s after
                ops/s before    difference
-------------------------------------------
MD5	16	308,766	264,664	-14.28%
	64	277,090	263,340	-4.96%
	256	212,652	197,043	-7.34%
	1024	103,604	100,157	-3.33%
	8192	17,936	17,796	-0.78%
sha1	16	290,011	385,098	32.79%
	64	234,939	302,788	28.88%
	256	144,831	177,028	22.23%
	1024	57,043	67,374	18.11%
	8192	8,586	9,932	15.68%
sha256	16	290,443	605,747	108.56%
	64	178,010	370,598	108.19%
	256	82,107	168,770	105.55%
	1024	26,064	53,068	103.61%
	8192	3,550	7,211	103.10%
sha512	16	59,618	259,354	335.03%
	64	59,616	258,265	333.22%
	256	21,727	98,057	351.31%
	1024	7,449	34,304	360.49%
	8192	1,047	4,842	362.63%
des cbc	16	964,682	1,124,459	16.56%
	64	260,188	298,910	14.88%
	256	65,945	76,273	15.66%
	1024	16,570	19,110	15.33%
	8192	2,082	2,398	15.17%
des ede3	16	370,442	429,906	16.05%
	64	95,429	110,147	15.42%
	256	23,928	27,808	16.21%
	1024	5,993	6,960	16.13%
	8192	752	868	15.36%
aes128	16	1,712,050	2,301,100	34.41%
	64	466,491	651,155	39.59%
	256	120,181	168,953	40.58%
	1024	30,177	42,792	41.80%
	8192	3,791	5,361	41.41%
aes192	16	1,472,560	1,964,900	33.43%
	64	400,087	544,971	36.21%
	256	103,245	141,062	36.63%
	1024	25,902	35,389	36.63%
	8192	3,256	4,451	36.67%
eas256	16	1,330,524	1,772,143	33.19%
	64	355,025	486,221	36.95%
	256	90,663	125,281	38.18%
	1024	22,725	31,484	38.54%
	8192	2,837	3,952	39.31%
rsa	2048bit	15	25	69.94%
	public	547	832	52.00%
dsa	2048bit	55	86	54.26%
	verify	47	73	53.33%

Signed-off-by: Koen Kooi <koen.kooi@linaro.org>
---
 meta/recipes-connectivity/openssl/openssl.inc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Phil Blundell - Oct. 16, 2013, 9:20 a.m.
On Wed, 2013-10-16 at 09:25 +0200, Koen Kooi wrote:
> From: Koen Kooi <koen.kooi@linaro.org>
> 
> This enables aes and sha1 assembly at buildtime. Openssl does a
> runtime check to see which portion gets enabled.

[...]

> Algo    blocksize       ops/s after
>                 ops/s before    difference
> -------------------------------------------
> MD5	16	308,766	264,664	-14.28%
> 	64	277,090	263,340	-4.96%
> 	256	212,652	197,043	-7.34%
> 	1024	103,604	100,157	-3.33%
> 	8192	17,936	17,796	-0.78%

Do you know why it's causing MD5 to get slower?  I guess md5 with
blocksize=16 is not a very common case, but still.

Also, it seems generally a bit unwholesome for openssl to be picking its
own CFLAGS at all.  Would it be better to just make it use the same
CFLAGS as everything else?

p.
Koen Kooi - Oct. 16, 2013, 9:45 a.m.
Op 16 okt. 2013, om 11:20 heeft Phil Blundell <pb@pbcl.net> het volgende geschreven:

> On Wed, 2013-10-16 at 09:25 +0200, Koen Kooi wrote:
>> From: Koen Kooi <koen.kooi@linaro.org>
>> 
>> This enables aes and sha1 assembly at buildtime. Openssl does a
>> runtime check to see which portion gets enabled.
> 
> [...]
> 
>> Algo    blocksize       ops/s after
>>                ops/s before    difference
>> -------------------------------------------
>> MD5	16	308,766	264,664	-14.28%
>> 	64	277,090	263,340	-4.96%
>> 	256	212,652	197,043	-7.34%
>> 	1024	103,604	100,157	-3.33%
>> 	8192	17,936	17,796	-0.78%
> 
> Do you know why it's causing MD5 to get slower?  I guess md5 with
> blocksize=16 is not a very common case, but still.

I really don't know and it is the only algo that gets a lot slower. This patch is a preparation for (more) NEON optimizations which seem to fix the regression, see

	https://docs.google.com/spreadsheet/ccc?key=0AhgZ33Tf6eBldHVONjRXRnItWld4eFlRWTJ3RzVIdGc&usp=sharing

and

	http://dominion.thruhere.net/koen/angstrom/0002-openssl-1.0.1e-add-ARMv7-AES-optimizations.patch

I need to test that patch on A9 and A15 cores as well since it doesn't seem to do a lot on A8 cores :(

> Also, it seems generally a bit unwholesome for openssl to be picking its
> own CFLAGS at all.  Would it be better to just make it use the same
> CFLAGS as everything else?

openssl.inc already pokes at CLAG(S):

CFLAG = "${@base_conditional('SITEINFO_ENDIANNESS', 'le', '-DL_ENDIAN', '-DB_ENDIAN', d)} \
        -DTERMIO ${CFLAGS} -Wall -Wa,--noexecstack"

The complete command looks like this:

arm-angstrom-linux-gnueabi-gcc  -march=armv7-a -mthumb-interwork -mfloat-abi=hard -mfpu=neon -mtune=cortex-a8 --sysroot=/build/v2013.06/build/tmp-angstrom_v2013_06-eglibc/sysroots/beaglebone -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN  -DTERMIO  -O2 -pipe -g -feliminate-unused-debug-types -Wall -Wa,--noexecstack -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -c   -c -o armv4cpuid.o armv4cpuid.S

So the '$cflags       = -DTERMIO -O3 -Wall' in linux-armv4 gets overridden by OE, just like we want :)

regards,

Koen
Koen Kooi - Oct. 17, 2013, 5:54 a.m.
Op 16 okt. 2013, om 11:45 heeft Koen Kooi <koen@dominion.thruhere.net> het volgende geschreven:

> 
> Op 16 okt. 2013, om 11:20 heeft Phil Blundell <pb@pbcl.net> het volgende geschreven:
> 
>> On Wed, 2013-10-16 at 09:25 +0200, Koen Kooi wrote:
>>> From: Koen Kooi <koen.kooi@linaro.org>
>>> 
>>> This enables aes and sha1 assembly at buildtime. Openssl does a
>>> runtime check to see which portion gets enabled.
>> 
>> [...]
>> 
>>> Algo    blocksize       ops/s after
>>>               ops/s before    difference
>>> -------------------------------------------
>>> MD5	16	308,766	264,664	-14.28%
>>> 	64	277,090	263,340	-4.96%
>>> 	256	212,652	197,043	-7.34%
>>> 	1024	103,604	100,157	-3.33%
>>> 	8192	17,936	17,796	-0.78%
>> 
>> Do you know why it's causing MD5 to get slower?  I guess md5 with
>> blocksize=16 is not a very common case, but still.
> 
> I really don't know and it is the only algo that gets a lot slower. This patch is a preparation for (more) NEON optimizations which seem to fix the regression, see
> 
> 	https://docs.google.com/spreadsheet/ccc?key=0AhgZ33Tf6eBldHVONjRXRnItWld4eFlRWTJ3RzVIdGc&usp=sharing
> 
> and
> 
> 	http://dominion.thruhere.net/koen/angstrom/0002-openssl-1.0.1e-add-ARMv7-AES-optimizations.patch
> 
> I need to test that patch on A9 and A15 cores as well since it doesn't seem to do a lot on A8 cores :(

And on A9 cores linux-armv4 is 20% *faster* on MD5 with blocksize=16, see spreadsheet above. So it probably is a scheduling issue that favours A9 cores.

regards,

Koen.


> 
>> Also, it seems generally a bit unwholesome for openssl to be picking its
>> own CFLAGS at all.  Would it be better to just make it use the same
>> CFLAGS as everything else?
> 
> openssl.inc already pokes at CLAG(S):
> 
> CFLAG = "${@base_conditional('SITEINFO_ENDIANNESS', 'le', '-DL_ENDIAN', '-DB_ENDIAN', d)} \
>        -DTERMIO ${CFLAGS} -Wall -Wa,--noexecstack"
> 
> The complete command looks like this:
> 
> arm-angstrom-linux-gnueabi-gcc  -march=armv7-a -mthumb-interwork -mfloat-abi=hard -mfpu=neon -mtune=cortex-a8 --sysroot=/build/v2013.06/build/tmp-angstrom_v2013_06-eglibc/sysroots/beaglebone -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN  -DTERMIO  -O2 -pipe -g -feliminate-unused-debug-types -Wall -Wa,--noexecstack -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -c   -c -o armv4cpuid.o armv4cpuid.S
> 
> So the '$cflags       = -DTERMIO -O3 -Wall' in linux-armv4 gets overridden by OE, just like we want :)
> 
> regards,
> 
> Koen

Patch

diff --git a/meta/recipes-connectivity/openssl/openssl.inc b/meta/recipes-connectivity/openssl/openssl.inc
index f5b2432..f0fa180 100644
--- a/meta/recipes-connectivity/openssl/openssl.inc
+++ b/meta/recipes-connectivity/openssl/openssl.inc
@@ -61,7 +61,7 @@  do_configure () {
 	target="$os-${HOST_ARCH}"
 	case $target in
 	linux-arm)
-		target=linux-elf-arm
+		target=linux-armv4
 		;;
 	linux-armeb)
 		target=linux-elf-armeb