On Sat, Jan 25, 2014 at 11:17 AM, Nick Craig-Wood wrote:

I have made two native ARM sha1 routines.
Great. Thank you for working on that.

This is the highest performance version on my Chromebook. It is fully
unrolled and uses 5508 bytes of code. However I'm concerned that 5k of
code won't fit into the I-cache on some ARM processors so I'd like some
advice as to whether that is sensible or not.

I've made a second routine which is partially unrolled (note this
version isn't quite as polished as the first one)


Which uses only 1896 bytes of code but runs about 10% slower on the

In comparison the amd64 version of the code is 4963 bytes and the 386
version is 3888 bytes. Both are fully unrolled.

My feeling is that the unrolled version should be preferred as it is
faster and 5k of code isn't excessive. However I don't want to unduly
hamper older ARM processors.

I'll polish and submit one or the other CLs depending on what we decide!
I can't decide which is better, but I'm slightly inclined to the smaller
(could we put both in the tree, and use one for armv7a and other for armv5,
and possibly for armv6?)

anyway, I want to hear the benchmark result on ARMv5.

PS: given that GOARM setting now affects the compiler code generation,
can we introduce armv5, armv6 and armv7 build tags?
Having that can solve this problem perfectly: just include both with
build tags.


You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 11 | next ›
Discussion Overview
groupgolang-dev @
postedJan 25, '14 at 4:17p
activeFeb 8, '14 at 3:34p



site design / logo © 2021 Grokbase