I have made two native ARM sha1 routines.


This is the highest performance version on my Chromebook. It is fully
unrolled and uses 5508 bytes of code. However I'm concerned that 5k of
code won't fit into the I-cache on some ARM processors so I'd like some
advice as to whether that is sensible or not.

I've made a second routine which is partially unrolled (note this
version isn't quite as polished as the first one)


Which uses only 1896 bytes of code but runs about 10% slower on the

In comparison the amd64 version of the code is 4963 bytes and the 386
version is 3888 bytes. Both are fully unrolled.

My feeling is that the unrolled version should be preferred as it is
faster and 5k of code isn't excessive. However I don't want to unduly
hamper older ARM processors.

I'll polish and submit one or the other CLs depending on what we decide!

Nick Craig-Wood <nick@craig-wood.com> -- http://www.craig-wood.com/nick


You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 11 | next ›
Discussion Overview
groupgolang-dev @
postedJan 25, '14 at 4:17p
activeFeb 8, '14 at 3:34p



site design / logo © 2021 Grokbase