FAQ
Hello Everyone,

I've started to learn Go so I was interested in comparing Go performance
against some other languages.
I have a simple microbenchmark that I use to get a quick idea of relative
performance.
I was disappointed to see that the performance of Go was not as as good as
I expected.

My benchmark factors a couple of big integers using Pollard's Rho method:

N5 = 766150476015982127183457373
N6 = 62823675885202669644104829577

Both are the products of just two primes.

-------------------------------------------
Go 1.1rc2 :

C:\Projects\GoCode\src\rho>go test -v
=== RUN: ExampleN5
--- PASS: ExampleN5 (10.053s)
=== RUN: ExampleN6
--- PASS: ExampleN6 (15.348s)
PASS
ok rho 25.452s

C:\Projects\GoCode\src\rho>go test -cpuprofile=rho.prof -bench="Both"
-benchmem=true
testing: warning: no tests to run
PASS
BenchmarkBoth 1 25958000000 ns/op 1105711392 B/op
25343461 allocs/op
ok rho 26.018s

-------------------------------------------
Python 3.3 :
rho(N5) = 1178524040059
9.34 s per run
rho(N6) = 663410067979
15.11 s per run

------------------------------------------
Scala 2.10.1 :
scala> TimeRho(N5)
rho(766150476015982127183457373) = 1178524040059
Time = 2112 ms
scala> TimeRho(N6)
rho(62823675885202669644104829577) = 663410067979
Time = 3104 ms
-------------------------------------------

As you can see above, Go and Python have about the same speed, while Scala
(and Java) are much faster.
This surprised me because I had thought that because Go is statically typed
it would have been faster than Python,
and I expected its performance to be closer to Scala or Java.

I've attached the Go files if anyone is interested.

After some editing to make pprof work on Windows, I managed to collect a
profile.
It seems that there is a lot of time spent on memory allocation, but that's
about all I can determine at this point,
given my unfamiliarity with Go.
The output from pprof is below.

Any ideas on how to make big.Int faster in Go?

-- Peter

-------------------------------------------------------------
C:\Projects\GoCode\src\rho>go tool pprof rho.test.exe rho.prof
The system cannot find the path specified.
Welcome to pprof! For help, type 'help'.
(pprof) top50 --cum
Total: 2593 samples
        0 0.0% 0.0% 2370 91.4% gosched0
        0 0.0% 0.0% 2370 91.4% rho.BenchmarkBoth
        3 0.1% 0.1% 2370 91.4% rho.Rho
        0 0.0% 0.1% 2370 91.4% testing.(*B).launch
        0 0.0% 0.1% 2370 91.4% testing.(*B).runN
        0 0.0% 0.1% 1906 73.5% math/big.(*Int).GCD
      144 5.6% 5.7% 1902 73.4% math/big.(*Int).binaryGCD
      276 10.6% 16.3% 606 23.4% math/big.(*Int).Sub
      156 6.0% 22.3% 596 23.0% math/big.(*Int).Rsh
      104 4.0% 26.3% 431 16.6% math/big.(*Int).Set
      240 9.3% 35.6% 384 14.8% math/big.nat.sub
       79 3.0% 38.6% 364 14.0% math/big.nat.set
        7 0.3% 38.9% 346 13.3% math/big.(*Int).Mod
      110 4.2% 43.2% 326 12.6% math/big.nat.make
       13 0.5% 43.7% 309 11.9% math/big.(*Int).QuoRem
        8 0.3% 44.0% 297 11.5% math/big.nat.div
       65 2.5% 46.5% 289 11.1% math/big.nat.divLarge
       59 2.3% 48.7% 267 10.3% runtime.makeslice
       54 2.1% 50.8% 238 9.2% runtime.mallocgc
        8 0.3% 51.1% 207 8.0% makeslice1
       61 2.4% 53.5% 205 7.9% runtime.copy
      106 4.1% 57.6% 198 7.6% math/big.nat.shr
       22 0.8% 58.4% 196 7.6% math/big.(*Int).Neg
       84 3.2% 61.7% 163 6.3% math/big.nat.add
      152 5.9% 67.5% 152 5.9% math/big.nat.norm
      143 5.5% 73.0% 143 5.5% runtime.memmove
      125 4.8% 77.9% 125 4.8% math/big.nat.cmp
       64 2.5% 80.3% 120 4.6% math/big.nat.trailingZeroBits
       10 0.4% 80.7% 80 3.1% math/big.(*Int).Mul
        0 0.0% 80.7% 78 3.0% gc
       24 0.9% 81.6% 78 3.0% runtime.MCache_Alloc
        0 0.0% 81.6% 78 3.0% runtime.gc
        7 0.3% 81.9% 70 2.7% math/big.nat.mul
       66 2.5% 84.5% 66 2.5% math/big.trailingZeroBits
       64 2.5% 86.9% 64 2.5% math/big.subVV
        0 0.0% 86.9% 62 2.4% runtime.parfordo
        6 0.2% 87.2% 51 2.0% runtime.MCentral_AllocList
        4 0.2% 87.3% 50 1.9% runtime.new
       34 1.3% 88.6% 44 1.7% sweepspan
        5 0.2% 88.8% 42 1.6% MCentral_Grow
       34 1.3% 90.1% 34 1.3% runtime.markallocated
       33 1.3% 91.4% 33 1.3% math/big.shrVU
       27 1.0% 92.4% 30 1.2% scanblock
       14 0.5% 93.0% 28 1.1% math/big.basicMul
       22 0.8% 93.8% 22 0.8% math/big.divWW
        7 0.3% 94.1% 19 0.7% math/big.(*Int).Add
        0 0.0% 94.1% 18 0.7% markroot
       17 0.7% 94.8% 17 0.7% runtime.markspan
        0 0.0% 94.8% 16 0.6% runtime.MHeap_Alloc
       12 0.5% 95.2% 12 0.5% math/big.addVV

(pprof) top50
Total: 2593 samples
      276 10.6% 10.6% 606 23.4% math/big.(*Int).Sub
      240 9.3% 19.9% 384 14.8% math/big.nat.sub
      156 6.0% 25.9% 596 23.0% math/big.(*Int).Rsh
      152 5.9% 31.8% 152 5.9% math/big.nat.norm
      144 5.6% 37.3% 1902 73.4% math/big.(*Int).binaryGCD
      143 5.5% 42.8% 143 5.5% runtime.memmove
      125 4.8% 47.7% 125 4.8% math/big.nat.cmp
      110 4.2% 51.9% 326 12.6% math/big.nat.make
      106 4.1% 56.0% 198 7.6% math/big.nat.shr
      104 4.0% 60.0% 431 16.6% math/big.(*Int).Set
       84 3.2% 63.2% 163 6.3% math/big.nat.add
       79 3.0% 66.3% 364 14.0% math/big.nat.set
       66 2.5% 68.8% 66 2.5% math/big.trailingZeroBits
       65 2.5% 71.3% 289 11.1% math/big.nat.divLarge
       64 2.5% 73.8% 120 4.6% math/big.nat.trailingZeroBits
       64 2.5% 76.3% 64 2.5% math/big.subVV
       61 2.4% 78.6% 205 7.9% runtime.copy
       59 2.3% 80.9% 267 10.3% runtime.makeslice
       54 2.1% 83.0% 238 9.2% runtime.mallocgc
       34 1.3% 84.3% 34 1.3% runtime.markallocated
       34 1.3% 85.6% 44 1.7% sweepspan
       33 1.3% 86.9% 33 1.3% math/big.shrVU
       27 1.0% 87.9% 30 1.2% scanblock
       24 0.9% 88.9% 78 3.0% runtime.MCache_Alloc
       22 0.8% 89.7% 196 7.6% math/big.(*Int).Neg
       22 0.8% 90.6% 22 0.8% math/big.divWW
       17 0.7% 91.2% 17 0.7% runtime.markspan
       14 0.5% 91.7% 28 1.1% math/big.basicMul
       13 0.5% 92.2% 309 11.9% math/big.(*Int).QuoRem
       12 0.5% 92.7% 12 0.5% math/big.addVV
       12 0.5% 93.2% 12 0.5% math/big.nat.clear
       11 0.4% 93.6% 11 0.4% runtime.memclr
       11 0.4% 94.0% 11 0.4% runtime.settype_flush
       10 0.4% 94.4% 80 3.1% math/big.(*Int).Mul
        9 0.3% 94.8% 9 0.3% math/big.addVW
        8 0.3% 95.1% 207 8.0% makeslice1
        8 0.3% 95.4% 8 0.3% math/big.addMulVVW
        8 0.3% 95.7% 297 11.5% math/big.nat.div
        8 0.3% 96.0% 8 0.3% runtime.SizeToClass
        8 0.3% 96.3% 8 0.3% runtime.casp
        7 0.3% 96.6% 19 0.7% math/big.(*Int).Add
        7 0.3% 96.8% 346 13.3% math/big.(*Int).Mod
        7 0.3% 97.1% 70 2.7% math/big.nat.mul
        6 0.2% 97.3% 6 0.2% math/big.subVW
        6 0.2% 97.6% 51 2.0% runtime.MCentral_AllocList
        5 0.2% 97.8% 42 1.6% MCentral_Grow
        5 0.2% 98.0% 11 0.4% math/big.(*Int).Lsh
        4 0.2% 98.1% 4 0.2% math/big.mulWW
        4 0.2% 98.3% 50 1.9% runtime.new
        3 0.1% 98.4% 3 0.1% math/big.(*Int).Cmp

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Search Discussions

  • Dave Cheney at May 7, 2013 at 10:57 pm
    Are you using an amd64 machine ?
    On Wed, May 8, 2013 at 5:47 AM, wrote:
    Hello Everyone,

    I've started to learn Go so I was interested in comparing Go performance
    against some other languages.
    I have a simple microbenchmark that I use to get a quick idea of relative
    performance.
    I was disappointed to see that the performance of Go was not as as good as I
    expected.

    My benchmark factors a couple of big integers using Pollard's Rho method:

    N5 = 766150476015982127183457373
    N6 = 62823675885202669644104829577

    Both are the products of just two primes.

    -------------------------------------------
    Go 1.1rc2 :

    C:\Projects\GoCode\src\rho>go test -v
    === RUN: ExampleN5
    --- PASS: ExampleN5 (10.053s)
    === RUN: ExampleN6
    --- PASS: ExampleN6 (15.348s)
    PASS
    ok rho 25.452s

    C:\Projects\GoCode\src\rho>go test -cpuprofile=rho.prof -bench="Both"
    -benchmem=true
    testing: warning: no tests to run
    PASS
    BenchmarkBoth 1 25958000000 ns/op 1105711392 B/op
    25343461 allocs/op
    ok rho 26.018s

    -------------------------------------------
    Python 3.3 :
    rho(N5) = 1178524040059
    9.34 s per run
    rho(N6) = 663410067979
    15.11 s per run

    ------------------------------------------
    Scala 2.10.1 :
    scala> TimeRho(N5)
    rho(766150476015982127183457373) = 1178524040059
    Time = 2112 ms
    scala> TimeRho(N6)
    rho(62823675885202669644104829577) = 663410067979
    Time = 3104 ms
    -------------------------------------------

    As you can see above, Go and Python have about the same speed, while Scala
    (and Java) are much faster.
    This surprised me because I had thought that because Go is statically typed
    it would have been faster than Python,
    and I expected its performance to be closer to Scala or Java.

    I've attached the Go files if anyone is interested.

    After some editing to make pprof work on Windows, I managed to collect a
    profile.
    It seems that there is a lot of time spent on memory allocation, but that's
    about all I can determine at this point,
    given my unfamiliarity with Go.
    The output from pprof is below.

    Any ideas on how to make big.Int faster in Go?

    -- Peter

    -------------------------------------------------------------
    C:\Projects\GoCode\src\rho>go tool pprof rho.test.exe rho.prof
    The system cannot find the path specified.
    Welcome to pprof! For help, type 'help'.
    (pprof) top50 --cum
    Total: 2593 samples
    0 0.0% 0.0% 2370 91.4% gosched0
    0 0.0% 0.0% 2370 91.4% rho.BenchmarkBoth
    3 0.1% 0.1% 2370 91.4% rho.Rho
    0 0.0% 0.1% 2370 91.4% testing.(*B).launch
    0 0.0% 0.1% 2370 91.4% testing.(*B).runN
    0 0.0% 0.1% 1906 73.5% math/big.(*Int).GCD
    144 5.6% 5.7% 1902 73.4% math/big.(*Int).binaryGCD
    276 10.6% 16.3% 606 23.4% math/big.(*Int).Sub
    156 6.0% 22.3% 596 23.0% math/big.(*Int).Rsh
    104 4.0% 26.3% 431 16.6% math/big.(*Int).Set
    240 9.3% 35.6% 384 14.8% math/big.nat.sub
    79 3.0% 38.6% 364 14.0% math/big.nat.set
    7 0.3% 38.9% 346 13.3% math/big.(*Int).Mod
    110 4.2% 43.2% 326 12.6% math/big.nat.make
    13 0.5% 43.7% 309 11.9% math/big.(*Int).QuoRem
    8 0.3% 44.0% 297 11.5% math/big.nat.div
    65 2.5% 46.5% 289 11.1% math/big.nat.divLarge
    59 2.3% 48.7% 267 10.3% runtime.makeslice
    54 2.1% 50.8% 238 9.2% runtime.mallocgc
    8 0.3% 51.1% 207 8.0% makeslice1
    61 2.4% 53.5% 205 7.9% runtime.copy
    106 4.1% 57.6% 198 7.6% math/big.nat.shr
    22 0.8% 58.4% 196 7.6% math/big.(*Int).Neg
    84 3.2% 61.7% 163 6.3% math/big.nat.add
    152 5.9% 67.5% 152 5.9% math/big.nat.norm
    143 5.5% 73.0% 143 5.5% runtime.memmove
    125 4.8% 77.9% 125 4.8% math/big.nat.cmp
    64 2.5% 80.3% 120 4.6% math/big.nat.trailingZeroBits
    10 0.4% 80.7% 80 3.1% math/big.(*Int).Mul
    0 0.0% 80.7% 78 3.0% gc
    24 0.9% 81.6% 78 3.0% runtime.MCache_Alloc
    0 0.0% 81.6% 78 3.0% runtime.gc
    7 0.3% 81.9% 70 2.7% math/big.nat.mul
    66 2.5% 84.5% 66 2.5% math/big.trailingZeroBits
    64 2.5% 86.9% 64 2.5% math/big.subVV
    0 0.0% 86.9% 62 2.4% runtime.parfordo
    6 0.2% 87.2% 51 2.0% runtime.MCentral_AllocList
    4 0.2% 87.3% 50 1.9% runtime.new
    34 1.3% 88.6% 44 1.7% sweepspan
    5 0.2% 88.8% 42 1.6% MCentral_Grow
    34 1.3% 90.1% 34 1.3% runtime.markallocated
    33 1.3% 91.4% 33 1.3% math/big.shrVU
    27 1.0% 92.4% 30 1.2% scanblock
    14 0.5% 93.0% 28 1.1% math/big.basicMul
    22 0.8% 93.8% 22 0.8% math/big.divWW
    7 0.3% 94.1% 19 0.7% math/big.(*Int).Add
    0 0.0% 94.1% 18 0.7% markroot
    17 0.7% 94.8% 17 0.7% runtime.markspan
    0 0.0% 94.8% 16 0.6% runtime.MHeap_Alloc
    12 0.5% 95.2% 12 0.5% math/big.addVV

    (pprof) top50
    Total: 2593 samples
    276 10.6% 10.6% 606 23.4% math/big.(*Int).Sub
    240 9.3% 19.9% 384 14.8% math/big.nat.sub
    156 6.0% 25.9% 596 23.0% math/big.(*Int).Rsh
    152 5.9% 31.8% 152 5.9% math/big.nat.norm
    144 5.6% 37.3% 1902 73.4% math/big.(*Int).binaryGCD
    143 5.5% 42.8% 143 5.5% runtime.memmove
    125 4.8% 47.7% 125 4.8% math/big.nat.cmp
    110 4.2% 51.9% 326 12.6% math/big.nat.make
    106 4.1% 56.0% 198 7.6% math/big.nat.shr
    104 4.0% 60.0% 431 16.6% math/big.(*Int).Set
    84 3.2% 63.2% 163 6.3% math/big.nat.add
    79 3.0% 66.3% 364 14.0% math/big.nat.set
    66 2.5% 68.8% 66 2.5% math/big.trailingZeroBits
    65 2.5% 71.3% 289 11.1% math/big.nat.divLarge
    64 2.5% 73.8% 120 4.6% math/big.nat.trailingZeroBits
    64 2.5% 76.3% 64 2.5% math/big.subVV
    61 2.4% 78.6% 205 7.9% runtime.copy
    59 2.3% 80.9% 267 10.3% runtime.makeslice
    54 2.1% 83.0% 238 9.2% runtime.mallocgc
    34 1.3% 84.3% 34 1.3% runtime.markallocated
    34 1.3% 85.6% 44 1.7% sweepspan
    33 1.3% 86.9% 33 1.3% math/big.shrVU
    27 1.0% 87.9% 30 1.2% scanblock
    24 0.9% 88.9% 78 3.0% runtime.MCache_Alloc
    22 0.8% 89.7% 196 7.6% math/big.(*Int).Neg
    22 0.8% 90.6% 22 0.8% math/big.divWW
    17 0.7% 91.2% 17 0.7% runtime.markspan
    14 0.5% 91.7% 28 1.1% math/big.basicMul
    13 0.5% 92.2% 309 11.9% math/big.(*Int).QuoRem
    12 0.5% 92.7% 12 0.5% math/big.addVV
    12 0.5% 93.2% 12 0.5% math/big.nat.clear
    11 0.4% 93.6% 11 0.4% runtime.memclr
    11 0.4% 94.0% 11 0.4% runtime.settype_flush
    10 0.4% 94.4% 80 3.1% math/big.(*Int).Mul
    9 0.3% 94.8% 9 0.3% math/big.addVW
    8 0.3% 95.1% 207 8.0% makeslice1
    8 0.3% 95.4% 8 0.3% math/big.addMulVVW
    8 0.3% 95.7% 297 11.5% math/big.nat.div
    8 0.3% 96.0% 8 0.3% runtime.SizeToClass
    8 0.3% 96.3% 8 0.3% runtime.casp
    7 0.3% 96.6% 19 0.7% math/big.(*Int).Add
    7 0.3% 96.8% 346 13.3% math/big.(*Int).Mod
    7 0.3% 97.1% 70 2.7% math/big.nat.mul
    6 0.2% 97.3% 6 0.2% math/big.subVW
    6 0.2% 97.6% 51 2.0% runtime.MCentral_AllocList
    5 0.2% 97.8% 42 1.6% MCentral_Grow
    5 0.2% 98.0% 11 0.4% math/big.(*Int).Lsh
    4 0.2% 98.1% 4 0.2% math/big.mulWW
    4 0.2% 98.3% 50 1.9% runtime.new
    3 0.1% 98.4% 3 0.1% math/big.(*Int).Cmp

    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Peter Caven at May 7, 2013 at 11:23 pm
    Yes, on Vista.

    So, I used:

    Go: go1.1rc2.windows-amd64
    Python: 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600 32
    bit (Intel)] on win32
    Scala: JVM 1.7.0_21, 64-Bit Server VM (build 23.21-b01, mixed mode)

    -- Peter

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Agl at May 7, 2013 at 11:55 pm

    On Tuesday, May 7, 2013 3:47:19 PM UTC-4, Peter Caven wrote:

    It seems that there is a lot of time spent on memory allocation, but
    that's about all I can determine at this point,
    given my unfamiliarity with Go.
    The output from pprof is below.

    Any ideas on how to make big.Int faster in Go?
    You can save a few allocations and some time by avoiding some aliasing. For
    example:

    a.Mul(a, a)

    Will end up allocating because Mul cannot write the result into a as it
    goes. By using a temp variable and swapping afterwards, some allocations
    will be saved:

    t1.Mul(a, a)
    t1, a = a, t1

    (Yes, this is a little ugly.)

    But this doesn't get you too much in this test because 75% of the time is
    spent in binaryGCD[1].

    A much clearer (to me, at least) profile output results from 'gv', but that
    might not work in Windows without ghostscript installed.

    I suspect that some work in optimisation work in binaryGCD would pay
    dividends here. For example, 30 seconds and the following change resulted
    in a modest speedup:

    diff -r 93752156e2b5 src/pkg/math/big/int.go
    --- a/src/pkg/math/big/int.go Fri May 03 15:24:05 2013 -0400
    +++ b/src/pkg/math/big/int.go Tue May 07 19:51:22 2013 -0400
    @@ -703,14 +703,15 @@
                     // reduce t
                     t.Rsh(t, t.abs.trailingZeroBits())
                     if t.neg {
    - v.Neg(t)
    + v, t = t, v
    + v.neg = len(v.abs) > 0 && !v.neg // 0 has no sign
                     } else {
    - u.Set(t)
    + u, t = t, u
                     }
                     t.Sub(u, v)
             }

    - return u.Lsh(u, k)
    + return z.Lsh(u, k)
      }

      // ProbablyPrime performs n Miller-Rabin tests to check whether x is prime.

    The combination of those two tricks got a 20% speedup. Tuning Rsh for
    negative numbers may also pay dividends.

    None the less, the Java BigInt speed is impressive.

    [1] http://tip.golang.org/src/pkg/math/big/int.go#L659


    Cheers

    AGL

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Peter Caven at May 8, 2013 at 3:51 am
    Thanks for pointing out the aliasing problems.
    I see in the big.Int code that there are a lot of checks for aliasing, and
    that many of the functions also return a pointer to the receiver in
    addition to updating it.

    Yes, I don't think there's very much we can gain by optimizing binaryGCD
    alone.
    After looking more closely at the lower level arithmetic functions in
    big.nat, I think that may be where most of the performance goes.

    Consider that the Python code I have is just these two simple high-level
    functions:

    def gcd(a,b):
         while b:
             a,b = b,(a % b)
         return a

    def rho(n):
         a = 2
         b = 2
         d = 1
         while d == 1:
             a = (a**2 + 1) % n
             b = (b**2 + 1) % n
             b = (b**2 + 1) % n
             d = gcd(a-b,n)
         return d

    Compare that to binaryGCD.
    It looks like it would be a big win to put the core functions,
    add,sub,mul,div,divmod, and the rest of big.nat in a lower-level C package.

    -- Peter

    On Tuesday, May 7, 2013 7:55:15 PM UTC-4, agl wrote:
    On Tuesday, May 7, 2013 3:47:19 PM UTC-4, Peter Caven wrote:

    It seems that there is a lot of time spent on memory allocation, but
    that's about all I can determine at this point,
    given my unfamiliarity with Go.
    The output from pprof is below.

    Any ideas on how to make big.Int faster in Go?
    You can save a few allocations and some time by avoiding some aliasing.
    For example:

    a.Mul(a, a)

    Will end up allocating because Mul cannot write the result into a as it
    goes. By using a temp variable and swapping afterwards, some allocations
    will be saved:

    t1.Mul(a, a)
    t1, a = a, t1

    (Yes, this is a little ugly.)

    But this doesn't get you too much in this test because 75% of the time is
    spent in binaryGCD[1].

    A much clearer (to me, at least) profile output results from 'gv', but
    that might not work in Windows without ghostscript installed.

    I suspect that some work in optimisation work in binaryGCD would pay
    dividends here. For example, 30 seconds and the following change resulted
    in a modest speedup:

    diff -r 93752156e2b5 src/pkg/math/big/int.go
    --- a/src/pkg/math/big/int.go Fri May 03 15:24:05 2013 -0400
    +++ b/src/pkg/math/big/int.go Tue May 07 19:51:22 2013 -0400
    @@ -703,14 +703,15 @@
    // reduce t
    t.Rsh(t, t.abs.trailingZeroBits())
    if t.neg {
    - v.Neg(t)
    + v, t = t, v
    + v.neg = len(v.abs) > 0 && !v.neg // 0 has no sign
    } else {
    - u.Set(t)
    + u, t = t, u
    }
    t.Sub(u, v)
    }

    - return u.Lsh(u, k)
    + return z.Lsh(u, k)
    }

    // ProbablyPrime performs n Miller-Rabin tests to check whether x is
    prime.

    The combination of those two tricks got a 20% speedup. Tuning Rsh for
    negative numbers may also pay dividends.

    None the less, the Java BigInt speed is impressive.

    [1] http://tip.golang.org/src/pkg/math/big/int.go#L659


    Cheers

    AGL
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Rémy Oudompheng at May 8, 2013 at 6:50 am

    On 2013/5/8 Peter Caven wrote:
    Thanks for pointing out the aliasing problems.
    I see in the big.Int code that there are a lot of checks for aliasing, and
    that many of the functions also return a pointer to the receiver in addition
    to updating it.

    Yes, I don't think there's very much we can gain by optimizing binaryGCD
    alone.
    After looking more closely at the lower level arithmetic functions in
    big.nat, I think that may be where most of the performance goes.

    Consider that the Python code I have is just these two simple high-level
    functions:

    def gcd(a,b):
    while b:
    a,b = b,(a % b)
    return a

    def rho(n):
    a = 2
    b = 2
    d = 1
    while d == 1:
    a = (a**2 + 1) % n
    b = (b**2 + 1) % n
    b = (b**2 + 1) % n
    d = gcd(a-b,n)
    return d

    Compare that to binaryGCD.
    It looks like it would be a big win to put the core functions,
    add,sub,mul,div,divmod, and the rest of big.nat in a lower-level C package.
    I don't understand why you think using C will make anything faster.
    We may:
    * have a few issues with divmod algorithm
    * binaryGCD is not always more effective than the ordinary GCD
    * your numbers are very small, so ordinary arithmetic (sum, product)
    is dwarfed by various overheads in function calls

    A C library will not help for these issues.

    For large integer multiplication you can try using
    github.com/remyoudompheng/bigfft which is faster than math/big for
    numbers with more than 40k digits, and "only" 1.5x-2x slower than GMP
    but that's not your use case. It's written in Go and calls the same
    assembly routines as math/big when necessary.

    Rémy.
    -- Peter

    On Tuesday, May 7, 2013 7:55:15 PM UTC-4, agl wrote:
    On Tuesday, May 7, 2013 3:47:19 PM UTC-4, Peter Caven wrote:

    It seems that there is a lot of time spent on memory allocation, but
    that's about all I can determine at this point,
    given my unfamiliarity with Go.
    The output from pprof is below.

    Any ideas on how to make big.Int faster in Go?

    You can save a few allocations and some time by avoiding some aliasing.
    For example:

    a.Mul(a, a)

    Will end up allocating because Mul cannot write the result into a as it
    goes. By using a temp variable and swapping afterwards, some allocations
    will be saved:

    t1.Mul(a, a)
    t1, a = a, t1

    (Yes, this is a little ugly.)

    But this doesn't get you too much in this test because 75% of the time is
    spent in binaryGCD[1].

    A much clearer (to me, at least) profile output results from 'gv', but
    that might not work in Windows without ghostscript installed.

    I suspect that some work in optimisation work in binaryGCD would pay
    dividends here. For example, 30 seconds and the following change resulted in
    a modest speedup:

    diff -r 93752156e2b5 src/pkg/math/big/int.go
    --- a/src/pkg/math/big/int.go Fri May 03 15:24:05 2013 -0400
    +++ b/src/pkg/math/big/int.go Tue May 07 19:51:22 2013 -0400
    @@ -703,14 +703,15 @@
    // reduce t
    t.Rsh(t, t.abs.trailingZeroBits())
    if t.neg {
    - v.Neg(t)
    + v, t = t, v
    + v.neg = len(v.abs) > 0 && !v.neg // 0 has no sign
    } else {
    - u.Set(t)
    + u, t = t, u
    }
    t.Sub(u, v)
    }

    - return u.Lsh(u, k)
    + return z.Lsh(u, k)
    }

    // ProbablyPrime performs n Miller-Rabin tests to check whether x is
    prime.

    The combination of those two tricks got a 20% speedup. Tuning Rsh for
    negative numbers may also pay dividends.

    None the less, the Java BigInt speed is impressive.

    [1] http://tip.golang.org/src/pkg/math/big/int.go#L659


    Cheers

    AGL
    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Peter Caven at May 8, 2013 at 7:35 am
    Hello Rémy,

    Lol! You've got some big numbers. Your package is very impressive!

    On Wednesday, May 8, 2013 2:49:59 AM UTC-4, Rémy Oudompheng wrote:


    I don't understand why you think using C will make anything faster.
    We may:
    * have a few issues with divmod algorithm
    * binaryGCD is not always more effective than the ordinary GCD
    *I was guessing of course, but here are my reasons: *
    *As you can see, the Python code is just as fast as the Go code, but the
    Python code is completely high level, even the gcd function, and this is
    all in a dynamic language.*
    *The low level arithmetic functions inside the Python VM are what makes the
    Python code fast, which means that tinkering with divmod or gcd is unlikely
    to make the Go code faster. *
    *There is also the problem of the unusually large number of memory
    allocations in the Go code. This looks like a fundamental problem with the
    implementation of big.nat.*

    * your numbers are very small, so ordinary arithmetic (sum, product)
    is dwarfed by various overheads in function calls
    A C library will not help for these issues.
    * *
    *So, I suppose you mean that the reason for the JVM code being 5 times
    faster is that Hotspot inlines everything?*
    *
    *

    For large integer multiplication you can try using
    github.com/remyoudompheng/bigfft which is faster than math/big for
    numbers with more than 40k digits, and "only" 1.5x-2x slower than GMP
    but that's not your use case. It's written in Go and calls the same
    assembly routines as math/big when necessary.
    *That doesn't surprise me actually - with 40k digits isn't the computation
    time determined less by the arithmetic than by the handling of the memory
    for those numbers?*
    *
    *
    *-- Peter*




    Rémy.
    -- Peter

    On Tuesday, May 7, 2013 7:55:15 PM UTC-4, agl wrote:
    On Tuesday, May 7, 2013 3:47:19 PM UTC-4, Peter Caven wrote:

    It seems that there is a lot of time spent on memory allocation, but
    that's about all I can determine at this point,
    given my unfamiliarity with Go.
    The output from pprof is below.

    Any ideas on how to make big.Int faster in Go?

    You can save a few allocations and some time by avoiding some aliasing.
    For example:

    a.Mul(a, a)

    Will end up allocating because Mul cannot write the result into a as it
    goes. By using a temp variable and swapping afterwards, some
    allocations
    will be saved:

    t1.Mul(a, a)
    t1, a = a, t1

    (Yes, this is a little ugly.)

    But this doesn't get you too much in this test because 75% of the time
    is
    spent in binaryGCD[1].

    A much clearer (to me, at least) profile output results from 'gv', but
    that might not work in Windows without ghostscript installed.

    I suspect that some work in optimisation work in binaryGCD would pay
    dividends here. For example, 30 seconds and the following change
    resulted in
    a modest speedup:

    diff -r 93752156e2b5 src/pkg/math/big/int.go
    --- a/src/pkg/math/big/int.go Fri May 03 15:24:05 2013 -0400
    +++ b/src/pkg/math/big/int.go Tue May 07 19:51:22 2013 -0400
    @@ -703,14 +703,15 @@
    // reduce t
    t.Rsh(t, t.abs.trailingZeroBits())
    if t.neg {
    - v.Neg(t)
    + v, t = t, v
    + v.neg = len(v.abs) > 0 && !v.neg // 0 has no
    sign
    } else {
    - u.Set(t)
    + u, t = t, u
    }
    t.Sub(u, v)
    }

    - return u.Lsh(u, k)
    + return z.Lsh(u, k)
    }

    // ProbablyPrime performs n Miller-Rabin tests to check whether x is
    prime.

    The combination of those two tricks got a 20% speedup. Tuning Rsh for
    negative numbers may also pay dividends.

    None the less, the Java BigInt speed is impressive.

    [1] http://tip.golang.org/src/pkg/math/big/int.go#L659


    Cheers

    AGL
    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts...@googlegroups.com <javascript:>.
    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Rémy Oudompheng at May 8, 2013 at 8:23 am

    2013/5/8 Peter Caven <pcaven@gmail.com>:
    Hello Rémy,

    Lol! You've got some big numbers. Your package is very impressive!

    On Wednesday, May 8, 2013 2:49:59 AM UTC-4, Rémy Oudompheng wrote:


    I don't understand why you think using C will make anything faster.
    We may:
    * have a few issues with divmod algorithm
    * binaryGCD is not always more effective than the ordinary GCD

    I was guessing of course, but here are my reasons:
    As you can see, the Python code is just as fast as the Go code, but the
    Python code is completely high level, even the gcd function, and this is all
    in a dynamic language.
    The low level arithmetic functions inside the Python VM are what makes the
    Python code fast, which means that tinkering with divmod or gcd is unlikely
    to make the Go code faster.
    divmod *is* a low-level arithmetic primitive, and it's the most
    complex one. As I said, your numbers are just 2 uint64 blocks wide, so
    the arithmetic itself will be a handful of instructions, which is
    already more than the cost of a function call, except for divmod.
    There is also the problem of the unusually large number of memory
    allocations in the Go code. This looks like a fundamental problem with the
    implementation of big.nat.
    For carefully written code (see agl's comments), the amortized cost is
    visible but usually small.
    * your numbers are very small, so ordinary arithmetic (sum, product)
    is dwarfed by various overheads in function calls

    A C library will not help for these issues.
    So, I suppose you mean that the reason for the JVM code being 5 times faster
    is that Hotspot inlines everything?

    For large integer multiplication you can try using
    github.com/remyoudompheng/bigfft which is faster than math/big for
    numbers with more than 40k digits, and "only" 1.5x-2x slower than GMP
    but that's not your use case. It's written in Go and calls the same
    assembly routines as math/big when necessary.
    That doesn't surprise me actually - with 40k digits isn't the computation
    time determined less by the arithmetic than by the handling of the memory
    for those numbers?
    There's a lot of cache-unfriendliness issues that arise but the choice
    of algorithm can also make a noticeable difference.

    Rémy.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • David DENG at May 8, 2013 at 8:18 am
    I didn't check the correctness the result, but using the following gcd
    (direct from your python code) is much faster:

    func gcd(a, b *big.Int) *big.Int {
         for b.Sign() != 0 {
             a, b = b, new(big.Int).Mod(a, b)
         }
         return a
    }


    David
    On Wednesday, May 8, 2013 11:51:27 AM UTC+8, Peter Caven wrote:

    Thanks for pointing out the aliasing problems.
    I see in the big.Int code that there are a lot of checks for aliasing, and
    that many of the functions also return a pointer to the receiver in
    addition to updating it.

    Yes, I don't think there's very much we can gain by optimizing binaryGCD
    alone.
    After looking more closely at the lower level arithmetic functions in
    big.nat, I think that may be where most of the performance goes.

    Consider that the Python code I have is just these two simple high-level
    functions:

    def gcd(a,b):
    while b:
    a,b = b,(a % b)
    return a

    def rho(n):
    a = 2
    b = 2
    d = 1
    while d == 1:
    a = (a**2 + 1) % n
    b = (b**2 + 1) % n
    b = (b**2 + 1) % n
    d = gcd(a-b,n)
    return d

    Compare that to binaryGCD.
    It looks like it would be a big win to put the core functions,
    add,sub,mul,div,divmod, and the rest of big.nat in a lower-level C package.

    -- Peter

    On Tuesday, May 7, 2013 7:55:15 PM UTC-4, agl wrote:
    On Tuesday, May 7, 2013 3:47:19 PM UTC-4, Peter Caven wrote:

    It seems that there is a lot of time spent on memory allocation, but
    that's about all I can determine at this point,
    given my unfamiliarity with Go.
    The output from pprof is below.

    Any ideas on how to make big.Int faster in Go?
    You can save a few allocations and some time by avoiding some aliasing.
    For example:

    a.Mul(a, a)

    Will end up allocating because Mul cannot write the result into a as it
    goes. By using a temp variable and swapping afterwards, some allocations
    will be saved:

    t1.Mul(a, a)
    t1, a = a, t1

    (Yes, this is a little ugly.)

    But this doesn't get you too much in this test because 75% of the time is
    spent in binaryGCD[1].

    A much clearer (to me, at least) profile output results from 'gv', but
    that might not work in Windows without ghostscript installed.

    I suspect that some work in optimisation work in binaryGCD would pay
    dividends here. For example, 30 seconds and the following change resulted
    in a modest speedup:

    diff -r 93752156e2b5 src/pkg/math/big/int.go
    --- a/src/pkg/math/big/int.go Fri May 03 15:24:05 2013 -0400
    +++ b/src/pkg/math/big/int.go Tue May 07 19:51:22 2013 -0400
    @@ -703,14 +703,15 @@
    // reduce t
    t.Rsh(t, t.abs.trailingZeroBits())
    if t.neg {
    - v.Neg(t)
    + v, t = t, v
    + v.neg = len(v.abs) > 0 && !v.neg // 0 has no sign
    } else {
    - u.Set(t)
    + u, t = t, u
    }
    t.Sub(u, v)
    }

    - return u.Lsh(u, k)
    + return z.Lsh(u, k)
    }

    // ProbablyPrime performs n Miller-Rabin tests to check whether x is
    prime.

    The combination of those two tricks got a 20% speedup. Tuning Rsh for
    negative numbers may also pay dividends.

    None the less, the Java BigInt speed is impressive.

    [1] http://tip.golang.org/src/pkg/math/big/int.go#L659


    Cheers

    AGL
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • David DENG at May 8, 2013 at 8:34 am
    An ad-hoc optimization to gcd can make it even faster (suppose a can be
    modified within gcd(), b cannot).

    var newB *big.Int = new(big.Int)
    func gcd(a, b *big.Int) *big.Int {
         b = newB.Set(b)
         for b.Sign() != 0 {
             a, b = b, a.Mod(a, b)
         }
         return a
    }


    David
    On Wednesday, May 8, 2013 4:18:50 PM UTC+8, David DENG wrote:

    I didn't check the correctness the result, but using the following gcd
    (direct from your python code) is much faster:

    func gcd(a, b *big.Int) *big.Int {
    for b.Sign() != 0 {
    a, b = b, new(big.Int).Mod(a, b)
    }
    return a
    }


    David
    On Wednesday, May 8, 2013 11:51:27 AM UTC+8, Peter Caven wrote:

    Thanks for pointing out the aliasing problems.
    I see in the big.Int code that there are a lot of checks for aliasing,
    and that many of the functions also return a pointer to the receiver in
    addition to updating it.

    Yes, I don't think there's very much we can gain by optimizing binaryGCD
    alone.
    After looking more closely at the lower level arithmetic functions in
    big.nat, I think that may be where most of the performance goes.

    Consider that the Python code I have is just these two simple high-level
    functions:

    def gcd(a,b):
    while b:
    a,b = b,(a % b)
    return a

    def rho(n):
    a = 2
    b = 2
    d = 1
    while d == 1:
    a = (a**2 + 1) % n
    b = (b**2 + 1) % n
    b = (b**2 + 1) % n
    d = gcd(a-b,n)
    return d

    Compare that to binaryGCD.
    It looks like it would be a big win to put the core functions,
    add,sub,mul,div,divmod, and the rest of big.nat in a lower-level C package.

    -- Peter

    On Tuesday, May 7, 2013 7:55:15 PM UTC-4, agl wrote:
    On Tuesday, May 7, 2013 3:47:19 PM UTC-4, Peter Caven wrote:

    It seems that there is a lot of time spent on memory allocation, but
    that's about all I can determine at this point,
    given my unfamiliarity with Go.
    The output from pprof is below.

    Any ideas on how to make big.Int faster in Go?
    You can save a few allocations and some time by avoiding some aliasing.
    For example:

    a.Mul(a, a)

    Will end up allocating because Mul cannot write the result into a as it
    goes. By using a temp variable and swapping afterwards, some allocations
    will be saved:

    t1.Mul(a, a)
    t1, a = a, t1

    (Yes, this is a little ugly.)

    But this doesn't get you too much in this test because 75% of the time
    is spent in binaryGCD[1].

    A much clearer (to me, at least) profile output results from 'gv', but
    that might not work in Windows without ghostscript installed.

    I suspect that some work in optimisation work in binaryGCD would pay
    dividends here. For example, 30 seconds and the following change resulted
    in a modest speedup:

    diff -r 93752156e2b5 src/pkg/math/big/int.go
    --- a/src/pkg/math/big/int.go Fri May 03 15:24:05 2013 -0400
    +++ b/src/pkg/math/big/int.go Tue May 07 19:51:22 2013 -0400
    @@ -703,14 +703,15 @@
    // reduce t
    t.Rsh(t, t.abs.trailingZeroBits())
    if t.neg {
    - v.Neg(t)
    + v, t = t, v
    + v.neg = len(v.abs) > 0 && !v.neg // 0 has no sign
    } else {
    - u.Set(t)
    + u, t = t, u
    }
    t.Sub(u, v)
    }

    - return u.Lsh(u, k)
    + return z.Lsh(u, k)
    }

    // ProbablyPrime performs n Miller-Rabin tests to check whether x is
    prime.

    The combination of those two tricks got a 20% speedup. Tuning Rsh for
    negative numbers may also pay dividends.

    None the less, the Java BigInt speed is impressive.

    [1] http://tip.golang.org/src/pkg/math/big/int.go#L659


    Cheers

    AGL
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Peter Caven at May 8, 2013 at 5:39 pm
    I'm seeing times that are twice as long with your code ...
    Can you post your benchmark results?

    -- Peter

    On Wednesday, May 8, 2013 4:18:50 AM UTC-4, David DENG wrote:

    I didn't check the correctness the result, but using the following gcd
    (direct from your python code) is much faster:

    func gcd(a, b *big.Int) *big.Int {
    for b.Sign() != 0 {
    a, b = b, new(big.Int).Mod(a, b)
    }
    return a
    }


    David
    On Wednesday, May 8, 2013 11:51:27 AM UTC+8, Peter Caven wrote:

    Thanks for pointing out the aliasing problems.
    I see in the big.Int code that there are a lot of checks for aliasing,
    and that many of the functions also return a pointer to the receiver in
    addition to updating it.

    Yes, I don't think there's very much we can gain by optimizing binaryGCD
    alone.
    After looking more closely at the lower level arithmetic functions in
    big.nat, I think that may be where most of the performance goes.

    Consider that the Python code I have is just these two simple high-level
    functions:

    def gcd(a,b):
    while b:
    a,b = b,(a % b)
    return a

    def rho(n):
    a = 2
    b = 2
    d = 1
    while d == 1:
    a = (a**2 + 1) % n
    b = (b**2 + 1) % n
    b = (b**2 + 1) % n
    d = gcd(a-b,n)
    return d

    Compare that to binaryGCD.
    It looks like it would be a big win to put the core functions,
    add,sub,mul,div,divmod, and the rest of big.nat in a lower-level C package.

    -- Peter

    On Tuesday, May 7, 2013 7:55:15 PM UTC-4, agl wrote:
    On Tuesday, May 7, 2013 3:47:19 PM UTC-4, Peter Caven wrote:

    It seems that there is a lot of time spent on memory allocation, but
    that's about all I can determine at this point,
    given my unfamiliarity with Go.
    The output from pprof is below.

    Any ideas on how to make big.Int faster in Go?
    You can save a few allocations and some time by avoiding some aliasing.
    For example:

    a.Mul(a, a)

    Will end up allocating because Mul cannot write the result into a as it
    goes. By using a temp variable and swapping afterwards, some allocations
    will be saved:

    t1.Mul(a, a)
    t1, a = a, t1

    (Yes, this is a little ugly.)

    But this doesn't get you too much in this test because 75% of the time
    is spent in binaryGCD[1].

    A much clearer (to me, at least) profile output results from 'gv', but
    that might not work in Windows without ghostscript installed.

    I suspect that some work in optimisation work in binaryGCD would pay
    dividends here. For example, 30 seconds and the following change resulted
    in a modest speedup:

    diff -r 93752156e2b5 src/pkg/math/big/int.go
    --- a/src/pkg/math/big/int.go Fri May 03 15:24:05 2013 -0400
    +++ b/src/pkg/math/big/int.go Tue May 07 19:51:22 2013 -0400
    @@ -703,14 +703,15 @@
    // reduce t
    t.Rsh(t, t.abs.trailingZeroBits())
    if t.neg {
    - v.Neg(t)
    + v, t = t, v
    + v.neg = len(v.abs) > 0 && !v.neg // 0 has no sign
    } else {
    - u.Set(t)
    + u, t = t, u
    }
    t.Sub(u, v)
    }

    - return u.Lsh(u, k)
    + return z.Lsh(u, k)
    }

    // ProbablyPrime performs n Miller-Rabin tests to check whether x is
    prime.

    The combination of those two tricks got a 20% speedup. Tuning Rsh for
    negative numbers may also pay dividends.

    None the less, the Java BigInt speed is impressive.

    [1] http://tip.golang.org/src/pkg/math/big/int.go#L659


    Cheers

    AGL
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • David DENG at May 10, 2013 at 2:02 am
    Maybe it's because I was using go 1.0.3 (windows). I'll try it later on go
    1.1.

    David
    On Thursday, May 9, 2013 1:39:32 AM UTC+8, Peter Caven wrote:

    I'm seeing times that are twice as long with your code ...
    Can you post your benchmark results?

    -- Peter

    On Wednesday, May 8, 2013 4:18:50 AM UTC-4, David DENG wrote:

    I didn't check the correctness the result, but using the following gcd
    (direct from your python code) is much faster:

    func gcd(a, b *big.Int) *big.Int {
    for b.Sign() != 0 {
    a, b = b, new(big.Int).Mod(a, b)
    }
    return a
    }


    David
    On Wednesday, May 8, 2013 11:51:27 AM UTC+8, Peter Caven wrote:

    Thanks for pointing out the aliasing problems.
    I see in the big.Int code that there are a lot of checks for aliasing,
    and that many of the functions also return a pointer to the receiver in
    addition to updating it.

    Yes, I don't think there's very much we can gain by optimizing binaryGCD
    alone.
    After looking more closely at the lower level arithmetic functions in
    big.nat, I think that may be where most of the performance goes.

    Consider that the Python code I have is just these two simple high-level
    functions:

    def gcd(a,b):
    while b:
    a,b = b,(a % b)
    return a

    def rho(n):
    a = 2
    b = 2
    d = 1
    while d == 1:
    a = (a**2 + 1) % n
    b = (b**2 + 1) % n
    b = (b**2 + 1) % n
    d = gcd(a-b,n)
    return d

    Compare that to binaryGCD.
    It looks like it would be a big win to put the core functions,
    add,sub,mul,div,divmod, and the rest of big.nat in a lower-level C package.

    -- Peter

    On Tuesday, May 7, 2013 7:55:15 PM UTC-4, agl wrote:
    On Tuesday, May 7, 2013 3:47:19 PM UTC-4, Peter Caven wrote:

    It seems that there is a lot of time spent on memory allocation, but
    that's about all I can determine at this point,
    given my unfamiliarity with Go.
    The output from pprof is below.

    Any ideas on how to make big.Int faster in Go?
    You can save a few allocations and some time by avoiding some aliasing.
    For example:

    a.Mul(a, a)

    Will end up allocating because Mul cannot write the result into a as it
    goes. By using a temp variable and swapping afterwards, some allocations
    will be saved:

    t1.Mul(a, a)
    t1, a = a, t1

    (Yes, this is a little ugly.)

    But this doesn't get you too much in this test because 75% of the time
    is spent in binaryGCD[1].

    A much clearer (to me, at least) profile output results from 'gv', but
    that might not work in Windows without ghostscript installed.

    I suspect that some work in optimisation work in binaryGCD would pay
    dividends here. For example, 30 seconds and the following change resulted
    in a modest speedup:

    diff -r 93752156e2b5 src/pkg/math/big/int.go
    --- a/src/pkg/math/big/int.go Fri May 03 15:24:05 2013 -0400
    +++ b/src/pkg/math/big/int.go Tue May 07 19:51:22 2013 -0400
    @@ -703,14 +703,15 @@
    // reduce t
    t.Rsh(t, t.abs.trailingZeroBits())
    if t.neg {
    - v.Neg(t)
    + v, t = t, v
    + v.neg = len(v.abs) > 0 && !v.neg // 0 has no
    sign
    } else {
    - u.Set(t)
    + u, t = t, u
    }
    t.Sub(u, v)
    }

    - return u.Lsh(u, k)
    + return z.Lsh(u, k)
    }

    // ProbablyPrime performs n Miller-Rabin tests to check whether x is
    prime.

    The combination of those two tricks got a 20% speedup. Tuning Rsh for
    negative numbers may also pay dividends.

    None the less, the Java BigInt speed is impressive.

    [1] http://tip.golang.org/src/pkg/math/big/int.go#L659


    Cheers

    AGL
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Robert Griesemer at May 8, 2013 at 12:47 am
    agl already gave good answers. I haven't looked at this in detail, but
    there's a few things to keep in mind:

    1) Go's math/big library is a native Go with some core assembly routines.
    It's written from scratch and while it has reasonable performance, it
    simply cannot compete with say the GMP library which has years of
    development and big integer arithmetic specialists working on it and adding
    the latest mathematic discoveries in fast multiplication and division.

    2) Using a.Op(a, b) instead of c.Op(a, b) should provide a significant
    performance improvement in many cases where one of the operands (a) can be
    re-used for the result.

    3) Most other languages have very fast bignum arithmetic, but they
    typically just call the GMP (or equivalent) C routines. One has to be
    careful when comparing languages this way: It's not the language, it the
    underlying library. I'm actually surprised the Python is not much faster
    than Go.

    4) I suspect the main culprit our div/mod implementation which is textbook
    (Knuth) and could use improvements.

    - gri


    On Tue, May 7, 2013 at 12:47 PM, wrote:

    Hello Everyone,

    I've started to learn Go so I was interested in comparing Go performance
    against some other languages.
    I have a simple microbenchmark that I use to get a quick idea of relative
    performance.
    I was disappointed to see that the performance of Go was not as as good as
    I expected.

    My benchmark factors a couple of big integers using Pollard's Rho method:

    N5 = 766150476015982127183457373
    N6 = 62823675885202669644104829577

    Both are the products of just two primes.

    -------------------------------------------
    Go 1.1rc2 :

    C:\Projects\GoCode\src\rho>go test -v
    === RUN: ExampleN5
    --- PASS: ExampleN5 (10.053s)
    === RUN: ExampleN6
    --- PASS: ExampleN6 (15.348s)
    PASS
    ok rho 25.452s

    C:\Projects\GoCode\src\rho>go test -cpuprofile=rho.prof -bench="Both"
    -benchmem=true
    testing: warning: no tests to run
    PASS
    BenchmarkBoth 1 25958000000 ns/op 1105711392 B/op
    25343461 allocs/op
    ok rho 26.018s

    -------------------------------------------
    Python 3.3 :
    rho(N5) = 1178524040059
    9.34 s per run
    rho(N6) = 663410067979
    15.11 s per run

    ------------------------------------------
    Scala 2.10.1 :
    scala> TimeRho(N5)
    rho(766150476015982127183457373) = 1178524040059
    Time = 2112 ms
    scala> TimeRho(N6)
    rho(62823675885202669644104829577) = 663410067979
    Time = 3104 ms
    -------------------------------------------

    As you can see above, Go and Python have about the same speed, while Scala
    (and Java) are much faster.
    This surprised me because I had thought that because Go is statically
    typed it would have been faster than Python,
    and I expected its performance to be closer to Scala or Java.

    I've attached the Go files if anyone is interested.

    After some editing to make pprof work on Windows, I managed to collect a
    profile.
    It seems that there is a lot of time spent on memory allocation, but
    that's about all I can determine at this point,
    given my unfamiliarity with Go.
    The output from pprof is below.

    Any ideas on how to make big.Int faster in Go?

    -- Peter

    -------------------------------------------------------------
    C:\Projects\GoCode\src\rho>go tool pprof rho.test.exe rho.prof
    The system cannot find the path specified.
    Welcome to pprof! For help, type 'help'.
    (pprof) top50 --cum
    Total: 2593 samples
    0 0.0% 0.0% 2370 91.4% gosched0
    0 0.0% 0.0% 2370 91.4% rho.BenchmarkBoth
    3 0.1% 0.1% 2370 91.4% rho.Rho
    0 0.0% 0.1% 2370 91.4% testing.(*B).launch
    0 0.0% 0.1% 2370 91.4% testing.(*B).runN
    0 0.0% 0.1% 1906 73.5% math/big.(*Int).GCD
    144 5.6% 5.7% 1902 73.4% math/big.(*Int).binaryGCD
    276 10.6% 16.3% 606 23.4% math/big.(*Int).Sub
    156 6.0% 22.3% 596 23.0% math/big.(*Int).Rsh
    104 4.0% 26.3% 431 16.6% math/big.(*Int).Set
    240 9.3% 35.6% 384 14.8% math/big.nat.sub
    79 3.0% 38.6% 364 14.0% math/big.nat.set
    7 0.3% 38.9% 346 13.3% math/big.(*Int).Mod
    110 4.2% 43.2% 326 12.6% math/big.nat.make
    13 0.5% 43.7% 309 11.9% math/big.(*Int).QuoRem
    8 0.3% 44.0% 297 11.5% math/big.nat.div
    65 2.5% 46.5% 289 11.1% math/big.nat.divLarge
    59 2.3% 48.7% 267 10.3% runtime.makeslice
    54 2.1% 50.8% 238 9.2% runtime.mallocgc
    8 0.3% 51.1% 207 8.0% makeslice1
    61 2.4% 53.5% 205 7.9% runtime.copy
    106 4.1% 57.6% 198 7.6% math/big.nat.shr
    22 0.8% 58.4% 196 7.6% math/big.(*Int).Neg
    84 3.2% 61.7% 163 6.3% math/big.nat.add
    152 5.9% 67.5% 152 5.9% math/big.nat.norm
    143 5.5% 73.0% 143 5.5% runtime.memmove
    125 4.8% 77.9% 125 4.8% math/big.nat.cmp
    64 2.5% 80.3% 120 4.6% math/big.nat.trailingZeroBits
    10 0.4% 80.7% 80 3.1% math/big.(*Int).Mul
    0 0.0% 80.7% 78 3.0% gc
    24 0.9% 81.6% 78 3.0% runtime.MCache_Alloc
    0 0.0% 81.6% 78 3.0% runtime.gc
    7 0.3% 81.9% 70 2.7% math/big.nat.mul
    66 2.5% 84.5% 66 2.5% math/big.trailingZeroBits
    64 2.5% 86.9% 64 2.5% math/big.subVV
    0 0.0% 86.9% 62 2.4% runtime.parfordo
    6 0.2% 87.2% 51 2.0% runtime.MCentral_AllocList
    4 0.2% 87.3% 50 1.9% runtime.new
    34 1.3% 88.6% 44 1.7% sweepspan
    5 0.2% 88.8% 42 1.6% MCentral_Grow
    34 1.3% 90.1% 34 1.3% runtime.markallocated
    33 1.3% 91.4% 33 1.3% math/big.shrVU
    27 1.0% 92.4% 30 1.2% scanblock
    14 0.5% 93.0% 28 1.1% math/big.basicMul
    22 0.8% 93.8% 22 0.8% math/big.divWW
    7 0.3% 94.1% 19 0.7% math/big.(*Int).Add
    0 0.0% 94.1% 18 0.7% markroot
    17 0.7% 94.8% 17 0.7% runtime.markspan
    0 0.0% 94.8% 16 0.6% runtime.MHeap_Alloc
    12 0.5% 95.2% 12 0.5% math/big.addVV

    (pprof) top50
    Total: 2593 samples
    276 10.6% 10.6% 606 23.4% math/big.(*Int).Sub
    240 9.3% 19.9% 384 14.8% math/big.nat.sub
    156 6.0% 25.9% 596 23.0% math/big.(*Int).Rsh
    152 5.9% 31.8% 152 5.9% math/big.nat.norm
    144 5.6% 37.3% 1902 73.4% math/big.(*Int).binaryGCD
    143 5.5% 42.8% 143 5.5% runtime.memmove
    125 4.8% 47.7% 125 4.8% math/big.nat.cmp
    110 4.2% 51.9% 326 12.6% math/big.nat.make
    106 4.1% 56.0% 198 7.6% math/big.nat.shr
    104 4.0% 60.0% 431 16.6% math/big.(*Int).Set
    84 3.2% 63.2% 163 6.3% math/big.nat.add
    79 3.0% 66.3% 364 14.0% math/big.nat.set
    66 2.5% 68.8% 66 2.5% math/big.trailingZeroBits
    65 2.5% 71.3% 289 11.1% math/big.nat.divLarge
    64 2.5% 73.8% 120 4.6% math/big.nat.trailingZeroBits
    64 2.5% 76.3% 64 2.5% math/big.subVV
    61 2.4% 78.6% 205 7.9% runtime.copy
    59 2.3% 80.9% 267 10.3% runtime.makeslice
    54 2.1% 83.0% 238 9.2% runtime.mallocgc
    34 1.3% 84.3% 34 1.3% runtime.markallocated
    34 1.3% 85.6% 44 1.7% sweepspan
    33 1.3% 86.9% 33 1.3% math/big.shrVU
    27 1.0% 87.9% 30 1.2% scanblock
    24 0.9% 88.9% 78 3.0% runtime.MCache_Alloc
    22 0.8% 89.7% 196 7.6% math/big.(*Int).Neg
    22 0.8% 90.6% 22 0.8% math/big.divWW
    17 0.7% 91.2% 17 0.7% runtime.markspan
    14 0.5% 91.7% 28 1.1% math/big.basicMul
    13 0.5% 92.2% 309 11.9% math/big.(*Int).QuoRem
    12 0.5% 92.7% 12 0.5% math/big.addVV
    12 0.5% 93.2% 12 0.5% math/big.nat.clear
    11 0.4% 93.6% 11 0.4% runtime.memclr
    11 0.4% 94.0% 11 0.4% runtime.settype_flush
    10 0.4% 94.4% 80 3.1% math/big.(*Int).Mul
    9 0.3% 94.8% 9 0.3% math/big.addVW
    8 0.3% 95.1% 207 8.0% makeslice1
    8 0.3% 95.4% 8 0.3% math/big.addMulVVW
    8 0.3% 95.7% 297 11.5% math/big.nat.div
    8 0.3% 96.0% 8 0.3% runtime.SizeToClass
    8 0.3% 96.3% 8 0.3% runtime.casp
    7 0.3% 96.6% 19 0.7% math/big.(*Int).Add
    7 0.3% 96.8% 346 13.3% math/big.(*Int).Mod
    7 0.3% 97.1% 70 2.7% math/big.nat.mul
    6 0.2% 97.3% 6 0.2% math/big.subVW
    6 0.2% 97.6% 51 2.0% runtime.MCentral_AllocList
    5 0.2% 97.8% 42 1.6% MCentral_Grow
    5 0.2% 98.0% 11 0.4% math/big.(*Int).Lsh
    4 0.2% 98.1% 4 0.2% math/big.mulWW
    4 0.2% 98.3% 50 1.9% runtime.new
    3 0.1% 98.4% 3 0.1% math/big.(*Int).Cmp

    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Uli Kunitz at May 8, 2013 at 11:01 pm
    In the sources I find only the Python and the Go code. Is the same
    algorithm used in Scala and Java? There are modifications to the Pollard's
    Rho methods that don't require to compute the GCD for every cycle in the
    loop.

    Could you provide the Java or Scala source that you are using?
    On Wednesday, May 8, 2013 2:47:14 AM UTC+2, gri wrote:

    agl already gave good answers. I haven't looked at this in detail, but
    there's a few things to keep in mind:

    1) Go's math/big library is a native Go with some core assembly routines.
    It's written from scratch and while it has reasonable performance, it
    simply cannot compete with say the GMP library which has years of
    development and big integer arithmetic specialists working on it and adding
    the latest mathematic discoveries in fast multiplication and division.

    2) Using a.Op(a, b) instead of c.Op(a, b) should provide a significant
    performance improvement in many cases where one of the operands (a) can be
    re-used for the result.

    3) Most other languages have very fast bignum arithmetic, but they
    typically just call the GMP (or equivalent) C routines. One has to be
    careful when comparing languages this way: It's not the language, it the
    underlying library. I'm actually surprised the Python is not much faster
    than Go.

    4) I suspect the main culprit our div/mod implementation which is textbook
    (Knuth) and could use improvements.

    - gri


    On Tue, May 7, 2013 at 12:47 PM, <pca...@gmail.com <javascript:>> wrote:

    Hello Everyone,

    I've started to learn Go so I was interested in comparing Go performance
    against some other languages.
    I have a simple microbenchmark that I use to get a quick idea of relative
    performance.
    I was disappointed to see that the performance of Go was not as as good
    as I expected.

    My benchmark factors a couple of big integers using Pollard's Rho method:

    N5 = 766150476015982127183457373
    N6 = 62823675885202669644104829577

    Both are the products of just two primes.

    -------------------------------------------
    Go 1.1rc2 :

    C:\Projects\GoCode\src\rho>go test -v
    === RUN: ExampleN5
    --- PASS: ExampleN5 (10.053s)
    === RUN: ExampleN6
    --- PASS: ExampleN6 (15.348s)
    PASS
    ok rho 25.452s

    C:\Projects\GoCode\src\rho>go test -cpuprofile=rho.prof -bench="Both"
    -benchmem=true
    testing: warning: no tests to run
    PASS
    BenchmarkBoth 1 25958000000 ns/op 1105711392 B/op
    25343461 allocs/op
    ok rho 26.018s

    -------------------------------------------
    Python 3.3 :
    rho(N5) = 1178524040059
    9.34 s per run
    rho(N6) = 663410067979
    15.11 s per run

    ------------------------------------------
    Scala 2.10.1 :
    scala> TimeRho(N5)
    rho(766150476015982127183457373) = 1178524040059
    Time = 2112 ms
    scala> TimeRho(N6)
    rho(62823675885202669644104829577) = 663410067979
    Time = 3104 ms
    -------------------------------------------

    As you can see above, Go and Python have about the same speed, while
    Scala (and Java) are much faster.
    This surprised me because I had thought that because Go is statically
    typed it would have been faster than Python,
    and I expected its performance to be closer to Scala or Java.

    I've attached the Go files if anyone is interested.

    After some editing to make pprof work on Windows, I managed to collect a
    profile.
    It seems that there is a lot of time spent on memory allocation, but
    that's about all I can determine at this point,
    given my unfamiliarity with Go.
    The output from pprof is below.

    Any ideas on how to make big.Int faster in Go?

    -- Peter

    -------------------------------------------------------------
    C:\Projects\GoCode\src\rho>go tool pprof rho.test.exe rho.prof
    The system cannot find the path specified.
    Welcome to pprof! For help, type 'help'.
    (pprof) top50 --cum
    Total: 2593 samples
    0 0.0% 0.0% 2370 91.4% gosched0
    0 0.0% 0.0% 2370 91.4% rho.BenchmarkBoth
    3 0.1% 0.1% 2370 91.4% rho.Rho
    0 0.0% 0.1% 2370 91.4% testing.(*B).launch
    0 0.0% 0.1% 2370 91.4% testing.(*B).runN
    0 0.0% 0.1% 1906 73.5% math/big.(*Int).GCD
    144 5.6% 5.7% 1902 73.4% math/big.(*Int).binaryGCD
    276 10.6% 16.3% 606 23.4% math/big.(*Int).Sub
    156 6.0% 22.3% 596 23.0% math/big.(*Int).Rsh
    104 4.0% 26.3% 431 16.6% math/big.(*Int).Set
    240 9.3% 35.6% 384 14.8% math/big.nat.sub
    79 3.0% 38.6% 364 14.0% math/big.nat.set
    7 0.3% 38.9% 346 13.3% math/big.(*Int).Mod
    110 4.2% 43.2% 326 12.6% math/big.nat.make
    13 0.5% 43.7% 309 11.9% math/big.(*Int).QuoRem
    8 0.3% 44.0% 297 11.5% math/big.nat.div
    65 2.5% 46.5% 289 11.1% math/big.nat.divLarge
    59 2.3% 48.7% 267 10.3% runtime.makeslice
    54 2.1% 50.8% 238 9.2% runtime.mallocgc
    8 0.3% 51.1% 207 8.0% makeslice1
    61 2.4% 53.5% 205 7.9% runtime.copy
    106 4.1% 57.6% 198 7.6% math/big.nat.shr
    22 0.8% 58.4% 196 7.6% math/big.(*Int).Neg
    84 3.2% 61.7% 163 6.3% math/big.nat.add
    152 5.9% 67.5% 152 5.9% math/big.nat.norm
    143 5.5% 73.0% 143 5.5% runtime.memmove
    125 4.8% 77.9% 125 4.8% math/big.nat.cmp
    64 2.5% 80.3% 120 4.6% math/big.nat.trailingZeroBits
    10 0.4% 80.7% 80 3.1% math/big.(*Int).Mul
    0 0.0% 80.7% 78 3.0% gc
    24 0.9% 81.6% 78 3.0% runtime.MCache_Alloc
    0 0.0% 81.6% 78 3.0% runtime.gc
    7 0.3% 81.9% 70 2.7% math/big.nat.mul
    66 2.5% 84.5% 66 2.5% math/big.trailingZeroBits
    64 2.5% 86.9% 64 2.5% math/big.subVV
    0 0.0% 86.9% 62 2.4% runtime.parfordo
    6 0.2% 87.2% 51 2.0% runtime.MCentral_AllocList
    4 0.2% 87.3% 50 1.9% runtime.new
    34 1.3% 88.6% 44 1.7% sweepspan
    5 0.2% 88.8% 42 1.6% MCentral_Grow
    34 1.3% 90.1% 34 1.3% runtime.markallocated
    33 1.3% 91.4% 33 1.3% math/big.shrVU
    27 1.0% 92.4% 30 1.2% scanblock
    14 0.5% 93.0% 28 1.1% math/big.basicMul
    22 0.8% 93.8% 22 0.8% math/big.divWW
    7 0.3% 94.1% 19 0.7% math/big.(*Int).Add
    0 0.0% 94.1% 18 0.7% markroot
    17 0.7% 94.8% 17 0.7% runtime.markspan
    0 0.0% 94.8% 16 0.6% runtime.MHeap_Alloc
    12 0.5% 95.2% 12 0.5% math/big.addVV

    (pprof) top50
    Total: 2593 samples
    276 10.6% 10.6% 606 23.4% math/big.(*Int).Sub
    240 9.3% 19.9% 384 14.8% math/big.nat.sub
    156 6.0% 25.9% 596 23.0% math/big.(*Int).Rsh
    152 5.9% 31.8% 152 5.9% math/big.nat.norm
    144 5.6% 37.3% 1902 73.4% math/big.(*Int).binaryGCD
    143 5.5% 42.8% 143 5.5% runtime.memmove
    125 4.8% 47.7% 125 4.8% math/big.nat.cmp
    110 4.2% 51.9% 326 12.6% math/big.nat.make
    106 4.1% 56.0% 198 7.6% math/big.nat.shr
    104 4.0% 60.0% 431 16.6% math/big.(*Int).Set
    84 3.2% 63.2% 163 6.3% math/big.nat.add
    79 3.0% 66.3% 364 14.0% math/big.nat.set
    66 2.5% 68.8% 66 2.5% math/big.trailingZeroBits
    65 2.5% 71.3% 289 11.1% math/big.nat.divLarge
    64 2.5% 73.8% 120 4.6% math/big.nat.trailingZeroBits
    64 2.5% 76.3% 64 2.5% math/big.subVV
    61 2.4% 78.6% 205 7.9% runtime.copy
    59 2.3% 80.9% 267 10.3% runtime.makeslice
    54 2.1% 83.0% 238 9.2% runtime.mallocgc
    34 1.3% 84.3% 34 1.3% runtime.markallocated
    34 1.3% 85.6% 44 1.7% sweepspan
    33 1.3% 86.9% 33 1.3% math/big.shrVU
    27 1.0% 87.9% 30 1.2% scanblock
    24 0.9% 88.9% 78 3.0% runtime.MCache_Alloc
    22 0.8% 89.7% 196 7.6% math/big.(*Int).Neg
    22 0.8% 90.6% 22 0.8% math/big.divWW
    17 0.7% 91.2% 17 0.7% runtime.markspan
    14 0.5% 91.7% 28 1.1% math/big.basicMul
    13 0.5% 92.2% 309 11.9% math/big.(*Int).QuoRem
    12 0.5% 92.7% 12 0.5% math/big.addVV
    12 0.5% 93.2% 12 0.5% math/big.nat.clear
    11 0.4% 93.6% 11 0.4% runtime.memclr
    11 0.4% 94.0% 11 0.4% runtime.settype_flush
    10 0.4% 94.4% 80 3.1% math/big.(*Int).Mul
    9 0.3% 94.8% 9 0.3% math/big.addVW
    8 0.3% 95.1% 207 8.0% makeslice1
    8 0.3% 95.4% 8 0.3% math/big.addMulVVW
    8 0.3% 95.7% 297 11.5% math/big.nat.div
    8 0.3% 96.0% 8 0.3% runtime.SizeToClass
    8 0.3% 96.3% 8 0.3% runtime.casp
    7 0.3% 96.6% 19 0.7% math/big.(*Int).Add
    7 0.3% 96.8% 346 13.3% math/big.(*Int).Mod
    7 0.3% 97.1% 70 2.7% math/big.nat.mul
    6 0.2% 97.3% 6 0.2% math/big.subVW
    6 0.2% 97.6% 51 2.0% runtime.MCentral_AllocList
    5 0.2% 97.8% 42 1.6% MCentral_Grow
    5 0.2% 98.0% 11 0.4% math/big.(*Int).Lsh
    4 0.2% 98.1% 4 0.2% math/big.mulWW
    4 0.2% 98.3% 50 1.9% runtime.new
    3 0.1% 98.4% 3 0.1% math/big.(*Int).Cmp

    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts...@googlegroups.com <javascript:>.
    For more options, visit https://groups.google.com/groups/opt_out.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Peter Caven at May 9, 2013 at 12:53 am
    Sure, it's essentially just a literal translation of the Python code, and
    makes use of the Java BigInteger gcd():


    val N5 = BigInt("766150476015982127183457373")
    val N6 = BigInt("62823675885202669644104829577")

    def rho(n : BigInt) : BigInt = {
         var a = BigInt(2)
         var b = BigInt(2)
         var d = BigInt(1)

         while (d == 1)
         {
             a = ((a * a) + 1) % n
             b = ((b * b) + 1) % n
             b = ((b * b) + 1) % n

             d = n.gcd(a - b)
         }
         d
    }

    def TimeRho(n : BigInt) = {
         val t1 = System.currentTimeMillis()
         val r = rho(n)
         val t2 = System.currentTimeMillis()
         println("rho(" + n + ") = " + r + " \nTime = " + (t2 - t1) + " ms")
    }

    Enjoy,
    -- Peter


    On Wednesday, May 8, 2013 7:00:44 PM UTC-4, Uli Kunitz wrote:

    In the sources I find only the Python and the Go code. Is the same
    algorithm used in Scala and Java? There are modifications to the Pollard's
    Rho methods that don't require to compute the GCD for every cycle in the
    loop.

    Could you provide the Java or Scala source that you are using?
    On Wednesday, May 8, 2013 2:47:14 AM UTC+2, gri wrote:

    agl already gave good answers. I haven't looked at this in detail, but
    there's a few things to keep in mind:

    1) Go's math/big library is a native Go with some core assembly routines.
    It's written from scratch and while it has reasonable performance, it
    simply cannot compete with say the GMP library which has years of
    development and big integer arithmetic specialists working on it and adding
    the latest mathematic discoveries in fast multiplication and division.

    2) Using a.Op(a, b) instead of c.Op(a, b) should provide a significant
    performance improvement in many cases where one of the operands (a) can be
    re-used for the result.

    3) Most other languages have very fast bignum arithmetic, but they
    typically just call the GMP (or equivalent) C routines. One has to be
    careful when comparing languages this way: It's not the language, it the
    underlying library. I'm actually surprised the Python is not much faster
    than Go.

    4) I suspect the main culprit our div/mod implementation which is
    textbook (Knuth) and could use improvements.

    - gri


    On Tue, May 7, 2013 at 12:47 PM, wrote:

    Hello Everyone,

    I've started to learn Go so I was interested in comparing Go performance
    against some other languages.
    I have a simple microbenchmark that I use to get a quick idea of
    relative performance.
    I was disappointed to see that the performance of Go was not as as good
    as I expected.

    My benchmark factors a couple of big integers using Pollard's Rho method:

    N5 = 766150476015982127183457373
    N6 = 62823675885202669644104829577

    Both are the products of just two primes.

    -------------------------------------------
    Go 1.1rc2 :

    C:\Projects\GoCode\src\rho>go test -v
    === RUN: ExampleN5
    --- PASS: ExampleN5 (10.053s)
    === RUN: ExampleN6
    --- PASS: ExampleN6 (15.348s)
    PASS
    ok rho 25.452s

    C:\Projects\GoCode\src\rho>go test -cpuprofile=rho.prof -bench="Both"
    -benchmem=true
    testing: warning: no tests to run
    PASS
    BenchmarkBoth 1 25958000000 ns/op 1105711392 B/op
    25343461 allocs/op
    ok rho 26.018s

    -------------------------------------------
    Python 3.3 :
    rho(N5) = 1178524040059
    9.34 s per run
    rho(N6) = 663410067979
    15.11 s per run

    ------------------------------------------
    Scala 2.10.1 :
    scala> TimeRho(N5)
    rho(766150476015982127183457373) = 1178524040059
    Time = 2112 ms
    scala> TimeRho(N6)
    rho(62823675885202669644104829577) = 663410067979
    Time = 3104 ms
    -------------------------------------------

    As you can see above, Go and Python have about the same speed, while
    Scala (and Java) are much faster.
    This surprised me because I had thought that because Go is statically
    typed it would have been faster than Python,
    and I expected its performance to be closer to Scala or Java.

    I've attached the Go files if anyone is interested.

    After some editing to make pprof work on Windows, I managed to collect a
    profile.
    It seems that there is a lot of time spent on memory allocation, but
    that's about all I can determine at this point,
    given my unfamiliarity with Go.
    The output from pprof is below.

    Any ideas on how to make big.Int faster in Go?

    -- Peter

    -------------------------------------------------------------
    C:\Projects\GoCode\src\rho>go tool pprof rho.test.exe rho.prof
    The system cannot find the path specified.
    Welcome to pprof! For help, type 'help'.
    (pprof) top50 --cum
    Total: 2593 samples
    0 0.0% 0.0% 2370 91.4% gosched0
    0 0.0% 0.0% 2370 91.4% rho.BenchmarkBoth
    3 0.1% 0.1% 2370 91.4% rho.Rho
    0 0.0% 0.1% 2370 91.4% testing.(*B).launch
    0 0.0% 0.1% 2370 91.4% testing.(*B).runN
    0 0.0% 0.1% 1906 73.5% math/big.(*Int).GCD
    144 5.6% 5.7% 1902 73.4% math/big.(*Int).binaryGCD
    276 10.6% 16.3% 606 23.4% math/big.(*Int).Sub
    156 6.0% 22.3% 596 23.0% math/big.(*Int).Rsh
    104 4.0% 26.3% 431 16.6% math/big.(*Int).Set
    240 9.3% 35.6% 384 14.8% math/big.nat.sub
    79 3.0% 38.6% 364 14.0% math/big.nat.set
    7 0.3% 38.9% 346 13.3% math/big.(*Int).Mod
    110 4.2% 43.2% 326 12.6% math/big.nat.make
    13 0.5% 43.7% 309 11.9% math/big.(*Int).QuoRem
    8 0.3% 44.0% 297 11.5% math/big.nat.div
    65 2.5% 46.5% 289 11.1% math/big.nat.divLarge
    59 2.3% 48.7% 267 10.3% runtime.makeslice
    54 2.1% 50.8% 238 9.2% runtime.mallocgc
    8 0.3% 51.1% 207 8.0% makeslice1
    61 2.4% 53.5% 205 7.9% runtime.copy
    106 4.1% 57.6% 198 7.6% math/big.nat.shr
    22 0.8% 58.4% 196 7.6% math/big.(*Int).Neg
    84 3.2% 61.7% 163 6.3% math/big.nat.add
    152 5.9% 67.5% 152 5.9% math/big.nat.norm
    143 5.5% 73.0% 143 5.5% runtime.memmove
    125 4.8% 77.9% 125 4.8% math/big.nat.cmp
    64 2.5% 80.3% 120 4.6% math/big.nat.trailingZeroBits
    10 0.4% 80.7% 80 3.1% math/big.(*Int).Mul
    0 0.0% 80.7% 78 3.0% gc
    24 0.9% 81.6% 78 3.0% runtime.MCache_Alloc
    0 0.0% 81.6% 78 3.0% runtime.gc
    7 0.3% 81.9% 70 2.7% math/big.nat.mul
    66 2.5% 84.5% 66 2.5% math/big.trailingZeroBits
    64 2.5% 86.9% 64 2.5% math/big.subVV
    0 0.0% 86.9% 62 2.4% runtime.parfordo
    6 0.2% 87.2% 51 2.0% runtime.MCentral_AllocList
    4 0.2% 87.3% 50 1.9% runtime.new
    34 1.3% 88.6% 44 1.7% sweepspan
    5 0.2% 88.8% 42 1.6% MCentral_Grow
    34 1.3% 90.1% 34 1.3% runtime.markallocated
    33 1.3% 91.4% 33 1.3% math/big.shrVU
    27 1.0% 92.4% 30 1.2% scanblock
    14 0.5% 93.0% 28 1.1% math/big.basicMul
    22 0.8% 93.8% 22 0.8% math/big.divWW
    7 0.3% 94.1% 19 0.7% math/big.(*Int).Add
    0 0.0% 94.1% 18 0.7% markroot
    17 0.7% 94.8% 17 0.7% runtime.markspan
    0 0.0% 94.8% 16 0.6% runtime.MHeap_Alloc
    12 0.5% 95.2% 12 0.5% math/big.addVV

    (pprof) top50
    Total: 2593 samples
    276 10.6% 10.6% 606 23.4% math/big.(*Int).Sub
    240 9.3% 19.9% 384 14.8% math/big.nat.sub
    156 6.0% 25.9% 596 23.0% math/big.(*Int).Rsh
    152 5.9% 31.8% 152 5.9% math/big.nat.norm
    144 5.6% 37.3% 1902 73.4% math/big.(*Int).binaryGCD
    143 5.5% 42.8% 143 5.5% runtime.memmove
    125 4.8% 47.7% 125 4.8% math/big.nat.cmp
    110 4.2% 51.9% 326 12.6% math/big.nat.make
    106 4.1% 56.0% 198 7.6% math/big.nat.shr
    104 4.0% 60.0% 431 16.6% math/big.(*Int).Set
    84 3.2% 63.2% 163 6.3% math/big.nat.add
    79 3.0% 66.3% 364 14.0% math/big.nat.set
    66 2.5% 68.8% 66 2.5% math/big.trailingZeroBits
    65 2.5% 71.3% 289 11.1% math/big.nat.divLarge
    64 2.5% 73.8% 120 4.6% math/big.nat.trailingZeroBits
    64 2.5% 76.3% 64 2.5% math/big.subVV
    61 2.4% 78.6% 205 7.9% runtime.copy
    59 2.3% 80.9% 267 10.3% runtime.makeslice
    54 2.1% 83.0% 238 9.2% runtime.mallocgc
    34 1.3% 84.3% 34 1.3% runtime.markallocated
    34 1.3% 85.6% 44 1.7% sweepspan
    33 1.3% 86.9% 33 1.3% math/big.shrVU
    27 1.0% 87.9% 30 1.2% scanblock
    24 0.9% 88.9% 78 3.0% runtime.MCache_Alloc
    22 0.8% 89.7% 196 7.6% math/big.(*Int).Neg
    22 0.8% 90.6% 22 0.8% math/big.divWW
    17 0.7% 91.2% 17 0.7% runtime.markspan
    14 0.5% 91.7% 28 1.1% math/big.basicMul
    13 0.5% 92.2% 309 11.9% math/big.(*Int).QuoRem
    12 0.5% 92.7% 12 0.5% math/big.addVV
    12 0.5% 93.2% 12 0.5% math/big.nat.clear
    11 0.4% 93.6% 11 0.4% runtime.memclr
    11 0.4% 94.0% 11 0.4% runtime.settype_flush
    10 0.4% 94.4% 80 3.1% math/big.(*Int).Mul
    9 0.3% 94.8% 9 0.3% math/big.addVW
    8 0.3% 95.1% 207 8.0% makeslice1
    8 0.3% 95.4% 8 0.3% math/big.addMulVVW
    8 0.3% 95.7% 297 11.5% math/big.nat.div
    8 0.3% 96.0% 8 0.3% runtime.SizeToClass
    8 0.3% 96.3% 8 0.3% runtime.casp
    7 0.3% 96.6% 19 0.7% math/big.(*Int).Add
    7 0.3% 96.8% 346 13.3% math/big.(*Int).Mod
    7 0.3% 97.1% 70 2.7% math/big.nat.mul
    6 0.2% 97.3% 6 0.2% math/big.subVW
    6 0.2% 97.6% 51 2.0% runtime.MCentral_AllocList
    5 0.2% 97.8% 42 1.6% MCentral_Grow
    5 0.2% 98.0% 11 0.4% math/big.(*Int).Lsh
    4 0.2% 98.1% 4 0.2% math/big.mulWW
    4 0.2% 98.3% 50 1.9% runtime.new
    3 0.1% 98.4% 3 0.1% math/big.(*Int).Cmp

    --
    You received this message because you are subscribed to the Google
    Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to golang-nuts...@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedMay 7, '13 at 10:53p
activeMay 10, '13 at 2:02a
posts15
users7
websitegolang.org

People

Translate

site design / logo © 2022 Grokbase