FAQ
There appears to be assembly code for math.Sin under 386 but not for amd64
or ARM. Instead the package uses Go code for those architectures.

I couldn't find anything in the change history about this. Any idea why? Is
it just because nobody has written the code?

Andrew

--

---
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Search Discussions

  • Rob Pike at Feb 14, 2014 at 12:36 am
    Be very careful. This is a minefield due to the various methods used
    for argument reduction.

    -rob

    On Thu, Feb 13, 2014 at 4:27 PM, Andrew Gerrand wrote:
    There appears to be assembly code for math.Sin under 386 but not for amd64
    or ARM. Instead the package uses Go code for those architectures.

    I couldn't find anything in the change history about this. Any idea why? Is
    it just because nobody has written the code?

    Andrew

    --

    ---
    You received this message because you are subscribed to the Google Groups
    "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
    --

    ---
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Andrew Gerrand at Feb 14, 2014 at 12:42 am
    I'm more curious than anything. I don't have the expertise to write it
    myself.

    I'm interested because I am working on a program that uses math.Sin in an
    inner loop. I will probably switch a lookup table, anyway, as that will be
    faster than actually computing the values.

    Andrew

    --

    ---
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Robert Griesemer at Feb 14, 2014 at 12:36 am
    386 has assembly instruction FSIN and FCOS instructions which are tempting
    to use and around which the range reduction code is built.

    I think for amd64 and ARM simply nobody has put in the work required. It's
    subtle code.

    - gri

    On Thu, Feb 13, 2014 at 4:27 PM, Andrew Gerrand wrote:

    There appears to be assembly code for math.Sin under 386 but not for amd64
    or ARM. Instead the package uses Go code for those architectures.

    I couldn't find anything in the change history about this. Any idea why?
    Is it just because nobody has written the code?

    Andrew

    --

    ---
    You received this message because you are subscribed to the Google Groups
    "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
    --

    ---
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Minux at Feb 14, 2014 at 12:38 am

    On Feb 13, 2014 7:28 PM, "Andrew Gerrand" wrote:
    There appears to be assembly code for math.Sin under 386 but not for
    amd64 or ARM. Instead the package uses Go code for those architectures.
    I couldn't find anything in the change history about this. Any idea why?
    Is it just because nobody has written the code?
    the x87 has dedicated sin instruction while sse2 doesn't have one. the same
    holds for arm.

    of course, you can say this is because nobody has written it.

    --

    ---
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Russ Cox at Feb 14, 2014 at 12:45 am

    On Thu, Feb 13, 2014 at 7:27 PM, Andrew Gerrand wrote:

    There appears to be assembly code for math.Sin under 386 but not for amd64
    or ARM. Instead the package uses Go code for those architectures.

    I couldn't find anything in the change history about this. Any idea why?
    Is it just because nobody has written the code?
    The floating point instructions we use on amd64 (SSE2) do not have a sin
    instruction. We could drop down to the 387 instruction in this case, but we
    have not. As Rob points out there are many subtle issues surrounding the
    answer for large numbers, and whether you prefer your answer quickly or
    correct. golang.org/issue/6794 has some interesting details. In particular
    Rob tried using the 387 instruction on amd64 and found that it was slower
    than our Go code. The various decisions are worth looking at again, but
    it's subtle and requires careful thought, and it won't happen before Go 1.3.

    On ARM there is no choice but to use software versions: the FPU does not
    have support for any transcendental functions.

    In general the transcendentals are too hard to do correctly in hardware,
    and modern chips have given up trying. Square root is where most stop.

    Not using math.Sin in an inner loop is always good advice, no matter how it
    is implemented.

    Russ

    --

    ---
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Dave Cheney at Feb 14, 2014 at 12:49 am
    As a curious observer I cannot offer any advice here, but I did enjoy
    reading
    http://randomascii.wordpress.com/2014/01/27/theres-only-four-billion-floatsso-test-them-all/and
    possibly there is something of use for this discussion.

    On Fri, Feb 14, 2014 at 11:45 AM, Russ Cox wrote:
    On Thu, Feb 13, 2014 at 7:27 PM, Andrew Gerrand wrote:

    There appears to be assembly code for math.Sin under 386 but not for
    amd64 or ARM. Instead the package uses Go code for those architectures.

    I couldn't find anything in the change history about this. Any idea why?
    Is it just because nobody has written the code?
    The floating point instructions we use on amd64 (SSE2) do not have a sin
    instruction. We could drop down to the 387 instruction in this case, but we
    have not. As Rob points out there are many subtle issues surrounding the
    answer for large numbers, and whether you prefer your answer quickly or
    correct. golang.org/issue/6794 has some interesting details. In
    particular Rob tried using the 387 instruction on amd64 and found that it
    was slower than our Go code. The various decisions are worth looking at
    again, but it's subtle and requires careful thought, and it won't happen
    before Go 1.3.

    On ARM there is no choice but to use software versions: the FPU does not
    have support for any transcendental functions.

    In general the transcendentals are too hard to do correctly in hardware,
    and modern chips have given up trying. Square root is where most stop.

    Not using math.Sin in an inner loop is always good advice, no matter how
    it is implemented.

    Russ

    --

    ---
    You received this message because you are subscribed to the Google Groups
    "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
    --

    ---
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Russ Cox at Feb 14, 2014 at 12:56 am

    On Thu, Feb 13, 2014 at 7:49 PM, Dave Cheney wrote:

    As a curious observer I cannot offer any advice here, but I did enjoy
    reading
    http://randomascii.wordpress.com/2014/01/27/theres-only-four-billion-floatsso-test-them-all/and possibly there is something of use for this discussion.
    Yes but there are eighteen quintillion doubles. You cannot test them all.

    Russ

    --

    ---
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Michael Jones at Feb 14, 2014 at 1:04 am
    ...unless you are patient.

    *"Everything's in walking distance, if you have the time"*
         --Stephen Wright

    On Thu, Feb 13, 2014 at 4:56 PM, Russ Cox wrote:
    On Thu, Feb 13, 2014 at 7:49 PM, Dave Cheney wrote:

    As a curious observer I cannot offer any advice here, but I did enjoy
    reading
    http://randomascii.wordpress.com/2014/01/27/theres-only-four-billion-floatsso-test-them-all/and possibly there is something of use for this discussion.
    Yes but there are eighteen quintillion doubles. You cannot test them all.

    Russ

    --

    ---
    You received this message because you are subscribed to the Google Groups
    "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.


    --
    *Michael T. Jones | Chief Technology Advocate | mtj@google.com
    <mtj@google.com> | +1 650-335-5765*

    --

    ---
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Charlie Dorian at Feb 14, 2014 at 5:01 am
    I tested the SLEEF sincos assembly version (Taylor series) but it was
    slower than a straightforward go version of the Cephes library code
    (Polynomial ratio). Then, for speed, I split out just the sin calculation
    for math.Sin. That's why there's no amd64 specific math.Sin.

    (I'd like to try an assembly version of the Cephes sincos that computes the
    polynomials in parallel, but the float64-coefficient pairs need to be
    128-bit aligned and I don't know how to specify that in go assembly. Some
    testing I did shows that would save about 15% in the section calculating
    the ratio of the polynomials.)

    Issue 6794 brings up the problem of large arguments to trig functions. In a
    2009 go-nuts discussion of this issue, I pointed out:

    "Functions like sin(x) have cyclic arguments (i.e., sin(x + N*2*Pi) ==
    sin(x)), but large values of x present a problem. For special cases, the
    result is defined by IEEE 754; sin(Inf or NaN) is NaN. However, if x is
    large but not Inf, I could find no specific IEEE 754 behavior. The problem
    is that, when x is large, precision is lost (e.g., (1e16 + 1)* Pi) mod Pi
    != Pi). Right now, math.Sin(1e15+1) returns 0, while the 386 FPU returns
    0.03188784252097787 (bc returns 0.03188912820453651133). The Sun library
    code reduces the argument (
    http://code.swtch.com/vx32/src/tip/src/libvxc/msun/s_sin.c#cl-67) and then
    calls sin(x). The 386 FPU delays the problem a little while (its valid
    range is -2**63 < x < 2**63), but when x is too large it just returns x and
    sets a status flag (currently ignored) to let you know an error occurred.
    There are 386 FPU routines to reduce the argument, but they don't produce
    better results than just using the FPU sine function."

    Then I asked, "What should go users expect in this case?"

    Russ replied (and I feel it still holds), "I don't think it matters much.
      All answers are equally bad. What the Sun library does is good enough, as
    is what the current Go library does."

    On Thu, Feb 13, 2014 at 7:45 PM, Russ Cox wrote:
    On Thu, Feb 13, 2014 at 7:27 PM, Andrew Gerrand wrote:

    There appears to be assembly code for math.Sin under 386 but not for
    amd64 or ARM. Instead the package uses Go code for those architectures.

    I couldn't find anything in the change history about this. Any idea why?
    Is it just because nobody has written the code?
    The floating point instructions we use on amd64 (SSE2) do not have a sin
    instruction. We could drop down to the 387 instruction in this case, but we
    have not. As Rob points out there are many subtle issues surrounding the
    answer for large numbers, and whether you prefer your answer quickly or
    correct. golang.org/issue/6794 has some interesting details. In
    particular Rob tried using the 387 instruction on amd64 and found that it
    was slower than our Go code. The various decisions are worth looking at
    again, but it's subtle and requires careful thought, and it won't happen
    before Go 1.3.

    On ARM there is no choice but to use software versions: the FPU does not
    have support for any transcendental functions.

    In general the transcendentals are too hard to do correctly in hardware,
    and modern chips have given up trying. Square root is where most stop.

    Not using math.Sin in an inner loop is always good advice, no matter how
    it is implemented.

    Russ
    --

    ---
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Keith Randall at Feb 14, 2014 at 5:14 am

    On Thu, Feb 13, 2014 at 9:01 PM, Charlie Dorian wrote:

    I tested the SLEEF sincos assembly version (Taylor series) but it was
    slower than a straightforward go version of the Cephes library code
    (Polynomial ratio). Then, for speed, I split out just the sin calculation
    for math.Sin. That's why there's no amd64 specific math.Sin.

    (I'd like to try an assembly version of the Cephes sincos that computes
    the polynomials in parallel, but the float64-coefficient pairs need to be
    128-bit aligned and I don't know how to specify that in go assembly. Some
    testing I did shows that would save about 15% in the section calculating
    the ratio of the polynomials.)
    If you mean the alignment of DATA, any symbol 16 bytes or larger gets
    aligned to 16 bytes.

    Issue 6794 brings up the problem of large arguments to trig functions. In
    a 2009 go-nuts discussion of this issue, I pointed out:

    "Functions like sin(x) have cyclic arguments (i.e., sin(x + N*2*Pi) ==
    sin(x)), but large values of x present a problem. For special cases, the
    result is defined by IEEE 754; sin(Inf or NaN) is NaN. However, if x is
    large but not Inf, I could find no specific IEEE 754 behavior. The problem
    is that, when x is large, precision is lost (e.g., (1e16 + 1)* Pi) mod Pi
    != Pi). Right now, math.Sin(1e15+1) returns 0, while the 386 FPU returns
    0.03188784252097787 (bc returns 0.03188912820453651133). The Sun library
    code reduces the argument (
    http://code.swtch.com/vx32/src/tip/src/libvxc/msun/s_sin.c#cl-67) and
    then calls sin(x). The 386 FPU delays the problem a little while (its valid
    range is -2**63 < x < 2**63), but when x is too large it just returns x and
    sets a status flag (currently ignored) to let you know an error occurred.
    There are 386 FPU routines to reduce the argument, but they don't produce
    better results than just using the FPU sine function."

    Then I asked, "What should go users expect in this case?"

    Russ replied (and I feel it still holds), "I don't think it matters much.
    All answers are equally bad. What the Sun library does is good enough, as
    is what the current Go library does."

    On Thu, Feb 13, 2014 at 7:45 PM, Russ Cox wrote:
    On Thu, Feb 13, 2014 at 7:27 PM, Andrew Gerrand wrote:

    There appears to be assembly code for math.Sin under 386 but not for
    amd64 or ARM. Instead the package uses Go code for those architectures.

    I couldn't find anything in the change history about this. Any idea why?
    Is it just because nobody has written the code?
    The floating point instructions we use on amd64 (SSE2) do not have a sin
    instruction. We could drop down to the 387 instruction in this case, but we
    have not. As Rob points out there are many subtle issues surrounding the
    answer for large numbers, and whether you prefer your answer quickly or
    correct. golang.org/issue/6794 has some interesting details. In
    particular Rob tried using the 387 instruction on amd64 and found that it
    was slower than our Go code. The various decisions are worth looking at
    again, but it's subtle and requires careful thought, and it won't happen
    before Go 1.3.

    On ARM there is no choice but to use software versions: the FPU does not
    have support for any transcendental functions.

    In general the transcendentals are too hard to do correctly in hardware,
    and modern chips have given up trying. Square root is where most stop.

    Not using math.Sin in an inner loop is always good advice, no matter how
    it is implemented.

    Russ
    --

    ---
    You received this message because you are subscribed to the Google Groups
    "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
    --

    ---
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Robert Griesemer at Feb 14, 2014 at 5:21 am
    We should not get lost in too much assembly code. It's a pain to maintain,
    it's not portable, and it's error prone.

    The main reason for the assembly versions of sin/cos/sincos/ on 386 is the
    availability of the respective cpu instructions which are otherwise not
    accessible.

    Lacking those, or other equally useful (SSE?) instructions, we should not
    use assembly. If using assembly would result in code that's a lot faster,
    then we should instead focus on making the Go code better, and/or the
    compiler produce better code.

    Applications that need utmost performance for trigonometric functions may
    be better off using different algorithms and/or caching/lookup tables.

    - gri

    PS: To a large extent this is also true for the core routines of math/big.
    Unfortunately, the code generated by the compilers for the Go versions of
    these routines is a lot slower (factors) than hand-written assembly code,
    mostly because we need to be able to access the carry bits and the
    compilers are not smart enough to recognize the corresponding Go patterns,
    and they don't do a good job eliminating unnecessary slice bounds checks.
    Maybe at some point they will.

    On Thu, Feb 13, 2014 at 9:01 PM, Charlie Dorian wrote:

    I tested the SLEEF sincos assembly version (Taylor series) but it was
    slower than a straightforward go version of the Cephes library code
    (Polynomial ratio). Then, for speed, I split out just the sin calculation
    for math.Sin. That's why there's no amd64 specific math.Sin.

    (I'd like to try an assembly version of the Cephes sincos that computes
    the polynomials in parallel, but the float64-coefficient pairs need to be
    128-bit aligned and I don't know how to specify that in go assembly. Some
    testing I did shows that would save about 15% in the section calculating
    the ratio of the polynomials.)

    Issue 6794 brings up the problem of large arguments to trig functions. In
    a 2009 go-nuts discussion of this issue, I pointed out:

    "Functions like sin(x) have cyclic arguments (i.e., sin(x + N*2*Pi) ==
    sin(x)), but large values of x present a problem. For special cases, the
    result is defined by IEEE 754; sin(Inf or NaN) is NaN. However, if x is
    large but not Inf, I could find no specific IEEE 754 behavior. The problem
    is that, when x is large, precision is lost (e.g., (1e16 + 1)* Pi) mod Pi
    != Pi). Right now, math.Sin(1e15+1) returns 0, while the 386 FPU returns
    0.03188784252097787 (bc returns 0.03188912820453651133). The Sun library
    code reduces the argument (
    http://code.swtch.com/vx32/src/tip/src/libvxc/msun/s_sin.c#cl-67) and
    then calls sin(x). The 386 FPU delays the problem a little while (its valid
    range is -2**63 < x < 2**63), but when x is too large it just returns x and
    sets a status flag (currently ignored) to let you know an error occurred.
    There are 386 FPU routines to reduce the argument, but they don't produce
    better results than just using the FPU sine function."

    Then I asked, "What should go users expect in this case?"

    Russ replied (and I feel it still holds), "I don't think it matters much.
    All answers are equally bad. What the Sun library does is good enough, as
    is what the current Go library does."

    On Thu, Feb 13, 2014 at 7:45 PM, Russ Cox wrote:
    On Thu, Feb 13, 2014 at 7:27 PM, Andrew Gerrand wrote:

    There appears to be assembly code for math.Sin under 386 but not for
    amd64 or ARM. Instead the package uses Go code for those architectures.

    I couldn't find anything in the change history about this. Any idea why?
    Is it just because nobody has written the code?
    The floating point instructions we use on amd64 (SSE2) do not have a sin
    instruction. We could drop down to the 387 instruction in this case, but we
    have not. As Rob points out there are many subtle issues surrounding the
    answer for large numbers, and whether you prefer your answer quickly or
    correct. golang.org/issue/6794 has some interesting details. In
    particular Rob tried using the 387 instruction on amd64 and found that it
    was slower than our Go code. The various decisions are worth looking at
    again, but it's subtle and requires careful thought, and it won't happen
    before Go 1.3.

    On ARM there is no choice but to use software versions: the FPU does not
    have support for any transcendental functions.

    In general the transcendentals are too hard to do correctly in hardware,
    and modern chips have given up trying. Square root is where most stop.

    Not using math.Sin in an inner loop is always good advice, no matter how
    it is implemented.

    Russ
    --

    ---
    You received this message because you are subscribed to the Google Groups
    "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
    --

    ---
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Michael Jones at Feb 14, 2014 at 5:31 am
    As a reference point, assembler has been faster than compilers since 1959.
    This could be the big year, but I'd not bet on it.

    I have long-running math programs that use custom-generated assembly
    functions for fixed-length, high-precision math of varying lengths. They
    are >3x the speed of Big and that's much faster than the Go-level versions
    of the like functions.

    Nothing wrong with programming for speed. 3x saves me two-months of runtime
    for each of a number of runs. Easy, thanks to Awk. ;-)

    mtj-macbookpro:integer mtj$ cat genAMD64.awk
    function columns(a, b) {
    printf("\t%-20s%s\n", a, b)
    }
    BEGIN {
      printf("// +build fixed\n")
    printf("\n")
      if (WORDS < 1) {
      printf("// words must be specified with \"-v WORDS=n\"\n")
      exit 1
    }
      words = WORDS
      printf("// FixedAdd: z += a, z and a are %d-word (%d bit) fixed precision
    integers\n", words, 64*words)
      printf("// func FixedAdd(z, a *Integer)\n")
      printf("TEXT `·(*Integer).Add`(SB),$0-16\n")
    columns("MOVQ a+8(FP), CX", "// make a[] accessible")
      columns("MOVQ z+0(FP), AX", "// make z[] accessible")
      # register/register style
      #columns("MOVQ (CX), BX", "// load a[0]")
      #columns("MOVQ (AX), R8", "// load z[0]")
      #columns("ADDQ BX, R8", "// sum = a[0] + z[0], set carry")
      #columns("MOVQ R8, (AX)", "// store sum in z[0]")
      # memory/register style
      columns("MOVQ (CX), BX", "// load a[0]")
      columns("ADDQ (AX), BX", "// sum = a[0] + z[0], set carry")
      columns("MOVQ BX, (AX)", "// store sum in z[0]")
      for (word=1; word < WORDS; word++) {
      offset = 8*word
      #columns(sprintf("MOVQ %d(CX), BX", offset), sprintf("// load a[%d]",
    word))
      #columns(sprintf("MOVQ %d(AX), R8", offset), sprintf("// load z[%d]",
    word))
      #columns("ADCQ BX, R8", sprintf("// sum = a[%d] + z[%d] + carry", word,
    word))
      #columns(sprintf("MOVQ R8, %d(AX)", offset), sprintf("// store sum in
    z[%d]", word))
      columns(sprintf("MOVQ %d(CX), BX", offset), sprintf("// load a[%d]", word))
      columns(sprintf("ADCQ %d(AX), BX", offset), sprintf("// sum = a[%d] +
    z[%d] + carry", word, word))
      columns(sprintf("MOVQ BX, %d(AX)", offset), sprintf("// store sum in
    z[%d]", word))
      }
    columns("RET", "")
      printf("\n")
      printf("// FixedInc: z++, z is a %d-word (%d bit) fixed precision
    integer\n", words, 64*words)
      printf("// func FixedInc(z *Integer)\n")
    printf("TEXT `·(*Integer).Inc`(SB),$0-8\n")

    columns("MOVQ z+0(FP),AX", "// make z[] accessible")
      columns("MOVQ (AX),BX", "// load z[0]")
      columns("ADDQ $1,BX", "// increment")
      columns("MOVQ BX,(AX)", "// store sum in z[0]")
      for (word=1; word < WORDS; word++) {
      offset = 8*word
    columns(sprintf("MOVQ %d(AX),BX", offset), sprintf("// load z[%d]", word))
      columns("ADCQ $0,BX","// add carry")
      columns(sprintf("MOVQ BX,%d(AX)", offset), sprintf("// store sum in
    z[%d]", word))
      }
    columns("RET", "")
    }

    On Thu, Feb 13, 2014 at 9:21 PM, Robert Griesemer wrote:

    We should not get lost in too much assembly code. It's a pain to maintain,
    it's not portable, and it's error prone.

    The main reason for the assembly versions of sin/cos/sincos/ on 386 is the
    availability of the respective cpu instructions which are otherwise not
    accessible.

    Lacking those, or other equally useful (SSE?) instructions, we should not
    use assembly. If using assembly would result in code that's a lot faster,
    then we should instead focus on making the Go code better, and/or the
    compiler produce better code.

    Applications that need utmost performance for trigonometric functions may
    be better off using different algorithms and/or caching/lookup tables.

    - gri

    PS: To a large extent this is also true for the core routines of math/big.
    Unfortunately, the code generated by the compilers for the Go versions of
    these routines is a lot slower (factors) than hand-written assembly code,
    mostly because we need to be able to access the carry bits and the
    compilers are not smart enough to recognize the corresponding Go patterns,
    and they don't do a good job eliminating unnecessary slice bounds checks.
    Maybe at some point they will.

    On Thu, Feb 13, 2014 at 9:01 PM, Charlie Dorian wrote:

    I tested the SLEEF sincos assembly version (Taylor series) but it was
    slower than a straightforward go version of the Cephes library code
    (Polynomial ratio). Then, for speed, I split out just the sin calculation
    for math.Sin. That's why there's no amd64 specific math.Sin.

    (I'd like to try an assembly version of the Cephes sincos that computes
    the polynomials in parallel, but the float64-coefficient pairs need to be
    128-bit aligned and I don't know how to specify that in go assembly. Some
    testing I did shows that would save about 15% in the section calculating
    the ratio of the polynomials.)

    Issue 6794 brings up the problem of large arguments to trig functions. In
    a 2009 go-nuts discussion of this issue, I pointed out:

    "Functions like sin(x) have cyclic arguments (i.e., sin(x + N*2*Pi) ==
    sin(x)), but large values of x present a problem. For special cases, the
    result is defined by IEEE 754; sin(Inf or NaN) is NaN. However, if x is
    large but not Inf, I could find no specific IEEE 754 behavior. The problem
    is that, when x is large, precision is lost (e.g., (1e16 + 1)* Pi) mod Pi
    != Pi). Right now, math.Sin(1e15+1) returns 0, while the 386 FPU returns
    0.03188784252097787 (bc returns 0.03188912820453651133). The Sun library
    code reduces the argument (
    http://code.swtch.com/vx32/src/tip/src/libvxc/msun/s_sin.c#cl-67) and
    then calls sin(x). The 386 FPU delays the problem a little while (its valid
    range is -2**63 < x < 2**63), but when x is too large it just returns x and
    sets a status flag (currently ignored) to let you know an error occurred.
    There are 386 FPU routines to reduce the argument, but they don't produce
    better results than just using the FPU sine function."

    Then I asked, "What should go users expect in this case?"

    Russ replied (and I feel it still holds), "I don't think it matters much.
    All answers are equally bad. What the Sun library does is good enough, as
    is what the current Go library does."

    On Thu, Feb 13, 2014 at 7:45 PM, Russ Cox wrote:
    On Thu, Feb 13, 2014 at 7:27 PM, Andrew Gerrand wrote:

    There appears to be assembly code for math.Sin under 386 but not for
    amd64 or ARM. Instead the package uses Go code for those architectures.

    I couldn't find anything in the change history about this. Any idea
    why? Is it just because nobody has written the code?
    The floating point instructions we use on amd64 (SSE2) do not have a sin
    instruction. We could drop down to the 387 instruction in this case, but we
    have not. As Rob points out there are many subtle issues surrounding the
    answer for large numbers, and whether you prefer your answer quickly or
    correct. golang.org/issue/6794 has some interesting details. In
    particular Rob tried using the 387 instruction on amd64 and found that it
    was slower than our Go code. The various decisions are worth looking at
    again, but it's subtle and requires careful thought, and it won't happen
    before Go 1.3.

    On ARM there is no choice but to use software versions: the FPU does not
    have support for any transcendental functions.

    In general the transcendentals are too hard to do correctly in hardware,
    and modern chips have given up trying. Square root is where most stop.

    Not using math.Sin in an inner loop is always good advice, no matter how
    it is implemented.

    Russ
    --

    ---
    You received this message because you are subscribed to the Google Groups
    "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
    --

    ---
    You received this message because you are subscribed to the Google Groups
    "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.


    --
    *Michael T. Jones | Chief Technology Advocate | mtj@google.com
    <mtj@google.com> | +1 650-335-5765*

    --

    ---
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-dev @
categoriesgo
postedFeb 14, '14 at 12:28a
activeFeb 14, '14 at 5:31a
posts13
users9
websitegolang.org

People

Translate

site design / logo © 2021 Grokbase