FAQ
Hello all

We have an app on linux/amd64 that calls time.Now quite frequently.

We're interested in the UnixNano value, so I did some benchmarks that uses
the vDSO work from a while back:

http://code.google.com/p/go/source/detail?r=56ea40aac72b

I think it might be of interest:

package time_test

import (
"syscall"
"testing"
"time"
)

func now() time.Time {
var tv syscall.Timeval
syscall.Gettimeofday(&tv)
return time.Unix(0, syscall.TimevalToNsec(tv))
}

func BenchmarkTimeNow(b *testing.B) {
for i := 0; i < b.N; i++ {
time.Now()
}
}

func BenchmarkNowGettimeofday(b *testing.B) {
for i := 0; i < b.N; i++ {
now()
}
}

X5675 running 3.3.8-1.fc16.x86_64:

BenchmarkTimeNow 1000000 1030 ns/op
BenchmarkNowGettimeofday 2000000 767 ns/op

E5-2670 running 3.4.7-1.fc16.x86_64:

BenchmarkTimeNow 1000000 1124 ns/op
BenchmarkNowGettimeofday 2000000 759 ns/op

i7-3720QM running 3.6.3-1.fc17.x86_64:

BenchmarkTimeNow 5000000 422 ns/op
BenchmarkNowGettimeofday 20000000 85.7 ns/op

syscall.Time is also a winner if you only need second precision.

Go version here is 96fde1b15506. Not quite ready for 64-bit ints yet.

The E5-2670 numbers seem a bit off. I'll update to 3.6 soon and retest. I'm
guessing it's a kernel thing since I can't imagine the i7 being that much
faster, but maybe it is...

Maybe there's some scope here for tweaking the time.Now implementation on
linux/amd64?

Regards

Albert

Search Discussions

  • Brad Fitzpatrick at Nov 7, 2012 at 12:38 pm
    Both methods are just microsecond resolution anyway, right? So I don't see
    why changing time·now in pkg/runtime/sys_linux_amd64.s would be
    objectionable, unless the case of the VDSO being unavailable (I don't think
    it ever is, on Linux versions we support?) is much slower.

    File a bug at least.

    On Wed, Nov 7, 2012 at 11:33 AM, Albert Strasheim wrote:

    Hello all

    We have an app on linux/amd64 that calls time.Now quite frequently.

    We're interested in the UnixNano value, so I did some benchmarks that uses
    the vDSO work from a while back:

    http://code.google.com/p/go/source/detail?r=56ea40aac72b

    I think it might be of interest:

    package time_test

    import (
    "syscall"
    "testing"
    "time"
    )

    func now() time.Time {
    var tv syscall.Timeval
    syscall.Gettimeofday(&tv)
    return time.Unix(0, syscall.TimevalToNsec(tv))
    }

    func BenchmarkTimeNow(b *testing.B) {
    for i := 0; i < b.N; i++ {
    time.Now()
    }
    }

    func BenchmarkNowGettimeofday(b *testing.B) {
    for i := 0; i < b.N; i++ {
    now()
    }
    }

    X5675 running 3.3.8-1.fc16.x86_64:

    BenchmarkTimeNow 1000000 1030 ns/op
    BenchmarkNowGettimeofday 2000000 767 ns/op

    E5-2670 running 3.4.7-1.fc16.x86_64:

    BenchmarkTimeNow 1000000 1124 ns/op
    BenchmarkNowGettimeofday 2000000 759 ns/op

    i7-3720QM running 3.6.3-1.fc17.x86_64:

    BenchmarkTimeNow 5000000 422 ns/op
    BenchmarkNowGettimeofday 20000000 85.7 ns/op

    syscall.Time is also a winner if you only need second precision.

    Go version here is 96fde1b15506. Not quite ready for 64-bit ints yet.

    The E5-2670 numbers seem a bit off. I'll update to 3.6 soon and retest.
    I'm guessing it's a kernel thing since I can't imagine the i7 being that
    much faster, but maybe it is...

    Maybe there's some scope here for tweaking the time.Now implementation on
    linux/amd64?

    Regards

    Albert
  • Anthony Martin at Nov 7, 2012 at 1:24 pm
    This is a one line change:

    diff -r c0761b6a5160 src/pkg/runtime/sys_linux_amd64.s
    --- a/src/pkg/runtime/sys_linux_amd64.s Fri Nov 02 20:46:47 2012 +1100
    +++ b/src/pkg/runtime/sys_linux_amd64.s Wed Nov 07 05:05:43 2012 -0800
    @@ -104,7 +104,7 @@
    TEXT time·now(SB), 7, $32
    LEAQ 8(SP), DI
    MOVQ $0, SI
    - MOVQ $0xffffffffff600000, AX
    + MOVQ runtime·__vdso_gettimeofday_sym(SB), AX
    CALL AX
    MOVQ 8(SP), AX // sec
    MOVL 16(SP), DX // usec

    Intel Core 2 Duo (2.16 GHz) running 3.6.5-1-ARCH

    benchmark old ns/op new ns/op delta
    BenchmarkTimeNow 1862 1031 -44.63%
    BenchmarkNowGettimeofday 1153 1162 +0.78%

    Cheers,
    Anthony
  • Russ Cox at Nov 7, 2012 at 2:12 pm
    If someone wants to puzzle through the format of the time data just
    lying there in user memory, it might even be possible to get
    nanosecond precision, like we have on OS X.

    Russ
  • Minux at Nov 7, 2012 at 7:03 pm

    On Wed, Nov 7, 2012 at 10:12 PM, Russ Cox wrote:

    If someone wants to puzzle through the format of the time data just
    lying there in user memory, it might even be possible to get
    nanosecond precision, like we have on OS X.
    I believe the newer VDSO provides a purely user-space gtod using rdtsc
    (just like what Darwin did).

    This could explain the huge performance difference between time(2)
    and gettimeofday(2) in OP's tests.
  • Russ Cox at Nov 7, 2012 at 7:12 pm

    If someone wants to puzzle through the format of the time data just
    lying there in user memory, it might even be possible to get
    nanosecond precision, like we have on OS X.
    I believe the newer VDSO provides a purely user-space gtod using rdtsc
    (just like what Darwin did).

    This could explain the huge performance difference between time(2)
    and gettimeofday(2) in OP's tests.
    I agree, but after doing all that work it returns microseconds.
    On Darwin because we were forced to do the work ourselves we were able
    to get nanoseconds.

    Russ
  • Minux at Nov 7, 2012 at 7:28 pm

    On Thu, Nov 8, 2012 at 3:11 AM, Russ Cox wrote:

    If someone wants to puzzle through the format of the time data just
    lying there in user memory, it might even be possible to get
    nanosecond precision, like we have on OS X.
    I believe the newer VDSO provides a purely user-space gtod using rdtsc
    (just like what Darwin did).

    This could explain the huge performance difference between time(2)
    and gettimeofday(2) in OP's tests.
    I agree, but after doing all that work it returns microseconds.
    On Darwin because we were forced to do the work ourselves we were able
    to get nanoseconds.
    Just digging through kernel source code, it seems we can use vdso version
    of clock_gettime to get nanosecond precision and at the same time benefit
    from the fast use space implementation.

    The reason i don't recommend we implement the calculation code is:
    1. i don't believe the data structure is exported, so it might change.
    2. the kernel has two clock sources: one is rdtsc, the other is hpet (high
    precision
    event timer), i don't think we want to touch the hpet thing.

    Relevant section of vdso source code:
    http://lxr.linux.no/linux+v3.6.6/arch/x86/vdso/vclock_gettime.c
  • Anthony Martin at Nov 8, 2012 at 1:30 am

    minux once said:
    The reason i don't recommend we implement the calculation code is:
    1. i don't believe the data structure is exported, so it might change.
    2. the kernel has two clock sources: one is rdtsc, the other is hpet (high
    precision
    event timer), i don't think we want to touch the hpet thing.
    I agree with both points here. And, indeed, the kernel's
    clock data structure has changed since 2.6.32. Originally,
    it had a function pointer that you would call to get the
    cycle counter value. As of 3.6.6, however, it contains a
    mode flag describing which clock source to read from.

    Anthony
  • Russ Cox at Nov 8, 2012 at 1:32 am
    I have not caught up with today's deluge of mail yet but if someone
    sends that 1-line change to use the vdso implementation, LGTM.
  • Dave Cheney at Nov 7, 2012 at 9:33 pm
    If the new code is using RDTSC then it will be highly arch specific as it took intel several goes to get the TSC working properly. From my notes that implies Nehalem based cores only.
    On 08/11/2012, at 6:02, minux wrote:

    On Wed, Nov 7, 2012 at 10:12 PM, Russ Cox wrote:
    If someone wants to puzzle through the format of the time data just
    lying there in user memory, it might even be possible to get
    nanosecond precision, like we have on OS X.
    I believe the newer VDSO provides a purely user-space gtod using rdtsc
    (just like what Darwin did).

    This could explain the huge performance difference between time(2)
    and gettimeofday(2) in OP's tests.
  • Anthony Martin at Nov 8, 2012 at 2:15 am

    Dave Cheney once said:
    If the new code is using RDTSC then it will be highly arch specific
    as it took intel several goes to get the TSC working properly. From my
    notes that implies Nehalem based cores only.
    The user-space implementation on Linux will fall back
    to the gettimeofday system call if the TSC is unstable
    and the HPET timers are not supported.

    Anthony
  • Minux at Nov 8, 2012 at 10:47 am

    On Thu, Nov 8, 2012 at 10:15 AM, Anthony Martin wrote:

    Dave Cheney <dave@cheney.net> once said:
    If the new code is using RDTSC then it will be highly arch specific
    as it took intel several goes to get the TSC working properly. From my
    notes that implies Nehalem based cores only.
    The user-space implementation on Linux will fall back
    to the gettimeofday system call if the TSC is unstable
    and the HPET timers are not supported.
    I created https://codereview.appspot.com/6814103/ to use vDSO clock_gettime
    on linux/amd64. Beside 3ns performance improvement (27.4ns->24.4ns),
    time.Now()
    now gets real nanosecond precision on Linux/amd64.
  • Anthony Martin at Nov 8, 2012 at 12:10 pm

    minux once said:
    I created https://codereview.appspot.com/6814103/ to use vDSO clock_gettime
    on linux/amd64. Beside 3ns performance improvement (27.4ns->24.4ns),
    time.Now() now gets real nanosecond precision on Linux/amd64.
    You mean "resolution", correct? None of these timer
    implementations will give you nanosecond precision
    in user space.

    Anthony
  • Minux at Nov 8, 2012 at 12:30 pm

    On Thu, Nov 8, 2012 at 8:10 PM, Anthony Martin wrote:

    minux <minux.ma@gmail.com> once said:
    I created https://codereview.appspot.com/6814103/ to use vDSO
    clock_gettime
    on linux/amd64. Beside 3ns performance improvement (27.4ns->24.4ns),
    time.Now() now gets real nanosecond precision on Linux/amd64.
    You mean "resolution", correct? None of these timer
    implementations will give you nanosecond precision
    in user space.
    Right. Sorry for the confusion. it is nanosecond resolution.
  • Maxim Khitrov at Nov 8, 2012 at 12:38 pm

    On Wed, Nov 7, 2012 at 9:15 PM, Anthony Martin wrote:
    Dave Cheney <dave@cheney.net> once said:
    If the new code is using RDTSC then it will be highly arch specific
    as it took intel several goes to get the TSC working properly. From my
    notes that implies Nehalem based cores only.
    The user-space implementation on Linux will fall back
    to the gettimeofday system call if the TSC is unstable
    and the HPET timers are not supported.
    How is the current TSC value converted to Unix time? Does the
    implementation store some base TSC reading with a corresponding
    gettimeofday value, or is there some other mechanism involved? I'm
    just wondering if something similar could be used on Windows with the
    QueryPerformanceCounter function in order to get better resolution.
  • Minux at Nov 8, 2012 at 1:00 pm

    On Thu, Nov 8, 2012 at 8:38 PM, Maxim Khitrov wrote:
    On Wed, Nov 7, 2012 at 9:15 PM, Anthony Martin wrote:
    Dave Cheney <dave@cheney.net> once said:
    If the new code is using RDTSC then it will be highly arch specific
    as it took intel several goes to get the TSC working properly. From my
    notes that implies Nehalem based cores only.
    The user-space implementation on Linux will fall back
    to the gettimeofday system call if the TSC is unstable
    and the HPET timers are not supported.
    How is the current TSC value converted to Unix time? Does the
    implementation store some base TSC reading with a corresponding
    gettimeofday value, or is there some other mechanism involved? I'm
    just wondering if something similar could be used on Windows with the
    QueryPerformanceCounter function in order to get better resolution.
    The linux kernel user context syscall implementation:
    http://lxr.linux.no/linux+v3.6.6/arch/x86/vdso/vclock_gettime.c
    and code to update the data structure:
    http://lxr.linux.no/linux+v3.6.6/arch/x86/kernel/vsyscall_64.c#L57

    Also, Darwin provides similar functionalities via commpage on x86,
    the Go user-space implementation is here:
    http://tip.golang.org/src/pkg/runtime/sys_darwin_amd64.s#L68
    xnu kernel-side code:
    http://code.metager.de/source/xref/apple/xnu/osfmk/i386/commpage/commpage.c#437

    I think to use something like this, kernel involvement is necessary.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-dev @
categoriesgo
postedNov 7, '12 at 10:33a
activeNov 8, '12 at 1:00p
posts16
users7
websitegolang.org

People

Translate

site design / logo © 2022 Grokbase