FAQ
Hi All,


Wrote a quick GoLang program to parse two mild'ly heavey TSV file ( 18K
lines ) and compare two files.

1. parse two TSV files
2. create a Dictionary Key (Unique product code) in the file Value record
associated to it.

3.. compare two dicts for missing counts one over other.

Consistently GOLANG returns results between 725ms to 912ms
Consistently Python returns results between 320ms to 350ms

Go Lang 1.3.1
Python 2.7
Mac OSX (latest version)

Program and Data Bundle
Where as same logical python program runs faster than my go program. i am
surprised the performance aspect of GO on this. any progressive help would
be

attached python & go programs

Thanks in advance




--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

  • Harry B at Oct 10, 2014 at 12:07 am
    encoding/csv is slow compared to python's afaik. CSV module may be
    overkill for parsing TSV.

    If you replace csv reader with this, you code should be 4x faster

            reader := bufio.NewScanner(csvFile)
            for reader.Scan() {
                    c := reader.Text()
                    record := strings.Split(c, "\t")
                    sku_to_record[record[3]] = record
            }


    On Thursday, October 9, 2014 2:43:18 PM UTC-7, Sadhasivam Jayabalaganesan
    wrote:
    Hi All,


    Wrote a quick GoLang program to parse two mild'ly heavey TSV file ( 18K
    lines ) and compare two files.

    1. parse two TSV files
    2. create a Dictionary Key (Unique product code) in the file Value record
    associated to it.

    3.. compare two dicts for missing counts one over other.

    Consistently GOLANG returns results between 725ms to 912ms
    Consistently Python returns results between 320ms to 350ms

    Go Lang 1.3.1
    Python 2.7
    Mac OSX (latest version)

    Program and Data Bundle >>
    https://drive.google.com/file/d/0BxXPRbMpGazQa0lETWdoaEVaWUk/view?usp=sharing

    Where as same logical python program runs faster than my go program. i am
    surprised the performance aspect of GO on this. any progressive help would
    be

    attached python & go programs

    Thanks in advance



    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • José Colón at Oct 10, 2014 at 2:03 am
    Hi. Don't know if this is right for your requirements, or if it's the best
    way to do it, but attached code does it in less than 40ms on my machine.

    On Thursday, October 9, 2014 5:43:18 PM UTC-4, Sadhasivam Jayabalaganesan
    wrote:
    Hi All,


    Wrote a quick GoLang program to parse two mild'ly heavey TSV file ( 18K
    lines ) and compare two files.

    1. parse two TSV files
    2. create a Dictionary Key (Unique product code) in the file Value record
    associated to it.

    3.. compare two dicts for missing counts one over other.

    Consistently GOLANG returns results between 725ms to 912ms
    Consistently Python returns results between 320ms to 350ms

    Go Lang 1.3.1
    Python 2.7
    Mac OSX (latest version)

    Program and Data Bundle >>
    https://drive.google.com/file/d/0BxXPRbMpGazQa0lETWdoaEVaWUk/view?usp=sharing

    Where as same logical python program runs faster than my go program. i am
    surprised the performance aspect of GO on this. any progressive help would
    be

    attached python & go programs

    Thanks in advance



    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Justin Israel at Oct 10, 2014 at 4:12 am

    On Fri, Oct 10, 2014 at 3:03 PM, José Colón wrote:

    Hi. Don't know if this is right for your requirements, or if it's the best
    way to do it, but attached code does it in less than 40ms on my machine.
    I don't think this example is correct. It shares a file handles between 3
    different concurrent goroutines. Wouldn't that mean they are not all
    reading from the start, and trampling the offset?


    On Thursday, October 9, 2014 5:43:18 PM UTC-4, Sadhasivam Jayabalaganesan
    wrote:
    Hi All,


    Wrote a quick GoLang program to parse two mild'ly heavey TSV file ( 18K
    lines ) and compare two files.

    1. parse two TSV files
    2. create a Dictionary Key (Unique product code) in the file Value
    record associated to it.

    3.. compare two dicts for missing counts one over other.

    Consistently GOLANG returns results between 725ms to 912ms
    Consistently Python returns results between 320ms to 350ms

    Go Lang 1.3.1
    Python 2.7
    Mac OSX (latest version)

    Program and Data Bundle >> https://drive.google.com/file/d/
    0BxXPRbMpGazQa0lETWdoaEVaWUk/view?usp=sharing

    Where as same logical python program runs faster than my go program. i am
    surprised the performance aspect of GO on this. any progressive help would
    be

    attached python & go programs

    Thanks in advance




    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • José Colón at Oct 10, 2014 at 4:37 pm
    I think you're right about the file handles Justin. I changed it so that
    each goroutine opens its own files and now the total time is around 400ms .
    Thanks for the heads up!
    On Friday, October 10, 2014 12:12:54 AM UTC-4, Justin Israel wrote:



    On Fri, Oct 10, 2014 at 3:03 PM, José Colón <jec...@gmail.com
    <javascript:>> wrote:
    Hi. Don't know if this is right for your requirements, or if it's the
    best way to do it, but attached code does it in less than 40ms on my
    machine.
    I don't think this example is correct. It shares a file handles between 3
    different concurrent goroutines. Wouldn't that mean they are not all
    reading from the start, and trampling the offset?


    On Thursday, October 9, 2014 5:43:18 PM UTC-4, Sadhasivam Jayabalaganesan
    wrote:
    Hi All,


    Wrote a quick GoLang program to parse two mild'ly heavey TSV file ( 18K
    lines ) and compare two files.

    1. parse two TSV files
    2. create a Dictionary Key (Unique product code) in the file Value
    record associated to it.

    3.. compare two dicts for missing counts one over other.

    Consistently GOLANG returns results between 725ms to 912ms
    Consistently Python returns results between 320ms to 350ms

    Go Lang 1.3.1
    Python 2.7
    Mac OSX (latest version)

    Program and Data Bundle >> https://drive.google.com/file/d/
    0BxXPRbMpGazQa0lETWdoaEVaWUk/view?usp=sharing

    Where as same logical python program runs faster than my go program. i
    am surprised the performance aspect of GO on this. any progressive help
    would be

    attached python & go programs

    Thanks in advance




    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts...@googlegroups.com <javascript:>.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Tamás Gulácsi at Oct 10, 2014 at 4:45 am
    Python buffers files, so use buffered input in go, too! bufio to the rescue!!!

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Carlos Castillo at Oct 10, 2014 at 11:50 am

    On Thursday, October 9, 2014 9:45:13 PM UTC-7, Tamás Gulácsi wrote:
    Python buffers files, so use buffered input in go, too! bufio to the
    rescue!!!

    Actually, encoding/csv in go uses a bufio.Reader internally when you call
    csv.NewReader, so both versions are using buffered input (provided python
    does ;-).

    I suspect that there are two main differences between python 2.7 csv, and
    go csv causing some of the discrepancy:

        1. python csv (in python2.7) doesn't support unicode, so unlike the go
        version, it doesn't have to convert the input bytes into runes
        2. python csv is written in C, while go csv is written in go

    The first difference is easy to test, either by using the example code at
    the bottom of https://docs.python.org/2/library/csv.html to create a csv
    parser that converts input data to utf-8, or by running the code using a
    python3 interpreter (you'll need to use 2to3 first!). In both cases the
    python time is brought closer to the go time on my machine (the py3 version
    being faster than the py2 hack). Unfortunately, going the other way, ie:
    changing go-csv to not directly support unicode (esp. for the purposes of
    speed) would require a re-write of the library.

    There are some ways the provided go code can be improved:

        1. Since you're storing the entire contents of every record in the hash,
        you can read all the data at once with the csv.Reader.ReadAll() method.
        That should simplify the generation of the hash, as well as allow you to
        pre-allocate it's size.
        2. As suggested by Harry, the go CSV parser is for general use, and must
        fully support unicode characters. A specific TSV parser using bufio.Scanner
        and strings.Split is still unicode-safe, but avoids most of the parsing
        speed penalties in encoding/csv
        3. If you put (optional) timing tests in the code, you can easily see
        where it's spending it's time.

    These changes, as well as some small re-factoring can be seen
    here: http://play.golang.org/p/fr5N6AEjnP

        1. Provides a small but noticable speed-up.
        2. Makes the code 4x faster
        3. Shows that it's indeed the csv parsing that's taking the most time in
        the code, as the comparisons take < 20ms on my machine, while parsing takes
        1.35s (#1) and .35s (#2).


    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Carlos Castillo at Oct 10, 2014 at 10:08 pm
    I have made two fixes to my example: http://play.golang.org/p/NYkcXe1rb-

        1. The TSV code now collects all the records up-front, to make the hash
        building slightly faster (ie: combining both changes)
        2. I have suppressed the header (ie: the first line) from the files from
        being treated as a record.

    The second point doesn't affect correctness of the example as the header is
    present in both data files (so it is filtered out of the comparison), but
    its still a bug, as the header is being treated as a record when it really
    shouldn't be. This bug affects both the special TSV parser, as well as the
    code using the CSV parsers in both python and go. I have fixed my go
    solutions, you might want to consider fixing the python one as well.
    On Friday, October 10, 2014 4:49:59 AM UTC-7, Carlos Castillo wrote:


    On Thursday, October 9, 2014 9:45:13 PM UTC-7, Tamás Gulácsi wrote:

    Python buffers files, so use buffered input in go, too! bufio to the
    rescue!!!

    Actually, encoding/csv in go uses a bufio.Reader internally when you call
    csv.NewReader, so both versions are using buffered input (provided python
    does ;-).

    I suspect that there are two main differences between python 2.7 csv, and
    go csv causing some of the discrepancy:

    1. python csv (in python2.7) doesn't support unicode, so unlike the go
    version, it doesn't have to convert the input bytes into runes
    2. python csv is written in C, while go csv is written in go

    The first difference is easy to test, either by using the example code at
    the bottom of https://docs.python.org/2/library/csv.html to create a csv
    parser that converts input data to utf-8, or by running the code using a
    python3 interpreter (you'll need to use 2to3 first!). In both cases the
    python time is brought closer to the go time on my machine (the py3 version
    being faster than the py2 hack). Unfortunately, going the other way, ie:
    changing go-csv to not directly support unicode (esp. for the purposes of
    speed) would require a re-write of the library.

    There are some ways the provided go code can be improved:

    1. Since you're storing the entire contents of every record in the
    hash, you can read all the data at once with the csv.Reader.ReadAll()
    method. That should simplify the generation of the hash, as well as allow
    you to pre-allocate it's size.
    2. As suggested by Harry, the go CSV parser is for general use, and
    must fully support unicode characters. A specific TSV parser using
    bufio.Scanner and strings.Split is still unicode-safe, but avoids most of
    the parsing speed penalties in encoding/csv
    3. If you put (optional) timing tests in the code, you can easily see
    where it's spending it's time.

    These changes, as well as some small re-factoring can be seen here:
    http://play.golang.org/p/fr5N6AEjnP

    1. Provides a small but noticable speed-up.
    2. Makes the code 4x faster
    3. Shows that it's indeed the csv parsing that's taking the most time
    in the code, as the comparisons take < 20ms on my machine, while parsing
    takes 1.35s (#1) and .35s (#2).
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Sadhasivam Jayabalaganesan at Oct 11, 2014 at 8:04 pm
    Thanks so much Carlos. you are the best and i liked the way you had shared
    the wilds of go asethics. i appreciate. i could see the real magic of go
    now.

    being a new learner such words really warm and welcome. i going to end
    this thread with thumbing up GO for now :)

    thanks again.
    On Friday, October 10, 2014 6:08:47 PM UTC-4, Carlos Castillo wrote:

    I have made two fixes to my example: http://play.golang.org/p/NYkcXe1rb-

    1. The TSV code now collects all the records up-front, to make the
    hash building slightly faster (ie: combining both changes)
    2. I have suppressed the header (ie: the first line) from the files
    from being treated as a record.

    The second point doesn't affect correctness of the example as the header
    is present in both data files (so it is filtered out of the comparison),
    but its still a bug, as the header is being treated as a record when it
    really shouldn't be. This bug affects both the special TSV parser, as well
    as the code using the CSV parsers in both python and go. I have fixed my go
    solutions, you might want to consider fixing the python one as well.
    On Friday, October 10, 2014 4:49:59 AM UTC-7, Carlos Castillo wrote:


    On Thursday, October 9, 2014 9:45:13 PM UTC-7, Tamás Gulácsi wrote:

    Python buffers files, so use buffered input in go, too! bufio to the
    rescue!!!

    Actually, encoding/csv in go uses a bufio.Reader internally when you call
    csv.NewReader, so both versions are using buffered input (provided python
    does ;-).

    I suspect that there are two main differences between python 2.7 csv, and
    go csv causing some of the discrepancy:

    1. python csv (in python2.7) doesn't support unicode, so unlike the
    go version, it doesn't have to convert the input bytes into runes
    2. python csv is written in C, while go csv is written in go

    The first difference is easy to test, either by using the example code at
    the bottom of https://docs.python.org/2/library/csv.html to create a csv
    parser that converts input data to utf-8, or by running the code using a
    python3 interpreter (you'll need to use 2to3 first!). In both cases the
    python time is brought closer to the go time on my machine (the py3 version
    being faster than the py2 hack). Unfortunately, going the other way, ie:
    changing go-csv to not directly support unicode (esp. for the purposes of
    speed) would require a re-write of the library.

    There are some ways the provided go code can be improved:

    1. Since you're storing the entire contents of every record in the
    hash, you can read all the data at once with the csv.Reader.ReadAll()
    method. That should simplify the generation of the hash, as well as allow
    you to pre-allocate it's size.
    2. As suggested by Harry, the go CSV parser is for general use, and
    must fully support unicode characters. A specific TSV parser using
    bufio.Scanner and strings.Split is still unicode-safe, but avoids most of
    the parsing speed penalties in encoding/csv
    3. If you put (optional) timing tests in the code, you can easily see
    where it's spending it's time.

    These changes, as well as some small re-factoring can be seen here:
    http://play.golang.org/p/fr5N6AEjnP

    1. Provides a small but noticable speed-up.
    2. Makes the code 4x faster
    3. Shows that it's indeed the csv parsing that's taking the most time
    in the code, as the comparisons take < 20ms on my machine, while parsing
    takes 1.35s (#1) and .35s (#2).
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Harry B at Nov 2, 2014 at 1:48 am
    FWIW, my use of this alternate implementation https://github.com/gwenn/yacr
    made Go version 2x faster compared to Python.
    This is not on your dataset of course. My own test data of a 100MB csv file.
    wc sample_csv1.csv
      1107531 4702666 104859893 sample_csv1.csv
    time ./parse_csv_log.py -l sample_csv1.csv
    total rows = 255728

    real 0m1.456s
    user 0m1.416s
    sys 0m0.040s

    Now the Go version
    time ./pglogparser -l sample_csv1.csv
    2014/11/01 18:22:26 total rows = 255729

    Num rows is off by one, I haven't figured out why. didn't think that was
    important anyways.

    real 0m0.756s
    user 0m0.731s
    sys 0m0.027s

    Sample Go and Python code along with sample data is here.
    https://github.com/harikb/placeholder2

    The code uses a clone of yacr because I needed to modify bufio token size.
    And that had to be cloned from standard library.

    There is a whole thread on this discussion here

    https://groups.google.com/forum/#!searchin/golang-nuts/bufio$20maxTokenSize/golang-nuts/1ZXVfem7biA/mY8efMPcg7YJ

    https://groups.google.com/forum/#!searchin/golang-nuts/bufio$20maxTokenSize/golang-nuts/cXX169-pNqw/8U6iC-FiAF4J

    Thanks
    --
    Harry

    On Saturday, October 11, 2014 1:04:28 PM UTC-7, Sadhasivam Jayabalaganesan
    wrote:
    Thanks so much Carlos. you are the best and i liked the way you had
    shared the wilds of go asethics. i appreciate. i could see the real magic
    of go now.

    being a new learner such words really warm and welcome. i going to end
    this thread with thumbing up GO for now :)

    thanks again.
    On Friday, October 10, 2014 6:08:47 PM UTC-4, Carlos Castillo wrote:

    I have made two fixes to my example: http://play.golang.org/p/NYkcXe1rb-

    1. The TSV code now collects all the records up-front, to make the
    hash building slightly faster (ie: combining both changes)
    2. I have suppressed the header (ie: the first line) from the files
    from being treated as a record.

    The second point doesn't affect correctness of the example as the header
    is present in both data files (so it is filtered out of the comparison),
    but its still a bug, as the header is being treated as a record when it
    really shouldn't be. This bug affects both the special TSV parser, as well
    as the code using the CSV parsers in both python and go. I have fixed my go
    solutions, you might want to consider fixing the python one as well.
    On Friday, October 10, 2014 4:49:59 AM UTC-7, Carlos Castillo wrote:


    On Thursday, October 9, 2014 9:45:13 PM UTC-7, Tamás Gulácsi wrote:

    Python buffers files, so use buffered input in go, too! bufio to the
    rescue!!!

    Actually, encoding/csv in go uses a bufio.Reader internally when you
    call csv.NewReader, so both versions are using buffered input (provided
    python does ;-).

    I suspect that there are two main differences between python 2.7 csv,
    and go csv causing some of the discrepancy:

    1. python csv (in python2.7) doesn't support unicode, so unlike the
    go version, it doesn't have to convert the input bytes into runes
    2. python csv is written in C, while go csv is written in go

    The first difference is easy to test, either by using the example code
    at the bottom of https://docs.python.org/2/library/csv.html to create a
    csv parser that converts input data to utf-8, or by running the code using
    a python3 interpreter (you'll need to use 2to3 first!). In both cases the
    python time is brought closer to the go time on my machine (the py3 version
    being faster than the py2 hack). Unfortunately, going the other way, ie:
    changing go-csv to not directly support unicode (esp. for the purposes of
    speed) would require a re-write of the library.

    There are some ways the provided go code can be improved:

    1. Since you're storing the entire contents of every record in the
    hash, you can read all the data at once with the csv.Reader.ReadAll()
    method. That should simplify the generation of the hash, as well as allow
    you to pre-allocate it's size.
    2. As suggested by Harry, the go CSV parser is for general use, and
    must fully support unicode characters. A specific TSV parser using
    bufio.Scanner and strings.Split is still unicode-safe, but avoids most of
    the parsing speed penalties in encoding/csv
    3. If you put (optional) timing tests in the code, you can easily
    see where it's spending it's time.

    These changes, as well as some small re-factoring can be seen here:
    http://play.golang.org/p/fr5N6AEjnP

    1. Provides a small but noticable speed-up.
    2. Makes the code 4x faster
    3. Shows that it's indeed the csv parsing that's taking the most
    time in the code, as the comparisons take < 20ms on my machine, while
    parsing takes 1.35s (#1) and .35s (#2).
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Sadhasivam Jayabalaganesan at Oct 10, 2014 at 4:19 pm
    that explains core problem in my code. thanks a lot Harry, Jose, Justin,
    Tamas, Carlos.

    when do CPU & Mem Profile i landed in to some saying that MacOSX kernals
    causing this slowness.

    http://godoc.org/code.google.com/p/rsc/cmd/pprof_mac_fix

    is it something relates to my problem ? attached my pref report.

    On Thursday, October 9, 2014 5:43:18 PM UTC-4, Sadhasivam Jayabalaganesan
    wrote:
    Hi All,


    Wrote a quick GoLang program to parse two mild'ly heavey TSV file ( 18K
    lines ) and compare two files.

    1. parse two TSV files
    2. create a Dictionary Key (Unique product code) in the file Value record
    associated to it.

    3.. compare two dicts for missing counts one over other.

    Consistently GOLANG returns results between 725ms to 912ms
    Consistently Python returns results between 320ms to 350ms

    Go Lang 1.3.1
    Python 2.7
    Mac OSX (latest version)

    Program and Data Bundle >>
    https://drive.google.com/file/d/0BxXPRbMpGazQa0lETWdoaEVaWUk/view?usp=sharing

    Where as same logical python program runs faster than my go program. i am
    surprised the performance aspect of GO on this. any progressive help would
    be

    attached python & go programs

    Thanks in advance



    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Carlos Castillo at Oct 10, 2014 at 9:37 pm
    The kernel patch provided by pprof_mac_fix doesn't affect performance, it
    only affects the built-in go CPU profiler. Russ Cox wrote about the bug in
    the OSX kernel here: http://research.swtch.com/macpprof

    To summarize it, is basically that the BSD kernel that OSX uses has a bug
    with regards to sending/receiving the signal used by Go to time the
    profiling of the running program. The signal should be received by the
    thread (and goroutine) that requested it, but it is not, and can be
    received by any thread in the process, which means that the samples it
    takes are not representative of where the active and sleeping go-routines
    are actually at, making the profile almost entirely useless. Apple hasn't
    (and probably won't) fix their kernel, as their own profiling tool doesn't
    use the broken feature, the feature only exists for profiling otherwise,
    and they don't feel a need to change code that isn't theirs in the first
    place (the bug was originally present in most BSDs).

    Using the tool you linked (pprof_mac_fix) patches the kernel so that it
    should properly deliver the signal to where it should go. It is a binary
    patch to your current running kernel, and the warnings in the documentation
    should be taken very seriously. That said, I run the patch myself, and have
    been doing so for the past two years with no perceived negative effect.

    To repeat and expand:

        - using the patch should only affect the go profiler's accuracy
        - there are no other performance/correctness issues it's known to fix
        - the patch must be re-applied whenever you upgrade OSX
        - profiling on a machine where it can be done correctly (eg: on Linux)
        is likely to be just as useful to help you find places needing
        optimization, and is much safer
        - profiling on a VM is not a good idea though, as most VMs have a
        strange concept of time
        - my "manual" profiling in my example code is not affected at all by the
        kernel bug
        - only go's CPU profiling is affected by the bug, Memory profiles and
        Blocking profiles should be fine on OSX with or without the patch


    On Friday, October 10, 2014 9:19:33 AM UTC-7, Sadhasivam Jayabalaganesan
    wrote:
    that explains core problem in my code. thanks a lot Harry, Jose, Justin,
    Tamas, Carlos.

    when do CPU & Mem Profile i landed in to some saying that MacOSX kernals
    causing this slowness.

    http://godoc.org/code.google.com/p/rsc/cmd/pprof_mac_fix

    is it something relates to my problem ? attached my pref report.

    On Thursday, October 9, 2014 5:43:18 PM UTC-4, Sadhasivam Jayabalaganesan
    wrote:
    Hi All,


    Wrote a quick GoLang program to parse two mild'ly heavey TSV file ( 18K
    lines ) and compare two files.

    1. parse two TSV files
    2. create a Dictionary Key (Unique product code) in the file Value
    record associated to it.

    3.. compare two dicts for missing counts one over other.

    Consistently GOLANG returns results between 725ms to 912ms
    Consistently Python returns results between 320ms to 350ms

    Go Lang 1.3.1
    Python 2.7
    Mac OSX (latest version)

    Program and Data Bundle >>
    https://drive.google.com/file/d/0BxXPRbMpGazQa0lETWdoaEVaWUk/view?usp=sharing

    Where as same logical python program runs faster than my go program. i am
    surprised the performance aspect of GO on this. any progressive help would
    be

    attached python & go programs

    Thanks in advance



    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Carlos Castillo at Oct 10, 2014 at 9:47 pm
    To answer your questions:

        1. The profile presented is likely being affected by the bug, as it
        appears to be spending 98.5% of it's time profiling, the other 1.5% of the
        time doing actual work.
        2. However, it's only affecting the accuracy of the profile, not the
        actual operation of the code, so running the patch (or disabling the
        profiler) won't make your code 50+ times faster.

    On Friday, October 10, 2014 2:37:25 PM UTC-7, Carlos Castillo wrote:

    The kernel patch provided by pprof_mac_fix doesn't affect performance, it
    only affects the built-in go CPU profiler. Russ Cox wrote about the bug in
    the OSX kernel here: http://research.swtch.com/macpprof

    To summarize it, is basically that the BSD kernel that OSX uses has a bug
    with regards to sending/receiving the signal used by Go to time the
    profiling of the running program. The signal should be received by the
    thread (and goroutine) that requested it, but it is not, and can be
    received by any thread in the process, which means that the samples it
    takes are not representative of where the active and sleeping go-routines
    are actually at, making the profile almost entirely useless. Apple hasn't
    (and probably won't) fix their kernel, as their own profiling tool doesn't
    use the broken feature, the feature only exists for profiling otherwise,
    and they don't feel a need to change code that isn't theirs in the first
    place (the bug was originally present in most BSDs).

    Using the tool you linked (pprof_mac_fix) patches the kernel so that it
    should properly deliver the signal to where it should go. It is a binary
    patch to your current running kernel, and the warnings in the documentation
    should be taken very seriously. That said, I run the patch myself, and have
    been doing so for the past two years with no perceived negative effect.

    To repeat and expand:

    - using the patch should only affect the go profiler's accuracy
    - there are no other performance/correctness issues it's known to fix
    - the patch must be re-applied whenever you upgrade OSX
    - profiling on a machine where it can be done correctly (eg: on Linux)
    is likely to be just as useful to help you find places needing
    optimization, and is much safer
    - profiling on a VM is not a good idea though, as most VMs have a
    strange concept of time
    - my "manual" profiling in my example code is not affected at all by
    the kernel bug
    - only go's CPU profiling is affected by the bug, Memory profiles and
    Blocking profiles should be fine on OSX with or without the patch


    On Friday, October 10, 2014 9:19:33 AM UTC-7, Sadhasivam Jayabalaganesan
    wrote:
    that explains core problem in my code. thanks a lot Harry, Jose, Justin,
    Tamas, Carlos.

    when do CPU & Mem Profile i landed in to some saying that MacOSX kernals
    causing this slowness.

    http://godoc.org/code.google.com/p/rsc/cmd/pprof_mac_fix

    is it something relates to my problem ? attached my pref report.

    On Thursday, October 9, 2014 5:43:18 PM UTC-4, Sadhasivam Jayabalaganesan
    wrote:
    Hi All,


    Wrote a quick GoLang program to parse two mild'ly heavey TSV file ( 18K
    lines ) and compare two files.

    1. parse two TSV files
    2. create a Dictionary Key (Unique product code) in the file Value
    record associated to it.

    3.. compare two dicts for missing counts one over other.

    Consistently GOLANG returns results between 725ms to 912ms
    Consistently Python returns results between 320ms to 350ms

    Go Lang 1.3.1
    Python 2.7
    Mac OSX (latest version)

    Program and Data Bundle >>
    https://drive.google.com/file/d/0BxXPRbMpGazQa0lETWdoaEVaWUk/view?usp=sharing

    Where as same logical python program runs faster than my go program. i
    am surprised the performance aspect of GO on this. any progressive help
    would be

    attached python & go programs

    Thanks in advance



    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedOct 9, '14 at 9:43p
activeNov 2, '14 at 1:48a
posts13
users6
websitegolang.org

People

Translate

site design / logo © 2021 Grokbase