FAQ
Hi,

I ported a set of routines that read fastq
files(http://en.wikipedia.org/wiki/FASTQ_format) to golang.
The code is here: https://gist.github.com/3882029

The idea is iterating over the input lines until you find a record that can
be returned to the user.
The main routine (readFq()) returns a closure. To get records, the user has
to keep calling the closure
until no more records are available.

I did some basic benchmarking on different implementations(c, lua, python
and perl) of the algorithm
with the following results:

0.03u 3.97s 41.34r 2176kB c
0.06u 8.69s 109.50r 2176kB go
0.03u 4.93s 131.96r 2192kB luajit
0.02u 2.97s 132.41r 2176kB python27
0.07u 9.89s 275.16r 2192kB perl

As you can see the c version is the fastest (41.3sec).

Then I did some profiling that gave me the following results (top10 using
pprof tool):

579 96.5% 96.5% 579 96.5% runtime.nanotime
11 1.8% 98.3% 11 1.8% runtime.sigprocmask
9 1.5% 99.8% 9 1.5% scanblock
1 0.2% 100.0% 1 0.2% ReleaseN
0 0.0% 100.0% 34 5.7% bufio.(*Reader).ReadBytes
0 0.0% 100.0% 11 1.8% bufio.(*Reader).ReadSlice
0 0.0% 100.0% 43 7.2% bufio.(*Reader).ReadString
0 0.0% 100.0% 11 1.8% bufio.(*Reader).fill
0 0.0% 100.0% 557 92.8% concatstring
0 0.0% 100.0% 566 94.3% gostringsize


This and the profiling graph tells me that most of the cpu is spent in
doing garbage collection for the
concatstring and gostringsize routines. Am I right?

In the readFq() routine, there are plenty of len() and substring selections
so the profiling results
are not surprising.

Do you see any obvious changes that can be made in the code to improve the
performance?
Any comments regarding the code are welcome.

Thanks,
-drd


--

Search Discussions

  • Dan Kortschak at Oct 13, 2012 at 6:22 am
    Try working with byte slices. I have two fq reader implementations that you might like to have a look at, both in code.google.com/p/biogo<http://code.google.com/p/biogo>.

    Dan

    On 13/10/2012, at 3:32 PM, "drio" wrote:

    Hi,

    I ported a set of routines that read fastq files(http://en.wikipedia.org/wiki/FASTQ_format) to golang.
    The code is here: https://gist.github.com/3882029

    The idea is iterating over the input lines until you find a record that can be returned to the user.
    The main routine (readFq()) returns a closure. To get records, the user has to keep calling the closure
    until no more records are available.

    I did some basic benchmarking on different implementations(c, lua, python and perl) of the algorithm
    with the following results:

    0.03u 3.97s 41.34r 2176kB c
    0.06u 8.69s 109.50r 2176kB go
    0.03u 4.93s 131.96r 2192kB luajit
    0.02u 2.97s 132.41r 2176kB python27
    0.07u 9.89s 275.16r 2192kB perl

    As you can see the c version is the fastest (41.3sec).

    Then I did some profiling that gave me the following results (top10 using pprof tool):


    579 96.5% 96.5% 579 96.5% runtime.nanotime
    11 1.8% 98.3% 11 1.8% runtime.sigprocmask
    9 1.5% 99.8% 9 1.5% scanblock
    1 0.2% 100.0% 1 0.2% ReleaseN
    0 0.0% 100.0% 34 5.7% bufio.(*Reader).ReadBytes
    0 0.0% 100.0% 11 1.8% bufio.(*Reader).ReadSlice
    0 0.0% 100.0% 43 7.2% bufio.(*Reader).ReadString
    0 0.0% 100.0% 11 1.8% bufio.(*Reader).fill
    0 0.0% 100.0% 557 92.8% concatstring
    0 0.0% 100.0% 566 94.3% gostringsize

    This and the profiling graph tells me that most of the cpu is spent in doing garbage collection for the
    concatstring and gostringsize routines. Am I right?

    In the readFq() routine, there are plenty of len() and substring selections so the profiling results
    are not surprising.

    Do you see any obvious changes that can be made in the code to improve the performance?
    Any comments regarding the code are welcome.

    Thanks,
    -drd



    --


    --
  • Evan Shaw at Oct 13, 2012 at 6:28 am

    On Sat, Oct 13, 2012 at 12:48 PM, drio wrote:
    Hi,

    I ported a set of routines that read fastq
    files(http://en.wikipedia.org/wiki/FASTQ_format) to golang.
    The code is here: https://gist.github.com/3882029

    The idea is iterating over the input lines until you find a record that can
    be returned to the user.
    The main routine (readFq()) returns a closure. To get records, the user has
    to keep calling the closure
    until no more records are available.

    I did some basic benchmarking on different implementations(c, lua, python
    and perl) of the algorithm
    with the following results:

    0.03u 3.97s 41.34r 2176kB c
    0.06u 8.69s 109.50r 2176kB go
    0.03u 4.93s 131.96r 2192kB luajit
    0.02u 2.97s 132.41r 2176kB python27
    0.07u 9.89s 275.16r 2192kB perl

    As you can see the c version is the fastest (41.3sec).

    Then I did some profiling that gave me the following results (top10 using
    pprof tool):

    579 96.5% 96.5% 579 96.5% runtime.nanotime
    11 1.8% 98.3% 11 1.8% runtime.sigprocmask
    9 1.5% 99.8% 9 1.5% scanblock
    1 0.2% 100.0% 1 0.2% ReleaseN
    0 0.0% 100.0% 34 5.7% bufio.(*Reader).ReadBytes
    0 0.0% 100.0% 11 1.8% bufio.(*Reader).ReadSlice
    0 0.0% 100.0% 43 7.2% bufio.(*Reader).ReadString
    0 0.0% 100.0% 11 1.8% bufio.(*Reader).fill
    0 0.0% 100.0% 557 92.8% concatstring
    0 0.0% 100.0% 566 94.3% gostringsize


    This and the profiling graph tells me that most of the cpu is spent in doing
    garbage collection for the
    concatstring and gostringsize routines. Am I right?

    In the readFq() routine, there are plenty of len() and substring selections
    so the profiling results
    are not surprising.

    Do you see any obvious changes that can be made in the code to improve the
    performance?
    Any comments regarding the code are welcome.
    My biggest piece of advice would be: don't concatenate strings to
    build strings. Using []byte and append or bytes.Buffer will give you
    much better performance than you're getting now.

    Your use of closures seems a little strange to me too, but I think
    getting rid of the string concatenations will help more than anything
    else.

    - Evan

    --
  • DisposaBoy at Oct 13, 2012 at 7:16 am

    On Saturday, October 13, 2012 12:48:24 AM UTC+1, drio wrote:
    Hi,

    I ported a set of routines that read fastq files(
    http://en.wikipedia.org/wiki/FASTQ_format) to golang.
    The code is here: https://gist.github.com/3882029

    The idea is iterating over the input lines until you find a record that
    can be returned to the user.
    The main routine (readFq()) returns a closure. To get records, the user
    has to keep calling the closure
    until no more records are available.

    I did some basic benchmarking on different implementations(c, lua, python
    and perl) of the algorithm
    with the following results:

    0.03u 3.97s 41.34r 2176kB c
    0.06u 8.69s 109.50r 2176kB go
    0.03u 4.93s 131.96r 2192kB luajit
    0.02u 2.97s 132.41r 2176kB python27
    0.07u 9.89s 275.16r 2192kB perl

    As you can see the c version is the fastest (41.3sec).

    Then I did some profiling that gave me the following results (top10 using
    pprof tool):

    579 96.5% 96.5% 579 96.5% runtime.nanotime
    11 1.8% 98.3% 11 1.8% runtime.sigprocmask
    9 1.5% 99.8% 9 1.5% scanblock
    1 0.2% 100.0% 1 0.2% ReleaseN
    0 0.0% 100.0% 34 5.7% bufio.(*Reader).ReadBytes
    0 0.0% 100.0% 11 1.8% bufio.(*Reader).ReadSlice
    0 0.0% 100.0% 43 7.2% bufio.(*Reader).ReadString
    0 0.0% 100.0% 11 1.8% bufio.(*Reader).fill
    0 0.0% 100.0% 557 92.8% concatstring
    0 0.0% 100.0% 566 94.3% gostringsize


    This and the profiling graph tells me that most of the cpu is spent in
    doing garbage collection for the
    concatstring and gostringsize routines. Am I right?

    In the readFq() routine, there are plenty of len() and substring
    selections so the profiling results
    are not surprising.

    Do you see any obvious changes that can be made in the code to improve the
    performance?
    Any comments regarding the code are welcome.

    Thanks,
    -drd

    I did an unverified straight port to using []byte instead of string (so no
    expensive string concat) I see the following timings on a sample file I
    found online. https://gist.github.com/3883645

    original

    750000 27000000 27000000
    time 6s, 197ms
    peak 2M, 956K

    using ReadSlice

    750000 27000000 27000000
    time 2s, 664ms
    peak 2M, 664K

    using ReadBytes

    750000 27000000 27000000
    time 3s, 127ms
    peak 2M, 680K




    --
  • Drio at Oct 13, 2012 at 12:50 pm
    Thanks great! I will post my results for the new version also.

    -drd
    On Saturday, October 13, 2012 2:15:56 AM UTC-5, DisposaBoy wrote:


    On Saturday, October 13, 2012 12:48:24 AM UTC+1, drio wrote:

    Hi,

    I ported a set of routines that read fastq files(
    http://en.wikipedia.org/wiki/FASTQ_format) to golang.
    The code is here: https://gist.github.com/3882029

    The idea is iterating over the input lines until you find a record that
    can be returned to the user.
    The main routine (readFq()) returns a closure. To get records, the user
    has to keep calling the closure
    until no more records are available.

    I did some basic benchmarking on different implementations(c, lua, python
    and perl) of the algorithm
    with the following results:

    0.03u 3.97s 41.34r 2176kB c
    0.06u 8.69s 109.50r 2176kB go
    0.03u 4.93s 131.96r 2192kB luajit
    0.02u 2.97s 132.41r 2176kB python27
    0.07u 9.89s 275.16r 2192kB perl

    As you can see the c version is the fastest (41.3sec).

    Then I did some profiling that gave me the following results (top10 using
    pprof tool):

    579 96.5% 96.5% 579 96.5% runtime.nanotime
    11 1.8% 98.3% 11 1.8% runtime.sigprocmask
    9 1.5% 99.8% 9 1.5% scanblock
    1 0.2% 100.0% 1 0.2% ReleaseN
    0 0.0% 100.0% 34 5.7% bufio.(*Reader).ReadBytes
    0 0.0% 100.0% 11 1.8% bufio.(*Reader).ReadSlice
    0 0.0% 100.0% 43 7.2% bufio.(*Reader).ReadString
    0 0.0% 100.0% 11 1.8% bufio.(*Reader).fill
    0 0.0% 100.0% 557 92.8% concatstring
    0 0.0% 100.0% 566 94.3% gostringsize


    This and the profiling graph tells me that most of the cpu is spent in
    doing garbage collection for the
    concatstring and gostringsize routines. Am I right?

    In the readFq() routine, there are plenty of len() and substring
    selections so the profiling results
    are not surprising.

    Do you see any obvious changes that can be made in the code to improve
    the performance?
    Any comments regarding the code are welcome.

    Thanks,
    -drd

    I did an unverified straight port to using []byte instead of string (so no
    expensive string concat) I see the following timings on a sample file I
    found online. https://gist.github.com/3883645

    original

    750000 27000000 27000000
    time 6s, 197ms
    peak 2M, 956K

    using ReadSlice

    750000 27000000 27000000
    time 2s, 664ms
    peak 2M, 664K

    using ReadBytes

    750000 27000000 27000000
    time 3s, 127ms
    peak 2M, 680K


    --
  • Drio at Oct 13, 2012 at 12:52 pm
    Thanks for the feedback Dan. I will most definitely check your project.
    I will post here the improvement once I work with byte slices instead of
    strings. It makes sense.

    -drd
    On Saturday, October 13, 2012 1:22:59 AM UTC-5, kortschak wrote:

    Try working with byte slices. I have two fq reader implementations that
    you might like to have a look at, both in code.google.com/p/biogo.

    Dan

    On 13/10/2012, at 3:32 PM, "drio" <driod...@gmail.com <javascript:>>
    wrote:

    Hi,

    I ported a set of routines that read fastq files(
    http://en.wikipedia.org/wiki/FASTQ_format) to golang.
    The code is here: https://gist.github.com/3882029

    The idea is iterating over the input lines until you find a record that
    can be returned to the user.
    The main routine (readFq()) returns a closure. To get records, the user
    has to keep calling the closure
    until no more records are available.

    I did some basic benchmarking on different implementations(c, lua,
    python and perl) of the algorithm
    with the following results:

    0.03u 3.97s 41.34r 2176kB c
    0.06u 8.69s 109.50r 2176kB go
    0.03u 4.93s 131.96r 2192kB luajit
    0.02u 2.97s 132.41r 2176kB python27
    0.07u 9.89s 275.16r 2192kB perl

    As you can see the c version is the fastest (41.3sec).

    Then I did some profiling that gave me the following results (top10
    using pprof tool):

    579 96.5% 96.5% 579 96.5% runtime.nanotime
    11 1.8% 98.3% 11 1.8% runtime.sigprocmask
    9 1.5% 99.8% 9 1.5% scanblock
    1 0.2% 100.0% 1 0.2% ReleaseN
    0 0.0% 100.0% 34 5.7% bufio.(*Reader).ReadBytes
    0 0.0% 100.0% 11 1.8% bufio.(*Reader).ReadSlice
    0 0.0% 100.0% 43 7.2% bufio.(*Reader).ReadString
    0 0.0% 100.0% 11 1.8% bufio.(*Reader).fill
    0 0.0% 100.0% 557 92.8% concatstring
    0 0.0% 100.0% 566 94.3% gostringsize


    This and the profiling graph tells me that most of the cpu is spent in
    doing garbage collection for the
    concatstring and gostringsize routines. Am I right?

    In the readFq() routine, there are plenty of len() and substring
    selections so the profiling results
    are not surprising.

    Do you see any obvious changes that can be made in the code to improve
    the performance?
    Any comments regarding the code are welcome.

    Thanks,
    -drd


    --


    --
  • Drio at Oct 13, 2012 at 12:58 pm

    On Saturday, October 13, 2012 1:29:00 AM UTC-5, Evan Shaw wrote:
    My biggest piece of advice would be: don't concatenate strings to
    build strings. Using []byte and append or bytes.Buffer will give you
    much better performance than you're getting now.
    Thanks.

    Your use of closures seems a little strange to me too, but I think
    getting rid of the string concatenations will help more than anything
    else.
    The idea of the whole code is to iterate over the records of a file. So I
    decided
    to code an iterator and for that I use a closure. For my first attempt I
    used
    channels, but the performance was worse. I will also rewrite the channel
    version
    so I use slices instead of strings to see how the performance increases.

    Could you please tell me more about why you think my closure usage is
    weird? What other approach would you follow?

    Thanks for the feedback!
    -drd

    --
  • Drio at Oct 13, 2012 at 5:47 pm
    Ok, I made the changes (https://gist.github.com/3882029) and here are
    the results of the benchmark:

    0.02u 4.06s 41.33r 2176kB c
    0.02u 5.02s 58.55r 2192kB go
    0.04u 5.14s 131.97r 2192kB python27
    0.03u 5.12s 270.17r 2176kB perl
    0.07u 11.62s 135.63r 2176kB luajit

    That's more reasonable.
    Still c is the clear winner, but that is fine, the c code has been heavily
    optimized.

    The two changes I have made:

    1. Use slices instead of strings (concat string is inefficient)
    2. Use ReadSlices instead of ReadBytes when reading lines.

    Please take a look to the code (https://gist.github.com/3882029) and let me
    know if you have more comments.

    @kortschak I will take a look to your implementation now.

    -drd
    On Saturday, October 13, 2012 7:50:27 AM UTC-5, drio wrote:

    Thanks great! I will post my results for the new version also.

    -drd
    On Saturday, October 13, 2012 2:15:56 AM UTC-5, DisposaBoy wrote:


    On Saturday, October 13, 2012 12:48:24 AM UTC+1, drio wrote:

    Hi,

    I ported a set of routines that read fastq files(
    http://en.wikipedia.org/wiki/FASTQ_format) to golang.
    The code is here: https://gist.github.com/3882029

    The idea is iterating over the input lines until you find a record that
    can be returned to the user.
    The main routine (readFq()) returns a closure. To get records, the user
    has to keep calling the closure
    until no more records are available.

    I did some basic benchmarking on different implementations(c, lua,
    python and perl) of the algorithm
    with the following results:

    0.03u 3.97s 41.34r 2176kB c
    0.06u 8.69s 109.50r 2176kB go
    0.03u 4.93s 131.96r 2192kB luajit
    0.02u 2.97s 132.41r 2176kB python27
    0.07u 9.89s 275.16r 2192kB perl

    As you can see the c version is the fastest (41.3sec).

    Then I did some profiling that gave me the following results (top10
    using pprof tool):

    579 96.5% 96.5% 579 96.5% runtime.nanotime
    11 1.8% 98.3% 11 1.8% runtime.sigprocmask
    9 1.5% 99.8% 9 1.5% scanblock
    1 0.2% 100.0% 1 0.2% ReleaseN
    0 0.0% 100.0% 34 5.7% bufio.(*Reader).ReadBytes
    0 0.0% 100.0% 11 1.8% bufio.(*Reader).ReadSlice
    0 0.0% 100.0% 43 7.2% bufio.(*Reader).ReadString
    0 0.0% 100.0% 11 1.8% bufio.(*Reader).fill
    0 0.0% 100.0% 557 92.8% concatstring
    0 0.0% 100.0% 566 94.3% gostringsize


    This and the profiling graph tells me that most of the cpu is spent in
    doing garbage collection for the
    concatstring and gostringsize routines. Am I right?

    In the readFq() routine, there are plenty of len() and substring
    selections so the profiling results
    are not surprising.

    Do you see any obvious changes that can be made in the code to improve
    the performance?
    Any comments regarding the code are welcome.

    Thanks,
    -drd

    I did an unverified straight port to using []byte instead of string (so
    no expensive string concat) I see the following timings on a sample file I
    found online. https://gist.github.com/3883645

    original

    750000 27000000 27000000
    time 6s, 197ms
    peak 2M, 956K

    using ReadSlice

    750000 27000000 27000000
    time 2s, 664ms
    peak 2M, 664K

    using ReadBytes

    750000 27000000 27000000
    time 3s, 127ms
    peak 2M, 680K


    --
  • Evan Shaw at Oct 13, 2012 at 8:22 pm

    On Sun, Oct 14, 2012 at 6:47 AM, drio wrote:
    Please take a look to the code (https://gist.github.com/3882029) and let me
    know if you have more comments.
    One big improvement would be, instead of setting seq and qual to nil
    on line 55, resetting their lengths. In other words:

    seq = seq[:0]
    qual = qual[:0]

    This way you're reusing space you've already allocated instead of
    allocating new space. This cuts about 20% of the processing time for
    some sample file I found from a Google search.

    I get another measurable improvement using bytes.SplitN(last, space,
    1) since you're only using the first result from bytes.Split. You
    might be able to do even better with bytes.IndexByte, but I had
    trouble getting it to work with my sample file.

    As for the closures, I think it would be more idiomatic (and
    potentially more efficient) to create objects instead of closures.
    Closures and single-method objects are basically equivalent, but when
    in doubt I'd usually choose an object. I get another measurable speed
    improvement by using an object in place of the closures.

    My version of the code, with the above changes and a few smaller
    refactorings, is here: https://gist.github.com/3885963

    (I pasted your Xopen function in my version too since the go tool was
    having trouble getting your package.)

    I don't know how to verify that it actually still works correctly, but
    my transformations were fairly mechanical so hopefully I haven't
    screwed anything up.

    - Evan

    --
  • Drio at Oct 13, 2012 at 10:47 pm
    Thanks again Evan,

    Your four changes, particularly the first two ones have had a dramatical
    impact in
    performance. Now, go is the fastest implementation:

    0.03u 4.19s 41.34r 2192kB c
    0.04u 8.58s 40.03r 2192kB go

    I still have to do some more tests but I find this particularly remarkable
    specially
    considering how the c version was implemented.

    I used your code and fixed a bug (not introduced by your changes).

    So to sum up what we did:

    1. operate in bytes[]
    2. when reading from the reader use ReadSlice() to get your bytes
    3. To reuse slices reset their lengths ([:0]) so we reuse the memory we
    have already allocated and
    we avoid GC operations.
    4. Implement the iterations by creating objects (setting objects to the
    main struct) instead of
    using closures.
    5. Use .SplitN() instead of Split() since we only care about the first part
    of the string, we about also GC ops.

    Latest code here: https://gist.github.com/3886423 (I am still using my
    files package to
    access Xopen).

    Way to go!
    -drd


    On Saturday, October 13, 2012 3:15:12 PM UTC-5, Evan Shaw wrote:

    On Sun, Oct 14, 2012 at 6:47 AM, drio <driod...@gmail.com <javascript:>>
    wrote:
    Please take a look to the code (https://gist.github.com/3882029) and let me
    know if you have more comments.
    One big improvement would be, instead of setting seq and qual to nil
    on line 55, resetting their lengths. In other words:

    seq = seq[:0]
    qual = qual[:0]

    This way you're reusing space you've already allocated instead of
    allocating new space. This cuts about 20% of the processing time for
    some sample file I found from a Google search.

    I get another measurable improvement using bytes.SplitN(last, space,
    1) since you're only using the first result from bytes.Split. You
    might be able to do even better with bytes.IndexByte, but I had
    trouble getting it to work with my sample file.

    As for the closures, I think it would be more idiomatic (and
    potentially more efficient) to create objects instead of closures.
    Closures and single-method objects are basically equivalent, but when
    in doubt I'd usually choose an object. I get another measurable speed
    improvement by using an object in place of the closures.

    My version of the code, with the above changes and a few smaller
    refactorings, is here: https://gist.github.com/3885963

    (I pasted your Xopen function in my version too since the go tool was
    having trouble getting your package.)

    I don't know how to verify that it actually still works correctly, but
    my transformations were fairly mechanical so hopefully I haven't
    screwed anything up.

    - Evan
    --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedOct 13, '12 at 5:02a
activeOct 13, '12 at 10:47p
posts10
users4
websitegolang.org

People

Translate

site design / logo © 2022 Grokbase