the results of the benchmark:
0.02u 4.06s 41.33r 2176kB c
0.02u 5.02s 58.55r 2192kB go
0.04u 5.14s 131.97r 2192kB python27
0.03u 5.12s 270.17r 2176kB perl
0.07u 11.62s 135.63r 2176kB luajit
That's more reasonable.
Still c is the clear winner, but that is fine, the c code has been heavily
optimized.
The two changes I have made:
1. Use slices instead of strings (concat string is inefficient)
2. Use ReadSlices instead of ReadBytes when reading lines.
Please take a look to the code (https://gist.github.com/3882029) and let me
know if you have more comments.
@kortschak I will take a look to your implementation now.
-drd
On Saturday, October 13, 2012 7:50:27 AM UTC-5, drio wrote:
Thanks great! I will post my results for the new version also.
-drd
--Thanks great! I will post my results for the new version also.
-drd
On Saturday, October 13, 2012 2:15:56 AM UTC-5, DisposaBoy wrote:
no expensive string concat) I see the following timings on a sample file I
found online. https://gist.github.com/3883645
original
750000 27000000 27000000
time 6s, 197ms
peak 2M, 956K
using ReadSlice
750000 27000000 27000000
time 2s, 664ms
peak 2M, 664K
using ReadBytes
750000 27000000 27000000
time 3s, 127ms
peak 2M, 680K
On Saturday, October 13, 2012 12:48:24 AM UTC+1, drio wrote:
Hi,
I ported a set of routines that read fastq files(
http://en.wikipedia.org/wiki/FASTQ_format) to golang.
The code is here: https://gist.github.com/3882029
The idea is iterating over the input lines until you find a record that
can be returned to the user.
The main routine (readFq()) returns a closure. To get records, the user
has to keep calling the closure
until no more records are available.
I did some basic benchmarking on different implementations(c, lua,
python and perl) of the algorithm
with the following results:
0.03u 3.97s 41.34r 2176kB c
0.06u 8.69s 109.50r 2176kB go
0.03u 4.93s 131.96r 2192kB luajit
0.02u 2.97s 132.41r 2176kB python27
0.07u 9.89s 275.16r 2192kB perl
As you can see the c version is the fastest (41.3sec).
Then I did some profiling that gave me the following results (top10
using pprof tool):
579 96.5% 96.5% 579 96.5% runtime.nanotime
11 1.8% 98.3% 11 1.8% runtime.sigprocmask
9 1.5% 99.8% 9 1.5% scanblock
1 0.2% 100.0% 1 0.2% ReleaseN
0 0.0% 100.0% 34 5.7% bufio.(*Reader).ReadBytes
0 0.0% 100.0% 11 1.8% bufio.(*Reader).ReadSlice
0 0.0% 100.0% 43 7.2% bufio.(*Reader).ReadString
0 0.0% 100.0% 11 1.8% bufio.(*Reader).fill
0 0.0% 100.0% 557 92.8% concatstring
0 0.0% 100.0% 566 94.3% gostringsize
This and the profiling graph tells me that most of the cpu is spent in
doing garbage collection for the
concatstring and gostringsize routines. Am I right?
In the readFq() routine, there are plenty of len() and substring
selections so the profiling results
are not surprising.
Do you see any obvious changes that can be made in the code to improve
the performance?
Any comments regarding the code are welcome.
Thanks,
-drd
I did an unverified straight port to using []byte instead of string (soHi,
I ported a set of routines that read fastq files(
http://en.wikipedia.org/wiki/FASTQ_format) to golang.
The code is here: https://gist.github.com/3882029
The idea is iterating over the input lines until you find a record that
can be returned to the user.
The main routine (readFq()) returns a closure. To get records, the user
has to keep calling the closure
until no more records are available.
I did some basic benchmarking on different implementations(c, lua,
python and perl) of the algorithm
with the following results:
0.03u 3.97s 41.34r 2176kB c
0.06u 8.69s 109.50r 2176kB go
0.03u 4.93s 131.96r 2192kB luajit
0.02u 2.97s 132.41r 2176kB python27
0.07u 9.89s 275.16r 2192kB perl
As you can see the c version is the fastest (41.3sec).
Then I did some profiling that gave me the following results (top10
using pprof tool):
579 96.5% 96.5% 579 96.5% runtime.nanotime
11 1.8% 98.3% 11 1.8% runtime.sigprocmask
9 1.5% 99.8% 9 1.5% scanblock
1 0.2% 100.0% 1 0.2% ReleaseN
0 0.0% 100.0% 34 5.7% bufio.(*Reader).ReadBytes
0 0.0% 100.0% 11 1.8% bufio.(*Reader).ReadSlice
0 0.0% 100.0% 43 7.2% bufio.(*Reader).ReadString
0 0.0% 100.0% 11 1.8% bufio.(*Reader).fill
0 0.0% 100.0% 557 92.8% concatstring
0 0.0% 100.0% 566 94.3% gostringsize
This and the profiling graph tells me that most of the cpu is spent in
doing garbage collection for the
concatstring and gostringsize routines. Am I right?
In the readFq() routine, there are plenty of len() and substring
selections so the profiling results
are not surprising.
Do you see any obvious changes that can be made in the code to improve
the performance?
Any comments regarding the code are welcome.
Thanks,
-drd
no expensive string concat) I see the following timings on a sample file I
found online. https://gist.github.com/3883645
original
750000 27000000 27000000
time 6s, 197ms
peak 2M, 956K
using ReadSlice
750000 27000000 27000000
time 2s, 664ms
peak 2M, 664K
using ReadBytes
750000 27000000 27000000
time 3s, 127ms
peak 2M, 680K