On Thursday, October 9, 2014 9:45:13 PM UTC-7, Tamás Gulácsi wrote:
Python buffers files, so use buffered input in go, too! bufio to the
Actually, encoding/csv in go uses a bufio.Reader internally when you call
csv.NewReader, so both versions are using buffered input (provided python
I suspect that there are two main differences between python 2.7 csv, and
go csv causing some of the discrepancy:
1. python csv (in python2.7) doesn't support unicode, so unlike the go
version, it doesn't have to convert the input bytes into runes
2. python csv is written in C, while go csv is written in go
The first difference is easy to test, either by using the example code at
the bottom of https://docs.python.org/2/library/csv.html
to create a csv
parser that converts input data to utf-8, or by running the code using a
python3 interpreter (you'll need to use 2to3 first!). In both cases the
python time is brought closer to the go time on my machine (the py3 version
being faster than the py2 hack). Unfortunately, going the other way, ie:
changing go-csv to not directly support unicode (esp. for the purposes of
speed) would require a re-write of the library.
There are some ways the provided go code can be improved:
1. Since you're storing the entire contents of every record in the hash,
you can read all the data at once with the csv.Reader.ReadAll() method.
That should simplify the generation of the hash, as well as allow you to
pre-allocate it's size.
2. As suggested by Harry, the go CSV parser is for general use, and must
fully support unicode characters. A specific TSV parser using bufio.Scanner
and strings.Split is still unicode-safe, but avoids most of the parsing
speed penalties in encoding/csv
3. If you put (optional) timing tests in the code, you can easily see
where it's spending it's time.
These changes, as well as some small re-factoring can be seen
1. Provides a small but noticable speed-up.
2. Makes the code 4x faster
3. Shows that it's indeed the csv parsing that's taking the most time in
the code, as the comparisons take < 20ms on my machine, while parsing takes
1.35s (#1) and .35s (#2).
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firstname.lastname@example.org.
For more options, visit https://groups.google.com/d/optout.