I'm currently working on a Go library that ingests a large corpus of text
(on the order of 10s of GBs) and outputs a CSV file that maps each unique
word to a unique index. I've been developing it on a late 2009 Macbook Pro
running Mac OS X 10.9, 4 GB RAM, and was able to do some significant memory
optimization. However, when I ran the same code on an Ubuntu machine (24 GB
RAM), the memory usage blew up and the program crashed. I ran the memory
profiler available through Go's "runtime" library on both machines to
compare the output:
Mac OS X 10.9:
Total: 50.5 MB
33.0 65.3% 65.3% 34.5 68.3% github.com/gtfierro/tokenizer.process
15.0 29.7% 95.0% 15.5 30.7%
1.5 3.0% 98.0% 1.5 3.0% evacuate
0.5 1.0% 99.0% 0.5 1.0% cnew
0.5 1.0% 100.0% 0.5 1.0% newdefer
This memory usage correlates with the "runtime" cpu profiler output, which
shows memory being taken up by the conversion of byte to string, which
happens in the tokenizer.tokenize method, and the formation of the map in
tokenizer.process. In the Ubuntu output below, 2GB of memory are taken up
by the bytes.Replace method, and secondly by the byte <-> string
conversion in tokenizer.tokenize. It doesn't seem like Go is garbage
collecting the code when it is run on Ubuntu, and when I run the code with
GOGCTRACE=1, Ubuntu seems to take up more and more memory and OSX regularly
discards the unneeded byte/string allocations as the program progresses.
Total: 4739.5 MB
2522.0 53.2% 53.2% 2522.0 53.2% bytes.Replace
2191.0 46.2% 99.4% 2191.0 46.2%
18.5 0.4% 99.8% 18.5 0.4%
6.5 0.1% 100.0% 6.5 0.1% runtime.malg
1.0 0.0% 100.0% 1.0 0.0% runtime.deferproc
0.5 0.0% 100.0% 4714.5 99.5%
0.0 0.0% 100.0% 6.5 0.1%
0.0 0.0% 100.0% 6.5 0.1% main.main
0.0 0.0% 100.0% 6.5 0.1% runtime.main
0.0 0.0% 100.0% 6.5 0.1% runtime.newproc
The test script I'm running is available
at http://play.golang.org/p/CUM90ZH25o, and the library I'm developing is
at https://github.com/gtfierro/tokenizer (the relevant code is in dict.go
I'm curious as to why the memory usage is so different. Is this because of
how Ubuntu handles memory vs how OSX handles memory? Is this an issue with
the Go compiler? Or is there something drastically wrong with my code?
Any help would be greatly appreciated, and I'm happy to run additional
benchmarks and the like.
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to email@example.com.
For more options, visit https://groups.google.com/groups/opt_out.