FAQ
That's super helpful as a place to get started, thanks. I'll dig in to
what's been written there.


On Friday, January 23, 2015 at 3:46:33 PM UTC-8, Peter Vessenes wrote:

Hi all,

I'm working on a program that loads up 100+GB of data into three
datastructures, two maps and an array. These are then only used for data
analysis and never changed again.

Once the data is loaded, it's used to back analytics for a web service.

The problem I'm currently facing is 13-15second GC times, quite often.
Everything stops while we wait for the GC to run, and obviously that's too
long for anything really, but certainly a web API call.

Any help or ideas? Intuitively, it seems to me there should be some way to
mark this data "constant" or "ignore" or "locked" and have the GC just skip
it, but I can't seem to find anything in go that would facilitate this. I
had great success running without the garbage collector until inevitably
ran out of addressable memory; my server has 256gig, but go1.4 only allows
128GB of allocation so that's out. (Many) more details below.

Thanks,

Peter

Details:
At peak loading the objects in creates about 1 billion objects according
to gctrace data. I spent the last few days working on trimming this number
down, but I've only got it down to 300million or so, and that's through
some fairly aggressive work. It hasn't changed the GC runtime very much,
which makes me think the allocations / deallocations aren't the big deal,
it's the amount of memory being used. In any event post-loading the
700million objects used to load and parse can be collected.

I started with the following datatypes:

type Data struct {
Name []string
Subs []Subdata
Id string
}

type Subdata struct {
id []string
other int
}

data := []Datastruct // about 50 million of these
lookup_one := map[string]string // relating two data items
lookup_two := map[string]int // relating data name to position in array

Post-testing, I have concluded that
data := make([]Datastruct2, 50000000)
l1 := map[[10]byte][10]byte
l2 := map[[10]byte][10]byte

and rewritten datastructures:

Datastruct2 {
Name [10]byte
Subs [100]Sub2
id [10]byte
}

Sub2 {
ID [10]byte
N int
}

Of course, this comes at a cost for dev time and annoyance, lots of byte
casting now and more memory use in the typical use case with 1 or 2 subs
worth of data, not 100. It's a large codebase, so I'm still in the middle
of reworking it all. But, as I'm doing it, I have begun to worry. I haven't
really seen anything which indicates I'm going to get down to 50-100ms GC
time in what I'm doing, that would be probably the max acceptable for a
realtime web API.

How can I achieve this with my data and use case in go? Any tips, help,
thoughts are appreciated.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 23 | next ›
Discussion Overview
groupgolang-nuts @
categoriesgo
postedJan 23, '15 at 11:46p
activeFeb 1, '15 at 1:47a
posts23
users11
websitegolang.org

People

Translate

site design / logo © 2021 Grokbase