represent spatial data (high thoughput genome sequence data with
consideration of the, for those who are interested [1]).
The string fields of the records are from a small set of values (not
determinable prior to the analysis) - the data structure that describes
the fields is here [2] - but the record reading makes it likely
(certain?) that each instance of the string values will be distinct
items. I need to store the entire set of records, so would like to be
able to intern the string values. The application heap size is on the
order of 100GB when running the analysis - at the moment I just discard
these strings.
How would I go about this in a reasonably efficient manner?
I'm thinking something on the lines of this, but it feels magical:
type store map[string]string
func (is store) intern(s string) string {
t, ok := is[s]
if ok {
return t
}
is[s] = s
return s
}
The rationale for this is that if the string has been seen, the returned
string is the stored string, rather than the query. Ideally, I'd like to
not store the string except in the key space, but I can't think of a way
to get the string representation corresponding to the key rather than
the query.
Dan
[1]http://www.illumina.com/documents/products/techspotlights/techspotlight_sequencing.pdf
[2]http://godoc.org/code.google.com/p/biogo.illumina#Metadata
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.