FAQ
Given the persistent issues with big hash tables requiring long GC pause
times that have lead to discussion[1][2][3] without a standard solution,
I've implemented[4] an entirely off-heap hash-table in Go. It uses a simple
Malloc()/Free() implementation based on mmap (gommap) to obtain memory
directly from the OS. It could just as easily use malloc from the C
runtime in you want use CGO, but the current implementation avoids that in
favor of an entirely Go-based solution for ease of "go get
  github.com/glycerine/go-offheap-hashtable".

This off-heap hash table is based on a simple open-addressing with
linear-probing and the speedy xxhash64 hash function[5]. The implementation
was inspired by Jeff Preshing's blog article where he shows that such a
strategy can best even the highly cache-tuned Judy array[6].

See here:

https://github.com/glycerine/go-offheap-hashtable

-Jason

[1] https://github.com/golang/go/issues/9477
[2] https://groups.google.com/d/msg/golang-nuts/kCQP6S6ZGh0/JEuY3A8xEygJ
[3] https://groups.google.com/d/msg/golang-nuts/cmpiArv10f4/UQZ39bLaYpQJ
[4] https://github.com/glycerine/go-offheap-hashtable
[5] https://github.com/OneOfOne/xxhash
[6] http://preshing.com/20130107/this-hash-table-is-faster-than-a-judy-array/

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

  • Nick Craig-Wood at Jan 5, 2015 at 8:51 am

    On 05/01/15 02:26, Jason E. Aten wrote:
    Given the persistent issues with big hash tables requiring long GC pause
    times that have lead to discussion[1][2][3] without a standard solution,
    I've implemented[4] an entirely off-heap hash-table in Go. It uses a
    simple Malloc()/Free() implementation based on mmap (gommap) to obtain
    memory directly from the OS. It could just as easily use malloc from
    the C runtime in you want use CGO, but the current implementation avoids
    that in favor of an entirely Go-based solution for ease of "go get
    github.com/glycerine/go-offheap-hashtable".

    This off-heap hash table is based on a simple open-addressing with
    linear-probing and the speedy xxhash64 hash function[5]. The
    implementation was inspired by Jeff Preshing's blog article where he
    shows that such a strategy can best even the highly cache-tuned Judy
    array[6].

    See here:

    https://github.com/glycerine/go-offheap-hashtable
    Interesting idea!

    I had a look at the godoc for the package

    http://godoc.org/github.com/glycerine/go-offheap-hashtable

    It wasn't immediately obvious how to use the hash table - it desperately
    needs an example.

    There seem to be too many implementation details which are public.

    IMHO all the public types, functions and methods should have a doc string.

    --
    Nick Craig-Wood <nick@craig-wood.com> -- http://www.craig-wood.com/nick

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Jason E. Aten at Jan 5, 2015 at 9:08 am
    I'll see about making the godoc look better, thanks for pointing that out.

    There are usage examples in most of the test files. bytekey_test.go,
    insert_test.go, string_test.go, and rand_test.go demonstrate using various
    different types of keys, such as string, []byte, and int.


    On Mon, Jan 5, 2015 at 12:51 AM, Nick Craig-Wood wrote:
    On 05/01/15 02:26, Jason E. Aten wrote:
    Given the persistent issues with big hash tables requiring long GC pause
    times that have lead to discussion[1][2][3] without a standard solution,
    I've implemented[4] an entirely off-heap hash-table in Go. It uses a
    simple Malloc()/Free() implementation based on mmap (gommap) to obtain
    memory directly from the OS. It could just as easily use malloc from
    the C runtime in you want use CGO, but the current implementation avoids
    that in favor of an entirely Go-based solution for ease of "go get
    github.com/glycerine/go-offheap-hashtable".

    This off-heap hash table is based on a simple open-addressing with
    linear-probing and the speedy xxhash64 hash function[5]. The
    implementation was inspired by Jeff Preshing's blog article where he
    shows that such a strategy can best even the highly cache-tuned Judy
    array[6].

    See here:

    https://github.com/glycerine/go-offheap-hashtable
    Interesting idea!

    I had a look at the godoc for the package

    http://godoc.org/github.com/glycerine/go-offheap-hashtable

    It wasn't immediately obvious how to use the hash table - it desperately
    needs an example.

    There seem to be too many implementation details which are public.

    IMHO all the public types, functions and methods should have a doc string.

    --
    Nick Craig-Wood <nick@craig-wood.com> -- http://www.craig-wood.com/nick
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Nate Finch at Jan 5, 2015 at 12:26 pm
    Totally agree about not making implementation details public. For example, IsDir is not a function that a package that is about HashTables should expose. Only expose what you absolutely must for the package to be useful.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Egon at Jan 5, 2015 at 1:10 pm
    As everyone else has commented, make the public API significantly smaller.

    I would probably start with something like:
    http://play.golang.org/p/-8FueAXbo1
    Remove everything else (except those loading/saving to disk with mmap).

    Any other opinions how to improve the API / what to keep and leave out from
    the API.

    I also considered making separate table for each key/value type. Although
    it would make things more confusing.

    + Egon
    On Monday, 5 January 2015 04:26:28 UTC+2, Jason E. Aten wrote:

    Given the persistent issues with big hash tables requiring long GC pause
    times that have lead to discussion[1][2][3] without a standard solution,
    I've implemented[4] an entirely off-heap hash-table in Go. It uses a simple
    Malloc()/Free() implementation based on mmap (gommap) to obtain memory
    directly from the OS. It could just as easily use malloc from the C
    runtime in you want use CGO, but the current implementation avoids that in
    favor of an entirely Go-based solution for ease of "go get
    github.com/glycerine/go-offheap-hashtable".

    This off-heap hash table is based on a simple open-addressing with
    linear-probing and the speedy xxhash64 hash function[5]. The implementation
    was inspired by Jeff Preshing's blog article where he shows that such a
    strategy can best even the highly cache-tuned Judy array[6].

    See here:

    https://github.com/glycerine/go-offheap-hashtable

    -Jason

    [1] https://github.com/golang/go/issues/9477
    [2] https://groups.google.com/d/msg/golang-nuts/kCQP6S6ZGh0/JEuY3A8xEygJ
    [3] https://groups.google.com/d/msg/golang-nuts/cmpiArv10f4/UQZ39bLaYpQJ
    [4] https://github.com/glycerine/go-offheap-hashtable
    [5] https://github.com/OneOfOne/xxhash
    [6]
    http://preshing.com/20130107/this-hash-table-is-faster-than-a-judy-array/
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Jason E. Aten at Jan 5, 2015 at 6:43 pm
    Thanks for the suggestions guys. I've simplified the exposed methods and
    added docs to them.
    On Mon, Jan 5, 2015 at 5:10 AM, Egon wrote:

    As everyone else has commented, make the public API significantly smaller.

    I would probably start with something like:
    http://play.golang.org/p/-8FueAXbo1
    Remove everything else (except those loading/saving to disk with mmap).

    Any other opinions how to improve the API / what to keep and leave out
    from the API.

    I also considered making separate table for each key/value type. Although
    it would make things more confusing.

    + Egon
    On Monday, 5 January 2015 04:26:28 UTC+2, Jason E. Aten wrote:

    Given the persistent issues with big hash tables requiring long GC pause
    times that have lead to discussion[1][2][3] without a standard solution,
    I've implemented[4] an entirely off-heap hash-table in Go. It uses a simple
    Malloc()/Free() implementation based on mmap (gommap) to obtain memory
    directly from the OS. It could just as easily use malloc from the C
    runtime in you want use CGO, but the current implementation avoids that in
    favor of an entirely Go-based solution for ease of "go get
    github.com/glycerine/go-offheap-hashtable".

    This off-heap hash table is based on a simple open-addressing with
    linear-probing and the speedy xxhash64 hash function[5]. The implementation
    was inspired by Jeff Preshing's blog article where he shows that such a
    strategy can best even the highly cache-tuned Judy array[6].

    See here:

    https://github.com/glycerine/go-offheap-hashtable

    -Jason

    [1] https://github.com/golang/go/issues/9477
    [2] https://groups.google.com/d/msg/golang-nuts/kCQP6S6ZGh0/JEuY3A8xEygJ
    [3] https://groups.google.com/d/msg/golang-nuts/cmpiArv10f4/UQZ39bLaYpQJ
    [4] https://github.com/glycerine/go-offheap-hashtable
    [5] https://github.com/OneOfOne/xxhash
    [6] http://preshing.com/20130107/this-hash-table-is-
    faster-than-a-judy-array/

    --
    You received this message because you are subscribed to a topic in the
    Google Groups "golang-nuts" group.
    To unsubscribe from this topic, visit
    https://groups.google.com/d/topic/golang-nuts/o7ABs3i8lHM/unsubscribe.
    To unsubscribe from this group and all its topics, send an email to
    golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.


    --

    Best regards,
    Jason

    --
    Jason E. Aten, Ph.D.
    j.e.aten@gmail.com
    650-429-8602
    linkedin: https://www.linkedin.com/pub/jason-e-aten-ph-d/18/313/45a

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Egon at Jan 5, 2015 at 7:30 pm

    On Monday, 5 January 2015 20:44:14 UTC+2, Jason E. Aten wrote:
    Thanks for the suggestions guys. I've simplified the exposed methods and
    added docs to them.
    You also might want to make the public import as
    "github.com/glycerine/offheap".

    Also, there might be a better name for it - using the negation form of "not
    on the heap" is probably not that informative.

    Is it necessary to have it as "offheap.HashTable" and not "offheap.Table"?
    I'm just thinking whether the "Hash" part adds any clarity to the code that
    uses it? To me it really doesn't add any useful information.

    Also try to get rid of as much stutter as possible, e.g.
    func (t *HashTable) DestroyHashTable()
    -->
    func (t *HashTable) Destroy()

    Diagnostic stuff probably shouldn't be part of public API
    func (*HashTable) Dump

    http://godoc.org/github.com/glycerine/go-offheap-hashtable#MmapMalloc
    That probably could be cleaned up or made clearer what it is... e.g. name
    it:
    TableFile or something like that.

    Similarly Malloc -> NewFileBackedTable() or NewFileTable()

    http://godoc.org/github.com/glycerine/go-offheap-hashtable#ByteKeyHashTable.DeleteBK
    For these don't add the suffixes, because the type already contains
    "ByteKey" it should be obvious that Insert/Delete/Lookup take "[]byte" as
    argument. (Similarly for the StringHashTable).

    http://godoc.org/github.com/glycerine/go-offheap-hashtable#Key_t
    Rename that to something simple like "Key".
    http://godoc.org/github.com/glycerine/go-offheap-hashtable#Val_t
    Rename that to something simple like "Value".

    https://github.com/glycerine/go-offheap-hashtable/blob/master/offheap.go#L434
    I would say put the Iterator constructor on the hashtable and not as a
    separate constructor.

    http://godoc.org/github.com/glycerine/go-offheap-hashtable#Val_t
    If you intend some of the parameters be modified by the package user, make
    them constants e.g.
    const (
       KeySize = 64
       CellSize = 128
    )
    and place them in a separate file - that way you can point to it in the
    documentation.

    On Mon, Jan 5, 2015 at 5:10 AM, Egon <egon...@gmail.com <javascript:>>
    wrote:
    As everyone else has commented, make the public API significantly smaller.

    I would probably start with something like:
    http://play.golang.org/p/-8FueAXbo1
    Remove everything else (except those loading/saving to disk with mmap).

    Any other opinions how to improve the API / what to keep and leave out
    from the API.

    I also considered making separate table for each key/value type. Although
    it would make things more confusing.

    + Egon
    On Monday, 5 January 2015 04:26:28 UTC+2, Jason E. Aten wrote:

    Given the persistent issues with big hash tables requiring long GC pause
    times that have lead to discussion[1][2][3] without a standard solution,
    I've implemented[4] an entirely off-heap hash-table in Go. It uses a simple
    Malloc()/Free() implementation based on mmap (gommap) to obtain memory
    directly from the OS. It could just as easily use malloc from the C
    runtime in you want use CGO, but the current implementation avoids that in
    favor of an entirely Go-based solution for ease of "go get
    github.com/glycerine/go-offheap-hashtable".

    This off-heap hash table is based on a simple open-addressing with
    linear-probing and the speedy xxhash64 hash function[5]. The implementation
    was inspired by Jeff Preshing's blog article where he shows that such a
    strategy can best even the highly cache-tuned Judy array[6].

    See here:

    https://github.com/glycerine/go-offheap-hashtable

    -Jason

    [1] https://github.com/golang/go/issues/9477
    [2] https://groups.google.com/d/msg/golang-nuts/kCQP6S6ZGh0/JEuY3A8xEygJ
    [3] https://groups.google.com/d/msg/golang-nuts/cmpiArv10f4/UQZ39bLaYpQJ
    [4] https://github.com/glycerine/go-offheap-hashtable
    [5] https://github.com/OneOfOne/xxhash
    [6] http://preshing.com/20130107/this-hash-table-is-
    faster-than-a-judy-array/

    --
    You received this message because you are subscribed to a topic in the
    Google Groups "golang-nuts" group.
    To unsubscribe from this topic, visit
    https://groups.google.com/d/topic/golang-nuts/o7ABs3i8lHM/unsubscribe.
    To unsubscribe from this group and all its topics, send an email to
    golang-nuts...@googlegroups.com <javascript:>.
    For more options, visit https://groups.google.com/d/optout.


    --

    Best regards,
    Jason

    --
    Jason E. Aten, Ph.D.
    j.e....@gmail.com <javascript:>
    650-429-8602
    linkedin: https://www.linkedin.com/pub/jason-e-aten-ph-d/18/313/45a
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Jason E. Aten at Jan 5, 2015 at 9:11 pm

    On Mon, Jan 5, 2015 at 11:30 AM, Egon wrote:

    You also might want to make the public import as "
    github.com/glycerine/offheap".
    Thank you, Egon. I like this shorter form. I've renamed it just "offheap"
    and put it here:

    github.com/glycerine/offheap

      I'll take a look at your other suggestions too. Some of them are just
    stylistic, and I tend to prefer clarity over brevity, but others are worth
    doing. May take me a bit to get to it though.

    Best,
    -Jason

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Egon at Jan 5, 2015 at 9:45 pm

    On Monday, 5 January 2015 23:11:42 UTC+2, Jason E. Aten wrote:
    On Mon, Jan 5, 2015 at 11:30 AM, Egon <egon...@gmail.com <javascript:>>
    wrote:
    You also might want to make the public import as "
    github.com/glycerine/offheap".
    Thank you, Egon. I like this shorter form. I've renamed it just "offheap"
    and put it here:

    github.com/glycerine/offheap

    I'll take a look at your other suggestions too. Some of them are just
    stylistic, and I tend to prefer clarity over brevity,
    Take *usability* as the primary goal - not clarity nor brevity. Clarity and
    brevity are often the side-effects of usability.

    e.g.
    "i" is very concise, but still very usable
    for i := 0; i < 10; i += 1 {

    There is probably no value in making it longer:
    for iteratorindex := 0; iteratorindex < 10; iteratorindex += 1 {

    This is clearer in its meaning - but it is not more usable.

    In a similar situation some global
    var F map[string]string
    Is concise, but it's not really usable.

    My general rules for names is:
    1. does this name part provide any useful distinction from other things?
    2. can it be still clearly understood without it?

    If answers to 1 and 2 are "yes" I will remove that part - because it
    probably adds nothing useful to the name.

    + Egon

    but others are worth doing. May take me a bit to get to it though.
    Best,
    -Jason
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedJan 5, '15 at 2:26a
activeJan 5, '15 at 9:45p
posts9
users4
websitegolang.org

People

Translate

site design / logo © 2022 Grokbase