FAQ
I'm working on a machine learning problem at the moment, translating
some MATLAB to Go. The final stage of handling the problem is to reduce
the solution set size so that only unique memberships are included in
the final result. In MATLAB you can use the unique function, but because
membership sets are []bool in my implementation (comments on this
welcome) I cannot use my normal approach to getting sets (using maps)...
unless I abuse unsafe, which is what I have done [1].

The approach is to define a func:

func unsafeKey(b []bool) string {
  return *(*string)(unsafe.Pointer(&b))
}

and key on the returned string. The map is only temporarily kept, but
can anyone see any significant problems with this approach (other than
limitation of platform the code can run on).

As expected, on small test data sets it works fine and the data are not
concurrently accessed. Performance is critical, so fmt.Sprint(b) is not
an option.

thanks
Dan

[1]http://play.golang.org/p/bedLbSQEjj

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Search Discussions

  • Kevin Malachowski at Mar 4, 2014 at 12:03 am
    Do you know some sort of maximum size for your bool slice? Arrays are hashable but are fixed length.

    Also I'm not sure if memory concerns are the type of concerns you're worried about, but note that every bool entry takes up a whole byte. Switching to a byte array/slice and wrapping it in a type which does bitwise operations via methods would save you 8 times the space and will likely be faster to hash.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Dan Kortschak at Mar 4, 2014 at 12:06 am

    On Mon, 2014-03-03 at 16:03 -0800, Kevin Malachowski wrote:
    Do you know some sort of maximum size for your bool slice? Arrays are
    hashable but are fixed length.
    Not knowable a priori.
    Also I'm not sure if memory concerns are the type of concerns you're
    worried about, but note that every bool entry takes up a whole byte.
    Switching to a byte array/slice and wrapping it in a type which does
    bitwise operations via methods would save you 8 times the space and
    will likely be faster to hash.
    Yeah, I thought about that, but there's more work in the manipulation of
    memberships than in the final culling of solutions, so the minor hit in
    hashing time probably not worth the extra complexity elsewhere. I'll
    keep it in mind though when I scale up.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Kevin Malachowski at Mar 4, 2014 at 12:05 am
    Do you know some sort of maximum size for your bool slice? Arrays are hashable but are fixed length.

    Also I'm not sure if memory concerns are the type of concerns you're worried about, but note that every bool entry takes up a whole byte. Switching to a byte array/slice and wrapping it in a type which does bitwise operations via methods would save you 8 times the space and will likely be faster to hash.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Kevin Malachowski at Mar 4, 2014 at 12:05 am
    Do you know some sort of maximum size for your bool slice? Arrays are hashable but are fixed length.

    Also I'm not sure if memory concerns are the type of concerns you're worried about, but note that every bool entry takes up a whole byte. Switching to a byte array/slice and wrapping it in a type which does bitwise operations via methods would save you 8 times the space and will likely be faster to hash.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Tamás Gulácsi at Mar 4, 2014 at 5:27 am
    math/big.Int ? It has bit set/get functions and a String(), too.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Dan Kortschak at Mar 4, 2014 at 10:24 pm
    I thought about that, but I was trying to avoid the string allocation
    and the internal slice growing, but if I preallocate using SetBits it
    then only has the string allocation. Might be worth trying.

    thanks
    On Mon, 2014-03-03 at 21:27 -0800, Tamás Gulácsi wrote:
    math/big.Int ? It has bit set/get functions and a String(), too.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Egon at Mar 4, 2014 at 5:30 am
    What are the exact constraints on the set? Do you need referencing the set
    members? How is used only add/remove or both interleaved in a short amount
    of time? Can you add them items into the set in order? (If you can keep
    them ordered you can get nice speed benefits).

    I don't quite understand what the "unique" func should do (I didn't try too
    hard either :)). Are you trying to compute the intersection/union of the
    sets or remove the duplicate solutions?

    Also, I have different set implementations here:
    https://github.com/egonelbre/spexs/tree/dev/src/set

    + egon
    On Tuesday, March 4, 2014 1:44:43 AM UTC+2, kortschak wrote:

    I'm working on a machine learning problem at the moment, translating
    some MATLAB to Go. The final stage of handling the problem is to reduce
    the solution set size so that only unique memberships are included in
    the final result. In MATLAB you can use the unique function, but because
    membership sets are []bool in my implementation (comments on this
    welcome) I cannot use my normal approach to getting sets (using maps)...
    unless I abuse unsafe, which is what I have done [1].

    The approach is to define a func:

    func unsafeKey(b []bool) string {
    return *(*string)(unsafe.Pointer(&b))
    }

    and key on the returned string. The map is only temporarily kept, but
    can anyone see any significant problems with this approach (other than
    limitation of platform the code can run on).

    As expected, on small test data sets it works fine and the data are not
    concurrently accessed. Performance is critical, so fmt.Sprint(b) is not
    an option.

    thanks
    Dan

    [1]http://play.golang.org/p/bedLbSQEjj
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Dan Kortschak at Mar 4, 2014 at 10:33 pm

    On Mon, 2014-03-03 at 21:30 -0800, egon wrote:
    What are the exact constraints on the set?
    Unique membership.
    Do you need referencing the set members? Yes.
    How is used only add/remove or both interleaved in a short amount
    of time?
    Add/ClearAll.
    Can you add them items into the set in order? (If you can keep
    them ordered you can get nice speed benefits).
    No.

    I don't quite understand what the "unique" func should do (I didn't try too
    hard either :)). Are you trying to compute the intersection/union of the
    sets or remove the duplicate solutions?
    Basically the algorithm is doing feature detection and allocation input
    components to discovered features. Each feature has an associated set of
    input components (sets are not disjoint) that have been used to discover
    the feature. At the end of the process there are more features than
    necessary, so similar features that have the same input component set
    are merged.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Egon at Mar 5, 2014 at 5:28 am

    On Wednesday, March 5, 2014 12:33:08 AM UTC+2, kortschak wrote:
    On Mon, 2014-03-03 at 21:30 -0800, egon wrote:
    What are the exact constraints on the set?
    Unique membership.
    Yeah, the word "constraint" wasn't probably the one I should've used... I
    was thinking more in the lines of properties... sparsity etc.

    Do you need referencing the set members? Yes.
    How is used only add/remove or both interleaved in a short amount
    of time?
    Add/ClearAll.
    Can you add them items into the set in order? (If you can keep
    them ordered you can get nice speed benefits).
    No.

    I don't quite understand what the "unique" func should do (I didn't try too
    hard either :)). Are you trying to compute the intersection/union of the
    sets or remove the duplicate solutions?
    Basically the algorithm is doing feature detection and allocation input
    components to discovered features. Each feature has an associated set of
    input components (sets are not disjoint) that have been used to discover
    the feature. At the end of the process there are more features than
    necessary, so similar features that have the same input component set
    are merged.
    Here's my first idea... store your bits as []uint32 and then do a divide
    and conquer for finding the unique items
    http://play.golang.org/p/Z7etvORaGP (untested code)

    The second idea I had was to do it in two passes... if you have your bits
    in []uint32 and then based on the hash allocate into buckets... then make
    the buckets into unique elements with some trivial approach.

    Oh... just had another idea... essentially what you are trying to do is
    sort the bitsets by some ordering... if you have that order you can remove
    the duplicates quite easily. In other words quick sort the array of bitsets
    and you should be able to remove the duplicates quite easily, without using
    maps. I.e. first sort by bits[0], then sort the subsets by bits[1], then by
    bits[2]... etc.

    I'm not sure about the performance, but all of them should be safer...

    + egon

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Nick Craig-Wood at Mar 4, 2014 at 5:16 pm

    On 03/03/14 23:44, Dan Kortschak wrote:
    I'm working on a machine learning problem at the moment, translating
    some MATLAB to Go. The final stage of handling the problem is to reduce
    the solution set size so that only unique memberships are included in
    the final result. In MATLAB you can use the unique function, but because
    membership sets are []bool in my implementation (comments on this
    welcome) I cannot use my normal approach to getting sets (using maps)...
    unless I abuse unsafe, which is what I have done [1].

    The approach is to define a func:

    func unsafeKey(b []bool) string {
    return *(*string)(unsafe.Pointer(&b))
    }
    That is horrible!

    Essentially you are casting

    struct Slice
    { // must not move anything
      byte* array; // actual data
      uintgo len; // number of elements
      uintgo cap; // allocated number of elements
    };

    To

    struct String
    {
      byte* str;
      intgo len;
    };

    Which works but the fact that it works is surely an implementation detail.

    If you wanted to be a little safer you could define a go (not cgo) c
    file like this

    #include "runtime.h"

    void ·unsafeKey2(Slice b, String s) {
         s.str = b.array;
         s.len = b.len;
         FLUSH(&s);
    }

    Along with a forward definition

    // Defined in C
    func unsafeKey(b []bool) string

    See Dave Cheneys great blog post for more info
    http://dave.cheney.net/2013/09/07/how-to-include-c-code-in-your-go-package

    --
    Nick Craig-Wood <nick@craig-wood.com> -- http://www.craig-wood.com/nick

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Dan Kortschak at Mar 4, 2014 at 10:27 pm

    On Tue, 2014-03-04 at 17:16 +0000, Nick Craig-Wood wrote:
    The approach is to define a func:

    func unsafeKey(b []bool) string {
    return *(*string)(unsafe.Pointer(&b))
    }
    That is horrible!
    Thanks! It's a talent I guess. ;)
    Essentially you are casting

    struct Slice
    { // must not move anything
    byte* array; // actual data
    uintgo len; // number of elements
    uintgo cap; // allocated number of elements
    };

    To

    struct String
    {
    byte* str;
    intgo len;
    };

    Which works but the fact that it works is surely an implementation
    detail.
    It's implementation details the whole way down - I'm also assuming that
    true and false are both uniquely defined in the binary representation,
    which is currently true and likely to stay so, but not guaranteed.
    If you wanted to be a little safer you could define a go (not cgo) c
    file like this

    #include "runtime.h"

    void ·unsafeKey2(Slice b, String s) {
    s.str = b.array;
    s.len = b.len;
    FLUSH(&s);
    }

    Along with a forward definition

    // Defined in C
    func unsafeKey(b []bool) string

    See Dave Cheneys great blog post for more info
    http://dave.cheney.net/2013/09/07/how-to-include-c-code-in-your-go-package
    Thanks, I'll have a look at that.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Dan Kortschak at Mar 4, 2014 at 11:07 pm
    What does the 2 do in the name - Dave's article doesn't add that, and if
    I model the code after his examples (with the names matching exactly) I
    get a duplicate symbol error. In fact it doesn't seem the name makes any
    difference at all so long as there is no collision - so how does the
    compiler associate the correct implementation, particularly if say there
    are two funcs with same parameter lists?

    thanks
    On Tue, 2014-03-04 at 17:16 +0000, Nick Craig-Wood wrote:
    #include "runtime.h"

    void ·unsafeKey2(Slice b, String s) {
    s.str = b.array;
    s.len = b.len;
    FLUSH(&s);
    }

    Along with a forward definition

    // Defined in C
    func unsafeKey(b []bool) string

    See Dave Cheneys great blog post for more info
    http://dave.cheney.net/2013/09/07/how-to-include-c-code-in-your-go-package

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Dan Kortschak at Mar 4, 2014 at 11:14 pm
    Ignore that - brain fart.
    On Wed, 2014-03-05 at 09:37 +1030, Dan Kortschak wrote:
    What does the 2 do in the name - Dave's article doesn't add that, and
    if
    I model the code after his examples (with the names matching exactly)
    I
    get a duplicate symbol error. In fact it doesn't seem the name makes
    any
    difference at all so long as there is no collision - so how does the
    compiler associate the correct implementation, particularly if say
    there
    are two funcs with same parameter lists?

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Nick Craig-Wood at Mar 5, 2014 at 4:18 pm

    On 04/03/14 23:07, Dan Kortschak wrote:
    On Tue, 2014-03-04 at 17:16 +0000, Nick Craig-Wood wrote:
    #include "runtime.h"

    void ·unsafeKey2(Slice b, String s) {
    s.str = b.array;
    s.len = b.len;
    FLUSH(&s);
    }

    Along with a forward definition

    // Defined in C
    func unsafeKey(b []bool) string

    See Dave Cheneys great blog post for more info
    http://dave.cheney.net/2013/09/07/how-to-include-c-code-in-your-go-package
    What does the 2 do in the name - Dave's article doesn't add that, and if
    I model the code after his examples (with the names matching exactly) I
    get a duplicate symbol error. In fact it doesn't seem the name makes any
    difference at all so long as there is no collision - so how does the
    compiler associate the correct implementation, particularly if say there
    are two funcs with same parameter lists?
    Sorry I messed up with my copy and paste - remove the 2!

    --
    Nick Craig-Wood <nick@craig-wood.com> -- http://www.craig-wood.com/nick

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Minux at Mar 5, 2014 at 4:49 am

    On Mon, Mar 3, 2014 at 6:44 PM, Dan Kortschak wrote:
    The approach is to define a func:

    func unsafeKey(b []bool) string {
    return *(*string)(unsafe.Pointer(&b))
    }
    Yes, this is very hacky, but did you know that Brad once mentioned (in a
    thread about
    readonly view of byte slices, IIRC), he used something like this to use
    []byte as string
    for map keys, so it's not without precedence.

    well, not quite the same, because your function assumes more than the
    (one-way) compatibility
    of []bool and string: e.g. each bool is stored canonically in 1 byte (not
    e.g. non-zero for true and
    0 for false). (Note: the spec doesn't guarantee that each bool is store in
    1 byte, it just says
    each bool takes at least 1 byte, but the compiler is free to make it as
    large as it pleases).

    If you want to be safer, use the explicit StringHeader approach and switch
    to []byte instead
    of []bool (depending on your operations, it might actually be faster, be
    sure to write your
    operation without using jmp, normally it's possible)

    This Go version:
    func unsafeKey(b []byte) string {
             var a reflect.StringHeader
             a.Data = uintptr(unsafe.Pointer(&b[0]))
             a.Len = len(b)
             return *(*string)(unsafe.Pointer(&a))
    }

    is still inlineable and but has b escaped. The way to solve this is to
    rewrite it in Plan 9 C,
    and label it //go:noescape. having escaped parameter is somewhat worse than
    non-inlinable.
    but of course, this still depends on the caller: if b is already on heap
    (which is almost always
    the case), then I'd prefer the Go version.

    Yes, it's complex trade-off here. You should definitely measure.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedMar 3, '14 at 11:44p
activeMar 5, '14 at 4:18p
posts16
users6
websitegolang.org

People

Translate

site design / logo © 2022 Grokbase