FAQ
I just wanted to follow up on this. My basic question is: should I expect
Serializers registered with "cascading.kryo.registrations" to work for
objects that are passed through the job-conf, as in local-date (an instance
of org.joda.time.LocalDate):

(defmapcatop [my-op [local-date]] ...)


thanks,
Marc
On Tue, Mar 20, 2012 at 5:56 PM, Marc Limotte wrote:

Hi Sam.

I'm trying this solution out, but for a slightly different scenario. I'm
using Joda LocalTime. Also in John's example, the data to be serialized is
in tuple data. I have a case where it's part of the custom operation
parameters. E.g.:

(defmapcatop [map-normalize [the-local-time-instance-is-passed-here]]
...)

I wrote a simple JodaLocalTimeSerializer along the lines of your examples,
and adjusted the jon-conf as suggested. From a kryo TRACE, it looks like
it is finding it. But not using it at the right point.

As I understand it, the mapcatop parameters are serialized into the
jobconf. So I wonder if maybe the "cascading.kryo.registrations" have not
taken effect at that point?

I can provide the full kryo trace output if that helps.

Marc
On Tue, Mar 20, 2012 at 1:04 PM, Sam Ritchie wrote:

Jack, based on my test this is right. John, you're going to have to
write a custom Kryo serializer for LocalDate. I've got a bunch of examples
here:


https://github.com/Cascading/meat-locker/tree/master/src/jvm/com/twitter/meatlocker/kryo

Once you've got that done you can register the serializer with the
"cascading.kryo.registrations" JobConf option, as detailed in
Cascading.Kryo:

https://github.com/Cascading/cascading.kryo

The simplest way to do this is to create a "job-conf.clj" file under the
"resources" directory in your project. The file should contain the
following:

{"cascading.kryo.registrations"
"org.joda.time.LocalDate,your.kryo.LocalDateSerializer"}

This should get everything working again.

Cheers,
--
Sam Ritchie
Sent with Sparrow <http://www.sparrowmailapp.com/?sig>

On Monday, March 19, 2012 at 7:46 PM, Thomas Jack wrote:

Can you share more of the stacktrace?

My wild guess is that the default kryo serialization doesn't work for
LocalDate. You may need to register a different serializer or create a new
one.
On Mar 19, 2012 8:52 PM, "John Bates" wrote:

Hi, all.

A few of our engineers have been trying to debug this issue to no
avail:

Our dataset includes dates stored as JODA LocalDate. Even the
simplest queries fail, e.g.

(??<- [?a] ([[(LocalDate.)]] ?a))

The stack trace reports a NullPointerException in
org.joda.time.LocalDate.getValue() as being the culprit, but
inspection of the joda-time-1.6.2 source showed no obvious flaws.

We're running cascalog version 1.8.6 and joda-time 1.6.2.

Any insight you might be able to offer would be most appreciated.

Thanks in advance!
-John Bates

Search Discussions

  • Andrew Xue at Mar 22, 2012 at 6:03 pm
    i do this and it works
    On Mar 22, 8:09 am, Marc Limotte wrote:
    I just wanted to follow up on this.  My basic question is: should I expect
    Serializers  registered with "cascading.kryo.registrations" to work for
    objects that are passed through the job-conf, as in local-date (an instance
    of org.joda.time.LocalDate):

    (defmapcatop [my-op [local-date]]  ...)

    thanks,
    Marc






    On Tue, Mar 20, 2012 at 5:56 PM, Marc Limotte wrote:
    Hi Sam.
    I'm trying this solution out, but for a slightly different scenario.  I'm
    using Joda LocalTime.  Also in John's example, the data to be serialized is
    in tuple data.  I have a case where it's part of the custom operation
    parameters.  E.g.:
    (defmapcatop [map-normalize [the-local-time-instance-is-passed-here]]
    ...)
    I wrote a simple JodaLocalTimeSerializer along the lines of your examples,
    and adjusted the jon-conf as suggested.  From a kryo TRACE, it looks like
    it is finding it.  But not using it at the right point.
    As I understand it, the mapcatop parameters are serialized into the
    jobconf.  So I wonder if maybe the "cascading.kryo.registrations" have not
    taken effect at that point?
    I can provide the full kryo trace output if that helps.
    Marc
    On Tue, Mar 20, 2012 at 1:04 PM, Sam Ritchie wrote:

    Jack, based on my test this is right. John, you're going to have to
    write a custom Kryo serializer for LocalDate. I've got a bunch of examples
    here:
    Once you've got that done you can register the serializer with the
    "cascading.kryo.registrations" JobConf option, as detailed in
    Cascading.Kryo:
    The simplest way to do this is to create a "job-conf.clj" file under the
    "resources" directory in your project. The file should contain the
    following:
    {"cascading.kryo.registrations"
    "org.joda.time.LocalDate,your.kryo.LocalDateSerializer"}
    This should get everything working again.
    Cheers,
    --
    Sam Ritchie
    Sent with Sparrow <http://www.sparrowmailapp.com/?sig>
    On Monday, March 19, 2012 at 7:46 PM, Thomas Jack wrote:
    Can you share more of the stacktrace?
    My wild guess is that the default kryo serialization doesn't work for
    LocalDate. You may need to register a different serializer or create a new
    one.
    On Mar 19, 2012 8:52 PM, "John Bates" wrote:
    Hi, all.
    A few of our engineers have been trying to debug this issue to no
    avail:
    Our dataset includes dates stored as JODA LocalDate.  Even the
    simplest queries fail, e.g.
    (??<- [?a] ([[(LocalDate.)]] ?a))
    The stack trace reports a NullPointerException in
    org.joda.time.LocalDate.getValue() as being the culprit, but
    inspection of the joda-time-1.6.2 source showed no obvious flaws.
    We're running cascalog version 1.8.6 and joda-time 1.6.2.
    Any insight you might be able to offer would be most appreciated.
    Thanks in advance!
    -John Bates
  • Sam Ritchie at Mar 23, 2012 at 2:37 am
    Hey Marc,

    That's a great question. This is actually a bug with the current
    implementation -- it looks like op parameters aren't picking up the custom
    serializations. I've opened a bug here to track this --

    https://github.com/nathanmarz/cascalog/issues/59

    I'll try to get this out as 1.8.7 soon. If you want to take a stab at it,
    the issue has some details and I'm happy to supply more info!

    Cheers,
    Sam
    On Thu, Mar 22, 2012 at 8:09 AM, Marc Limotte wrote:

    I just wanted to follow up on this. My basic question is: should I expect
    Serializers registered with "cascading.kryo.registrations" to work for
    objects that are passed through the job-conf, as in local-date (an instance
    of org.joda.time.LocalDate):

    (defmapcatop [my-op [local-date]] ...)


    thanks,
    Marc
    On Tue, Mar 20, 2012 at 5:56 PM, Marc Limotte wrote:

    Hi Sam.

    I'm trying this solution out, but for a slightly different scenario. I'm
    using Joda LocalTime. Also in John's example, the data to be serialized is
    in tuple data. I have a case where it's part of the custom operation
    parameters. E.g.:

    (defmapcatop [map-normalize [the-local-time-instance-is-passed-here]]
    ...)

    I wrote a simple JodaLocalTimeSerializer along the lines of your
    examples, and adjusted the jon-conf as suggested. From a kryo TRACE, it
    looks like it is finding it. But not using it at the right point.

    As I understand it, the mapcatop parameters are serialized into the
    jobconf. So I wonder if maybe the "cascading.kryo.registrations" have not
    taken effect at that point?

    I can provide the full kryo trace output if that helps.

    Marc
    On Tue, Mar 20, 2012 at 1:04 PM, Sam Ritchie wrote:

    Jack, based on my test this is right. John, you're going to have to
    write a custom Kryo serializer for LocalDate. I've got a bunch of examples
    here:


    https://github.com/Cascading/meat-locker/tree/master/src/jvm/com/twitter/meatlocker/kryo

    Once you've got that done you can register the serializer with the
    "cascading.kryo.registrations" JobConf option, as detailed in
    Cascading.Kryo:

    https://github.com/Cascading/cascading.kryo

    The simplest way to do this is to create a "job-conf.clj" file under the
    "resources" directory in your project. The file should contain the
    following:

    {"cascading.kryo.registrations"
    "org.joda.time.LocalDate,your.kryo.LocalDateSerializer"}

    This should get everything working again.

    Cheers,
    --
    Sam Ritchie
    Sent with Sparrow <http://www.sparrowmailapp.com/?sig>

    On Monday, March 19, 2012 at 7:46 PM, Thomas Jack wrote:

    Can you share more of the stacktrace?

    My wild guess is that the default kryo serialization doesn't work for
    LocalDate. You may need to register a different serializer or create a new
    one.
    On Mar 19, 2012 8:52 PM, "John Bates" wrote:

    Hi, all.

    A few of our engineers have been trying to debug this issue to no
    avail:

    Our dataset includes dates stored as JODA LocalDate. Even the
    simplest queries fail, e.g.

    (??<- [?a] ([[(LocalDate.)]] ?a))

    The stack trace reports a NullPointerException in
    org.joda.time.LocalDate.getValue() as being the culprit, but
    inspection of the joda-time-1.6.2 source showed no obvious flaws.

    We're running cascalog version 1.8.6 and joda-time 1.6.2.

    Any insight you might be able to offer would be most appreciated.

    Thanks in advance!
    -John Bates


    --
    Sam Ritchie, Twitter Inc
    703.662.1337
    @sritchie09

    (Too brief? Here's why! http://emailcharter.org)
  • Andrew Xue at Mar 23, 2012 at 4:26 am
    hmm, it seems to work for me but i guess ... here is what i do

    (defn- incant* [func databag]
    (let [list (.getData databag)] (func list)))

    (defmapop [incant [func]] [databag] (incant* func databag))


    where databag is this custom wrapper i put around ArrayList (just as a
    way to pass grouping vals without cascalog trying to destructure it)

    public class DataBag {
    private ArrayList data;

    public DataBag(Collection c) { data = new ArrayList(c); }
    public Collection getData() { return data; }

    ... etc etc

    and i have a custom serializer for this too




    On Mar 22, 7:37 pm, Sam Ritchie wrote:
    Hey Marc,

    That's a great question. This is actually a bug with the current
    implementation -- it looks like op parameters aren't picking up the custom
    serializations. I've opened a bug here to track this --

    https://github.com/nathanmarz/cascalog/issues/59

    I'll try to get this out as 1.8.7 soon. If you want to take a stab at it,
    the issue has some details and I'm happy to supply more info!

    Cheers,
    Sam








    On Thu, Mar 22, 2012 at 8:09 AM, Marc Limotte wrote:
    I just wanted to follow up on this.  My basic question is: should I expect
    Serializers  registered with "cascading.kryo.registrations" to work for
    objects that are passed through the job-conf, as in local-date (an instance
    of org.joda.time.LocalDate):
    (defmapcatop [my-op [local-date]]  ...)
    thanks,
    Marc
    On Tue, Mar 20, 2012 at 5:56 PM, Marc Limotte wrote:

    Hi Sam.
    I'm trying this solution out, but for a slightly different scenario.  I'm
    using Joda LocalTime.  Also in John's example, the data to be serialized is
    in tuple data.  I have a case where it's part of the custom operation
    parameters.  E.g.:
    (defmapcatop [map-normalize [the-local-time-instance-is-passed-here]]
    ...)
    I wrote a simple JodaLocalTimeSerializer along the lines of your
    examples, and adjusted the jon-conf as suggested.  From a kryo TRACE, it
    looks like it is finding it.  But not using it at the right point.
    As I understand it, the mapcatop parameters are serialized into the
    jobconf.  So I wonder if maybe the "cascading.kryo.registrations" have not
    taken effect at that point?
    I can provide the full kryo trace output if that helps.
    Marc
    On Tue, Mar 20, 2012 at 1:04 PM, Sam Ritchie wrote:

    Jack, based on my test this is right. John, you're going to have to
    write a custom Kryo serializer for LocalDate. I've got a bunch of examples
    here:
    Once you've got that done you can register the serializer with the
    "cascading.kryo.registrations" JobConf option, as detailed in
    Cascading.Kryo:
    The simplest way to do this is to create a "job-conf.clj" file under the
    "resources" directory in your project. The file should contain the
    following:
    {"cascading.kryo.registrations"
    "org.joda.time.LocalDate,your.kryo.LocalDateSerializer"}
    This should get everything working again.
    Cheers,
    --
    Sam Ritchie
    Sent with Sparrow <http://www.sparrowmailapp.com/?sig>
    On Monday, March 19, 2012 at 7:46 PM, Thomas Jack wrote:
    Can you share more of the stacktrace?
    My wild guess is that the default kryo serialization doesn't work for
    LocalDate. You may need to register a different serializer or create a new
    one.
    On Mar 19, 2012 8:52 PM, "John Bates" wrote:
    Hi, all.
    A few of our engineers have been trying to debug this issue to no
    avail:
    Our dataset includes dates stored as JODA LocalDate.  Even the
    simplest queries fail, e.g.
    (??<- [?a] ([[(LocalDate.)]] ?a))
    The stack trace reports a NullPointerException in
    org.joda.time.LocalDate.getValue() as being the culprit, but
    inspection of the joda-time-1.6.2 source showed no obvious flaws.
    We're running cascalog version 1.8.6 and joda-time 1.6.2.
    Any insight you might be able to offer would be most appreciated.
    Thanks in advance!
    -John Bates
    --
    Sam Ritchie, Twitter Inc
    703.662.1337
    @sritchie09

    (Too brief? Here's why!http://emailcharter.org)
  • Andrew Xue at Mar 23, 2012 at 4:26 am
    BUT, i do see this issue

    java.lang.UnsupportedOperationException
    at clojure.lang.ASeq.add(ASeq.java:136)
    at
    com.esotericsoftware.kryo.serialize.CollectionSerializer.readObjectData(CollectionSerializer.java:
    113)
    at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:566)


    CollectionSerializer is having trouble serializing clojure ASeq?
    On Mar 22, 7:37 pm, Sam Ritchie wrote:
    Hey Marc,

    That's a great question. This is actually a bug with the current
    implementation -- it looks like op parameters aren't picking up the custom
    serializations. I've opened a bug here to track this --

    https://github.com/nathanmarz/cascalog/issues/59

    I'll try to get this out as 1.8.7 soon. If you want to take a stab at it,
    the issue has some details and I'm happy to supply more info!

    Cheers,
    Sam








    On Thu, Mar 22, 2012 at 8:09 AM, Marc Limotte wrote:
    I just wanted to follow up on this.  My basic question is: should I expect
    Serializers  registered with "cascading.kryo.registrations" to work for
    objects that are passed through the job-conf, as in local-date (an instance
    of org.joda.time.LocalDate):
    (defmapcatop [my-op [local-date]]  ...)
    thanks,
    Marc
    On Tue, Mar 20, 2012 at 5:56 PM, Marc Limotte wrote:

    Hi Sam.
    I'm trying this solution out, but for a slightly different scenario.  I'm
    using Joda LocalTime.  Also in John's example, the data to be serialized is
    in tuple data.  I have a case where it's part of the custom operation
    parameters.  E.g.:
    (defmapcatop [map-normalize [the-local-time-instance-is-passed-here]]
    ...)
    I wrote a simple JodaLocalTimeSerializer along the lines of your
    examples, and adjusted the jon-conf as suggested.  From a kryo TRACE, it
    looks like it is finding it.  But not using it at the right point.
    As I understand it, the mapcatop parameters are serialized into the
    jobconf.  So I wonder if maybe the "cascading.kryo.registrations" have not
    taken effect at that point?
    I can provide the full kryo trace output if that helps.
    Marc
    On Tue, Mar 20, 2012 at 1:04 PM, Sam Ritchie wrote:

    Jack, based on my test this is right. John, you're going to have to
    write a custom Kryo serializer for LocalDate. I've got a bunch of examples
    here:
    Once you've got that done you can register the serializer with the
    "cascading.kryo.registrations" JobConf option, as detailed in
    Cascading.Kryo:
    The simplest way to do this is to create a "job-conf.clj" file under the
    "resources" directory in your project. The file should contain the
    following:
    {"cascading.kryo.registrations"
    "org.joda.time.LocalDate,your.kryo.LocalDateSerializer"}
    This should get everything working again.
    Cheers,
    --
    Sam Ritchie
    Sent with Sparrow <http://www.sparrowmailapp.com/?sig>
    On Monday, March 19, 2012 at 7:46 PM, Thomas Jack wrote:
    Can you share more of the stacktrace?
    My wild guess is that the default kryo serialization doesn't work for
    LocalDate. You may need to register a different serializer or create a new
    one.
    On Mar 19, 2012 8:52 PM, "John Bates" wrote:
    Hi, all.
    A few of our engineers have been trying to debug this issue to no
    avail:
    Our dataset includes dates stored as JODA LocalDate.  Even the
    simplest queries fail, e.g.
    (??<- [?a] ([[(LocalDate.)]] ?a))
    The stack trace reports a NullPointerException in
    org.joda.time.LocalDate.getValue() as being the culprit, but
    inspection of the joda-time-1.6.2 source showed no obvious flaws.
    We're running cascalog version 1.8.6 and joda-time 1.6.2.
    Any insight you might be able to offer would be most appreciated.
    Thanks in advance!
    -John Bates
    --
    Sam Ritchie, Twitter Inc
    703.662.1337
    @sritchie09

    (Too brief? Here's why!http://emailcharter.org)
  • Marc Limotte at Mar 23, 2012 at 1:51 pm
    Hi Sam.

    I can do that. Where would 1.8.7-SNAPSHOT be... should I work off the
    'develop' branch?

    I glanced through the code just now. I think the salient part is
    in TupleMemoryInputFormat, where you look up the right Serializer from the
    factory:

    Serializer<Tuple> serializer = factory.getSerializer(Tuple.class);

    So, in KryoService, I would want to do that in the serialize/deserialize
    methods? If so, what I don't get is why this would be any different for op
    parameters vs. tuple data?

    Marc
    On Thu, Mar 22, 2012 at 10:37 PM, Sam Ritchie wrote:

    Hey Marc,

    That's a great question. This is actually a bug with the current
    implementation -- it looks like op parameters aren't picking up the custom
    serializations. I've opened a bug here to track this --

    https://github.com/nathanmarz/cascalog/issues/59

    I'll try to get this out as 1.8.7 soon. If you want to take a stab at it,
    the issue has some details and I'm happy to supply more info!

    Cheers,
    Sam
    On Thu, Mar 22, 2012 at 8:09 AM, Marc Limotte wrote:

    I just wanted to follow up on this. My basic question is: should I
    expect Serializers registered with "cascading.kryo.registrations" to work
    for objects that are passed through the job-conf, as in local-date (an
    instance of org.joda.time.LocalDate):

    (defmapcatop [my-op [local-date]] ...)


    thanks,
    Marc
    On Tue, Mar 20, 2012 at 5:56 PM, Marc Limotte wrote:

    Hi Sam.

    I'm trying this solution out, but for a slightly different scenario.
    I'm using Joda LocalTime. Also in John's example, the data to be
    serialized is in tuple data. I have a case where it's part of the custom
    operation parameters. E.g.:

    (defmapcatop [map-normalize [the-local-time-instance-is-passed-here]]
    ...)

    I wrote a simple JodaLocalTimeSerializer along the lines of your
    examples, and adjusted the jon-conf as suggested. From a kryo TRACE, it
    looks like it is finding it. But not using it at the right point.

    As I understand it, the mapcatop parameters are serialized into the
    jobconf. So I wonder if maybe the "cascading.kryo.registrations" have not
    taken effect at that point?

    I can provide the full kryo trace output if that helps.

    Marc
    On Tue, Mar 20, 2012 at 1:04 PM, Sam Ritchie wrote:

    Jack, based on my test this is right. John, you're going to have to
    write a custom Kryo serializer for LocalDate. I've got a bunch of examples
    here:


    https://github.com/Cascading/meat-locker/tree/master/src/jvm/com/twitter/meatlocker/kryo

    Once you've got that done you can register the serializer with the
    "cascading.kryo.registrations" JobConf option, as detailed in
    Cascading.Kryo:

    https://github.com/Cascading/cascading.kryo

    The simplest way to do this is to create a "job-conf.clj" file under
    the "resources" directory in your project. The file should contain the
    following:

    {"cascading.kryo.registrations"
    "org.joda.time.LocalDate,your.kryo.LocalDateSerializer"}

    This should get everything working again.

    Cheers,
    --
    Sam Ritchie
    Sent with Sparrow <http://www.sparrowmailapp.com/?sig>

    On Monday, March 19, 2012 at 7:46 PM, Thomas Jack wrote:

    Can you share more of the stacktrace?

    My wild guess is that the default kryo serialization doesn't work for
    LocalDate. You may need to register a different serializer or create a new
    one.
    On Mar 19, 2012 8:52 PM, "John Bates" wrote:

    Hi, all.

    A few of our engineers have been trying to debug this issue to no
    avail:

    Our dataset includes dates stored as JODA LocalDate. Even the
    simplest queries fail, e.g.

    (??<- [?a] ([[(LocalDate.)]] ?a))

    The stack trace reports a NullPointerException in
    org.joda.time.LocalDate.getValue() as being the culprit, but
    inspection of the joda-time-1.6.2 source showed no obvious flaws.

    We're running cascalog version 1.8.6 and joda-time 1.6.2.

    Any insight you might be able to offer would be most appreciated.

    Thanks in advance!
    -John Bates


    --
    Sam Ritchie, Twitter Inc
    703.662.1337
    @sritchie09

    (Too brief? Here's why! http://emailcharter.org)
  • Andrew Xue at Mar 23, 2012 at 8:00 pm
    I think I figured out what happened in my ASeq issue

    I had code that looks similar to this:

    defn test-map []
    (let [src [ [11 1][11 2][11 3][11 4]
    [12 5][12 6][12 7][12 8][12 9] ]
    map-src [ [11 "a"] [12 "a"] ]
    map-tuples (??<- [?id ?alpha] (map-src ?id ?alpha))
    my-map (into {} (map #(vector (first %) (rest %)) map-tuples))
    my-map-val (first (vals my-map))]
    (prn (class my-map-val) my-map-val)
    (?<- (stdout)
    [?id ?alpha]
    (src ?id _)
    (get my-map ?id [nil] :> ?alpha))))

    The interesting portion being that the vals of the my-map are Seq's
    (specifically a clojure.lang.PersistentVector$ChunkedSeq) ... and I
    think this is causing the serialization issue.

    Can be fixed by forcing into a vector:

    my-map (into {} (map #(vector (first %) (rest %) map-tuples))

    vs.

    my-map (into {} (map #(vector (first %) (vec (rest %)) map-tuples))


    Trying to confirm this by rolling back to 1.8.5 to test but having
    some trouble doing that ... getting a unable to load
    cascalog.op.KyroInsert issue doing that
    On Mar 23, 6:51 am, Marc Limotte wrote:
    Hi Sam.

    I can do that.  Where would 1.8.7-SNAPSHOT be... should I work off the
    'develop' branch?

    I glanced through the code just now.  I think the salient part is
    in TupleMemoryInputFormat, where you look up the right Serializer from the
    factory:

    Serializer<Tuple> serializer = factory.getSerializer(Tuple.class);

    So, in KryoService, I would want to do that in the serialize/deserialize
    methods? If so, what I don't get is why this would be any different for op
    parameters vs. tuple data?

    Marc






    On Thu, Mar 22, 2012 at 10:37 PM, Sam Ritchie wrote:
    Hey Marc,
    That's a great question. This is actually a bug with the current
    implementation -- it looks like op parameters aren't picking up the custom
    serializations. I've opened a bug here to track this --
    https://github.com/nathanmarz/cascalog/issues/59
    I'll try to get this out as 1.8.7 soon. If you want to take a stab at it,
    the issue has some details and I'm happy to supply more info!
    Cheers,
    Sam
    On Thu, Mar 22, 2012 at 8:09 AM, Marc Limotte wrote:

    I just wanted to follow up on this.  My basic question is: should I
    expect Serializers  registered with "cascading.kryo.registrations" to work
    for objects that are passed through the job-conf, as in local-date (an
    instance of org.joda.time.LocalDate):
    (defmapcatop [my-op [local-date]]  ...)
    thanks,
    Marc
    On Tue, Mar 20, 2012 at 5:56 PM, Marc Limotte wrote:

    Hi Sam.
    I'm trying this solution out, but for a slightly different scenario.
    I'm using Joda LocalTime.  Also in John's example, the data to be
    serialized is in tuple data.  I have a case where it's part of the custom
    operation parameters.  E.g.:
    (defmapcatop [map-normalize [the-local-time-instance-is-passed-here]]
    ...)
    I wrote a simple JodaLocalTimeSerializer along the lines of your
    examples, and adjusted the jon-conf as suggested.  From a kryo TRACE, it
    looks like it is finding it.  But not using it at the right point.
    As I understand it, the mapcatop parameters are serialized into the
    jobconf.  So I wonder if maybe the "cascading.kryo.registrations" have not
    taken effect at that point?
    I can provide the full kryo trace output if that helps.
    Marc
    On Tue, Mar 20, 2012 at 1:04 PM, Sam Ritchie wrote:

    Jack, based on my test this is right. John, you're going to have to
    write a custom Kryo serializer for LocalDate. I've got a bunch of examples
    here:
    Once you've got that done you can register the serializer with the
    "cascading.kryo.registrations" JobConf option, as detailed in
    Cascading.Kryo:
    The simplest way to do this is to create a "job-conf.clj" file under
    the "resources" directory in your project. The file should contain the
    following:
    {"cascading.kryo.registrations"
    "org.joda.time.LocalDate,your.kryo.LocalDateSerializer"}
    This should get everything working again.
    Cheers,
    --
    Sam Ritchie
    Sent with Sparrow <http://www.sparrowmailapp.com/?sig>
    On Monday, March 19, 2012 at 7:46 PM, Thomas Jack wrote:
    Can you share more of the stacktrace?
    My wild guess is that the default kryo serialization doesn't work for
    LocalDate. You may need to register a different serializer or create a new
    one.
    On Mar 19, 2012 8:52 PM, "John Bates" wrote:
    Hi, all.
    A few of our engineers have been trying to debug this issue to no
    avail:
    Our dataset includes dates stored as JODA LocalDate.  Even the
    simplest queries fail, e.g.
    (??<- [?a] ([[(LocalDate.)]] ?a))
    The stack trace reports a NullPointerException in
    org.joda.time.LocalDate.getValue() as being the culprit, but
    inspection of the joda-time-1.6.2 source showed no obvious flaws.
    We're running cascalog version 1.8.6 and joda-time 1.6.2.
    Any insight you might be able to offer would be most appreciated.
    Thanks in advance!
    -John Bates
    --
    Sam Ritchie, Twitter Inc
    703.662.1337
    @sritchie09
    (Too brief? Here's why!http://emailcharter.org)
  • Marc Limotte at Mar 26, 2012 at 3:37 pm
    Hi Sam.

    I found the answer to my last question: KyroService is called explicitly to
    serialize the fn_spec, the tuple data serialization is handled implicitly
    by the Cascading and Hadoop codebase.

    Anyway, working off the develop branch, I've posted a pull request
    https://github.com/nathanmarz/cascalog/pull/63

    This seems to work... unit tests pass. There's a bit of ugliness around
    handling the JobConf (see comments in the request). Let me know if you
    have ideas to improve it.

    Marc
    On Fri, Mar 23, 2012 at 9:51 AM, Marc Limotte wrote:

    Hi Sam.

    I can do that. Where would 1.8.7-SNAPSHOT be... should I work off the
    'develop' branch?

    I glanced through the code just now. I think the salient part is
    in TupleMemoryInputFormat, where you look up the right Serializer from the
    factory:

    Serializer<Tuple> serializer = factory.getSerializer(Tuple.class);

    So, in KryoService, I would want to do that in the serialize/deserialize
    methods? If so, what I don't get is why this would be any different for op
    parameters vs. tuple data?

    Marc

    On Thu, Mar 22, 2012 at 10:37 PM, Sam Ritchie wrote:

    Hey Marc,

    That's a great question. This is actually a bug with the current
    implementation -- it looks like op parameters aren't picking up the custom
    serializations. I've opened a bug here to track this --

    https://github.com/nathanmarz/cascalog/issues/59

    I'll try to get this out as 1.8.7 soon. If you want to take a stab at it,
    the issue has some details and I'm happy to supply more info!

    Cheers,
    Sam
    On Thu, Mar 22, 2012 at 8:09 AM, Marc Limotte wrote:

    I just wanted to follow up on this. My basic question is: should I
    expect Serializers registered with "cascading.kryo.registrations" to work
    for objects that are passed through the job-conf, as in local-date (an
    instance of org.joda.time.LocalDate):

    (defmapcatop [my-op [local-date]] ...)


    thanks,
    Marc
    On Tue, Mar 20, 2012 at 5:56 PM, Marc Limotte wrote:

    Hi Sam.

    I'm trying this solution out, but for a slightly different scenario.
    I'm using Joda LocalTime. Also in John's example, the data to be
    serialized is in tuple data. I have a case where it's part of the custom
    operation parameters. E.g.:

    (defmapcatop [map-normalize [the-local-time-instance-is-passed-here]]
    ...)

    I wrote a simple JodaLocalTimeSerializer along the lines of your
    examples, and adjusted the jon-conf as suggested. From a kryo TRACE, it
    looks like it is finding it. But not using it at the right point.

    As I understand it, the mapcatop parameters are serialized into the
    jobconf. So I wonder if maybe the "cascading.kryo.registrations" have not
    taken effect at that point?

    I can provide the full kryo trace output if that helps.

    Marc
    On Tue, Mar 20, 2012 at 1:04 PM, Sam Ritchie wrote:

    Jack, based on my test this is right. John, you're going to have to
    write a custom Kryo serializer for LocalDate. I've got a bunch of examples
    here:


    https://github.com/Cascading/meat-locker/tree/master/src/jvm/com/twitter/meatlocker/kryo

    Once you've got that done you can register the serializer with the
    "cascading.kryo.registrations" JobConf option, as detailed in
    Cascading.Kryo:

    https://github.com/Cascading/cascading.kryo

    The simplest way to do this is to create a "job-conf.clj" file under
    the "resources" directory in your project. The file should contain the
    following:

    {"cascading.kryo.registrations"
    "org.joda.time.LocalDate,your.kryo.LocalDateSerializer"}

    This should get everything working again.

    Cheers,
    --
    Sam Ritchie
    Sent with Sparrow <http://www.sparrowmailapp.com/?sig>

    On Monday, March 19, 2012 at 7:46 PM, Thomas Jack wrote:

    Can you share more of the stacktrace?

    My wild guess is that the default kryo serialization doesn't work for
    LocalDate. You may need to register a different serializer or create a new
    one.
    On Mar 19, 2012 8:52 PM, "John Bates" wrote:

    Hi, all.

    A few of our engineers have been trying to debug this issue to no
    avail:

    Our dataset includes dates stored as JODA LocalDate. Even the
    simplest queries fail, e.g.

    (??<- [?a] ([[(LocalDate.)]] ?a))

    The stack trace reports a NullPointerException in
    org.joda.time.LocalDate.getValue() as being the culprit, but
    inspection of the joda-time-1.6.2 source showed no obvious flaws.

    We're running cascalog version 1.8.6 and joda-time 1.6.2.

    Any insight you might be able to offer would be most appreciated.

    Thanks in advance!
    -John Bates


    --
    Sam Ritchie, Twitter Inc
    703.662.1337
    @sritchie09

    (Too brief? Here's why! http://emailcharter.org)
  • Sam Ritchie at Apr 2, 2012 at 4:48 pm
    Hey Guys,

    I'm working with Marc to get this commit in in the next day or two. We
    should be able to cut a release shortly with some of these issues resolved.
    Thanks!

    Sam
    On Mon, Mar 26, 2012 at 11:36 AM, Marc Limotte wrote:

    Hi Sam.

    I found the answer to my last question: KyroService is called explicitly
    to serialize the fn_spec, the tuple data serialization is handled
    implicitly by the Cascading and Hadoop codebase.

    Anyway, working off the develop branch, I've posted a pull request
    https://github.com/nathanmarz/cascalog/pull/63

    This seems to work... unit tests pass. There's a bit of ugliness around
    handling the JobConf (see comments in the request). Let me know if you
    have ideas to improve it.

    Marc
    On Fri, Mar 23, 2012 at 9:51 AM, Marc Limotte wrote:

    Hi Sam.

    I can do that. Where would 1.8.7-SNAPSHOT be... should I work off the
    'develop' branch?

    I glanced through the code just now. I think the salient part is
    in TupleMemoryInputFormat, where you look up the right Serializer from the
    factory:

    Serializer<Tuple> serializer = factory.getSerializer(Tuple.class);

    So, in KryoService, I would want to do that in the serialize/deserialize
    methods? If so, what I don't get is why this would be any different for op
    parameters vs. tuple data?

    Marc

    On Thu, Mar 22, 2012 at 10:37 PM, Sam Ritchie wrote:

    Hey Marc,

    That's a great question. This is actually a bug with the current
    implementation -- it looks like op parameters aren't picking up the custom
    serializations. I've opened a bug here to track this --

    https://github.com/nathanmarz/cascalog/issues/59

    I'll try to get this out as 1.8.7 soon. If you want to take a stab at
    it, the issue has some details and I'm happy to supply more info!

    Cheers,
    Sam
    On Thu, Mar 22, 2012 at 8:09 AM, Marc Limotte wrote:

    I just wanted to follow up on this. My basic question is: should I
    expect Serializers registered with "cascading.kryo.registrations" to work
    for objects that are passed through the job-conf, as in local-date (an
    instance of org.joda.time.LocalDate):

    (defmapcatop [my-op [local-date]] ...)


    thanks,
    Marc
    On Tue, Mar 20, 2012 at 5:56 PM, Marc Limotte wrote:

    Hi Sam.

    I'm trying this solution out, but for a slightly different scenario.
    I'm using Joda LocalTime. Also in John's example, the data to be
    serialized is in tuple data. I have a case where it's part of the custom
    operation parameters. E.g.:

    (defmapcatop [map-normalize [the-local-time-instance-is-passed-here]]
    ...)

    I wrote a simple JodaLocalTimeSerializer along the lines of your
    examples, and adjusted the jon-conf as suggested. From a kryo TRACE, it
    looks like it is finding it. But not using it at the right point.

    As I understand it, the mapcatop parameters are serialized into the
    jobconf. So I wonder if maybe the "cascading.kryo.registrations" have not
    taken effect at that point?

    I can provide the full kryo trace output if that helps.

    Marc
    On Tue, Mar 20, 2012 at 1:04 PM, Sam Ritchie wrote:

    Jack, based on my test this is right. John, you're going to have to
    write a custom Kryo serializer for LocalDate. I've got a bunch of examples
    here:


    https://github.com/Cascading/meat-locker/tree/master/src/jvm/com/twitter/meatlocker/kryo

    Once you've got that done you can register the serializer with the
    "cascading.kryo.registrations" JobConf option, as detailed in
    Cascading.Kryo:

    https://github.com/Cascading/cascading.kryo

    The simplest way to do this is to create a "job-conf.clj" file under
    the "resources" directory in your project. The file should contain the
    following:

    {"cascading.kryo.registrations"
    "org.joda.time.LocalDate,your.kryo.LocalDateSerializer"}

    This should get everything working again.

    Cheers,
    --
    Sam Ritchie
    Sent with Sparrow <http://www.sparrowmailapp.com/?sig>

    On Monday, March 19, 2012 at 7:46 PM, Thomas Jack wrote:

    Can you share more of the stacktrace?

    My wild guess is that the default kryo serialization doesn't work for
    LocalDate. You may need to register a different serializer or create a new
    one.
    On Mar 19, 2012 8:52 PM, "John Bates" wrote:

    Hi, all.

    A few of our engineers have been trying to debug this issue to no
    avail:

    Our dataset includes dates stored as JODA LocalDate. Even the
    simplest queries fail, e.g.

    (??<- [?a] ([[(LocalDate.)]] ?a))

    The stack trace reports a NullPointerException in
    org.joda.time.LocalDate.getValue() as being the culprit, but
    inspection of the joda-time-1.6.2 source showed no obvious flaws.

    We're running cascalog version 1.8.6 and joda-time 1.6.2.

    Any insight you might be able to offer would be most appreciated.

    Thanks in advance!
    -John Bates


    --
    Sam Ritchie, Twitter Inc
    703.662.1337
    @sritchie09

    (Too brief? Here's why! http://emailcharter.org)

    --
    Sam Ritchie, Twitter Inc
    703.662.1337
    @sritchie09

    (Too brief? Here's why! http://emailcharter.org)

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcascalog-user @
categoriesclojure, hadoop
postedMar 22, '12 at 3:09p
activeApr 2, '12 at 4:48p
posts9
users3
websiteclojure.org
irc#clojure

People

Translate

site design / logo © 2021 Grokbase