FAQ
I'm trying to use a PailTap from dfs-datastores-cascading in a Cascalog
query and getting a ClassCastException. What am I doing wrong?

I'm creating the tap with a simple (PailTap. path)

sourcing from: SplittableMemorySourceTap["SplittableMemorySourceScheme[[UNKNOWN]->[ALL]]"]["/948307a7-82e5-474a-b7db-5d7084943d13"]"]
sinking to: PailTap["PailScheme[['pail_root', 'bytes']->['?practiceid', '?invoiceid', '?clientid', '?patientid', '?txn_date', '?quantity', '?price', '?itemtype', '?species', '?lineitem', '?sublineitem', '?transactionnumber']]"]["/VisitData/incoming/"]"]
trapping to: Hfs["TextLine[['line']->[ALL]]"]["/PracticeVisitData/errors"]"]
Creating processor/data/VisitData/7698
caught Throwable, no trap available, rethrowing
cascading.tuple.TupleException: unable to sink into output identifier: /VisitData/incoming/
at cascading.tuple.TupleEntrySchemeCollector.collect(TupleEntrySchemeCollector.java:160)
at cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:71)
at cascading.tuple.TupleEntrySchemeCollector.add(TupleEntrySchemeCollector.java:134)
at cascading.flow.stream.SinkStage.receive(SinkStage.java:90)
at cascading.flow.stream.SinkStage.receive(SinkStage.java:37)
at cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
at cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
at cascading.operation.Identity.operate(Identity.java:110)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:60)
at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:33)
at cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
at cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
at cascalog.ClojureMapcat.operate(Unknown Source)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
at cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
at cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
at cascading.operation.Identity.operate(Identity.java:110)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:60)
at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:33)
at cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
at cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
at cascading.operation.Identity.operate(Identity.java:110)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
at cascading.flow.stream.SourceStage.map(SourceStage.java:102)
at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:124)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1109)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to [B
at backtype.hadoop.pail.BinaryPailStructure.serialize(BinaryPailStructure.java:3)
at backtype.cascading.tap.PailTap$PailScheme.serialize(PailTap.java:89)
at backtype.cascading.tap.PailTap$PailScheme.sink(PailTap.java:164)
at cascading.tuple.TupleEntrySchemeCollector.collect(TupleEntrySchemeCollector.java:153)
... 43 more

Search Discussions

  • Soren Macbeth at Oct 3, 2012 at 8:11 pm
    Looks like you're trying to pass a String into a BinaryPailStructure
    that only accepts a byte arrary. you probably want to do something
    like (.getBytes "mystring") before writing it into the pail.
    On Wed, Oct 3, 2012 at 12:46 PM, David Kincaid wrote:
    I'm trying to use a PailTap from dfs-datastores-cascading in a Cascalog
    query and getting a ClassCastException. What am I doing wrong?

    I'm creating the tap with a simple (PailTap. path)

    sourcing from:
    SplittableMemorySourceTap["SplittableMemorySourceScheme[[UNKNOWN]->[ALL]]"]["/948307a7-82e5-474a-b7db-5d7084943d13"]"]
    sinking to: PailTap["PailScheme[['pail_root', 'bytes']->['?practiceid',
    '?invoiceid', '?clientid', '?patientid', '?txn_date', '?quantity', '?price',
    '?itemtype', '?species', '?lineitem', '?sublineitem',
    '?transactionnumber']]"]["/VisitData/incoming/"]"]
    trapping to: Hfs["TextLine[['line']->[ALL]]"]["/PracticeVisitData/errors"]"]
    Creating processor/data/VisitData/7698
    caught Throwable, no trap available, rethrowing
    cascading.tuple.TupleException: unable to sink into output identifier:
    /VisitData/incoming/
    at
    cascading.tuple.TupleEntrySchemeCollector.collect(TupleEntrySchemeCollector.java:160)
    at
    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:71)
    at
    cascading.tuple.TupleEntrySchemeCollector.add(TupleEntrySchemeCollector.java:134)
    at cascading.flow.stream.SinkStage.receive(SinkStage.java:90)
    at cascading.flow.stream.SinkStage.receive(SinkStage.java:37)
    at
    cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
    at
    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
    at cascading.operation.Identity.operate(Identity.java:110)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
    at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:60)
    at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:33)
    at
    cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
    at
    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
    at cascalog.ClojureMapcat.operate(Unknown Source)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
    at
    cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
    at
    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
    at cascading.operation.Identity.operate(Identity.java:110)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
    at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:60)
    at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:33)
    at
    cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
    at
    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
    at cascading.operation.Identity.operate(Identity.java:110)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
    at cascading.flow.stream.SourceStage.map(SourceStage.java:102)
    at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
    at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:124)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1109)
    at org.apache.hadoop.mapred.Child.main(Child.java:264)
    Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to
    [B
    at
    backtype.hadoop.pail.BinaryPailStructure.serialize(BinaryPailStructure.java:3)
    at backtype.cascading.tap.PailTap$PailScheme.serialize(PailTap.java:89)
    at backtype.cascading.tap.PailTap$PailScheme.sink(PailTap.java:164)
    at
    cascading.tuple.TupleEntrySchemeCollector.collect(TupleEntrySchemeCollector.java:153)
    ... 43 more


    --
    http://about.me/soren
  • David Kincaid at Oct 3, 2012 at 8:26 pm
    I'm just using it as a sink in a Cascalog query so I'm not writing it
    directly to the pail. Something like this:

    (let [sink-out (PailTap. "/tmp/pail")]
    (?<- sink-out [?field1 ?field2]
    (source-gen ?field1 ?field2)
    (:trap (hfs-trap error-path)))

    On Wednesday, October 3, 2012 3:11:03 PM UTC-5, Soren Macbeth wrote:

    Looks like you're trying to pass a String into a BinaryPailStructure
    that only accepts a byte arrary. you probably want to do something
    like (.getBytes "mystring") before writing it into the pail.
    On Wed, Oct 3, 2012 at 12:46 PM, David Kincaid wrote:
    I'm trying to use a PailTap from dfs-datastores-cascading in a Cascalog
    query and getting a ClassCastException. What am I doing wrong?

    I'm creating the tap with a simple (PailTap. path)

    sourcing from:
    SplittableMemorySourceTap["SplittableMemorySourceScheme[[UNKNOWN]->[ALL]]"]["/948307a7-82e5-474a-b7db-5d7084943d13"]"]
    sinking to: PailTap["PailScheme[['pail_root', 'bytes']->['?practiceid',
    '?invoiceid', '?clientid', '?patientid', '?txn_date', '?quantity', '?price',
    '?itemtype', '?species', '?lineitem', '?sublineitem',
    '?transactionnumber']]"]["/VisitData/incoming/"]"]
    trapping to:
    Hfs["TextLine[['line']->[ALL]]"]["/PracticeVisitData/errors"]"]
    Creating processor/data/VisitData/7698
    caught Throwable, no trap available, rethrowing
    cascading.tuple.TupleException: unable to sink into output identifier:
    /VisitData/incoming/
    at
    cascading.tuple.TupleEntrySchemeCollector.collect(TupleEntrySchemeCollector.java:160)
    at
    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at
    cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:71)
    at
    cascading.tuple.TupleEntrySchemeCollector.add(TupleEntrySchemeCollector.java:134)
    at cascading.flow.stream.SinkStage.receive(SinkStage.java:90)
    at cascading.flow.stream.SinkStage.receive(SinkStage.java:37)
    at
    cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
    at
    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at
    cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
    at cascading.operation.Identity.operate(Identity.java:110)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
    at
    cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:60)
    at
    cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:33)
    at
    cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
    at
    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at
    cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
    at cascalog.ClojureMapcat.operate(Unknown Source)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
    at
    cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
    at
    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at
    cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
    at cascading.operation.Identity.operate(Identity.java:110)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
    at
    cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:60)
    at
    cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:33)
    at
    cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
    at
    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at
    cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
    at cascading.operation.Identity.operate(Identity.java:110)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
    at cascading.flow.stream.SourceStage.map(SourceStage.java:102)
    at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
    at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:124)
    at
    org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1109)
    at org.apache.hadoop.mapred.Child.main(Child.java:264)
    Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to
    [B
    at
    backtype.hadoop.pail.BinaryPailStructure.serialize(BinaryPailStructure.java:3)
    at
    backtype.cascading.tap.PailTap$PailScheme.serialize(PailTap.java:89)
    at
    backtype.cascading.tap.PailTap$PailScheme.sink(PailTap.java:164)
    at
    cascading.tuple.TupleEntrySchemeCollector.collect(TupleEntrySchemeCollector.java:153)
    ... 43 more


    --
    http://about.me/soren
  • Soren Macbeth at Oct 3, 2012 at 8:30 pm
    Are ?field1 and ?field2 strings or byte arrays?
    On Wed, Oct 3, 2012 at 1:26 PM, David Kincaid wrote:
    I'm just using it as a sink in a Cascalog query so I'm not writing it
    directly to the pail. Something like this:

    (let [sink-out (PailTap. "/tmp/pail")]
    (?<- sink-out [?field1 ?field2]
    (source-gen ?field1 ?field2)
    (:trap (hfs-trap error-path)))

    On Wednesday, October 3, 2012 3:11:03 PM UTC-5, Soren Macbeth wrote:

    Looks like you're trying to pass a String into a BinaryPailStructure
    that only accepts a byte arrary. you probably want to do something
    like (.getBytes "mystring") before writing it into the pail.

    On Wed, Oct 3, 2012 at 12:46 PM, David Kincaid <kincai...@gmail.com>
    wrote:
    I'm trying to use a PailTap from dfs-datastores-cascading in a Cascalog
    query and getting a ClassCastException. What am I doing wrong?

    I'm creating the tap with a simple (PailTap. path)

    sourcing from:

    SplittableMemorySourceTap["SplittableMemorySourceScheme[[UNKNOWN]->[ALL]]"]["/948307a7-82e5-474a-b7db-5d7084943d13"]"]
    sinking to: PailTap["PailScheme[['pail_root', 'bytes']->['?practiceid',
    '?invoiceid', '?clientid', '?patientid', '?txn_date', '?quantity',
    '?price',
    '?itemtype', '?species', '?lineitem', '?sublineitem',
    '?transactionnumber']]"]["/VisitData/incoming/"]"]
    trapping to:
    Hfs["TextLine[['line']->[ALL]]"]["/PracticeVisitData/errors"]"]
    Creating processor/data/VisitData/7698
    caught Throwable, no trap available, rethrowing
    cascading.tuple.TupleException: unable to sink into output identifier:
    /VisitData/incoming/
    at

    cascading.tuple.TupleEntrySchemeCollector.collect(TupleEntrySchemeCollector.java:160)
    at

    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at
    cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:71)
    at

    cascading.tuple.TupleEntrySchemeCollector.add(TupleEntrySchemeCollector.java:134)
    at cascading.flow.stream.SinkStage.receive(SinkStage.java:90)
    at cascading.flow.stream.SinkStage.receive(SinkStage.java:37)
    at

    cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
    at

    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at
    cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
    at cascading.operation.Identity.operate(Identity.java:110)
    at

    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
    at

    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
    at
    cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:60)
    at
    cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:33)
    at

    cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
    at

    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at
    cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
    at cascalog.ClojureMapcat.operate(Unknown Source)
    at

    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
    at

    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
    at

    cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
    at

    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at
    cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
    at cascading.operation.Identity.operate(Identity.java:110)
    at

    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
    at

    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
    at
    cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:60)
    at
    cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:33)
    at

    cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
    at

    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at
    cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
    at cascading.operation.Identity.operate(Identity.java:110)
    at

    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
    at

    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
    at cascading.flow.stream.SourceStage.map(SourceStage.java:102)
    at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
    at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:124)
    at
    org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at

    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1109)
    at org.apache.hadoop.mapred.Child.main(Child.java:264)
    Caused by: java.lang.ClassCastException: java.lang.String cannot be cast
    to
    [B
    at

    backtype.hadoop.pail.BinaryPailStructure.serialize(BinaryPailStructure.java:3)
    at
    backtype.cascading.tap.PailTap$PailScheme.serialize(PailTap.java:89)
    at
    backtype.cascading.tap.PailTap$PailScheme.sink(PailTap.java:164)
    at

    cascading.tuple.TupleEntrySchemeCollector.collect(TupleEntrySchemeCollector.java:153)
    ... 43 more


    --
    http://about.me/soren


    --
    http://about.me/soren
  • David Kincaid at Oct 3, 2012 at 8:35 pm
    They are strings in most cases. Sometimes doubles or integers. But when I
    use any other taps I don't have to do any type conversion. Does PailTap
    require that I convert all of the fields to byte arrays in my query?
    On Wednesday, October 3, 2012 3:30:18 PM UTC-5, Soren Macbeth wrote:

    Are ?field1 and ?field2 strings or byte arrays?
    On Wed, Oct 3, 2012 at 1:26 PM, David Kincaid wrote:
    I'm just using it as a sink in a Cascalog query so I'm not writing it
    directly to the pail. Something like this:

    (let [sink-out (PailTap. "/tmp/pail")]
    (?<- sink-out [?field1 ?field2]
    (source-gen ?field1 ?field2)
    (:trap (hfs-trap error-path)))

    On Wednesday, October 3, 2012 3:11:03 PM UTC-5, Soren Macbeth wrote:

    Looks like you're trying to pass a String into a BinaryPailStructure
    that only accepts a byte arrary. you probably want to do something
    like (.getBytes "mystring") before writing it into the pail.

    On Wed, Oct 3, 2012 at 12:46 PM, David Kincaid <kincai...@gmail.com>
    wrote:
    I'm trying to use a PailTap from dfs-datastores-cascading in a
    Cascalog
    query and getting a ClassCastException. What am I doing wrong?

    I'm creating the tap with a simple (PailTap. path)

    sourcing from:
    SplittableMemorySourceTap["SplittableMemorySourceScheme[[UNKNOWN]->[ALL]]"]["/948307a7-82e5-474a-b7db-5d7084943d13"]"]
    sinking to: PailTap["PailScheme[['pail_root',
    'bytes']->['?practiceid',
    '?invoiceid', '?clientid', '?patientid', '?txn_date', '?quantity',
    '?price',
    '?itemtype', '?species', '?lineitem', '?sublineitem',
    '?transactionnumber']]"]["/VisitData/incoming/"]"]
    trapping to:
    Hfs["TextLine[['line']->[ALL]]"]["/PracticeVisitData/errors"]"]
    Creating processor/data/VisitData/7698
    caught Throwable, no trap available, rethrowing
    cascading.tuple.TupleException: unable to sink into output
    identifier:
    /VisitData/incoming/
    at
    cascading.tuple.TupleEntrySchemeCollector.collect(TupleEntrySchemeCollector.java:160)
    at
    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at
    cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:71)
    at
    cascading.tuple.TupleEntrySchemeCollector.add(TupleEntrySchemeCollector.java:134)
    at cascading.flow.stream.SinkStage.receive(SinkStage.java:90)
    at cascading.flow.stream.SinkStage.receive(SinkStage.java:37)
    at
    cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
    at
    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at
    cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
    at cascading.operation.Identity.operate(Identity.java:110)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
    at
    cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:60)
    at
    cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:33)
    at
    cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
    at
    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at
    cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
    at cascalog.ClojureMapcat.operate(Unknown Source)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
    at
    cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
    at
    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at
    cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
    at cascading.operation.Identity.operate(Identity.java:110)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
    at
    cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:60)
    at
    cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:33)
    at
    cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
    at
    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at
    cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
    at cascading.operation.Identity.operate(Identity.java:110)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
    at
    cascading.flow.stream.SourceStage.map(SourceStage.java:102)
    at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
    at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:124)
    at
    org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1109)
    at org.apache.hadoop.mapred.Child.main(Child.java:264)
    Caused by: java.lang.ClassCastException: java.lang.String cannot be
    cast
    to
    [B
    at
    backtype.hadoop.pail.BinaryPailStructure.serialize(BinaryPailStructure.java:3)
    at
    backtype.cascading.tap.PailTap$PailScheme.serialize(PailTap.java:89)
    at
    backtype.cascading.tap.PailTap$PailScheme.sink(PailTap.java:164)
    at
    cascading.tuple.TupleEntrySchemeCollector.collect(TupleEntrySchemeCollector.java:153)


    --
    http://about.me/soren
  • Soren Macbeth at Oct 3, 2012 at 8:52 pm
    I think by default it uses a BinaryPailStructure that requires byte arrays. You can/should implement your own PailStructure class to sink into.

    --
    http://about.me/soren

    On Wednesday, October 3, 2012 at 1:35 PM, David Kincaid wrote:

    They are strings in most cases. Sometimes doubles or integers. But when I use any other taps I don't have to do any type conversion. Does PailTap require that I convert all of the fields to byte arrays in my query?
    On Wednesday, October 3, 2012 3:30:18 PM UTC-5, Soren Macbeth wrote:
    Are ?field1 and ?field2 strings or byte arrays?
    On Wed, Oct 3, 2012 at 1:26 PM, David Kincaid wrote:
    I'm just using it as a sink in a Cascalog query so I'm not writing it
    directly to the pail. Something like this:

    (let [sink-out (PailTap. "/tmp/pail")]
    (?<- sink-out [?field1 ?field2]
    (source-gen ?field1 ?field2)
    (:trap (hfs-trap error-path)))

    On Wednesday, October 3, 2012 3:11:03 PM UTC-5, Soren Macbeth wrote:

    Looks like you're trying to pass a String into a BinaryPailStructure
    that only accepts a byte arrary. you probably want to do something
    like (.getBytes "mystring") before writing it into the pail.

    On Wed, Oct 3, 2012 at 12:46 PM, David Kincaid <kincai...@gmail.com>
    wrote:
    I'm trying to use a PailTap from dfs-datastores-cascading in a Cascalog
    query and getting a ClassCastException. What am I doing wrong?

    I'm creating the tap with a simple (PailTap. path)

    sourcing from:

    SplittableMemorySourceTap["SplittableMemorySourceScheme[[UNKNOWN]->[ALL]]"]["/948307a7-82e5-474a-b7db-5d7084943d13"]"]
    sinking to: PailTap["PailScheme[['pail_root', 'bytes']->['?practiceid',
    '?invoiceid', '?clientid', '?patientid', '?txn_date', '?quantity',
    '?price',
    '?itemtype', '?species', '?lineitem', '?sublineitem',
    '?transactionnumber']]"]["/VisitData/incoming/"]"]
    trapping to:
    Hfs["TextLine[['line']->[ALL]]"]["/PracticeVisitData/errors"]"]
    Creating processor/data/VisitData/7698
    caught Throwable, no trap available, rethrowing
    cascading.tuple.TupleException: unable to sink into output identifier:
    /VisitData/incoming/
    at

    cascading.tuple.TupleEntrySchemeCollector.collect(TupleEntrySchemeCollector.java:160)
    at

    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at
    cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:71)
    at

    cascading.tuple.TupleEntrySchemeCollector.add(TupleEntrySchemeCollector.java:134)
    at cascading.flow.stream.SinkStage.receive(SinkStage.java:90)
    at cascading.flow.stream.SinkStage.receive(SinkStage.java:37)
    at

    cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
    at

    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at
    cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
    at cascading.operation.Identity.operate(Identity.java:110)
    at

    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
    at

    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
    at
    cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:60)
    at
    cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:33)
    at

    cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
    at

    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at
    cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
    at cascalog.ClojureMapcat.operate(Unknown Source)
    at

    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
    at

    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
    at

    cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
    at

    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at
    cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
    at cascading.operation.Identity.operate(Identity.java:110)
    at

    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
    at

    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
    at
    cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:60)
    at
    cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:33)
    at

    cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
    at

    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at
    cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
    at cascading.operation.Identity.operate(Identity.java:110)
    at

    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
    at

    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
    at cascading.flow.stream.SourceStage.map(SourceStage.java:102)
    at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
    at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:124)
    at
    org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at

    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1109)
    at org.apache.hadoop.mapred.Child.main(Child.java:264)
    Caused by: java.lang.ClassCastException: java.lang.String cannot be cast
    to
    [B
    at

    backtype.hadoop.pail.BinaryPailStructure.serialize(BinaryPailStructure.java:3)
    at
    backtype.cascading.tap.PailTap$PailScheme.serialize(PailTap.java:89)
    at
    backtype.cascading.tap.PailTap$PailScheme.sink(PailTap.java:164)
    at

    cascading.tuple.TupleEntrySchemeCollector.collect(TupleEntrySchemeCollector.java:153)
    ... 43 more


    --
    http://about.me/soren


    --
    http://about.me/soren
  • David Kincaid at Oct 3, 2012 at 9:05 pm
    I thought Cascalog was more flexible than that and that taps could be
    swapped out more easily. I've created a PailStructure before in Java and
    written to it directly so that won't be a problem, but I don't see how to
    get that into a PailTap. I think I've been down this road before and got
    lost in PailStructures, PailTapOptions, PailSpecs

    After this last week of living in the guts of the source code for Hadoop,
    Cascading and various extensions and tools I just want to get something
    accomplished. I'm beginning to see why people are telling me to stay away
    from Hadoop. There is surprisingly little structured documentation.

    Anyway, thanks for the help and advice. Looks like I'll be reading through
    more other peoples' source code tonight.
    On Wednesday, October 3, 2012 3:52:53 PM UTC-5, Soren Macbeth wrote:

    I think by default it uses a BinaryPailStructure that requires byte
    arrays. You can/should implement your own PailStructure class to sink
    into.

    --
    http://about.me/soren

    On Wednesday, October 3, 2012 at 1:35 PM, David Kincaid wrote:

    They are strings in most cases. Sometimes doubles or integers. But when I
    use any other taps I don't have to do any type conversion. Does PailTap
    require that I convert all of the fields to byte arrays in my query?

    On Wednesday, October 3, 2012 3:30:18 PM UTC-5, Soren Macbeth wrote:

    Are ?field1 and ?field2 strings or byte arrays?
    On Wed, Oct 3, 2012 at 1:26 PM, David Kincaid wrote:
    I'm just using it as a sink in a Cascalog query so I'm not writing it
    directly to the pail. Something like this:

    (let [sink-out (PailTap. "/tmp/pail")]
    (?<- sink-out [?field1 ?field2]
    (source-gen ?field1 ?field2)
    (:trap (hfs-trap error-path)))

    On Wednesday, October 3, 2012 3:11:03 PM UTC-5, Soren Macbeth wrote:

    Looks like you're trying to pass a String into a BinaryPailStructure
    that only accepts a byte arrary. you probably want to do something
    like (.getBytes "mystring") before writing it into the pail.

    On Wed, Oct 3, 2012 at 12:46 PM, David Kincaid <kincai...@gmail.com>
    wrote:
    I'm trying to use a PailTap from dfs-datastores-cascading in a
    Cascalog
    query and getting a ClassCastException. What am I doing wrong?

    I'm creating the tap with a simple (PailTap. path)

    sourcing from:
    SplittableMemorySourceTap["SplittableMemorySourceScheme[[UNKNOWN]->[ALL]]"]["/948307a7-82e5-474a-b7db-5d7084943d13"]"]
    sinking to: PailTap["PailScheme[['pail_root',
    'bytes']->['?practiceid',
    '?invoiceid', '?clientid', '?patientid', '?txn_date', '?quantity',
    '?price',
    '?itemtype', '?species', '?lineitem', '?sublineitem',
    '?transactionnumber']]"]["/VisitData/incoming/"]"]
    trapping to:
    Hfs["TextLine[['line']->[ALL]]"]["/PracticeVisitData/errors"]"]
    Creating processor/data/VisitData/7698
    caught Throwable, no trap available, rethrowing
    cascading.tuple.TupleException: unable to sink into output
    identifier:
    /VisitData/incoming/
    at
    cascading.tuple.TupleEntrySchemeCollector.collect(TupleEntrySchemeCollector.java:160)
    at
    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at
    cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:71)
    at
    cascading.tuple.TupleEntrySchemeCollector.add(TupleEntrySchemeCollector.java:134)
    at cascading.flow.stream.SinkStage.receive(SinkStage.java:90)
    at cascading.flow.stream.SinkStage.receive(SinkStage.java:37)
    at
    cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
    at
    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at
    cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
    at cascading.operation.Identity.operate(Identity.java:110)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
    at
    cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:60)
    at
    cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:33)
    at
    cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
    at
    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at
    cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
    at cascalog.ClojureMapcat.operate(Unknown Source)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
    at
    cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
    at
    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at
    cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
    at cascading.operation.Identity.operate(Identity.java:110)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
    at
    cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:60)
    at
    cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:33)
    at
    cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
    at
    cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
    at
    cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
    at cascading.operation.Identity.operate(Identity.java:110)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
    at
    cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
    at
    cascading.flow.stream.SourceStage.map(SourceStage.java:102)
    at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
    at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:124)
    at
    org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1109)
    at org.apache.hadoop.mapred.Child.main(Child.java:264)
    Caused by: java.lang.ClassCastException: java.lang.String cannot be
    cast
    to
    [B
    at
    backtype.hadoop.pail.BinaryPailStructure.serialize(BinaryPailStructure.java:3)
    at
    backtype.cascading.tap.PailTap$PailScheme.serialize(PailTap.java:89)
    at
    backtype.cascading.tap.PailTap$PailScheme.sink(PailTap.java:164)
    at
    cascading.tuple.TupleEntrySchemeCollector.collect(TupleEntrySchemeCollector.java:153)


    --
    http://about.me/soren

  • David Kincaid at Oct 3, 2012 at 9:06 pm
    I thought Cascalog was more flexible than that and that taps could be
    swapped out more easily. I've created a PailStructure before in Java and
    written to it directly so that won't be a problem, but I don't see how to
    get that into a PailTap. I think I've been down this road before and got
    lost in PailStructures, PailTapOptions, PailSpecs

    After this last week of living in the guts of the source code for Hadoop,
    Cascading and various extensions and tools I just want to get something
    accomplished. I'm beginning to see why people are telling me to stay away
    from Hadoop. There is surprisingly little structured documentation.

    Anyway, thanks for the help and advice. Looks like I'll be reading through
    more other peoples' source code tonight.
    On Wednesday, October 3, 2012 3:52:53 PM UTC-5, Soren Macbeth wrote:

    I think by default it uses a BinaryPailStructure that requires byte
    arrays. You can/should implement your own PailStructure class to sink
    into.

    --
    http://about.me/soren

    On Wednesday, October 3, 2012 at 1:35 PM, David Kincaid wrote:

    They are strings in most cases. Sometimes doubles or integers. But when I
    use any other taps I don't have to do any type conversion. Does PailTap
    require that I convert all of the fields to byte arrays in my query?

    On Wednesday, October 3, 2012 3:30:18 PM UTC-5, Soren Macbeth wrote:

    Are ?field1 and ?field2 strings or byte arrays?
    On Wed, Oct 3, 2012 at 1:26 PM, David Kincaid wrote:
    I'm just using it as a sink in a Cascalog query so I'm not writing it
    directly to the pail. Something like this:

    (let [sink-out (PailTap. "/tmp/pail")]
    (?<- sink-out [?field1 ?field2]
    (source-gen ?field1 ?field2)
    (:trap (hfs-trap error-path)))

    On Wednesday, October 3, 2012 3:11:03 PM UTC-5, Soren Macbeth wrote:

    Looks like you're trying to pass a String into a BinaryPailStructure
    that only accepts a byte arrary. you probably want to do something
    like (.getBytes "mystring") before writing it into the pail.

    On Wed, Oct 3, 2012 at 12:46 PM, David Kincaid <kincai...@gmail.com>
    wrote:
    I'm trying to use a PailTap from dfs-datastores-cascading in a
    Cascalog
    query and getting a ClassCastException. What am I doing wrong?

    I'm creating the tap with a simple (PailTap. path)

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcascalog-user @
categoriesclojure, hadoop
postedOct 3, '12 at 7:46p
activeOct 3, '12 at 9:06p
posts8
users2
websiteclojure.org
irc#clojure

2 users in discussion

David Kincaid: 5 posts Soren Macbeth: 3 posts

People

Translate

site design / logo © 2022 Grokbase