FAQ
I'd like to get the following nonstrict behavior for hfs-delimited:

*When using a TextDelimited scheme to pull a delimited file into a source
Tap, if the scheme expects N fields from the file, it will only split an
incoming line using the delimiter N - 1 times. That is, you *should* be
left with N delimited fields. However, if the scheme encounters a line
with more than N delimited fields, it will simply concatenate the remainder
of the line to the last field. *


By my reading of the documentation, I should be able to do this using
something like this, but it isn't working for me.



((hfs-delimited in :delimiter "|" :strict? false) ?a ?b)


What do I need to do to allow the extra fields in the input tap?

Here's a gist with my code and a transcript:
https://gist.github.com/royseto/5227039

Thanks!


--
You received this message because you are subscribed to the Google Groups "cascalog-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascalog-user+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Search Discussions

  • Paco Nathan at Mar 23, 2013 at 2:40 pm
    From my read, the ":strict" keyword option in Cascading wasn't implemented
    for hfs-delimited in the "cascalog-contrib" library:
    https://github.com/nathanmarz/cascalog-contrib/blob/master/cascalog.more-taps/src/cascalog/more_taps.clj

    Those taps are being moved into Cascalog, and the ":strict" keyword option
    is implemented in the "develop" branch:
    https://github.com/nathanmarz/cascalog/blob/develop/cascalog-more-taps/src/cascalog/more_taps.clj

    OTOH, if you need to handle a variable number of fields, would it work to
    read the input as a text line, split it into a vector, then use some
    business logic to handle the variable length vector?


    On Sat, Mar 23, 2013 at 1:52 AM, Roy Seto wrote:

    I'd like to get the following nonstrict behavior for hfs-delimited:

    *When using a TextDelimited scheme to pull a delimited file into a source
    Tap, if the scheme expects N fields from the file, it will only split an
    incoming line using the delimiter N - 1 times. That is, you *should* be
    left with N delimited fields. However, if the scheme encounters a line
    with more than N delimited fields, it will simply concatenate the remainder
    of the line to the last field. *


    By my reading of the documentation, I should be able to do this using
    something like this, but it isn't working for me.



    ((hfs-delimited in :delimiter "|" :strict? false) ?a ?b)


    What do I need to do to allow the extra fields in the input tap?

    Here's a gist with my code and a transcript:
    https://gist.github.com/royseto/5227039

    Thanks!


    --
    You received this message because you are subscribed to the Google Groups
    "cascalog-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to cascalog-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

    --
    You received this message because you are subscribed to the Google Groups "cascalog-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to cascalog-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Roy Seto at Mar 24, 2013 at 6:06 am
    Thank you Paco. For now, I'll follow your suggestion to read the input as a
    text line and write my own field splitting logic.

    It looks like there was a Cascalog 1.10.1 release on Clojars today that may
    include the enhancement from the "develop" branch you mentioned, but I
    can't get my project to build when I update my project.clj to depend on [cascalog
    "1.10.1"]. I'll start a separate thread about that.

    Roy
    On Saturday, March 23, 2013 7:40:07 AM UTC-7, Paco Nathan wrote:

    From my read, the ":strict" keyword option in Cascading wasn't implemented
    for hfs-delimited in the "cascalog-contrib" library:
    https://github.com/nathanmarz/cascalog-contrib/blob/master/cascalog.more-taps/src/cascalog/more_taps.clj

    Those taps are being moved into Cascalog, and the ":strict" keyword option
    is implemented in the "develop" branch:
    https://github.com/nathanmarz/cascalog/blob/develop/cascalog-more-taps/src/cascalog/more_taps.clj

    OTOH, if you need to handle a variable number of fields, would it work to
    read the input as a text line, split it into a vector, then use some
    business logic to handle the variable length vector?



    On Sat, Mar 23, 2013 at 1:52 AM, Roy Seto <roy....@gmail.com <javascript:>
    wrote:
    I'd like to get the following nonstrict behavior for hfs-delimited:

    *When using a TextDelimited scheme to pull a delimited file into a
    source Tap, if the scheme expects N fields from the file, it will only
    split an incoming line using the delimiter N - 1 times. That is, you
    *should* be left with N delimited fields. However, if the scheme
    encounters a line with more than N delimited fields, it will simply
    concatenate the remainder of the line to the last field. *


    By my reading of the documentation, I should be able to do this using
    something like this, but it isn't working for me.



    ((hfs-delimited in :delimiter "|" :strict? false) ?a ?b)


    What do I need to do to allow the extra fields in the input tap?

    Here's a gist with my code and a transcript:
    https://gist.github.com/royseto/5227039

    Thanks!


    --
    You received this message because you are subscribed to the Google Groups
    "cascalog-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to cascalog-use...@googlegroups.com <javascript:>.
    For more options, visit https://groups.google.com/groups/opt_out.

    --
    You received this message because you are subscribed to the Google Groups "cascalog-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to cascalog-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Paul Lam at Mar 24, 2013 at 5:21 pm
    yes, more-taps 1.10.1 includes those options for hfs-delimited


    On Sunday, 24 March 2013 06:06:08 UTC, Roy Seto wrote:

    Thank you Paco. For now, I'll follow your suggestion to read the input as
    a text line and write my own field splitting logic.

    It looks like there was a Cascalog 1.10.1 release on Clojars today that
    may include the enhancement from the "develop" branch you mentioned, but I
    can't get my project to build when I update my project.clj to depend on [cascalog
    "1.10.1"]. I'll start a separate thread about that.

    Roy
    On Saturday, March 23, 2013 7:40:07 AM UTC-7, Paco Nathan wrote:

    From my read, the ":strict" keyword option in Cascading wasn't
    implemented for hfs-delimited in the "cascalog-contrib" library:
    https://github.com/nathanmarz/cascalog-contrib/blob/master/cascalog.more-taps/src/cascalog/more_taps.clj

    Those taps are being moved into Cascalog, and the ":strict" keyword
    option is implemented in the "develop" branch:
    https://github.com/nathanmarz/cascalog/blob/develop/cascalog-more-taps/src/cascalog/more_taps.clj

    OTOH, if you need to handle a variable number of fields, would it work to
    read the input as a text line, split it into a vector, then use some
    business logic to handle the variable length vector?


    On Sat, Mar 23, 2013 at 1:52 AM, Roy Seto wrote:

    I'd like to get the following nonstrict behavior for hfs-delimited:

    *When using a TextDelimited scheme to pull a delimited file into a
    source Tap, if the scheme expects N fields from the file, it will only
    split an incoming line using the delimiter N - 1 times. That is, you
    *should* be left with N delimited fields. However, if the scheme
    encounters a line with more than N delimited fields, it will simply
    concatenate the remainder of the line to the last field. *


    By my reading of the documentation, I should be able to do this using
    something like this, but it isn't working for me.



    ((hfs-delimited in :delimiter "|" :strict? false) ?a ?b)


    What do I need to do to allow the extra fields in the input tap?

    Here's a gist with my code and a transcript:
    https://gist.github.com/royseto/5227039

    Thanks!


    --
    You received this message because you are subscribed to the Google
    Groups "cascalog-user" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to cascalog-use...@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

    --
    You received this message because you are subscribed to the Google Groups "cascalog-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to cascalog-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcascalog-user @
categoriesclojure, hadoop
postedMar 23, '13 at 10:12a
activeMar 24, '13 at 5:21p
posts4
users3
websiteclojure.org
irc#clojure

3 users in discussion

Roy Seto: 2 posts Paul Lam: 1 post Paco Nathan: 1 post

People

Translate

site design / logo © 2022 Grokbase