FAQ
Hi,

I need to join two files. One is compressed sequence file (maybe I should
use hfs-seqfile tap) and the other one is not compressed, tab delimited
file (maybe I should use hfs-delimited).

I wonder if I can do it in cascalog?

Thanks in advance

Kang

--
You received this message because you are subscribed to the Google Groups "cascalog-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascalog-user+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Search Discussions

  • David Kincaid at Mar 7, 2013 at 10:01 pm
    I think you answered your own question. Create two taps using hfs-seqfile
    for the compressed file and hfs-delimited for the tab delimited file. Then
    create a query that uses the two taps and does your join.

    Dave
    On Thursday, March 7, 2013 3:49:31 PM UTC-6, Kang Tu wrote:

    Hi,

    I need to join two files. One is compressed sequence file (maybe I should
    use hfs-seqfile tap) and the other one is not compressed, tab delimited
    file (maybe I should use hfs-delimited).

    I wonder if I can do it in cascalog?

    Thanks in advance

    Kang
    --
    You received this message because you are subscribed to the Google Groups "cascalog-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to cascalog-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Kang Tu at Mar 7, 2013 at 10:47 pm
    Hi Dave,

    Thanks for replying. What I am not sure is:

    If the hfs-seqfile is the compressed format by default?

    If it is not, how can I set "compressed" option for one tap and
    "non-compressed" option for another tap? I know there might be some option
    in with-job-conf but it looks like a global option and cannot be applied to
    individual.

    Thanks

    Kang
    On Thursday, March 7, 2013 2:01:15 PM UTC-8, David Kincaid wrote:

    I think you answered your own question. Create two taps using hfs-seqfile
    for the compressed file and hfs-delimited for the tab delimited file. Then
    create a query that uses the two taps and does your join.

    Dave
    On Thursday, March 7, 2013 3:49:31 PM UTC-6, Kang Tu wrote:

    Hi,

    I need to join two files. One is compressed sequence file (maybe I should
    use hfs-seqfile tap) and the other one is not compressed, tab delimited
    file (maybe I should use hfs-delimited).

    I wonder if I can do it in cascalog?

    Thanks in advance

    Kang
    --
    You received this message because you are subscribed to the Google Groups "cascalog-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to cascalog-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Paul Lam at Mar 10, 2013 at 8:27 am
    Hi Kang,

    hfs-delimited is non-compressed by default. For a general solution, say if
    you have one hfs-seqfile that is compressed and another hfs-seqfile that is
    not compressed or using a different compression method, you can use
    cascalog-checkpoint and have each sourcing step using its own
    (with-job-conf) to set compression properties.



    Paul
    On Thursday, 7 March 2013 22:47:47 UTC, Kang Tu wrote:

    Hi Dave,

    Thanks for replying. What I am not sure is:

    If the hfs-seqfile is the compressed format by default?

    If it is not, how can I set "compressed" option for one tap and
    "non-compressed" option for another tap? I know there might be some option
    in with-job-conf but it looks like a global option and cannot be applied to
    individual.

    Thanks

    Kang
    On Thursday, March 7, 2013 2:01:15 PM UTC-8, David Kincaid wrote:

    I think you answered your own question. Create two taps using hfs-seqfile
    for the compressed file and hfs-delimited for the tab delimited file. Then
    create a query that uses the two taps and does your join.

    Dave
    On Thursday, March 7, 2013 3:49:31 PM UTC-6, Kang Tu wrote:

    Hi,

    I need to join two files. One is compressed sequence file (maybe I
    should use hfs-seqfile tap) and the other one is not compressed, tab
    delimited file (maybe I should use hfs-delimited).

    I wonder if I can do it in cascalog?

    Thanks in advance

    Kang
    --
    You received this message because you are subscribed to the Google Groups "cascalog-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to cascalog-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Kang Tu at Mar 11, 2013 at 12:12 am
    Great suggestion. Thank you Paul.
    On Sun, Mar 10, 2013 at 12:27 AM, Paul Lam wrote:

    Hi Kang,

    hfs-delimited is non-compressed by default. For a general solution, say if
    you have one hfs-seqfile that is compressed and another hfs-seqfile that is
    not compressed or using a different compression method, you can use
    cascalog-checkpoint and have each sourcing step using its own
    (with-job-conf) to set compression properties.



    Paul

    On Thursday, 7 March 2013 22:47:47 UTC, Kang Tu wrote:

    Hi Dave,

    Thanks for replying. What I am not sure is:

    If the hfs-seqfile is the compressed format by default?

    If it is not, how can I set "compressed" option for one tap and
    "non-compressed" option for another tap? I know there might be some option
    in with-job-conf but it looks like a global option and cannot be applied to
    individual.

    Thanks

    Kang
    On Thursday, March 7, 2013 2:01:15 PM UTC-8, David Kincaid wrote:

    I think you answered your own question. Create two taps using
    hfs-seqfile for the compressed file and hfs-delimited for the tab delimited
    file. Then create a query that uses the two taps and does your join.

    Dave
    On Thursday, March 7, 2013 3:49:31 PM UTC-6, Kang Tu wrote:

    Hi,

    I need to join two files. One is compressed sequence file (maybe I
    should use hfs-seqfile tap) and the other one is not compressed, tab
    delimited file (maybe I should use hfs-delimited).

    I wonder if I can do it in cascalog?

    Thanks in advance

    Kang
    --
    You received this message because you are subscribed to a topic in the
    Google Groups "cascalog-user" group.
    To unsubscribe from this topic, visit
    https://groups.google.com/d/topic/cascalog-user/00EgoRaoOFU/unsubscribe?hl=en
    .
    To unsubscribe from this group and all its topics, send an email to
    cascalog-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

    --
    You received this message because you are subscribed to the Google Groups "cascalog-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to cascalog-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Sam Ritchie at Mar 7, 2013 at 10:01 pm
    Yup, totally possible.
    Kang Tu March 7, 2013 1:49 PM
    Hi,

    I need to join two files. One is compressed sequence file (maybe I
    should use hfs-seqfile tap) and the other one is not compressed, tab
    delimited file (maybe I should use hfs-delimited).

    I wonder if I can do it in cascalog?

    Thanks in advance

    Kang
    --
    You received this message because you are subscribed to the Google
    Groups "cascalog-user" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to cascalog-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
    --
    Sam Ritchie, Twitter Inc
    703.662.1337
    @sritchie

    --
    You received this message because you are subscribed to the Google Groups "cascalog-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to cascalog-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcascalog-user @
categoriesclojure, hadoop
postedMar 7, '13 at 9:49p
activeMar 11, '13 at 12:12a
posts6
users4
websiteclojure.org
irc#clojure

People

Translate

site design / logo © 2022 Grokbase