Grokbase Groups Pig user October 2010
FAQ
Hi there, I have some doubts about zebra usage.
The thing is that all my data is already in HDFS, and want to use the zebra
storers and loaders, but I don't want to reprocess all my data just to get
the .meta, .schema and the .btschema files, and by the way how are those
files related? I mean they all keep file's metadata, right?
Is there any way I can create the necessary files to use zebra's loaders and
storers functionality? Any advice or suggestion is highly appreciated.
Thanks in advanced.


Renato M.

Search Discussions

  • Yan Zhou at Oct 25, 2010 at 5:12 pm
    .schema is column group's schema file; .btschema is Zebra table's schema file; .meta is column group's index file.

    The bottom line is that they are all internal files maintained by Zebra and users should not access or manipulate them directly. Also, the storage format by Zebra is probably different from that used by you data already on HDFS.

    In summary, you have to use Zebra to generate Zebra data and no other data format can be used by Zebra.

    Yan

    -----Original Message-----
    From: Renato Marroquín Mogrovejo
    Sent: Sunday, October 24, 2010 1:15 PM
    To: user@pig.apache.org
    Subject: Using data with Zebra

    Hi there, I have some doubts about zebra usage.
    The thing is that all my data is already in HDFS, and want to use the zebra
    storers and loaders, but I don't want to reprocess all my data just to get
    the .meta, .schema and the .btschema files, and by the way how are those
    files related? I mean they all keep file's metadata, right?
    Is there any way I can create the necessary files to use zebra's loaders and
    storers functionality? Any advice or suggestion is highly appreciated.
    Thanks in advanced.


    Renato M.
  • Renato Marroquín Mogrovejo at Oct 27, 2010 at 4:41 pm
    Thanks Yan!

    Just a couple of questions. The thing is that I have too much data just to
    delete it and reprocess it all, and if I would reprocess all my hdfs data,
    then I will generate the same amount of data duplicated, one with Zebra and
    one with regular hdfs data. What would be the best approach that you would
    suggest? and would it be better to use Pig or raw MapReduce?

    Renato M.

    2010/10/25 Yan Zhou <yanz@yahoo-inc.com>
    .schema is column group's schema file; .btschema is Zebra table's schema
    file; .meta is column group's index file.

    The bottom line is that they are all internal files maintained by Zebra and
    users should not access or manipulate them directly. Also, the storage
    format by Zebra is probably different from that used by you data already on
    HDFS.

    In summary, you have to use Zebra to generate Zebra data and no other data
    format can be used by Zebra.

    Yan

    -----Original Message-----
    From: Renato Marroquín Mogrovejo
    Sent: Sunday, October 24, 2010 1:15 PM
    To: user@pig.apache.org
    Subject: Using data with Zebra

    Hi there, I have some doubts about zebra usage.
    The thing is that all my data is already in HDFS, and want to use the zebra
    storers and loaders, but I don't want to reprocess all my data just to get
    the .meta, .schema and the .btschema files, and by the way how are those
    files related? I mean they all keep file's metadata, right?
    Is there any way I can create the necessary files to use zebra's loaders
    and
    storers functionality? Any advice or suggestion is highly appreciated.
    Thanks in advanced.


    Renato M.
  • Yan Zhou at Oct 27, 2010 at 5:09 pm
    If you can not change your input data generation process to generate input directly in Zebra, I can't see any alternative than two sets of data.

    Regarding generating Zebra data, Pig is simpler than raw map/reduce and the performance should be fine too, provided there is a PIG loader for your input data format.

    Yan

    ________________________________
    From: Renato Marroquín Mogrovejo
    Sent: Wednesday, October 27, 2010 9:29 AM
    To: Yan Zhou; user@pig.apache.org
    Subject: Re: Using data with Zebra

    Thanks Yan!

    Just a couple of questions. The thing is that I have too much data just to delete it and reprocess it all, and if I would reprocess all my hdfs data, then I will generate the same amount of data duplicated, one with Zebra and one with regular hdfs data. What would be the best approach that you would suggest? and would it be better to use Pig or raw MapReduce?

    Renato M.
    2010/10/25 Yan Zhou <yanz@yahoo-inc.com
    .schema is column group's schema file; .btschema is Zebra table's schema file; .meta is column group's index file.

    The bottom line is that they are all internal files maintained by Zebra and users should not access or manipulate them directly. Also, the storage format by Zebra is probably different from that used by you data already on HDFS.

    In summary, you have to use Zebra to generate Zebra data and no other data format can be used by Zebra.

    Yan

    -----Original Message-----
    From: Renato Marroquín Mogrovejo
    Sent: Sunday, October 24, 2010 1:15 PM
    To: user@pig.apache.org
    Subject: Using data with Zebra

    Hi there, I have some doubts about zebra usage.
    The thing is that all my data is already in HDFS, and want to use the zebra
    storers and loaders, but I don't want to reprocess all my data just to get
    the .meta, .schema and the .btschema files, and by the way how are those
    files related? I mean they all keep file's metadata, right?
    Is there any way I can create the necessary files to use zebra's loaders and
    storers functionality? Any advice or suggestion is highly appreciated.
    Thanks in advanced.


    Renato M.
  • Renato Marroquín Mogrovejo at Oct 28, 2010 at 3:51 am
    Thanks for the pointers Yan!

    Renato M.

    2010/10/27 Yan Zhou <yanz@yahoo-inc.com>
    If you can not change your input data generation process to generate
    input directly in Zebra, I can’t see any alternative than two sets of data.



    Regarding generating Zebra data, Pig is simpler than raw map/reduce and the
    performance should be fine too, provided there is a PIG loader for your
    input data format.



    Yan


    ------------------------------

    *From:* Renato Marroquín Mogrovejo
    *Sent:* Wednesday, October 27, 2010 9:29 AM
    *To:* Yan Zhou; user@pig.apache.org
    *Subject:* Re: Using data with Zebra



    Thanks Yan!

    Just a couple of questions. The thing is that I have too much data just to
    delete it and reprocess it all, and if I would reprocess all my hdfs data,
    then I will generate the same amount of data duplicated, one with Zebra and
    one with regular hdfs data. What would be the best approach that you would
    suggest? and would it be better to use Pig or raw MapReduce?

    Renato M.

    2010/10/25 Yan Zhou <yanz@yahoo-inc.com>

    .schema is column group's schema file; .btschema is Zebra table's schema
    file; .meta is column group's index file.

    The bottom line is that they are all internal files maintained by Zebra and
    users should not access or manipulate them directly. Also, the storage
    format by Zebra is probably different from that used by you data already on
    HDFS.

    In summary, you have to use Zebra to generate Zebra data and no other data
    format can be used by Zebra.

    Yan


    -----Original Message-----
    From: Renato Marroquín Mogrovejo
    Sent: Sunday, October 24, 2010 1:15 PM
    To: user@pig.apache.org
    Subject: Using data with Zebra

    Hi there, I have some doubts about zebra usage.
    The thing is that all my data is already in HDFS, and want to use the zebra
    storers and loaders, but I don't want to reprocess all my data just to get
    the .meta, .schema and the .btschema files, and by the way how are those
    files related? I mean they all keep file's metadata, right?
    Is there any way I can create the necessary files to use zebra's loaders
    and
    storers functionality? Any advice or suggestion is highly appreciated.
    Thanks in advanced.


    Renato M.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedOct 24, '10 at 8:15p
activeOct 28, '10 at 3:51a
posts5
users2
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase