Grokbase Groups Pig user October 2010
FAQ
If you can not change your input data generation process to generate input directly in Zebra, I can't see any alternative than two sets of data.

Regarding generating Zebra data, Pig is simpler than raw map/reduce and the performance should be fine too, provided there is a PIG loader for your input data format.

Yan

________________________________
From: Renato Marroquín Mogrovejo
Sent: Wednesday, October 27, 2010 9:29 AM
To: Yan Zhou; user@pig.apache.org
Subject: Re: Using data with Zebra

Thanks Yan!

Just a couple of questions. The thing is that I have too much data just to delete it and reprocess it all, and if I would reprocess all my hdfs data, then I will generate the same amount of data duplicated, one with Zebra and one with regular hdfs data. What would be the best approach that you would suggest? and would it be better to use Pig or raw MapReduce?

Renato M.
2010/10/25 Yan Zhou <yanz@yahoo-inc.com
.schema is column group's schema file; .btschema is Zebra table's schema file; .meta is column group's index file.

The bottom line is that they are all internal files maintained by Zebra and users should not access or manipulate them directly. Also, the storage format by Zebra is probably different from that used by you data already on HDFS.

In summary, you have to use Zebra to generate Zebra data and no other data format can be used by Zebra.

Yan

-----Original Message-----
From: Renato Marroquín Mogrovejo
Sent: Sunday, October 24, 2010 1:15 PM
To: user@pig.apache.org
Subject: Using data with Zebra

Hi there, I have some doubts about zebra usage.
The thing is that all my data is already in HDFS, and want to use the zebra
storers and loaders, but I don't want to reprocess all my data just to get
the .meta, .schema and the .btschema files, and by the way how are those
files related? I mean they all keep file's metadata, right?
Is there any way I can create the necessary files to use zebra's loaders and
storers functionality? Any advice or suggestion is highly appreciated.
Thanks in advanced.


Renato M.

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 4 of 5 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedOct 24, '10 at 8:15p
activeOct 28, '10 at 3:51a
posts5
users2
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase