Grokbase Groups Pig user October 2010
FAQ
Thanks Yan!

Just a couple of questions. The thing is that I have too much data just to
delete it and reprocess it all, and if I would reprocess all my hdfs data,
then I will generate the same amount of data duplicated, one with Zebra and
one with regular hdfs data. What would be the best approach that you would
suggest? and would it be better to use Pig or raw MapReduce?

Renato M.

2010/10/25 Yan Zhou <yanz@yahoo-inc.com>
.schema is column group's schema file; .btschema is Zebra table's schema
file; .meta is column group's index file.

The bottom line is that they are all internal files maintained by Zebra and
users should not access or manipulate them directly. Also, the storage
format by Zebra is probably different from that used by you data already on
HDFS.

In summary, you have to use Zebra to generate Zebra data and no other data
format can be used by Zebra.

Yan

-----Original Message-----
From: Renato Marroquín Mogrovejo
Sent: Sunday, October 24, 2010 1:15 PM
To: user@pig.apache.org
Subject: Using data with Zebra

Hi there, I have some doubts about zebra usage.
The thing is that all my data is already in HDFS, and want to use the zebra
storers and loaders, but I don't want to reprocess all my data just to get
the .meta, .schema and the .btschema files, and by the way how are those
files related? I mean they all keep file's metadata, right?
Is there any way I can create the necessary files to use zebra's loaders
and
storers functionality? Any advice or suggestion is highly appreciated.
Thanks in advanced.


Renato M.

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 5 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedOct 24, '10 at 8:15p
activeOct 28, '10 at 3:51a
posts5
users2
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase