Grokbase Groups Pig user August 2010
FAQ
What do you mean "multiple relations with many tuples" ? Do you mean
join multiple data set ?
And Pig user BinStorage for storing intermediate data.


On Fri, Aug 20, 2010 at 2:42 PM, Defenestrator
wrote:
Thanks, Jeff.

A quick follow-up question relating to the loading/storing of data - what is
the best practice when dealing with multiple relations with many tuples, do
people typically STORE intermediate relations to minimize memory usage and
RELOAD the intermediate data for use later on in the same script?  Because I
noticed that when tuples are written out using the TupleFormat, which
outputs text with an additional parenthesis that would cause a subsequent
PigStorage LOAD to get extra parenthesis characters, right?
On Thu, Aug 19, 2010 at 1:50 AM, Jeff Zhang wrote:

I am afraid you should write your own LoadFunc to interpret the text.
From Pig 0.7, the local mode use the hadoop's standalone local mode,
so it will won't store all the data in memory, the data will been read
in stream mode, but this mode need more memory because each task is
executed in another jvm.


On Thu, Aug 19, 2010 at 12:48 AM, Defenestrator
wrote:
What loader should I use on csv files with quoted strings that contain
embedded commas?  (i.e. Embedded commas should not be a separator.)

And when LOADing large files in local mode, does Pig just store it all
in memory?  Or does it have memory management ala buffer managers in
DBMS's?


--
Best Regards

Jeff Zhang


--
Best Regards

Jeff Zhang

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 4 of 7 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedAug 19, '10 at 7:49a
activeAug 20, '10 at 2:26p
posts7
users3
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase