Grokbase Groups Pig user January 2011
FAQ
Yes, you would have to distribute ruby (though. it's typically
installed by default) as well as the wukong and json libraries to all
the nodes in the cluster. Unfortunately this isn't something wukong
gives you for free at the moment though it is planned.

As far as I know Pig doesn't do anything more complex than launch a
hadoop streaming job and use the output in the subsequent steps

btw I write 90% of my mr jobs using either wukong or Pig. Only when
it's absolutely required do I use a language with as much overhead as
java :)

--jacob
@thedatachef

Sent from my iPhone
On Jan 30, 2011, at 2:09 PM, Alex McLintock wrote:
On 29 January 2011 13:43, Jacob Perkins wrote:

Write a map only wukong script that parses the json as you want it.
See
the example here:


http://thedatachef.blogspot.com/2011/01/processing-json-records-with-hadoop-and.html
Hi Jacob,

Thanks very much for helping me out. I haven't heard of Wukong before.
I am a bit concerned though by adding Ruby into my tool stack as
well as
Pig. It seems like a step too far.
Presumably I have to distribute Ruby and Wukong across all my job
nodes in
the same way as if I were writing perl or C++ streaming programs.

With STREAMing - the script is launched once per file, right, not
once per
record?

Alex

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 4 of 5 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedJan 29, '11 at 12:13p
activeJan 30, '11 at 10:24p
posts5
users3
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase