Grokbase Groups Pig user October 2009
FAQ
Forwarding this to pig-user, as many pig users may want to give
feedback on this issue.

Alan.

Begin forwarded message:
From: "Alan Gates (JIRA)" <jira@apache.org>
Date: October 26, 2009 3:18:59 PM PDT
To: <pig-dev@hadoop.apache.org>
Subject: [jira] Commented: (PIG-1053) Consider moving to Hadoop for
local mode
Reply-To: pig-dev@hadoop.apache.org


[ https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770237
#action_12770237 ]

Alan Gates commented on PIG-1053:
---------------------------------

Currently Pig has its own backend implementation framework that it
uses for executing Pig Latin scripts on a single box (as opposed to
in a Hadoop cluster), referred to as local mode. Having a separate
implementation has several drawbacks:

1) It does not offer the same functionality as Hadoop. A number of
things do not work, such as counters, slicers, etc.
2) UDFs (both eval and load/store functions) are often forced to
understand both contexts, and test whether they are working in local
or hadoop mode.
3) Additional code maintenance, as Pig is forced to maintain its own
framework. Going forward, as Pig attempts to leverage more Map
Reduce functionality (see for example PIG-966) maintaining this
separate mode is becoming a larger and larger effort.
4) It makes debugging harder for users and UDF writers, as the
execution environment on a local box differs from that on the
production cluster.

Pig's local mode has one very serious advantage over Hadoop in local
mode. It is much faster, about 15 times faster. Hadoop is designed
for large data sets and thus is not optimized to handle the start up
and tear down involved in small data jobs.

For debugging of code, this performance factor should not be that
big an issue. Where the performance becomes prohibitive is
functionality like ILLUSTRATE. Taking 30 seconds to give a sample
of data running through your script is excessive compared to 2
seconds.

So, which of these pain points is worse? Originally we felt the
performance was more important. But as we see many user complaints
about the above listed drawbacks and relatively few users using
local mode in performance intensive ways, we are wondering if we
made that choice correctly. Please give your feedback one way or
another.

Consider moving to Hadoop for local mode
----------------------------------------

Key: PIG-1053
URL: https://issues.apache.org/jira/browse/PIG-1053
Project: Pig
Issue Type: Improvement
Reporter: Alan Gates

We need to consider moving Pig to use Hadoop's local mode instead
of its own.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 1 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedOct 26, '09 at 10:22p
activeOct 26, '09 at 10:22p
posts1
users1
websitepig.apache.org

1 user in discussion

Alan Gates: 1 post

People

Translate

site design / logo © 2022 Grokbase