Grokbase Groups Pig user June 2009
FAQ
Hi all,

First question: I use the one single PigServer to run serveral pig scripts
in multiple threads, but some exceptions will be throw, so Pig do not
support multi thread, Is it right ? I just want to make sure.

Second question: I run serverl pig scripts to hadoop cluster at the same
time, but in the jobtracker, I notice that the jobs is done one by one, So
Hadoop do not support execute multiple jobs at the same time ? I think in
this way hadoop can not levarage its all the machines' power.


Thank you

Jeff Zhang

Search Discussions

  • Dmitriy Ryaboy at Jun 19, 2009 at 1:45 pm
    Jeff,In regards to your second question -- Hadoop will schedule tasks as
    slots become available; if there are more tasks than slots, the tasks get
    enqueued. If you want multiple jobs to get executed at the same time
    (sacrificing some performance on individual jobs, as they will have access
    to fewer resources while unrelated tasks are running on some of the nodes)
    you can look into enabling the FairScheduler or CapacityScheduler. An
    explanation of the FairScheduler, as well as links to the relevant JIRAs,
    etc, can be found here: http://www.cloudera.com/blog/tag/scheduling/

    -D
    On Fri, Jun 19, 2009 at 2:06 AM, zhang jianfeng wrote:

    Hi all,

    First question: I use the one single PigServer to run serveral pig scripts
    in multiple threads, but some exceptions will be throw, so Pig do not
    support multi thread, Is it right ? I just want to make sure.

    Second question: I run serverl pig scripts to hadoop cluster at the same
    time, but in the jobtracker, I notice that the jobs is done one by one, So
    Hadoop do not support execute multiple jobs at the same time ? I think in
    this way hadoop can not levarage its all the machines' power.


    Thank you

    Jeff Zhang
  • Zjffdu at Jun 19, 2009 at 3:07 pm
    Hi Dmitriy,

    Thank you for your help, I will look into the schedule tool you refer for
    more details.


    Jeff Zhang


    -----Original Message-----
    From: Dmitriy Ryaboy
    Sent: 2009年6月19日 6:45
    To: [email protected]
    Subject: Re: Does Pig support multi thread and Hadoop do not support execute
    multiple jobs at the same time?

    Jeff,In regards to your second question -- Hadoop will schedule tasks as
    slots become available; if there are more tasks than slots, the tasks get
    enqueued. If you want multiple jobs to get executed at the same time
    (sacrificing some performance on individual jobs, as they will have access
    to fewer resources while unrelated tasks are running on some of the nodes)
    you can look into enabling the FairScheduler or CapacityScheduler. An
    explanation of the FairScheduler, as well as links to the relevant JIRAs,
    etc, can be found here: http://www.cloudera.com/blog/tag/scheduling/

    -D
    On Fri, Jun 19, 2009 at 2:06 AM, zhang jianfeng wrote:

    Hi all,

    First question: I use the one single PigServer to run serveral pig scripts
    in multiple threads, but some exceptions will be throw, so Pig do not
    support multi thread, Is it right ? I just want to make sure.

    Second question: I run serverl pig scripts to hadoop cluster at the same
    time, but in the jobtracker, I notice that the jobs is done one by one, So
    Hadoop do not support execute multiple jobs at the same time ? I think in
    this way hadoop can not levarage its all the machines' power.


    Thank you

    Jeff Zhang
  • Alan Gates at Jun 19, 2009 at 3:16 pm

    On Jun 19, 2009, at 2:06 AM, zhang jianfeng wrote:

    First question: I use the one single PigServer to run serveral pig
    scripts
    in multiple threads, but some exceptions will be throw, so Pig do not
    support multi thread, Is it right ? I just want to make sure.
    Correct, Pig is not multi-threaded at this point. We have tried to
    design it such that changing that in the future would not be too much
    work. But as we don't test multi-threaded, I'm sure there are some
    shared static variables, etc.

    Alan.
  • Zhang jianfeng at Jun 22, 2009 at 8:12 am
    Hi Alan,

    I have tested the following code , but sometimes exception will been throw
    out:

    /
    *********************************************************************************************************
    ExecutorService executor = new ScheduledThreadPoolExecutor(10);

    Callable<Long> totalVisitorTaks = new Callable<Long>() {
    @Override
    public Long call() throws Exception {
    PigServer pig=new PigServer(ExecType.MAPREDUCE);
    pig.register(..........);
    }
    };

    Callable<Map<String, Long>> visitorBySourceTask = new
    Callable<Map<String, Long>>() {
    @Override
    public Map<String, Long> call() throws Exception {
    PigServer pig=new PigServer(ExecType.MAPREDUCE);
    pig.register(..........);
    }
    };

    Future<Long> totalVisitor = executor.submit(totalVisitorTaks);
    Future<Map<String, Long>> visitBySource =
    executor.submit(visitBySourceTask);

    /
    *********************************************************************************************************

    It seems I can not execute two pig scripts in different threads at the same
    time. I think it would be better if Pig can support this.

    Because one job one time can not use the resource efficiently. What do you
    think ?


    Jeff zhang.


    On Fri, Jun 19, 2009 at 11:16 PM, Alan Gates wrote:


    On Jun 19, 2009, at 2:06 AM, zhang jianfeng wrote:

    First question: I use the one single PigServer to run serveral pig scripts
    in multiple threads, but some exceptions will be throw, so Pig do not
    support multi thread, Is it right ? I just want to make sure.
    Correct, Pig is not multi-threaded at this point. We have tried to design
    it such that changing that in the future would not be too much work. But as
    we don't test multi-threaded, I'm sure there are some shared static
    variables, etc.

    Alan.
  • Alan Gates at Jun 22, 2009 at 4:11 pm
    Pig being thread safe would be great. In particular it would be good
    because it would make it much easier to construct a Pig server that
    accepted requests via web services (or whatever) (see https://issues.apache.org/jira/browse/PIG-603)
    . I am not aware of anyone working on this. It would be good to file
    a JIRA and vote for it, so that Pig developers know what is of
    interest to users.

    Alan.
    On Jun 22, 2009, at 1:12 AM, zhang jianfeng wrote:

    Hi Alan,

    I have tested the following code , but sometimes exception will been
    throw
    out:

    /
    *********************************************************************************************************
    ExecutorService executor = new ScheduledThreadPoolExecutor(10);

    Callable<Long> totalVisitorTaks = new Callable<Long>() {
    @Override
    public Long call() throws Exception {
    PigServer pig=new PigServer(ExecType.MAPREDUCE);
    pig.register(..........);
    }
    };

    Callable<Map<String, Long>> visitorBySourceTask = new
    Callable<Map<String, Long>>() {
    @Override
    public Map<String, Long> call() throws Exception {
    PigServer pig=new PigServer(ExecType.MAPREDUCE);
    pig.register(..........);
    }
    };

    Future<Long> totalVisitor = executor.submit(totalVisitorTaks);
    Future<Map<String, Long>> visitBySource =
    executor.submit(visitBySourceTask);

    /
    *********************************************************************************************************

    It seems I can not execute two pig scripts in different threads at
    the same
    time. I think it would be better if Pig can support this.

    Because one job one time can not use the resource efficiently. What
    do you
    think ?


    Jeff zhang.


    On Fri, Jun 19, 2009 at 11:16 PM, Alan Gates wrote:


    On Jun 19, 2009, at 2:06 AM, zhang jianfeng wrote:

    First question: I use the one single PigServer to run serveral pig
    scripts
    in multiple threads, but some exceptions will be throw, so Pig do
    not
    support multi thread, Is it right ? I just want to make sure.
    Correct, Pig is not multi-threaded at this point. We have tried to
    design
    it such that changing that in the future would not be too much
    work. But as
    we don't test multi-threaded, I'm sure there are some shared static
    variables, etc.

    Alan.
  • Dmitriy Ryaboy at Jun 22, 2009 at 4:58 pm
    There is already a jira for this:

    https://issues.apache.org/jira/browse/PIG-240

    -Dmitriy
    On Mon, Jun 22, 2009 at 9:09 AM, Alan Gates wrote:

    Pig being thread safe would be great. In particular it would be good
    because it would make it much easier to construct a Pig server that accepted
    requests via web services (or whatever) (see
    https://issues.apache.org/jira/browse/PIG-603). I am not aware of anyone
    working on this. It would be good to file a JIRA and vote for it, so that
    Pig developers know what is of interest to users.

    Alan.


    On Jun 22, 2009, at 1:12 AM, zhang jianfeng wrote:

    Hi Alan,
    I have tested the following code , but sometimes exception will been throw
    out:

    /

    *********************************************************************************************************
    ExecutorService executor = new ScheduledThreadPoolExecutor(10);

    Callable<Long> totalVisitorTaks = new Callable<Long>() {
    @Override
    public Long call() throws Exception {
    PigServer pig=new PigServer(ExecType.MAPREDUCE);
    pig.register(..........);
    }
    };

    Callable<Map<String, Long>> visitorBySourceTask = new
    Callable<Map<String, Long>>() {
    @Override
    public Map<String, Long> call() throws Exception {
    PigServer pig=new PigServer(ExecType.MAPREDUCE);
    pig.register(..........);
    }
    };

    Future<Long> totalVisitor = executor.submit(totalVisitorTaks);
    Future<Map<String, Long>> visitBySource =
    executor.submit(visitBySourceTask);

    /

    *********************************************************************************************************

    It seems I can not execute two pig scripts in different threads at the
    same
    time. I think it would be better if Pig can support this.

    Because one job one time can not use the resource efficiently. What do you
    think ?


    Jeff zhang.



    On Fri, Jun 19, 2009 at 11:16 PM, Alan Gates wrote:

    On Jun 19, 2009, at 2:06 AM, zhang jianfeng wrote:

    First question: I use the one single PigServer to run serveral pig
    scripts
    in multiple threads, but some exceptions will be throw, so Pig do not
    support multi thread, Is it right ? I just want to make sure.
    Correct, Pig is not multi-threaded at this point. We have tried to
    design
    it such that changing that in the future would not be too much work. But
    as
    we don't test multi-threaded, I'm sure there are some shared static
    variables, etc.

    Alan.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJun 19, '09 at 9:07a
activeJun 22, '09 at 4:58p
posts7
users3
websitepig.apache.org

People

Translate

site design / logo © 2023 Grokbase