Grokbase Groups Pig user August 2010
FAQ
Hi all,
I am new to pig. I am wondering is there any recommended way to call Pig
code from Java?
Is there any Java interface which can be called directly from Java and
makes them work smoothly? It seems each keyword (filter, group, cogrape,
generate) and data types in Pig can have a counterpart in Java by using
Class, interface and data type. Is these Java interface available to Java
programmers to use? If not, why not?
Thanks very much for help!

regards,
Wenhao

--
~_~

Search Discussions

  • Harsh J at Aug 5, 2010 at 4:02 am
    You need to use the class PigServer.

    PigServer pigServer = new PigServer("mapreduce"); // Or "local" for local mode
    pigServer.registerQuery("A = LOAD ...");
    (...) // Your statements here.
    pigServer.store("A", "filename");
    On Thu, Aug 5, 2010 at 9:26 AM, Wenhao Xu wrote:
    Hi all,
    I am new to pig. I am wondering is there any recommended way to call Pig
    code from Java?
    Is there any Java interface which can be called directly from Java and
    makes them work smoothly? It seems each keyword (filter, group, cogrape,
    generate) and data types in Pig can have a counterpart in Java by using
    Class, interface and data type. Is these Java interface available to Java
    programmers to use? If not, why not?
    Thanks very much for help!

    regards,
    Wenhao

    --
    ~_~


    --
    Harsh J
    www.harshj.com
  • Harsh J at Aug 5, 2010 at 4:03 am
    Sorry, forgot the API link:
    http://hadoop.apache.org/pig/docs/r0.7.0/api/org/apache/pig/PigServer.html
    On Thu, Aug 5, 2010 at 9:31 AM, Harsh J wrote:
    You need to use the class PigServer.

    PigServer pigServer = new PigServer("mapreduce"); // Or "local" for local mode
    pigServer.registerQuery("A = LOAD ...");
    (...) // Your statements here.
    pigServer.store("A", "filename");
    On Thu, Aug 5, 2010 at 9:26 AM, Wenhao Xu wrote:
    Hi all,
    I am new to pig. I am wondering is there any recommended way to call Pig
    code from Java?
    Is there any Java interface which can be called directly from Java and
    makes them work smoothly? It seems each keyword (filter, group, cogrape,
    generate) and data types in Pig can have a counterpart in Java by using
    Class, interface and data type. Is these Java interface available to Java
    programmers to use? If not, why not?
    Thanks very much for help!

    regards,
    Wenhao

    --
    ~_~


    --
    Harsh J
    www.harshj.com


    --
    Harsh J
    www.harshj.com
  • Wenhao Xu at Aug 5, 2010 at 4:09 am
    Thanks!
    Can PigServer handle concurrent requests? Because the store is a
    synchronous interface, is there any asynchronous one?

    cheers,
    W.
    On Wed, Aug 4, 2010 at 9:01 PM, Harsh J wrote:

    You need to use the class PigServer.

    PigServer pigServer = new PigServer("mapreduce"); // Or "local" for local
    mode
    pigServer.registerQuery("A = LOAD ...");
    (...) // Your statements here.
    pigServer.store("A", "filename");
    On Thu, Aug 5, 2010 at 9:26 AM, Wenhao Xu wrote:
    Hi all,
    I am new to pig. I am wondering is there any recommended way to call Pig
    code from Java?
    Is there any Java interface which can be called directly from Java and
    makes them work smoothly? It seems each keyword (filter, group, cogrape,
    generate) and data types in Pig can have a counterpart in Java by using
    Class, interface and data type. Is these Java interface available to Java
    programmers to use? If not, why not?
    Thanks very much for help!

    regards,
    Wenhao

    --
    ~_~


    --
    Harsh J
    www.harshj.com


    --
    ~_~
  • Wenhao Xu at Aug 5, 2010 at 4:14 am
    btw, I am considering using it to speedup (parallel) online queries over
    large dataset. Is pig suitable for this, or just suitable for offline large
    data analysis? Will it be a better choice than distributed(parallel)
    database in terms of scalability and latency?

    I really like the pig's programming interface. So I want to try to use it
    instead of using parallel database.

    Thanks!

    cheers,
    W.
    On Wed, Aug 4, 2010 at 9:08 PM, Wenhao Xu wrote:

    Thanks!
    Can PigServer handle concurrent requests? Because the store is a
    synchronous interface, is there any asynchronous one?

    cheers,
    W.

    On Wed, Aug 4, 2010 at 9:01 PM, Harsh J wrote:

    You need to use the class PigServer.

    PigServer pigServer = new PigServer("mapreduce"); // Or "local" for local
    mode
    pigServer.registerQuery("A = LOAD ...");
    (...) // Your statements here.
    pigServer.store("A", "filename");
    On Thu, Aug 5, 2010 at 9:26 AM, Wenhao Xu wrote:
    Hi all,
    I am new to pig. I am wondering is there any recommended way to call Pig
    code from Java?
    Is there any Java interface which can be called directly from Java and
    makes them work smoothly? It seems each keyword (filter, group, cogrape,
    generate) and data types in Pig can have a counterpart in Java by using
    Class, interface and data type. Is these Java interface available to Java
    programmers to use? If not, why not?
    Thanks very much for help!

    regards,
    Wenhao

    --
    ~_~


    --
    Harsh J
    www.harshj.com


    --
    ~_~


    --
    ~_~
  • Dmitriy Lyubimov at Aug 5, 2010 at 4:21 am
    No, pig (or any MR stuff) is not really useful for real time queries. Not
    unless you can wait at least a couple of minutes.

    It would seem you need to look towards HBase, Cassandra and the likes going
    under 'NoSQL' umbrella.


    On Wed, Aug 4, 2010 at 9:13 PM, Wenhao Xu wrote:

    btw, I am considering using it to speedup (parallel) online queries over
    large dataset. Is pig suitable for this, or just suitable for offline large
    data analysis? Will it be a better choice than distributed(parallel)
    database in terms of scalability and latency?

    I really like the pig's programming interface. So I want to try to use it
    instead of using parallel database.

    Thanks!

    cheers,
    W.
    On Wed, Aug 4, 2010 at 9:08 PM, Wenhao Xu wrote:

    Thanks!
    Can PigServer handle concurrent requests? Because the store is a
    synchronous interface, is there any asynchronous one?

    cheers,
    W.

    On Wed, Aug 4, 2010 at 9:01 PM, Harsh J wrote:

    You need to use the class PigServer.

    PigServer pigServer = new PigServer("mapreduce"); // Or "local" for
    local
    mode
    pigServer.registerQuery("A = LOAD ...");
    (...) // Your statements here.
    pigServer.store("A", "filename");
    On Thu, Aug 5, 2010 at 9:26 AM, Wenhao Xu wrote:
    Hi all,
    I am new to pig. I am wondering is there any recommended way to call Pig
    code from Java?
    Is there any Java interface which can be called directly from Java
    and
    makes them work smoothly? It seems each keyword (filter, group,
    cogrape,
    generate) and data types in Pig can have a counterpart in Java by
    using
    Class, interface and data type. Is these Java interface available to Java
    programmers to use? If not, why not?
    Thanks very much for help!

    regards,
    Wenhao

    --
    ~_~


    --
    Harsh J
    www.harshj.com


    --
    ~_~


    --
    ~_~
  • Jeff Zhang at Aug 5, 2010 at 4:26 am
    Currently, PigServer is not thread-safe. You can try patches in
    http://issues.apache.org/jira/browse/PIG-240


    On Thu, Aug 5, 2010 at 12:08 PM, Wenhao Xu wrote:

    Thanks!
    Can PigServer handle concurrent requests? Because the store is a
    synchronous interface, is there any asynchronous one?

    cheers,
    W.
    On Wed, Aug 4, 2010 at 9:01 PM, Harsh J wrote:

    You need to use the class PigServer.

    PigServer pigServer = new PigServer("mapreduce"); // Or "local" for local
    mode
    pigServer.registerQuery("A = LOAD ...");
    (...) // Your statements here.
    pigServer.store("A", "filename");
    On Thu, Aug 5, 2010 at 9:26 AM, Wenhao Xu wrote:
    Hi all,
    I am new to pig. I am wondering is there any recommended way to call Pig
    code from Java?
    Is there any Java interface which can be called directly from Java
    and
    makes them work smoothly? It seems each keyword (filter, group,
    cogrape,
    generate) and data types in Pig can have a counterpart in Java by using
    Class, interface and data type. Is these Java interface available to
    Java
    programmers to use? If not, why not?
    Thanks very much for help!

    regards,
    Wenhao

    --
    ~_~


    --
    Harsh J
    www.harshj.com


    --
    ~_~


    --
    Best Regards

    Jeff Zhang
  • Dmitriy Lyubimov at Aug 5, 2010 at 4:19 am
    In my personal and very not so long lived opinion PigServer is not very
    useful as it doesn't run directly pig scripts.

    I actually integrated Grunt setup in a spring bean and been able to run pig
    scripts that way initialized as a resource. it also takes care of project
    classpath on the hadoop side so no registration of any jars is necessary,
    anything in our project classpath (as built by maven) is automatically added
    to the backend classpaths. Also feeding in script parameters thru spring
    injections is also pretty consistent with our spring use and useful.

    This requires some work (couple of days) to dig up various grant parameters
    and PigContext parameters but i think it pays off with convenience of using
    regular grunt script and ease of UDF access.

    -Dmitriy



    On Wed, Aug 4, 2010 at 9:01 PM, Harsh J wrote:

    You need to use the class PigServer.

    PigServer pigServer = new PigServer("mapreduce"); // Or "local" for local
    mode
    pigServer.registerQuery("A = LOAD ...");
    (...) // Your statements here.
    pigServer.store("A", "filename");
    On Thu, Aug 5, 2010 at 9:26 AM, Wenhao Xu wrote:
    Hi all,
    I am new to pig. I am wondering is there any recommended way to call Pig
    code from Java?
    Is there any Java interface which can be called directly from Java and
    makes them work smoothly? It seems each keyword (filter, group, cogrape,
    generate) and data types in Pig can have a counterpart in Java by using
    Class, interface and data type. Is these Java interface available to Java
    programmers to use? If not, why not?
    Thanks very much for help!

    regards,
    Wenhao

    --
    ~_~


    --
    Harsh J
    www.harshj.com
  • Gerrit van Vuuren at Aug 5, 2010 at 6:28 am
    Yep I can confirm that if you call it enough times within the same java process you will run out of memory eventually. I've tried this before, monitored this with jconsole and saw the memory gradually increasing over 50 or so iterations, each iteration also created its own set of threads that never died but this might be in the hadoop client itself.
    I even tried using a whole different set of classloaders to try and unload classes after each call but this did not work either


    ----- Original Message -----
    From: Vincent Barat <vbarat@ubikod.com>
    To: pig-user@hadoop.apache.org <pig-user@hadoop.apache.org>; pig-user@hadoop.apache.org <pig-user@hadoop.apache.org>
    Sent: Thu Aug 05 07:08:06 2010
    Subject: Re: Call Pig from Java

    No. PigServer is not reentrant at this time, afaik, and even if you create several pigserver objects you will run into trouble, as there is a small set of global data shared between them. It may work for a time, but it will fail at a point. The only way is to create different processes to handle your requests.

    Wenhao Xu <xuwenhao2008@gmail.com> a écrit :
    Thanks!
    Can PigServer handle concurrent requests? Because the store is a
    synchronous interface, is there any asynchronous one?

    cheers,
    W.
    On Wed, Aug 4, 2010 at 9:01 PM, Harsh J wrote:

    You need to use the class PigServer.

    PigServer pigServer = new PigServer("mapreduce"); // Or "local" for local
    mode
    pigServer.registerQuery("A = LOAD ...");
    (...) // Your statements here.
    pigServer.store("A", "filename");
    On Thu, Aug 5, 2010 at 9:26 AM, Wenhao Xu wrote:
    Hi all,
    I am new to pig. I am wondering is there any recommended way to call Pig
    code from Java?
    Is there any Java interface which can be called directly from Java and
    makes them work smoothly? It seems each keyword (filter, group, cogrape,
    generate) and data types in Pig can have a counterpart in Java by using
    Class, interface and data type. Is these Java interface available to Java
    programmers to use? If not, why not?
    Thanks very much for help!

    regards,
    Wenhao

    --
    ~_~


    --
    Harsh J
    www.harshj.com


    --
    ~_~
  • Dmitriy Lyubimov at Aug 5, 2010 at 5:14 pm
    Gerrit,

    what you are saying is much more serious than just being reentrant. Are you
    saying you have not been able to run same grunt or pig server instance thru
    a bunch of scripts and OOM happened eventually? can you share which version
    of pig that was?

    I so far haven't actually seen that issue but then i run only 4 pig jobs a
    day and the process i ran has been up for perhaps couple of weeks only

    Thanks.
    -Dmitriy
    On Wed, Aug 4, 2010 at 11:27 PM, Gerrit van Vuuren wrote:

    Yep I can confirm that if you call it enough times within the same java
    process you will run out of memory eventually. I've tried this before,
    monitored this with jconsole and saw the memory gradually increasing over 50
    or so iterations, each iteration also created its own set of threads that
    never died but this might be in the hadoop client itself.
    I even tried using a whole different set of classloaders to try and unload
    classes after each call but this did not work either


    ----- Original Message -----
    From: Vincent Barat <vbarat@ubikod.com>
    To: pig-user@hadoop.apache.org <pig-user@hadoop.apache.org>;
    pig-user@hadoop.apache.org <pig-user@hadoop.apache.org>
    Sent: Thu Aug 05 07:08:06 2010
    Subject: Re: Call Pig from Java

    No. PigServer is not reentrant at this time, afaik, and even if you create
    several pigserver objects you will run into trouble, as there is a small set
    of global data shared between them. It may work for a time, but it will fail
    at a point. The only way is to create different processes to handle your
    requests.

    Wenhao Xu <xuwenhao2008@gmail.com> a écrit :
    Thanks!
    Can PigServer handle concurrent requests? Because the store is a
    synchronous interface, is there any asynchronous one?

    cheers,
    W.
    On Wed, Aug 4, 2010 at 9:01 PM, Harsh J wrote:

    You need to use the class PigServer.

    PigServer pigServer = new PigServer("mapreduce"); // Or "local" for
    local
    mode
    pigServer.registerQuery("A = LOAD ...");
    (...) // Your statements here.
    pigServer.store("A", "filename");
    On Thu, Aug 5, 2010 at 9:26 AM, Wenhao Xu wrote:
    Hi all,
    I am new to pig. I am wondering is there any recommended way to call Pig
    code from Java?
    Is there any Java interface which can be called directly from Java
    and
    makes them work smoothly? It seems each keyword (filter, group,
    cogrape,
    generate) and data types in Pig can have a counterpart in Java by
    using
    Class, interface and data type. Is these Java interface available to
    Java
    programmers to use? If not, why not?
    Thanks very much for help!

    regards,
    Wenhao

    --
    ~_~


    --
    Harsh J
    www.harshj.com


    --
    ~_~

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedAug 5, '10 at 3:57a
activeAug 5, '10 at 5:14p
posts10
users5
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase