Grokbase Groups Pig user May 2011
FAQ
Hi,



Is there only one way to load data into pig, i.e. using load command to load
data from files? Can I load data from memory, for example in embedded code
create a table and store data into it?



Thanks,

Jianting Cao

Search Discussions

  • Mark Laczin at May 13, 2011 at 5:15 pm
    Technically speaking, yes you could store data in memory and keep it
    there, then have your program present some interface to store data
    (shared memory or reading from the stdin or something) but I'm not
    sure why you'd want to do this.

    Maybe I'm misunderstanding your question, but it sounds like you want
    to run using a filesystem that's in memory as opposed to on disk.

    -Mark
    On Fri, May 13, 2011 at 1:08 PM, Jianting Cao wrote:
    Hi,



    Is there only one way to load data into pig, i.e. using load command to load
    data from files? Can I load data from memory, for example in embedded code
    create a table and store data into it?



    Thanks,

    Jianting Cao
  • Jianting Cao at May 13, 2011 at 5:32 pm
    Thank you Mark. Sorry that I'm not clear enough. What I want is this, there
    are some program running and generating a lot of data, instead of putting
    these data to a relational database, I want to directly output them to Pig
    and do some analysis along the way or afterwards. So I'm asking if there is
    a JDBC-like interface with which I could load these newly generated data
    into Pig and do analytic. all of this is happening within a Java process.

    Jianting
    On Fri, May 13, 2011 at 10:14 AM, Mark Laczin wrote:

    Technically speaking, yes you could store data in memory and keep it
    there, then have your program present some interface to store data
    (shared memory or reading from the stdin or something) but I'm not
    sure why you'd want to do this.

    Maybe I'm misunderstanding your question, but it sounds like you want
    to run using a filesystem that's in memory as opposed to on disk.

    -Mark
    On Fri, May 13, 2011 at 1:08 PM, Jianting Cao wrote:
    Hi,



    Is there only one way to load data into pig, i.e. using load command to load
    data from files? Can I load data from memory, for example in embedded code
    create a table and store data into it?



    Thanks,

    Jianting Cao
  • Mark Laczin at May 13, 2011 at 5:49 pm
    I'm not sure if Pig can do this. It's designed to follow the
    MapReduce/Hadoop paradigm which typically involves data on disk ->
    MapReduce Jobs -> data on disk.

    You could try to create a custom InputSplit/RecordReader to read from
    a program's standard output or something but this is kind of hacky.
    There are RecordReaders which read from SQL databases. There's also
    something like this:
    http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/streaming/StreamBaseRecordReader.html
    Which can be used with Hadoop streaming.

    But this is all somewhat intensive and would require a bit of work (if
    it's even possible) - I don't think Pig has direct support yet for the
    kind of interface you're looking for.

    That being said, I'm somewhat new to Pig/Hadoop so if there's anyone
    else who can chime in with comments or agreements/disagreements, I'd
    appreciate it.

    On Fri, May 13, 2011 at 1:32 PM, Jianting Cao wrote:
    Thank you Mark. Sorry that I'm not clear enough. What I want is this, there
    are some program running and generating a lot of data, instead of putting
    these data to a relational database, I want to directly output them to Pig
    and do some analysis along the way or afterwards. So I'm asking if there is
    a JDBC-like interface with which I could load these newly generated data
    into Pig and do analytic. all of this is happening within a Java process.

    Jianting
    On Fri, May 13, 2011 at 10:14 AM, Mark Laczin wrote:

    Technically speaking, yes you could store data in memory and keep it
    there, then have your program present some interface to store data
    (shared memory or reading from the stdin or something) but I'm not
    sure why you'd want to do this.

    Maybe I'm misunderstanding your question, but it sounds like you want
    to run using a filesystem that's in memory as opposed to on disk.

    -Mark

    On Fri, May 13, 2011 at 1:08 PM, Jianting Cao <beyondjustin@gmail.com>
    wrote:
    Hi,



    Is there only one way to load data into pig, i.e. using load command to load
    data from files? Can I load data from memory, for example in embedded code
    create a table and store data into it?



    Thanks,

    Jianting Cao

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedMay 13, '11 at 5:09p
activeMay 13, '11 at 5:49p
posts4
users2
websitepig.apache.org

2 users in discussion

Mark Laczin: 2 posts Jianting Cao: 2 posts

People

Translate

site design / logo © 2021 Grokbase