|| at May 13, 2011 at 5:49 pm
I'm not sure if Pig can do this. It's designed to follow the
MapReduce/Hadoop paradigm which typically involves data on disk ->
MapReduce Jobs -> data on disk.
You could try to create a custom InputSplit/RecordReader to read from
a program's standard output or something but this is kind of hacky.
There are RecordReaders which read from SQL databases. There's also
something like this:http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/streaming/StreamBaseRecordReader.html
Which can be used with Hadoop streaming.
But this is all somewhat intensive and would require a bit of work (if
it's even possible) - I don't think Pig has direct support yet for the
kind of interface you're looking for.
That being said, I'm somewhat new to Pig/Hadoop so if there's anyone
else who can chime in with comments or agreements/disagreements, I'd
On Fri, May 13, 2011 at 1:32 PM, Jianting Cao wrote:
Thank you Mark. Sorry that I'm not clear enough. What I want is this, there
are some program running and generating a lot of data, instead of putting
these data to a relational database, I want to directly output them to Pig
and do some analysis along the way or afterwards. So I'm asking if there is
a JDBC-like interface with which I could load these newly generated data
into Pig and do analytic. all of this is happening within a Java process.
On Fri, May 13, 2011 at 10:14 AM, Mark Laczin wrote:
Technically speaking, yes you could store data in memory and keep it
there, then have your program present some interface to store data
(shared memory or reading from the stdin or something) but I'm not
sure why you'd want to do this.
Maybe I'm misunderstanding your question, but it sounds like you want
to run using a filesystem that's in memory as opposed to on disk.
On Fri, May 13, 2011 at 1:08 PM, Jianting Cao <email@example.com>
Is there only one way to load data into pig, i.e. using load command to load
data from files? Can I load data from memory, for example in embedded code
create a table and store data into it?