I think this is an interesting project but is not core to "Pig" itself --
it may be more interesting / viable as a standalone project on github that
uses Pig to implement graph algorithms.
At this point in its development, I feel that Pig needs to concentrate on
doing the things it already does, and do them better (operator efficiency,
storage efficiency, better MR plan generation, etc) rather than expand to
specific verticals; we should allow our users to create their own solution
suites that use Pig for specific purposes. A successful example of such a
standalone project is PacketPig (https://github.com/packetloop/packetpig) ,
a PCAP network capture analysis tool.
D
On Tue, Apr 2, 2013 at 9:48 AM, burakkk wrote:
I know that but giraph tries to use bsp. What I'm saying is nothing shared
model except reducers. Besides I don't want to divide iteration. One phase
is still responsible for whole iteration. Every different origin vertex
will be processed in parallel.
Thanks
Best regards...
On Tue, Apr 2, 2013 at 7:20 PM, Gianmarco De Francisci Morales <
gdfm@gdfm.me
or
use
improve
operator.
haven't
model
or
analytics
(in
in
large-mem
open-source
Pig
on
spend
case
are
interesting
to
to
wrote:
Follow
this
from
be
Finally
dvryaboy@gmail.com
one.
burak.isikli@gmail.com>
the
--
*BURAK ISIKLI** *| *http://burakisikli.wordpress.com*
*
*
--
*BURAK ISIKLI** *| *http://burakisikli.wordpress.com*
*
*
--
*BURAK ISIKLI** *| *http://burakisikli.wordpress.com*
*
*
--
*BURAK ISIKLI** *| *http://burakisikli.wordpress.com*
*
*
I know that but giraph tries to use bsp. What I'm saying is nothing shared
model except reducers. Besides I don't want to divide iteration. One phase
is still responsible for whole iteration. Every different origin vertex
will be processed in parallel.
Thanks
Best regards...
On Tue, Apr 2, 2013 at 7:20 PM, Gianmarco De Francisci Morales <
gdfm@gdfm.me
wrote:
FYI, Giraph has a Random Walk implementation.
Pig does not support iteration natively, so any iterative algorithm is not
a very good fit for it. Just my 2c.
Cheers,
--
Gianmarco
way IFYI, Giraph has a Random Walk implementation.
Pig does not support iteration natively, so any iterative algorithm is not
a very good fit for it. Just my 2c.
Cheers,
--
Gianmarco
On Tue, Apr 2, 2013 at 10:04 AM, burakkk wrote:
So what do you suggest? Is it clear?
So what do you suggest? Is it clear?
On Mon, Apr 1, 2013 at 9:35 PM, burakkk wrote:
I'm using only WTF graph representation to fit the memory. By the
I'm using only WTF graph representation to fit the memory. By the
haven't seen any explanation from the pig 0.11 release page about WTF
graph models.
I don't wanna use Cassovary. I believe it can be done with pig. I
implement a graph representation using WTF paper to pig and then I'll
I don't wanna use Cassovary. I believe it can be done with pig. I
implement a graph representation using WTF paper to pig and then I'll
it to implement random walk algorithm. To do that maybe I need to
some features such as joins(fuzzy join) etc or implement a new
I
That'scan implement it using either existing operators or new operators.
up to us and it doesn't really matter. If there is already a
implementation
to random walker algorithm, please feel free to tell. Because I
implementation
to random walker algorithm, please feel free to tell. Because I
found it.
Are you proposing to create an open-source implementation of those
algorithms?
Yes, I'm proposing to implement a random walk algorithm, new data
Are you proposing to create an open-source implementation of those
algorithms?
Yes, I'm proposing to implement a random walk algorithm, new data
which is representing graph. After that, people can use it coding the pig.
Do you suggest they should be Pig scripts added to the Pig project,
Do you suggest they should be Pig scripts added to the Pig project,
do
asyou want to create some new operators?
Maybe, it can be UDF or new operator.
I made a quick example. It may not be completely accurate, I've just tried
to explain it.
Think about you have a graph file just like that
user_id follower
1 2
1 3
1 10
2 3
3 4
3 5
...
Vertex List is an array including sorted vertex ids
node List is a matrix including vertex id and its starting position
graph = load 'graph' using PigStorage() (vertex:int, follower:int) -
--load the graph file
vertex = COGROUP graph BY (vertex);
list = FOREACH vertex GENERATE org.apache.pig.generateVertex(vertex)
Maybe, it can be UDF or new operator.
I made a quick example. It may not be completely accurate, I've just tried
to explain it.
Think about you have a graph file just like that
user_id follower
1 2
1 3
1 10
2 3
3 4
3 5
...
Vertex List is an array including sorted vertex ids
node List is a matrix including vertex id and its starting position
graph = load 'graph' using PigStorage() (vertex:int, follower:int) -
--load the graph file
vertex = COGROUP graph BY (vertex);
list = FOREACH vertex GENERATE org.apache.pig.generateVertex(vertex)
vertexList; --load the whole vertexes from HDFS into the memory
list = FOREACH graph GENERATE org.apache.pig.generateNode(list) as
nodeList; --load the whole vertexes from HDFS into the memory
randomWalk = FOREACH vertex GENERATE
flatten(org.apache.pig.RandomWalk(list, endVertex)) as score; --
generate a
score using the node list you can traverse the graph to the your finishing
position
store...
Thanks
Best Regards...
On Mon, Apr 1, 2013 at 7:20 PM, Dmitriy Ryaboy <dvryaboy@gmail.com>
wrote:list = FOREACH graph GENERATE org.apache.pig.generateNode(list) as
nodeList; --load the whole vertexes from HDFS into the memory
randomWalk = FOREACH vertex GENERATE
flatten(org.apache.pig.RandomWalk(list, endVertex)) as score; --
generate a
score using the node list you can traverse the graph to the your finishing
position
store...
Thanks
Best Regards...
On Mon, Apr 1, 2013 at 7:20 PM, Dmitriy Ryaboy <dvryaboy@gmail.com>
I'm somewhat familiar with WTF code (my day job is managing the
infrastructure team at Twitter). WTF is implemented using Pig 0.11
fact
some of the Pig 11 features/improvements are directly due to this
project...), and mostly has to do with clever algorithms implemented
some of the Pig 11 features/improvements are directly due to this
project...), and mostly has to do with clever algorithms implemented
Pig
(an earlier version of WTF loaded the graph into main memory on
(an earlier version of WTF loaded the graph into main memory on
machines -- that system is open sourced, too, under
github.com/twitter/cassovary). Are you proposing to create an
github.com/twitter/cassovary). Are you proposing to create an
implementation of those algorithms? Do you suggest they should be
scripts added to the Pig project, or do you want to create some new
operators? I'm not totally sure where you are going here.
GSoC proposals for Pig are usually made by students who want to work
operators? I'm not totally sure where you are going here.
GSoC proposals for Pig are usually made by students who want to work
issues labeled as GSoC candidates on the apache jira. The students
some time to understand the problem stated in the jira, familiarize
themselves with the existing codebase, and put a basic technical
implementation plan and schedule into their proposal. Since in this
themselves with the existing codebase, and put a basic technical
implementation plan and schedule into their proposal. Since in this
you are proposing something we haven't scoped or defined well for
ourselves, we need you to be very clear and specific about what you
ourselves, we need you to be very clear and specific about what you
trying to do, and how you plan to go about it. I think that Graph
processing in Pig (or other Hadoop-based systems) is a really
processing in Pig (or other Hadoop-based systems) is a really
topic and there is a lot of work to be done, but we really need you
be
far more detailed to be able to give you good guidance with regards
GSoC.
Best,
Dmitriy
On Sat, Mar 30, 2013 at 10:12 AM, burakkk <burak.isikli@gmail.com>
Best,
Dmitriy
On Sat, Mar 30, 2013 at 10:12 AM, burakkk <burak.isikli@gmail.com>
Sure. We can implement a graph model using "WTF: The Who to
Service
at Twitter article we can" article.This article's said that in
way
readgraph can be stored one machine's memory so that every node will
from
HDFS and cache the graph to the memory. Every node is responsible
its
bucket edge to process. I mean it can be splitted. Every node can
processed its bucket using random walk algorithm for instance.
it
can be reduced to get to the final results. I hope it's clear :)
Thanks
Best Regards...
On Fri, Mar 29, 2013 at 6:10 PM, Dmitriy Ryaboy <
Thanks
Best Regards...
On Fri, Mar 29, 2013 at 6:10 PM, Dmitriy Ryaboy <
wrote:
Hi Burakk,
The general idea of making graph processing easier is a good
The general idea of making graph processing easier is a good
I'm
morenot
sure what exactly you are proposing to do, though. Could you be
detailed about what you are thinking?
On Thu, Mar 28, 2013 at 1:28 PM, burakkk <
On Thu, Mar 28, 2013 at 1:28 PM, burakkk <
wrote:
Hi,
I might be a little bit late. I come up with a new idea for
I might be a little bit late. I come up with a new idea for
last
thinkminute. Currently I'm working on social graph processing. I
we
apply
mentor
there
papers
Awaycan
implement a solution for pig. With this idea I'm thinking to
the
GSOC 2013 so that I can do some tasks about it. Is there any
to
do
it with me? Is there any suggestion? :)
Details:
Of course I can improve some join operations. I'm not sure is
Details:
Of course I can improve some join operations. I'm not sure is
any
implementation about fuzzy joins for instance. These are the
that
I
found
Fuzzy Joins Using MapReduce
http://ilpubs.stanford.edu:8090/1006/
Dimension independent similarity computation
http://arxiv.org/abs/1206.2082
MapReduce is Good Enough? If All You Have is a Hammer, Throw
Fuzzy Joins Using MapReduce
http://ilpubs.stanford.edu:8090/1006/
Dimension independent similarity computation
http://arxiv.org/abs/1206.2082
MapReduce is Good Enough? If All You Have is a Hammer, Throw
Everything That’s Not a Nail!
http://arxiv.org/pdf/1209.2191.pdf
Large Graph Processing in the Cloud
http://www.ntu.edu.sg/home/bshe/sigmod10_demo.pdf
..etc
Thanks
Best regards..
--
*BURAK ISIKLI** *| *http://burakisikli.wordpress.com*
*
*
http://arxiv.org/pdf/1209.2191.pdf
Large Graph Processing in the Cloud
http://www.ntu.edu.sg/home/bshe/sigmod10_demo.pdf
..etc
Thanks
Best regards..
--
*BURAK ISIKLI** *| *http://burakisikli.wordpress.com*
*
*
--
*BURAK ISIKLI** *| *http://burakisikli.wordpress.com*
*
*
--
*BURAK ISIKLI** *| *http://burakisikli.wordpress.com*
*
*
--
*BURAK ISIKLI** *| *http://burakisikli.wordpress.com*
*
*
--
*BURAK ISIKLI** *| *http://burakisikli.wordpress.com*
*
*