Grokbase Groups Pig user April 2013
FAQ
I'm somewhat familiar with WTF code (my day job is managing the analytics
infrastructure team at Twitter). WTF is implemented using Pig 0.11 (in fact
some of the Pig 11 features/improvements are directly due to this
project...), and mostly has to do with clever algorithms implemented in Pig
(an earlier version of WTF loaded the graph into main memory on large-mem
machines -- that system is open sourced, too, under
github.com/twitter/cassovary). Are you proposing to create an open-source
implementation of those algorithms? Do you suggest they should be Pig
scripts added to the Pig project, or do you want to create some new
operators? I'm not totally sure where you are going here.

GSoC proposals for Pig are usually made by students who want to work on
issues labeled as GSoC candidates on the apache jira. The students spend
some time to understand the problem stated in the jira, familiarize
themselves with the existing codebase, and put a basic technical
implementation plan and schedule into their proposal. Since in this case
you are proposing something we haven't scoped or defined well for
ourselves, we need you to be very clear and specific about what you are
trying to do, and how you plan to go about it. I think that Graph
processing in Pig (or other Hadoop-based systems) is a really interesting
topic and there is a lot of work to be done, but we really need you to be
far more detailed to be able to give you good guidance with regards to GSoC.

Best,
Dmitriy

On Sat, Mar 30, 2013 at 10:12 AM, burakkk wrote:

Sure. We can implement a graph model using "WTF: The Who to Follow Service
at Twitter article we can" article.This article's said that in this way
graph can be stored one machine's memory so that every node will read from
HDFS and cache the graph to the memory. Every node is responsible from its
bucket edge to process. I mean it can be splitted. Every node can be
processed its bucket using random walk algorithm for instance. Finally it
can be reduced to get to the final results. I hope it's clear :)

Thanks
Best Regards...

On Fri, Mar 29, 2013 at 6:10 PM, Dmitriy Ryaboy wrote:

Hi Burakk,
The general idea of making graph processing easier is a good one. I'm not
sure what exactly you are proposing to do, though. Could you be more
detailed about what you are thinking?

On Thu, Mar 28, 2013 at 1:28 PM, burakkk wrote:

Hi,
I might be a little bit late. I come up with a new idea for the last
minute. Currently I'm working on social graph processing. I think we
can
implement a solution for pig. With this idea I'm thinking to apply the
GSOC 2013 so that I can do some tasks about it. Is there any mentor to
do
it with me? Is there any suggestion? :)

Details:
Of course I can improve some join operations. I'm not sure is there any
implementation about fuzzy joins for instance. These are the papers
that
I
found

Fuzzy Joins Using MapReduce
http://ilpubs.stanford.edu:8090/1006/

Dimension independent similarity computation
http://arxiv.org/abs/1206.2082

MapReduce is Good Enough? If All You Have is a Hammer, Throw Away
Everything That’s Not a Nail!
http://arxiv.org/pdf/1209.2191.pdf

Large Graph Processing in the Cloud
http://www.ntu.edu.sg/home/bshe/sigmod10_demo.pdf

..etc

Thanks
Best regards..


--

*BURAK ISIKLI** *| *http://burakisikli.wordpress.com*
*
*


--

*BURAK ISIKLI** *| *http://burakisikli.wordpress.com*
*
*

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 8 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedApr 1, '13 at 4:20p
activeApr 9, '13 at 7:11a
posts8
users5
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase