FAQ
Hello Hadoop Users,

Me and another friend of mine are looking out for some of the project ideas
based on hadoop

as a part of our curriculum .


Can you give us some pointers please


Thanks in advance !

Regards,
~Sid~

Search Discussions

  • Sudha sadhasivam at Oct 14, 2009 at 11:21 am
    Some of the projects include:
    1) Categorise URLS based on domains
    2) Content based searching
    3) P2P information retrieval
    4) Performance enhancements in map-reduce.
    5) Sort and shuffle optimisations in MR framework.
    6) Enhancements of scheduling strategies in hadoop
    7) Document classification
    8) Document Ranking

    Infact all batch applications that can be parallelised are suitable for hadoop.
    G Sudha Sadasivam



    --- On Wed, 10/14/09, Siddu wrote:


    From: Siddu <siddu.sjce@gmail.com>
    Subject: Project ideas !
    To: common-user@hadoop.apache.org
    Cc: core-user@hadoop.apache.org
    Date: Wednesday, October 14, 2009, 3:38 PM


    Hello Hadoop Users,

    Me and another friend of mine are looking out for some of the project ideas
    based on hadoop

    as a part of our  curriculum .


    Can you give us some pointers please


    Thanks in advance !

    Regards,
    ~Sid~
  • Tim robertson at Oct 14, 2009 at 11:35 am
    I am interested to see more spatial processing carried out on hadoop.
    I have done basic spatial joins intersecting 100s millions of points
    with 100s thousands of polygons but this is all. It's something I'd
    like to spend time researching, but don't have that time... could be a
    nice piece of research since everybody loves maps.

    Cheers,
    Tim






    On Wed, Oct 14, 2009 at 1:20 PM, sudha sadhasivam
    wrote:
    Some of the projects include:
    1) Categorise URLS based on domains
    2) Content based searching
    3) P2P information retrieval
    4) Performance enhancements in map-reduce.
    5) Sort and shuffle optimisations in MR framework.
    6) Enhancements of scheduling strategies in hadoop
    7) Document classification
    8) Document Ranking

    Infact all batch applications that can be parallelised are suitable for hadoop.
    G Sudha Sadasivam



    --- On Wed, 10/14/09, Siddu wrote:


    From: Siddu <siddu.sjce@gmail.com>
    Subject: Project ideas !
    To: common-user@hadoop.apache.org
    Cc: core-user@hadoop.apache.org
    Date: Wednesday, October 14, 2009, 3:38 PM


    Hello Hadoop Users,

    Me and another friend of mine are looking out for some of the project ideas
    based on hadoop

    as a part of our  curriculum .


    Can you give us some pointers please


    Thanks in advance !

    Regards,
    ~Sid~


  • Siddu at Oct 17, 2009 at 9:59 am

    On Wed, Oct 14, 2009 at 5:05 PM, tim robertson wrote:

    I am interested to see more spatial processing carried out on hadoop.
    I have done basic spatial joins intersecting 100s millions of points
    with 100s thousands of polygons but this is all. It's something I'd
    like to spend time researching, but don't have that time... could be a
    nice piece of research since everybody loves maps.

    yes tim that sounds interesting ... do u ave any link of urs detailing the
    work ?
    would be happy to go through it !

    Cheers,
    Tim






    On Wed, Oct 14, 2009 at 1:20 PM, sudha sadhasivam
    wrote:
    Some of the projects include:
    1) Categorise URLS based on domains
    2) Content based searching
    3) P2P information retrieval
    4) Performance enhancements in map-reduce.
    5) Sort and shuffle optimisations in MR framework.
    6) Enhancements of scheduling strategies in hadoop
    7) Document classification
    8) Document Ranking

    Infact all batch applications that can be parallelised are suitable for hadoop.
    G Sudha Sadasivam



    --- On Wed, 10/14/09, Siddu wrote:


    From: Siddu <siddu.sjce@gmail.com>
    Subject: Project ideas !
    To: common-user@hadoop.apache.org
    Cc: core-user@hadoop.apache.org
    Date: Wednesday, October 14, 2009, 3:38 PM


    Hello Hadoop Users,

    Me and another friend of mine are looking out for some of the project ideas
    based on hadoop

    as a part of our curriculum .


    Can you give us some pointers please


    Thanks in advance !

    Regards,
    ~Sid~




    --
    Regards,
    ~Sid~
    I have never met a man so ignorant that i couldn't learn something from him
  • Sudha sadhasivam at Oct 17, 2009 at 3:25 pm
    Has any publications been done in this area?
    G Sudha Sadasivam

    --- On Sat, 10/17/09, Siddu wrote:


    From: Siddu <siddu.sjce@gmail.com>
    Subject: Re: Project ideas !
    To: common-user@hadoop.apache.org
    Date: Saturday, October 17, 2009, 3:29 PM

    On Wed, Oct 14, 2009 at 5:05 PM, tim robertson wrote:

    I am interested to see more spatial processing carried out on hadoop.
    I have done basic spatial joins intersecting 100s millions of points
    with 100s thousands of polygons but this is all.  It's something I'd
    like to spend time researching, but don't have that time... could be a
    nice piece of research since everybody loves maps.

    yes tim that sounds interesting ... do u ave any link of urs detailing the
    work ?
    would be happy to go through it !

    Cheers,
    Tim






    On Wed, Oct 14, 2009 at 1:20 PM, sudha sadhasivam
    wrote:
    Some of the projects include:
    1) Categorise URLS based on domains
    2) Content based searching
    3) P2P information retrieval
    4) Performance enhancements in map-reduce.
    5) Sort and shuffle optimisations in MR framework.
    6) Enhancements of scheduling strategies in hadoop
    7) Document classification
    8) Document Ranking

    Infact all batch applications that can be parallelised are suitable for hadoop.
    G Sudha Sadasivam



    --- On Wed, 10/14/09, Siddu wrote:


    From: Siddu <siddu.sjce@gmail.com>
    Subject: Project ideas !
    To: common-user@hadoop.apache.org
    Cc: core-user@hadoop.apache.org
    Date: Wednesday, October 14, 2009, 3:38 PM


    Hello Hadoop Users,

    Me and another friend of mine are looking out for some of the project ideas
    based on hadoop

    as a part of our  curriculum .


    Can you give us some pointers please


    Thanks in advance !

    Regards,
    ~Sid~




    --
    Regards,
    ~Sid~
    I have never met a man so ignorant that i couldn't learn something from him
  • Tim robertson at Oct 18, 2009 at 7:00 pm

    Has any publications been done in this area? (The spatial processing on hadoop)
    G Sudha Sadasivam
    I saw Chris who does Cascading tweet about someone building RTree
    indexes recently in hadoop and wanted to follow up with him about who
    was doing that.

    For my part I very hastily wrote a blog
    (http://biodivertido.blogspot.com/2008/11/reproducing-spatial-joins-using-hadoop.html)
    but really I did not get far. I could imagine running some initial MR
    job on each side of a huge join before actually doing the join, to
    determine the best join order, and potentially a (spatial)
    partitioning strategy (e.g. build some RTrees, or perhaps subset the
    data for different areas of the world and do the join in multiple
    jobs). Then using the output of this analysis stage to actually run
    the process / implement the join. There are some nice species
    observation and specimen data (100s millions of point based data) that
    we are often looking to join with polygon datasets (e.g. protected
    areas of the world etc) - if you wanted a real world dataset and
    fancied doing something worthwhile (helping protect and understand
    species) it could be arranged.

    Cheers,
    Tim




    --- On Sat, 10/17/09, Siddu wrote:


    From: Siddu <siddu.sjce@gmail.com>
    Subject: Re: Project ideas !
    To: common-user@hadoop.apache.org
    Date: Saturday, October 17, 2009, 3:29 PM

    On Wed, Oct 14, 2009 at 5:05 PM, tim robertson wrote:

    I am interested to see more spatial processing carried out on hadoop.
    I have done basic spatial joins intersecting 100s millions of points
    with 100s thousands of polygons but this is all.  It's something I'd
    like to spend time researching, but don't have that time... could be a
    nice piece of research since everybody loves maps.

    yes tim that sounds interesting ... do u ave any link of urs detailing the
    work ?
    would be happy to go through it !

    Cheers,
    Tim






    On Wed, Oct 14, 2009 at 1:20 PM, sudha sadhasivam
    wrote:
    Some of the projects include:
    1) Categorise URLS based on domains
    2) Content based searching
    3) P2P information retrieval
    4) Performance enhancements in map-reduce.
    5) Sort and shuffle optimisations in MR framework.
    6) Enhancements of scheduling strategies in hadoop
    7) Document classification
    8) Document Ranking

    Infact all batch applications that can be parallelised are suitable for hadoop.
    G Sudha Sadasivam



    --- On Wed, 10/14/09, Siddu wrote:


    From: Siddu <siddu.sjce@gmail.com>
    Subject: Project ideas !
    To: common-user@hadoop.apache.org
    Cc: core-user@hadoop.apache.org
    Date: Wednesday, October 14, 2009, 3:38 PM


    Hello Hadoop Users,

    Me and another friend of mine are looking out for some of the project ideas
    based on hadoop

    as a part of our  curriculum .


    Can you give us some pointers please


    Thanks in advance !

    Regards,
    ~Sid~




    --
    Regards,
    ~Sid~
    I have never met a man so ignorant that i couldn't learn something from him


  • Patterson, Josh at Oct 14, 2009 at 3:19 pm
    Siddu,
    If this is for an undergraduate class, I would suggest something that
    allows you to get some work in with basic data structures such as
    building an inverted index over a few million documents (maybe Wikipedia
    pages?). You will also need to get a general feel for Hadoop.

    The University of Washington has some really nice project ideas for
    their distributed systems class:

    http://www.cs.washington.edu/education/courses/cse490h/09wi/projects/490
    H.project.ideas.pdf

    If you wanted to tackle something a little more advanced, then you could
    take a look at Pete Skomoroch's article on finding trends with Hadoop
    and Hive:

    http://www.cloudera.com/blog/2009/07/31/tracking-trends-with-hadoop-and-
    hive-on-ec2/

    http://www.cloudera.com/blog/2009/09/28/grouping-related-trends-with-had
    oop-and-hive/

    Things to keep in mind:

    1.) Hadoop wont be as simple as writing a single Java app
    2.) There will be some overhead involved in re-writing algorithms in Map
    Reduce
    3.) There will also be some overhead involved in setup and maintenance
    of the Hadoop Cluster

    Take these three things into account when planning how to manage your
    time for the project during the semester, semesters can seem a lot
    shorter when you spend too much time on things not related to just
    implementing and testing your algorithm.

    Good luck!

    Josh Patterson
    TVA



    -----Original Message-----
    From: Siddu
    Sent: Wednesday, October 14, 2009 6:09 AM
    To: common-user@hadoop.apache.org
    Cc: core-user@hadoop.apache.org
    Subject: Project ideas !

    Hello Hadoop Users,

    Me and another friend of mine are looking out for some of the project
    ideas
    based on hadoop

    as a part of our curriculum .


    Can you give us some pointers please


    Thanks in advance !

    Regards,
    ~Sid~
  • Amund Tveit at Oct 17, 2009 at 9:13 pm
    2009/10/14 Patterson, Josh <jpatterson0@tva.gov>
    Siddu,
    If this is for an undergraduate class, I would suggest something that
    allows you to get some work in with basic data structures such as
    building an inverted index over a few million documents (maybe Wikipedia
    pages?). You will also need to get a general feel for Hadoop.

    The University of Washington has some really nice project ideas for
    their distributed systems class:

    http://www.cs.washington.edu/education/courses/cse490h/09wi/projects/490
    H.project.ideas.pdf

    If you wanted to tackle something a little more advanced, then you could
    take a look at Pete Skomoroch's article on finding trends with Hadoop
    and Hive:
    Related: CUSUM charts are used for interpretation of (noisy) time series
    (e.g. collected from sensor measurements), perhaps efficiently create those
    with hadoop?
    http://www.variation.com/cpa/help/hs108.htm
    http://www.itl.nist.gov/div898/handbook/pmc/section3/pmc323.htm
    note: R, Pig, Hive could be of relevance(/overlap) for this.

    Amund
    http://atbrox.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 14, '09 at 10:09a
activeOct 18, '09 at 7:00p
posts8
users5
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase