FAQ
Good day everyone!



First, I want to congratulate the group for this wonderful project. It did
open up new ideas and solutions in computing and technology-wise. I'm
excited to learn more about it and discover possibilities using Hadoop and
its components.



Well I just want to ask this with regards to my study. Currently I'm
studying my PhD course in Bioinformatics, and my question is that can you
give me a (rough) idea if it's possible to use Hadoop cluster in achieving a
DNA sequence alignment? My basic idea for this goes something like a string
search out of a huge data files stored in HDFS, and the application uses
MapReduce in searching and computing. As the Hadoop paradigm impies, it
doesn't serve well in interactive applications, and I think this kind of
searching is a "write-once, read-many" application.



I hope you don't mind my question. And it'll be great hearing your comments
or suggestions about this.



Thanks and more power!

Franco

Search Discussions

  • Bibek Paudel at Mar 29, 2011 at 1:10 pm

    On Mon, Mar 28, 2011 at 4:51 AM, Franco Nazareno wrote:
    Good day everyone!



    First, I want to congratulate the group for this wonderful project. It did
    open up new ideas and solutions in computing and technology-wise. I'm
    excited to learn more about it and discover possibilities using Hadoop and
    its components.



    Well I just want to ask this with regards to my study. Currently I'm
    studying my PhD course in Bioinformatics, and my question is that can you
    give me a (rough) idea if it's possible to use Hadoop cluster in achieving a
    DNA sequence alignment? My basic idea for this goes something like a string
    search out of a huge data files stored in HDFS, and the application uses
    MapReduce in searching and computing. As the Hadoop paradigm impies, it
    doesn't serve well in interactive applications, and I think this kind of
    searching is a "write-once, read-many" application.
    Are you looking for something like a "distributed grep?" The hadoop
    package comes with some examples, and 'grep' is one of them.

    Please see: http://wiki.apache.org/hadoop/Grep and
    http://hadoop.apache.org/common/docs/r0.20.2/quickstart.html .

    Let us know if you are looking for something else.

    -b


    I hope you don't mind my question. And it'll be great hearing your comments
    or suggestions about this.



    Thanks and more power!

    Franco
  • Kiss Tibor at Mar 29, 2011 at 1:33 pm
    Hi Franco,

    We are using Hadoop for next-gen sequence alignment.
    Earlier we had a classic programming model solution, but currently we are
    upgrading our software services to M/R modell based on Hadoop.
    We transferred most of our classic algorithms to Hadoop and I can say that
    everything is getting more manageable.

    We are going with Hadoop on the cloud and/or on datacenter. Another
    challenge, especially with cloud, how you are transferring the data, because
    in bioinformatics the amount of data are usually very high.
    Currently i am working on an open-source version of Amazon multipart upload
    which will be available in the next release of
    JClouds<http://code.google.com/p/jclouds/wiki/BlobStore>,
    here are the starting
    ideas<http://www.slideshare.net/jclouds/big-data-in-real-life-a-study-on-s3-multipart-uploads>and
    also a sample
    client app<https://github.com/jclouds/jclouds-examples/tree/master/blobstore-largeblob>
    .
    If you want to follow new results on
    twitter<http://twitter.com/#%21/tiborkisstibor>,
    you are invited. I plan to release a paper with results of the data transfer
    operations based on this open-source approach.

    Also, soon we are releasing the version of our cloud based service stack
    which is fully based on Hadoop.

    Tibor

    On Mon, Mar 28, 2011 at 4:51 AM, Franco Nazareno
    wrote:
    Good day everyone!



    First, I want to congratulate the group for this wonderful project. It did
    open up new ideas and solutions in computing and technology-wise. I'm
    excited to learn more about it and discover possibilities using Hadoop and
    its components.



    Well I just want to ask this with regards to my study. Currently I'm
    studying my PhD course in Bioinformatics, and my question is that can you
    give me a (rough) idea if it's possible to use Hadoop cluster in achieving
    a
    DNA sequence alignment? My basic idea for this goes something like a string
    search out of a huge data files stored in HDFS, and the application uses
    MapReduce in searching and computing. As the Hadoop paradigm impies, it
    doesn't serve well in interactive applications, and I think this kind of
    searching is a "write-once, read-many" application.



    I hope you don't mind my question. And it'll be great hearing your comments
    or suggestions about this.



    Thanks and more power!

    Franco
  • Luca Pireddu at Mar 29, 2011 at 1:39 pm

    On March 28, 2011 04:51:14 Franco Nazareno wrote:
    Good day everyone!
    And a good day to you Franco!
    First, I want to congratulate the group for this wonderful project. It did
    open up new ideas and solutions in computing and technology-wise. I'm
    excited to learn more about it and discover possibilities using Hadoop and
    its components.


    Well I just want to ask this with regards to my study. Currently I'm
    studying my PhD course in Bioinformatics, and my question is that can you
    give me a (rough) idea if it's possible to use Hadoop cluster in achieving
    a DNA sequence alignment? My basic idea for this goes something like a
    string search out of a huge data files stored in HDFS, and the application
    uses MapReduce in searching and computing. As the Hadoop paradigm impies,
    it doesn't serve well in interactive applications, and I think this kind
    of searching is a "write-once, read-many" application.



    I hope you don't mind my question. And it'll be great hearing your comments
    or suggestions about this.



    Thanks and more power!

    Franco
    The short answer is yes! At CRS4 we are working on this very problem.

    We have implemented a Hadoop-based workflow to perform short read alignment to
    support DNA sequencing activities in our lab. Its alignment operation is
    based on (and therefore equivalent to) BWA. We have written a paper about it
    which will appear in the coming months, and we are working on an open source
    release, but alas we haven't completed that task yet.

    We have also implemented a Hadoop-based distributed blast alignment program,
    in case you're working with long fragments. It's currently being used by our
    collaborators to align viral DNA segments.


    In either case, if you're interested we can let you have an advance release of
    either program so you can try them out.


    --
    Luca Pireddu
    CRS4 - Distributed Computing Group
    Loc. Pixina Manna Edificio 1
    Pula 09010 (CA), Italy
    Tel: +39 0709250452
  • Evert Lammerts at Mar 31, 2011 at 7:18 am

    The short answer is yes! At CRS4 we are working on this very problem.

    We have implemented a Hadoop-based workflow to perform short read
    alignment to
    support DNA sequencing activities in our lab. Its alignment operation
    is
    based on (and therefore equivalent to) BWA. We have written a paper
    about it
    which will appear in the coming months, and we are working on an open
    source
    release, but alas we haven't completed that task yet.

    We have also implemented a Hadoop-based distributed blast alignment
    program,
    in case you're working with long fragments. It's currently being used
    by our
    collaborators to align viral DNA segments.


    In either case, if you're interested we can let you have an advance
    release of
    either program so you can try them out.
    Hi Luca,

    Could you send me an advanced release of your software? I work for the Dutch national center for scientific computing, and I will give a workshop on Hadoop to BioInformatics on a large BI conference (http://www.nbic.nl/about-nbic/nbic-conferences/nbic-conference-2011/). Lots of people there work with BWA and BLAST type applications (among others in the BBMRI project, which I think CRS4 is involved in as well). So BWA on Hadoop could be a great case study.

    Let me know!
    Cheers,
    Evert

    --
    Luca Pireddu
    CRS4 - Distributed Computing Group
    Loc. Pixina Manna Edificio 1
    Pula 09010 (CA), Italy
    Tel: +39 0709250452
  • Luca Pireddu at Mar 29, 2011 at 1:49 pm

    On March 28, 2011 04:51:14 Franco Nazareno wrote:

    Well I just want to ask this with regards to my study. Currently I'm
    studying my PhD course in Bioinformatics, and my question is that can you
    give me a (rough) idea if it's possible to use Hadoop cluster in achieving
    a DNA sequence alignment? My basic idea for this goes something like a
    string search out of a huge data files stored in HDFS, and the application
    uses MapReduce in searching and computing. As the Hadoop paradigm impies,
    it doesn't serve well in interactive applications, and I think this kind
    of searching is a "write-once, read-many" application.
    I'll add some relevant citations:

    An overview of the Hadoop/MapReduce/HBase framework and its current
    applications in bioinformatics
    http://www.biomedcentral.com/1471-2105/11/S12/S1


    Biodoop: Bioinformatics on Hadoop
    http://www.computer.org/portal/web/csdl/doi/10.1109/ICPPW.2009.37


    CloudBurst: highly sensitive read mapping with MapReduce
    http://bioinformatics.oxfordjournals.org/content/25/11/1363.short


    CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources
    for Bioinformatics Applications
    http://www.computer.org/portal/web/csdl/doi/10.1109/eScience.2008.62


    --
    Luca Pireddu
    CRS4 - Distributed Computing Group
    Loc. Pixina Manna Edificio 1
    Pula 09010 (CA), Italy
    Tel: +39 0709250452
  • Tsz Wo \(Nicholas\), Sze at Mar 29, 2011 at 5:51 pm
    Hi Franco,

    I recall that there are some Hadoop-Blast researches/projects. For examples,
    see


    - http://www.cs.umd.edu/Grad/scholarlypapers/papers/MichaelSchatz.pdf
    - http://salsahpc.indiana.edu/tutorial/hadoopblast.html

    Nicholas



    ________________________________
    From: Franco Nazareno <franco.nazareno@gmail.com>
    To: common-user@hadoop.apache.org
    Sent: Sun, March 27, 2011 7:51:14 PM
    Subject: Hadoop for Bioinformatics

    Good day everyone!



    First, I want to congratulate the group for this wonderful project. It did
    open up new ideas and solutions in computing and technology-wise. I'm
    excited to learn more about it and discover possibilities using Hadoop and
    its components.



    Well I just want to ask this with regards to my study. Currently I'm
    studying my PhD course in Bioinformatics, and my question is that can you
    give me a (rough) idea if it's possible to use Hadoop cluster in achieving a
    DNA sequence alignment? My basic idea for this goes something like a string
    search out of a huge data files stored in HDFS, and the application uses
    MapReduce in searching and computing. As the Hadoop paradigm impies, it
    doesn't serve well in interactive applications, and I think this kind of
    searching is a "write-once, read-many" application.



    I hope you don't mind my question. And it'll be great hearing your comments
    or suggestions about this.



    Thanks and more power!

    Franco

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMar 29, '11 at 1:05p
activeMar 31, '11 at 7:18a
posts7
users6
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase