FAQ
Hello All,

I am planning to start project where I have to do extensive storage of xml
and text files. On top of that I have to implement efficient algorithm for
searching over thousands or millions of files, and also do some indexes to
make search faster next time.

I looked into Oracle database but it delivers very poor result. Can I use
Hadoop for this? Which Hadoop project would be best fit for this?

Is there anything from Google I can use?

Thanks a lot in advance.
--
View this message in context: http://old.nabble.com/trying-to-select-technology-tp31743063p31743063.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Search Discussions

  • Jagaran das at May 31, 2011 at 6:16 pm
    Think of Lucene and Apache SOLR

    Cheers,
    Jagaran



    ________________________________
    From: cs230 <chintanjshah@gmail.com>
    To: core-user@hadoop.apache.org
    Sent: Tue, 31 May, 2011 10:50:49 AM
    Subject: trying to select technology


    Hello All,

    I am planning to start project where I have to do extensive storage of xml
    and text files. On top of that I have to implement efficient algorithm for
    searching over thousands or millions of files, and also do some indexes to
    make search faster next time.

    I looked into Oracle database but it delivers very poor result. Can I use
    Hadoop for this? Which Hadoop project would be best fit for this?

    Is there anything from Google I can use?

    Thanks a lot in advance.
    --
    View this message in context:
    http://old.nabble.com/trying-to-select-technology-tp31743063p31743063.html
    Sent from the Hadoop core-user mailing list archive at Nabble.com.
  • Matthew Foley at May 31, 2011 at 6:57 pm
    Sounds like you're looking for a full-text inverted index. Lucene is a good opensource implementation of that. I believe it has an option for storing the original full text as well as the indexes.
    --Matt

    On May 31, 2011, at 10:50 AM, cs230 wrote:


    Hello All,

    I am planning to start project where I have to do extensive storage of xml
    and text files. On top of that I have to implement efficient algorithm for
    searching over thousands or millions of files, and also do some indexes to
    make search faster next time.

    I looked into Oracle database but it delivers very poor result. Can I use
    Hadoop for this? Which Hadoop project would be best fit for this?

    Is there anything from Google I can use?

    Thanks a lot in advance.
    --
    View this message in context: http://old.nabble.com/trying-to-select-technology-tp31743063p31743063.html
    Sent from the Hadoop core-user mailing list archive at Nabble.com.
  • Ted Dunning at May 31, 2011 at 7:00 pm
    To pile on, thousands or millions of documents are well within the range
    that is well addressed by Lucene.

    Solr may be an even better option than bare Lucene since it handles lots of
    the boilerplate problems like document parsing and index update scheduling.
    On Tue, May 31, 2011 at 11:56 AM, Matthew Foley wrote:

    Sounds like you're looking for a full-text inverted index. Lucene is a
    good opensource implementation of that. I believe it has an option for
    storing the original full text as well as the indexes.
    --Matt

    On May 31, 2011, at 10:50 AM, cs230 wrote:


    Hello All,

    I am planning to start project where I have to do extensive storage of xml
    and text files. On top of that I have to implement efficient algorithm for
    searching over thousands or millions of files, and also do some indexes to
    make search faster next time.

    I looked into Oracle database but it delivers very poor result. Can I use
    Hadoop for this? Which Hadoop project would be best fit for this?

    Is there anything from Google I can use?

    Thanks a lot in advance.
    --
    View this message in context:
    http://old.nabble.com/trying-to-select-technology-tp31743063p31743063.html
    Sent from the Hadoop core-user mailing list archive at Nabble.com.

  • Jane Chen at Jun 1, 2011 at 4:20 am
    Hi,

    I think you should check out MarkLogic, a product with database and search capabilities especially designed for XML and unstructured data. We also allow you to run Hadoop MapReduce jobs on top of data stored in MarkLogic.

    For more information on MarkLogic, please check out:
    http://www.marklogic.com/products/overview.html

    Thanks,
    Jane

    --- On Tue, 5/31/11, cs230 wrote:
    From: cs230 <chintanjshah@gmail.com>
    Subject: trying to select technology
    To: core-user@hadoop.apache.org
    Date: Tuesday, May 31, 2011, 10:50 AM

    Hello All,

    I am planning to start project where I have to do extensive
    storage of xml
    and text files. On top of that I have to implement
    efficient algorithm for
    searching over thousands or millions of files, and also do
    some indexes to
    make search faster next time.

    I looked into Oracle database but it delivers very poor
    result. Can I use
    Hadoop for this? Which Hadoop project would be best fit for
    this?

    Is there anything from Google I can use?

    Thanks a lot in advance.
    --
    View this message in context: http://old.nabble.com/trying-to-select-technology-tp31743063p31743063.html
    Sent from the Hadoop core-user mailing list archive at
    Nabble.com.
  • Medcl at Jun 1, 2011 at 5:58 am
    my suggestion,
    ElasticSearch:http://elasticsearch.org


    -----原始邮件-----
    From: Jane Chen
    Sent: Wednesday, June 01, 2011 12:19 PM
    To: core-user@hadoop.apache.org ; common-user@hadoop.apache.org
    Subject: Re: trying to select technology

    Hi,

    I think you should check out MarkLogic, a product with database and search
    capabilities especially designed for XML and unstructured data. We also
    allow you to run Hadoop MapReduce jobs on top of data stored in MarkLogic.

    For more information on MarkLogic, please check out:
    http://www.marklogic.com/products/overview.html

    Thanks,
    Jane

    --- On Tue, 5/31/11, cs230 wrote:
    From: cs230 <chintanjshah@gmail.com>
    Subject: trying to select technology
    To: core-user@hadoop.apache.org
    Date: Tuesday, May 31, 2011, 10:50 AM

    Hello All,

    I am planning to start project where I have to do extensive
    storage of xml
    and text files. On top of that I have to implement
    efficient algorithm for
    searching over thousands or millions of files, and also do
    some indexes to
    make search faster next time.

    I looked into Oracle database but it delivers very poor
    result. Can I use
    Hadoop for this? Which Hadoop project would be best fit for
    this?

    Is there anything from Google I can use?

    Thanks a lot in advance.
    --
    View this message in context:
    http://old.nabble.com/trying-to-select-technology-tp31743063p31743063.html
    Sent from the Hadoop core-user mailing list archive at
    Nabble.com.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMay 31, '11 at 5:51p
activeJun 1, '11 at 5:58a
posts6
users6
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase