FAQ
Hi,

We're interested in using Hadoop for our application for purposes of replication and distribution of query execution. But I have some questions as to whether it's a good fit. We have essentially written a search engine using Jena (Semantic Web framework) and its accompanying Lucene interface called LARQ (Lucene ARQ) to allow for free-text search over the RDF graphs stored in Jena.

We expect the Lucene indexes to get very large, thus the need for Hadoop. I tried going through the documentation provided on the site, but want to clarify some points that we are unable to answer from the wiki, faq, etc:

1. We're not using Nutch, but the documentation seems to reference it frequently. Is this a problem? Can Lucene indexes alone be used with Hadoop without using Nutch?

2. Are there any best practices to using Hadoop behind such a setup in terms of creating/querying/managing the Lucene indexes? I found this thread ( http://www.mail-archive.com/hadoop-user@lucene.apache.org/msg00573.html ), but could use some clarification on several of the points mentioned.

3. How does Hadoop access, process & replicate the Lucene indexes in case we generate the indexes in our local file system as against HDFS?

4. Please provide a standard flow of execution as to how Hadoop works when Lucene is queried.


Thanks,
Vinaya


---------------------------------
Check out what you're missing if you're not on Yahoo! Messenger

Search Discussions

  • Vinaya Shastrakar at Apr 18, 2007 at 8:05 am
    Hi,

    We're interested in using Hadoop for our application for purposes of replication and distribution of query execution. But I have some questions as to whether it's a good fit. We have essentially written a search engine using Jena (Semantic Web framework) and its accompanying Lucene interface called LARQ (Lucene ARQ) to allow for free-text search over the RDF graphs stored in Jena.

    We expect the Lucene indexes to get very large, thus the need for Hadoop. I tried going through the documentation provided on the site, but want to clarify some points that we are unable to answer from the wiki, faq, etc:

    1. We're not using Nutch, but the documentation seems to reference it frequently. Is this a problem? Can Lucene indexes alone be used with Hadoop without using Nutch?

    2. Are there any best practices to using Hadoop behind such a setup in terms of creating/querying/managing the Lucene indexes? I found this thread ( http://www.mail-archive.com/hadoop-user@lucene.apache.org/msg00573.html ), but could use some clarification on several of the points mentioned.

    3. How does Hadoop access, process & replicate the Lucene indexes in case we generate the indexes in our local file system as against HDFS?

    4. Please provide a standard flow of execution as to how Hadoop works when Lucene is queried.


    Thanks,
    Vinaya


    ---------------------------------
    Check out what you're missing if you're not on Yahoo! Messenger

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedApr 18, '07 at 8:01a
activeApr 18, '07 at 8:05a
posts2
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Vinaya Shastrakar: 2 posts

People

Translate

site design / logo © 2022 Grokbase