FAQ
Hi!

We're trying to use nutch as web-spider and lucene as indexer and searcher.
Nutch and Lucene work good separately, but we do not succeed to link lucene to the index created by nutch.
We've modified the file "configuring.jsp" specifying in the "indexLocation" the index file created by Nutch, but we obtain always 0 results in the result-list.

We don't find documentation about nutch and lucene work together.


Tanks for your help

Best Regards


Giovanni

Search Discussions

  • Andrzej Bialecki at Jun 22, 2005 at 12:37 pm

    Giovanni Dima wrote:
    Hi!

    We're trying to use nutch as web-spider and lucene as indexer and searcher.
    Nutch and Lucene work good separately, but we do not succeed to link lucene to the index created by nutch.
    We've modified the file "configuring.jsp" specifying in the "indexLocation" the index file created by Nutch, but we obtain always 0 results in the result-list.

    We don't find documentation about nutch and lucene work together.
    Nutch already uses Lucene, in fact it wouldn't be able to do without it.
    Nutch creates so called "segments" (meaning is different from Lucene
    segments), which can be indexed to produce Lucene indexes, which are
    then used for searching.

    I suggest to use Luke (http://www.getopt.org/luke) to investigate such
    index first, before using it in a separate application. Nutch uses
    non-standard analyzer, so that the terms in the index are sometimes
    different (e.g. some words are combined into bi-grams).

    --
    Best regards,
    Andrzej Bialecki <><
    ___. ___ ___ ___ _ _ __________________________________
    [__ || __|__/|__||\/| Information Retrieval, Semantic Web
    ___|||__|| \| || | Embedded Unix, System Integration
    http://www.sigram.com Contact: info at sigram dot com


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Giovanni Dima at Jun 22, 2005 at 1:28 pm
    Andrzej, tanks for the reply.
    I'm sorry but i've another (similar) question...
    Lucene and nutch use the same parser and analyzer? I seems to understand that the segments created from the nutch are different from those creating from lucene.
    I've installed nutch and I've created folders db and segments. Then, as explained in the tutorial, i've created a new database, injected urls into it, generated a fetchlist from the database and indexed the segment with the command
    bin/nutch index ..
    May I use as lucene's "indexLocation" the index folder created in this way?
    If I may not, how can I make a valid index for lucene?

    Thanks in advance.

    Giovanni
    Atitlan Engineering, Pisa
  • Andrzej Bialecki at Jun 22, 2005 at 3:37 pm

    Giovanni Dima wrote:
    Andrzej, tanks for the reply. I'm sorry but i've another (similar)
    question... Lucene and nutch use the same parser and analyzer?
    No, they don't.
    I seems to understand that the segments created from the nutch are
    different from those creating from lucene. I've installed nutch and
    They are not simply different - they represent completely different data
    structures, only accidentally named the same...
    I've created folders db and segments. Then, as explained in the
    tutorial, i've created a new database, injected urls into it,
    generated a fetchlist from the database and indexed the segment with
    the command bin/nutch index .. May I use as lucene's "indexLocation"
    the index folder created in this way? If I may not, how can I make a
    valid index for lucene?
    In Lucene API there is no such thing as "indexLocation". You probably
    refer to the search demo application, included in Lucene distribution.
    If that's the case, then the answer is no - it won't work with indexes
    created by Nutch. However, if you use Lucene API, you can work with them
    just fine - they are normal Lucene indexes, just the field names, and
    analyzers used are different.

    I suggest you to look inside the demo app., and see how Lucene's API is
    used.

    --
    Best regards,
    Andrzej Bialecki <><
    ___. ___ ___ ___ _ _ __________________________________
    [__ || __|__/|__||\/| Information Retrieval, Semantic Web
    ___|||__|| \| || | Embedded Unix, System Integration
    http://www.sigram.com Contact: info at sigram dot com


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Giovanni Dima at Jun 23, 2005 at 4:01 pm
    Andrzej,
    I'm so sorry about still disturbing you!

    I followed your suggest and i have found this code:

    try {
    searcher = new IndexSearcher(
    IndexReader.open(indexName) //create an indexSearcher for our page
    );
    } catch (Exception e) {

    IndexSearcher is a class of Lucene API, isn't this?

    I have used as "indexName" the index created by nutch. When I access the lucene web page of my application, the system not produce any exception, but the result list is always empty (for any search keyword used).

    What's wrong?


    Thanks in advance.

    Giovanni
    Atitlan Engineering, Pisa
  • Andrzej Bialecki at Jun 23, 2005 at 4:16 pm

    Giovanni Dima wrote:
    Andrzej, I'm so sorry about still disturbing you!

    I followed your suggest and i have found this code:

    try { searcher = new IndexSearcher( IndexReader.open(indexName)
    //create an indexSearcher for our page ); } catch (Exception e) {


    IndexSearcher is a class of Lucene API, isn't this? Yes.
    I have used as "indexName" the index created by nutch. When I access
    the lucene web page of my application, the system not produce any
    Lucene doesn't have any "web page" - Lucene is a library. This means
    that you are using some kind of web application based on Lucene. The
    exact details of how a query is run with the index (e.g. the default
    field, analyzer, etc) depend on the application. You cannot just blindly
    use a specific application (which assumes specific things about the
    index) with any random index, and hope that it works...

    Well, perhaps the only exception to that rule would be Luke
    (http://www.getopt.org/luke), which you should try anyway to get a
    better understanding of what's in the index.
    exception, but the result list is always empty (for any search
    keyword used).

    What's wrong?
    What is the query? What is the analyzer?


    --
    Best regards,
    Andrzej Bialecki <><
    ___. ___ ___ ___ _ _ __________________________________
    [__ || __|__/|__||\/| Information Retrieval, Semantic Web
    ___|||__|| \| || | Embedded Unix, System Integration
    http://www.sigram.com Contact: info at sigram dot com


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 22, '05 at 11:04a
activeJun 23, '05 at 4:16p
posts6
users2
websitelucene.apache.org

2 users in discussion

Giovanni Dima: 3 posts Andrzej Bialecki: 3 posts

People

Translate

site design / logo © 2023 Grokbase