FAQ
Yes Erik I'm instantiating a new IndexSearcher for every search.


-----Ursprungligt meddelande-----
Från: Erik Hatcher
Skickat: den 18 maj 2006 12:08
Till: java-user@lucene.apache.org
Ämne: Re: SV: Sort problematics

On May 18, 2006, at 4:52 AM, Marcus Falck wrote:
I have slow subsequent searches.
And if i get the cache up and running is it persisted to disc?
No, Lucene's caches are not persisted, only in RAM. Are you using a
new IndexReader/IndexSearcher for your subsequent searches? If not,
you're likely not leveraging any caches at all.

Erik


/Marcus


________________________________

Från: Yonik Seeley
Skickat: on 2006-05-17 16:31
Till: java-user@lucene.apache.org
Ämne: Re: Sort problematics


On 5/17/06, Marcus Falck wrote:
I did a quite interesting notice, if i search for IndexId:x
(IndexId is unique) with a sort it still takes very long time,
which
it doesn't without the sort.
This will only be the case the first time you sort on a field because
a FieldCache entry is created for that field and then cached for
subsequent sorts.

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene
search server

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Erik Hatcher at May 18, 2006 at 10:52 am

    On May 18, 2006, at 6:41 AM, Marcus Falck wrote:
    Yes Erik I'm instantiating a new IndexSearcher for every search.
    Then don't :) You only need a new IndexSearcher instance when the
    index itself has changed.

    -----Ursprungligt meddelande-----
    Från: Erik Hatcher
    Skickat: den 18 maj 2006 12:08
    Till: java-user@lucene.apache.org
    Ämne: Re: SV: Sort problematics

    On May 18, 2006, at 4:52 AM, Marcus Falck wrote:
    I have slow subsequent searches.
    And if i get the cache up and running is it persisted to disc?
    No, Lucene's caches are not persisted, only in RAM. Are you using a
    new IndexReader/IndexSearcher for your subsequent searches? If not,
    you're likely not leveraging any caches at all.

    Erik


    /Marcus


    ________________________________

    Från: Yonik Seeley
    Skickat: on 2006-05-17 16:31
    Till: java-user@lucene.apache.org
    Ämne: Re: Sort problematics


    On 5/17/06, Marcus Falck wrote:
    I did a quite interesting notice, if i search for IndexId:x
    (IndexId is unique) with a sort it still takes very long time,
    which
    it doesn't without the sort.
    This will only be the case the first time you sort on a field because
    a FieldCache entry is created for that field and then cached for
    subsequent sorts.

    -Yonik
    http://incubator.apache.org/solr Solr, the open-source Lucene
    search server

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Marcus Falck at May 18, 2006 at 9:01 pm
    Well that book is cool =)


    ________________________________

    Från: Erik Hatcher
    Skickat: to 2006-05-18 22:56
    Till: java-user@lucene.apache.org
    Ämne: Re: SV: Sort problematics



    On May 18, 2006, at 4:25 PM, Marcus Falck wrote:
    Where can i read more about the lucene sort implementation?
    Does there exist any documentation on the sorting except for the
    Lucene API docs?
    Well, there is "Lucene in Action" which covers sorting in a fair bit
    of detail. I hear that book is pretty cool ;)

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Marcus Falck at May 18, 2006 at 10:23 pm
    Hi Gunther.

    We thought in the terms of an index containing the search profiles and search that index using the documents as a query. But we couldn't really figure it out. We have an alert service up and running today using Veritys implementation of alerts. So we looked at the Verity documentation and realised that they didn't handle the alert using an inverted index. So we implemented our new alert service in the same way the verity service works today.
    Which seems to work nice, but if you have any concrete solution on how to achive an inverted index storing pretty complex queries you are more then welcome to share it.

    -

    What I want to accomplish is an central index for alot of large backend systems containing a lot of articles. For example news polled from web, newspapers delivered in electronic form to us and 3:d part document databases.
    So what we have done is to implement a search engine using Lucene as the core. This engine is scalable both in terms of range and round-robin/range. Fetcher applications fetches documents from different storages and transforms those documents into a more common format and then distributes them to all searchmachines matching that range.
    The range clustering is built using date range. Since we are going to buy document databases from other companies we can't guarantee that all data will be added in terms of date order.
    The volymes of data we are talking about are around 500 Million news articles.

    The enduser, and alot of our internal processes for value adding services, are then defining a search query for things they want to monitor. In the endusers case this is called "agent". When the user logs in to the system and clicks on its agent the user will get the matching articles presented to him/her in DATE order (newest first). The date order is critical. The relevance is not important since we have value added services such as quality control of the hits.

    So the last thing to do in order to get a fully functional prof of concept up is to fix the date order presentation. And since it's alot of data and the IndexSearcher will be recreated pretty often we will need to change the lucene scoring/ranking. And I can't understand why this should be so hard? But I don't have any clue of what the best practises for doing so are.

    /
    Regards
    Marcus



    ________________________________

    Från: Günther Starnberger
    Skickat: to 2006-05-18 23:22
    Till: java-user@lucene.apache.org
    Ämne: Re: SV: Sort problematics



    On Thu, May 18, 2006 at 10:53:23PM +0200, Marcus Falck wrote:

    Hello,
    The term scorer will give higher score on documents containing both
    terms. This is a problem (in our application) since in this case want
    the same score on documents as long as they contain 1 of the terms
    (since we are dealing with newsletter observation for companies they
    want to get the hits ordered by date to get the complete overview). I
    tested to rewrite the TermScorer to give me the same score with
    success. So my question is.
    What exactly do you want to achieve with your application?

    You speak of "immediate alerts". I understand this as: Your users
    specify keywords or queries and when you receive a new document which
    matches a query you alert the user.

    Is this what you want to do? If so I don't think that Lucene is useful
    for this kind of realtime queries. Instead of using an inverted index
    it would make more sense to use a normal index which contains the
    terms you search for. If you receive a new document make a lookup on
    each term of the document using the index. It _might_ be possible to
    do this with Lucene by storing the search-terms as documents and using
    the documents which you receive as queries, but i guess this it isn't
    that trivial.

    If you need a combination of traditional search and real-time alerts a
    hybrid solution may make sense. But using Lucene for real-time search
    isn't a good idea (at least IMO).

    bye,
    /gst

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMay 18, '06 at 10:41a
activeMay 18, '06 at 10:23p
posts4
users2
websitelucene.apache.org

2 users in discussion

Marcus Falck: 3 posts Erik Hatcher: 1 post

People

Translate

site design / logo © 2022 Grokbase