FAQ
Hi,

I'm going to replace an old reader/writer synchronization mechanism we had
implemented with the new near realtime search facilities in Lucene 2.9.
However, it's still a bit unclear on how to efficiently do it.

Is the following implementation the good way to do achieve it ? The context
is concurrent read/writes on an index :

1. create a Directory instance
2. create a writer on this directory
3. on each write request, add document to the writer
4. on each read request,
a. use writer.getReader() to obtain an up-to-date reader
b. create an IndexSearcher with that reader
c. perform Query
d. close IndexSearcher
5. on application close
a. close writer
b. close directory

While this seems to be ok, I'm really wondering about the performance of
opening a searcher for each request. I could introduce some kind of delay
and cache a searcher for some seconds, but I'm not sure it's the best thing
to do.

Thanks,

Cedric


--
View this message in context: http://www.nabble.com/Realtime-search-best-practices-tp25852756p25852756.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Jake Mannix at Oct 12, 2009 at 5:28 pm
    Hi Cedric,

    I don't know of anyone with a substantial throughput production system who
    is doing realtime search with the 2.9 improvements yet (and in fact, no
    serious performance analysis has been done on these even "in the lab" so to
    speak: follow https://issues.apache.org/jira/browse/LUCENE-1577 to track
    work
    on this), so some experimentation will be necessary to know how well it fits
    in your environment.

    Your approach has the basic components of how to do 2.9 NRT search,
    but it's missing the point when you're making your commit() calls. Your
    choices
    here depend on some tradeoffs, as lucene provides ACID-like transactional
    semantics whereby if you decide to commit() after every add(), then yes,
    getReader() will be up-to-date with the most recent commit(), but at a cost
    of indexing throughput (and much more frequent segment merges), at least
    in comparison to only calling commit() at a slower rate (but calling
    commit()
    less frequently means, of course, that you only have readers as fresh as
    your most recent commit).

    Also, you have to be aware that there are no guarantees as far as
    realtimeliness is concerned with 2.9 NRT - if there is an addIndexes() going

    on in anther thread on your IndexWriter, this is another instance where your

    getReader() call won't block, but also won't necessarily get access to the
    all of these new segments if the addIndexes() hasn't completed yet.

    Please post here any results you find with this - this is a very new
    feature
    and seeing how it works in the wild would be very helpful to everyone else
    who is interested.

    -jake
    On Mon, Oct 12, 2009 at 2:24 AM, melix wrote:


    Hi,

    I'm going to replace an old reader/writer synchronization mechanism we had
    implemented with the new near realtime search facilities in Lucene 2.9.
    However, it's still a bit unclear on how to efficiently do it.

    Is the following implementation the good way to do achieve it ? The context
    is concurrent read/writes on an index :

    1. create a Directory instance
    2. create a writer on this directory
    3. on each write request, add document to the writer
    4. on each read request,
    a. use writer.getReader() to obtain an up-to-date reader
    b. create an IndexSearcher with that reader
    c. perform Query
    d. close IndexSearcher
    5. on application close
    a. close writer
    b. close directory

    While this seems to be ok, I'm really wondering about the performance of
    opening a searcher for each request. I could introduce some kind of delay
    and cache a searcher for some seconds, but I'm not sure it's the best thing
    to do.

    Thanks,

    Cedric


    --
    View this message in context:
    http://www.nabble.com/Realtime-search-best-practices-tp25852756p25852756.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Oct 12, 2009 at 7:12 pm
    Just to clarify: IndexWriter.newReader returns a reader that searches
    uncommitted changes as well. Ie, you need not call IndexWriter.commit
    to make the changes visible.

    However, if you're opening a reader the "normal" way
    (IndexReader.open) then it is necessary to first call
    IndexWriter.commit.

    Mike
    On Mon, Oct 12, 2009 at 5:24 AM, melix wrote:

    Hi,

    I'm going to replace an old reader/writer synchronization mechanism we had
    implemented with the new near realtime search facilities in Lucene 2.9.
    However, it's still a bit unclear on how to efficiently do it.

    Is the following implementation the good way to do achieve it ? The context
    is concurrent read/writes on an index :

    1. create a Directory instance
    2. create a writer on this directory
    3. on each write request, add document to the writer
    4. on each read request,
    a. use writer.getReader() to obtain an up-to-date reader
    b. create an IndexSearcher with that reader
    c. perform Query
    d. close IndexSearcher
    5. on application close
    a. close writer
    b. close directory

    While this seems to be ok, I'm really wondering about the performance of
    opening a searcher for each request. I could introduce some kind of delay
    and cache a searcher for some seconds, but I'm not sure it's the best thing
    to do.

    Thanks,

    Cedric


    --
    View this message in context: http://www.nabble.com/Realtime-search-best-practices-tp25852756p25852756.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jake Mannix at Oct 12, 2009 at 7:17 pm
    Wait, so according to the javadocs, the IndexReader which you got from
    the IndexWriter forwards calls to reopen() back to IndexWriter.getReader(),
    which means that if the user has a NRT reader, and the user keeps calling
    reopen() on it, they're getting uncommitted changes as well, while if they
    call reopen() on a regular IndexReader, they do not?

    How does this play nicely with the transactional semantics given by
    commit()?
    On Mon, Oct 12, 2009 at 12:11 PM, Michael McCandless wrote:

    Just to clarify: IndexWriter.newReader returns a reader that searches
    uncommitted changes as well. Ie, you need not call IndexWriter.commit
    to make the changes visible.

    However, if you're opening a reader the "normal" way
    (IndexReader.open) then it is necessary to first call
    IndexWriter.commit.

    Mike
    On Mon, Oct 12, 2009 at 5:24 AM, melix wrote:

    Hi,

    I'm going to replace an old reader/writer synchronization mechanism we had
    implemented with the new near realtime search facilities in Lucene 2.9.
    However, it's still a bit unclear on how to efficiently do it.

    Is the following implementation the good way to do achieve it ? The context
    is concurrent read/writes on an index :

    1. create a Directory instance
    2. create a writer on this directory
    3. on each write request, add document to the writer
    4. on each read request,
    a. use writer.getReader() to obtain an up-to-date reader
    b. create an IndexSearcher with that reader
    c. perform Query
    d. close IndexSearcher
    5. on application close
    a. close writer
    b. close directory

    While this seems to be ok, I'm really wondering about the performance of
    opening a searcher for each request. I could introduce some kind of delay
    and cache a searcher for some seconds, but I'm not sure it's the best thing
    to do.

    Thanks,

    Cedric


    --
    View this message in context:
    http://www.nabble.com/Realtime-search-best-practices-tp25852756p25852756.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Oct 12, 2009 at 7:26 pm

    On Mon, Oct 12, 2009 at 3:17 PM, Jake Mannix wrote:

    Wait, so according to the javadocs, the IndexReader which you got from
    the IndexWriter forwards calls to reopen() back to IndexWriter.getReader(),
    which means that if the user has a NRT reader, and the user keeps calling
    reopen() on it, they're getting uncommitted changes as well, while if they
    call reopen() on a regular IndexReader, they do not?
    That's right.
    How does this play nicely with the transactional semantics given by
    commit()?
    The transactional semantics are still intact... it's just that an NRT
    reader sees the uncommitted changes, ie, all changes done since the
    last commit.

    If disaster strikes (machine/os/jvm crashes, power loss, kill -9,
    etc.) then on reboot/restart your index will still only show the last
    successfull commit.

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jake Mannix at Oct 12, 2009 at 7:39 pm

    On Mon, Oct 12, 2009 at 12:26 PM, Michael McCandless wrote:
    On Mon, Oct 12, 2009 at 3:17 PM, Jake Mannix wrote:

    Wait, so according to the javadocs, the IndexReader which you got from
    the IndexWriter forwards calls to reopen() back to
    IndexWriter.getReader(),
    which means that if the user has a NRT reader, and the user keeps calling
    reopen() on it, they're getting uncommitted changes as well, while if they
    call reopen() on a regular IndexReader, they do not?
    That's right.
    So maybe since it's an "expert" feature, this is ok, but if users are used
    to using
    isCurrent() on their reader instances, this seems like it might get
    confusing, since
    now some readers are even more current than current, and in fact the NRT
    readers may be current w.r.t. the most recent commit, but calling reopen()
    on
    them will actually still make them more current, in that they now get a view
    on even more recent uncommitted changes...

    -jake
  • John Wang at Oct 12, 2009 at 8:18 pm
    Oh, that is really good to know!
    Is this deterministic? e.g. as long as writer.addDocument() is called, next
    getReader reflects the change? Does it work with deletes? e.g.
    writer.deleteDocuments()?
    Thanks Mike for clarifying!

    -John
    On Mon, Oct 12, 2009 at 12:11 PM, Michael McCandless wrote:

    Just to clarify: IndexWriter.newReader returns a reader that searches
    uncommitted changes as well. Ie, you need not call IndexWriter.commit
    to make the changes visible.

    However, if you're opening a reader the "normal" way
    (IndexReader.open) then it is necessary to first call
    IndexWriter.commit.

    Mike
    On Mon, Oct 12, 2009 at 5:24 AM, melix wrote:

    Hi,

    I'm going to replace an old reader/writer synchronization mechanism we had
    implemented with the new near realtime search facilities in Lucene 2.9.
    However, it's still a bit unclear on how to efficiently do it.

    Is the following implementation the good way to do achieve it ? The context
    is concurrent read/writes on an index :

    1. create a Directory instance
    2. create a writer on this directory
    3. on each write request, add document to the writer
    4. on each read request,
    a. use writer.getReader() to obtain an up-to-date reader
    b. create an IndexSearcher with that reader
    c. perform Query
    d. close IndexSearcher
    5. on application close
    a. close writer
    b. close directory

    While this seems to be ok, I'm really wondering about the performance of
    opening a searcher for each request. I could introduce some kind of delay
    and cache a searcher for some seconds, but I'm not sure it's the best thing
    to do.

    Thanks,

    Cedric


    --
    View this message in context:
    http://www.nabble.com/Realtime-search-best-practices-tp25852756p25852756.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Yonik Seeley at Oct 12, 2009 at 8:26 pm
    Guys, please - you're not new at this... this is what JavaDoc is for:

    /**
    * Returns a readonly reader containing all
    * current updates. Flush is called automatically. This
    * provides "near real-time" searching, in that changes
    * made during an IndexWriter session can be made
    * available for searching without closing the writer.
    *
    * <p>It's near real-time because there is no hard
    * guarantee on how quickly you can get a new reader after
    * making changes with IndexWriter. You'll have to
    * experiment in your situation to determine if it's
    * fast enough. As this is a new and experimental
    * feature, please report back on your findings so we can
    * learn, improve and iterate.</p>
    *
    * <p>The resulting reader supports {@link
    * IndexReader#reopen}, but that call will simply forward
    * back to this method (though this may change in the
    * future).</p>
    *
    * <p>The very first time this method is called, this
    * writer instance will make every effort to pool the
    * readers that it opens for doing merges, applying
    * deletes, etc. This means additional resources (RAM,
    * file descriptors, CPU time) will be consumed.</p>
    *
    * <p>For lower latency on reopening a reader, you should
    * call {@link #setMergedSegmentWarmer} to
    * pre-warm a newly merged segment before it's committed
    * to the index. This is important for minimizing
    * index-to-search delay after a large merge. </p>
    *
    * <p>If an addIndexes* call is running in another thread,
    * then this reader will only search those segments from
    * the foreign index that have been successfully copied
    * over, so far</p>.
    *
    * <p><b>NOTE</b>: Once the writer is closed, any
    * outstanding readers may continue to be used. However,
    * if you attempt to reopen any of those readers, you'll
    * hit an {@link AlreadyClosedException}.</p>
    *
    * <p><b>NOTE:</b> This API is experimental and might
    * change in incompatible ways in the next release.</p>
    *
    * @return IndexReader that covers entire index plus all
    * changes made so far by this IndexWriter instance
    *
    * @throws IOException
    */
    public IndexReader getReader() throws IOException {


    -Yonik
    http://www.lucidimagination.com

    On Mon, Oct 12, 2009 at 4:18 PM, John Wang wrote:
    Oh, that is really good to know!
    Is this deterministic? e.g. as long as writer.addDocument() is called, next
    getReader reflects the change? Does it work with deletes? e.g.
    writer.deleteDocuments()?
    Thanks Mike for clarifying!

    -John

    On Mon, Oct 12, 2009 at 12:11 PM, Michael McCandless <
    lucene@mikemccandless.com> wrote:
    Just to clarify: IndexWriter.newReader returns a reader that searches
    uncommitted changes as well.  Ie, you need not call IndexWriter.commit
    to make the changes visible.

    However, if you're opening a reader the "normal" way
    (IndexReader.open) then it is necessary to first call
    IndexWriter.commit.

    Mike

    On Mon, Oct 12, 2009 at 5:24 AM, melix <cedric.champeau@lingway.com>
    wrote:
    Hi,

    I'm going to replace an old reader/writer synchronization mechanism we had
    implemented with the new near realtime search facilities in Lucene 2.9.
    However, it's still a bit unclear on how to efficiently do it.

    Is the following implementation the good way to do achieve it ? The context
    is concurrent read/writes on an index :

    1. create a Directory instance
    2. create a writer on this directory
    3. on each write request, add document to the writer
    4. on each read request,
    a. use writer.getReader() to obtain an up-to-date reader
    b. create an IndexSearcher with that reader
    c. perform Query
    d. close IndexSearcher
    5. on application close
    a. close writer
    b. close directory

    While this seems to be ok, I'm really wondering about the performance of
    opening a searcher for each request. I could introduce some kind of delay
    and cache a searcher for some seconds, but I'm not sure it's the best thing
    to do.

    Thanks,

    Cedric


    --
    View this message in context:
    http://www.nabble.com/Realtime-search-best-practices-tp25852756p25852756.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jake Mannix at Oct 12, 2009 at 8:36 pm
    Thanks Yonik,

    It may be surprising, but in fact I have read that
    javadoc. It talks about not needing to close the
    writer, but doesn't specifically talk about the what
    the relationship between commit() calls and
    getReader() calls is. I suppose I should have
    interpreted:

    "@returns a new reader which contains all
    changes..."

    to mean "all uncommitted changes", but why
    is it so obvious that what could be happening
    is that it only "returns all changes since the last
    commit, but without touching disk because it
    has docs in memory as well"?

    -jake
    On Mon, Oct 12, 2009 at 1:26 PM, Yonik Seeley wrote:

    Guys, please - you're not new at this... this is what JavaDoc is for:

    /**
    * Returns a readonly reader containing all
    * current updates. Flush is called automatically. This
    * provides "near real-time" searching, in that changes
    * made during an IndexWriter session can be made
    * available for searching without closing the writer.
    *
    * <p>It's near real-time because there is no hard
    * guarantee on how quickly you can get a new reader after
    * making changes with IndexWriter. You'll have to
    * experiment in your situation to determine if it's
    * fast enough. As this is a new and experimental
    * feature, please report back on your findings so we can
    * learn, improve and iterate.</p>
    *
    * <p>The resulting reader supports {@link
    * IndexReader#reopen}, but that call will simply forward
    * back to this method (though this may change in the
    * future).</p>
    *
    * <p>The very first time this method is called, this
    * writer instance will make every effort to pool the
    * readers that it opens for doing merges, applying
    * deletes, etc. This means additional resources (RAM,
    * file descriptors, CPU time) will be consumed.</p>
    *
    * <p>For lower latency on reopening a reader, you should
    * call {@link #setMergedSegmentWarmer} to
    * pre-warm a newly merged segment before it's committed
    * to the index. This is important for minimizing
    * index-to-search delay after a large merge. </p>
    *
    * <p>If an addIndexes* call is running in another thread,
    * then this reader will only search those segments from
    * the foreign index that have been successfully copied
    * over, so far</p>.
    *
    * <p><b>NOTE</b>: Once the writer is closed, any
    * outstanding readers may continue to be used. However,
    * if you attempt to reopen any of those readers, you'll
    * hit an {@link AlreadyClosedException}.</p>
    *
    * <p><b>NOTE:</b> This API is experimental and might
    * change in incompatible ways in the next release.</p>
    *
    * @return IndexReader that covers entire index plus all
    * changes made so far by this IndexWriter instance
    *
    * @throws IOException
    */
    public IndexReader getReader() throws IOException {


    -Yonik
    http://www.lucidimagination.com

    On Mon, Oct 12, 2009 at 4:18 PM, John Wang wrote:
    Oh, that is really good to know!
    Is this deterministic? e.g. as long as writer.addDocument() is called, next
    getReader reflects the change? Does it work with deletes? e.g.
    writer.deleteDocuments()?
    Thanks Mike for clarifying!

    -John

    On Mon, Oct 12, 2009 at 12:11 PM, Michael McCandless <
    lucene@mikemccandless.com> wrote:
    Just to clarify: IndexWriter.newReader returns a reader that searches
    uncommitted changes as well. Ie, you need not call IndexWriter.commit
    to make the changes visible.

    However, if you're opening a reader the "normal" way
    (IndexReader.open) then it is necessary to first call
    IndexWriter.commit.

    Mike

    On Mon, Oct 12, 2009 at 5:24 AM, melix <cedric.champeau@lingway.com>
    wrote:
    Hi,

    I'm going to replace an old reader/writer synchronization mechanism we had
    implemented with the new near realtime search facilities in Lucene
    2.9.
    However, it's still a bit unclear on how to efficiently do it.

    Is the following implementation the good way to do achieve it ? The context
    is concurrent read/writes on an index :

    1. create a Directory instance
    2. create a writer on this directory
    3. on each write request, add document to the writer
    4. on each read request,
    a. use writer.getReader() to obtain an up-to-date reader
    b. create an IndexSearcher with that reader
    c. perform Query
    d. close IndexSearcher
    5. on application close
    a. close writer
    b. close directory

    While this seems to be ok, I'm really wondering about the performance
    of
    opening a searcher for each request. I could introduce some kind of
    delay
    and cache a searcher for some seconds, but I'm not sure it's the best thing
    to do.

    Thanks,

    Cedric


    --
    View this message in context:
    http://www.nabble.com/Realtime-search-best-practices-tp25852756p25852756.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Oct 12, 2009 at 8:56 pm
    I agree, the javadocs could be improved. How about something like
    this for the first 2 paragraphs:

    * Returns a readonly reader, covering all committed as
    * well as un-committed changes to the index. This
    * provides "near real-time" searching, in that changes
    * made during an IndexWriter session can be quickly made
    * available for searching without closing the writer nor
    * calling {@link #commit}.
    *
    * <p>Note that this is functionally equivalent to calling
    * {#commit} and then using {@link IndexReader#open} to
    * open a new reader. But the turarnound time of this
    * method should be faster since it avoids the potentially
    * costly {@link #commit}.<p>

    Mike
    On Mon, Oct 12, 2009 at 4:35 PM, Jake Mannix wrote:
    Thanks Yonik,

    It may be surprising, but in fact I have read that
    javadoc.  It talks about not needing to close the
    writer, but doesn't specifically talk about the what
    the relationship between commit() calls and
    getReader() calls is.  I suppose I should have
    interpreted:

    "@returns a new reader which contains all
    changes..."

    to mean "all uncommitted changes", but why
    is it so obvious that what could be happening
    is that it only "returns all changes since the last
    commit, but without touching disk because it
    has docs in memory as well"?

    -jake
    On Mon, Oct 12, 2009 at 1:26 PM, Yonik Seeley wrote:

    Guys, please - you're not new at this... this is what JavaDoc is for:

    /**
    * Returns a readonly reader containing all
    * current updates.  Flush is called automatically.  This
    * provides "near real-time" searching, in that changes
    * made during an IndexWriter session can be made
    * available for searching without closing the writer.
    *
    * <p>It's near real-time because there is no hard
    * guarantee on how quickly you can get a new reader after
    * making changes with IndexWriter.  You'll have to
    * experiment in your situation to determine if it's
    * fast enough.  As this is a new and experimental
    * feature, please report back on your findings so we can
    * learn, improve and iterate.</p>
    *
    * <p>The resulting reader supports {@link
    * IndexReader#reopen}, but that call will simply forward
    * back to this method (though this may change in the
    * future).</p>
    *
    * <p>The very first time this method is called, this
    * writer instance will make every effort to pool the
    * readers that it opens for doing merges, applying
    * deletes, etc.  This means additional resources (RAM,
    * file descriptors, CPU time) will be consumed.</p>
    *
    * <p>For lower latency on reopening a reader, you should
    * call {@link #setMergedSegmentWarmer} to
    * pre-warm a newly merged segment before it's committed
    * to the index.  This is important for minimizing
    * index-to-search delay after a large merge.  </p>
    *
    * <p>If an addIndexes* call is running in another thread,
    * then this reader will only search those segments from
    * the foreign index that have been successfully copied
    * over, so far</p>.
    *
    * <p><b>NOTE</b>: Once the writer is closed, any
    * outstanding readers may continue to be used.  However,
    * if you attempt to reopen any of those readers, you'll
    * hit an {@link AlreadyClosedException}.</p>
    *
    * <p><b>NOTE:</b> This API is experimental and might
    * change in incompatible ways in the next release.</p>
    *
    * @return IndexReader that covers entire index plus all
    * changes made so far by this IndexWriter instance
    *
    * @throws IOException
    */
    public IndexReader getReader() throws IOException {


    -Yonik
    http://www.lucidimagination.com

    On Mon, Oct 12, 2009 at 4:18 PM, John Wang wrote:
    Oh, that is really good to know!
    Is this deterministic? e.g. as long as writer.addDocument() is called, next
    getReader reflects the change? Does it work with deletes? e.g.
    writer.deleteDocuments()?
    Thanks Mike for clarifying!

    -John

    On Mon, Oct 12, 2009 at 12:11 PM, Michael McCandless <
    lucene@mikemccandless.com> wrote:
    Just to clarify: IndexWriter.newReader returns a reader that searches
    uncommitted changes as well.  Ie, you need not call IndexWriter.commit
    to make the changes visible.

    However, if you're opening a reader the "normal" way
    (IndexReader.open) then it is necessary to first call
    IndexWriter.commit.

    Mike

    On Mon, Oct 12, 2009 at 5:24 AM, melix <cedric.champeau@lingway.com>
    wrote:
    Hi,

    I'm going to replace an old reader/writer synchronization mechanism we had
    implemented with the new near realtime search facilities in Lucene
    2.9.
    However, it's still a bit unclear on how to efficiently do it.

    Is the following implementation the good way to do achieve it ? The context
    is concurrent read/writes on an index :

    1. create a Directory instance
    2. create a writer on this directory
    3. on each write request, add document to the writer
    4. on each read request,
    a. use writer.getReader() to obtain an up-to-date reader
    b. create an IndexSearcher with that reader
    c. perform Query
    d. close IndexSearcher
    5. on application close
    a. close writer
    b. close directory

    While this seems to be ok, I'm really wondering about the performance
    of
    opening a searcher for each request. I could introduce some kind of
    delay
    and cache a searcher for some seconds, but I'm not sure it's the best thing
    to do.

    Thanks,

    Cedric


    --
    View this message in context:
    http://www.nabble.com/Realtime-search-best-practices-tp25852756p25852756.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jake Mannix at Oct 12, 2009 at 9:01 pm
    That seems a lot more straightforward Mike, thanks.

    -jake
    On Mon, Oct 12, 2009 at 1:56 PM, Michael McCandless wrote:

    I agree, the javadocs could be improved. How about something like
    this for the first 2 paragraphs:

    * Returns a readonly reader, covering all committed as
    * well as un-committed changes to the index. This
    * provides "near real-time" searching, in that changes
    * made during an IndexWriter session can be quickly made
    * available for searching without closing the writer nor
    * calling {@link #commit}.
    *
    * <p>Note that this is functionally equivalent to calling
    * {#commit} and then using {@link IndexReader#open} to
    * open a new reader. But the turarnound time of this
    * method should be faster since it avoids the potentially
    * costly {@link #commit}.<p>

    Mike
    On Mon, Oct 12, 2009 at 4:35 PM, Jake Mannix wrote:
    Thanks Yonik,

    It may be surprising, but in fact I have read that
    javadoc. It talks about not needing to close the
    writer, but doesn't specifically talk about the what
    the relationship between commit() calls and
    getReader() calls is. I suppose I should have
    interpreted:

    "@returns a new reader which contains all
    changes..."

    to mean "all uncommitted changes", but why
    is it so obvious that what could be happening
    is that it only "returns all changes since the last
    commit, but without touching disk because it
    has docs in memory as well"?

    -jake

    On Mon, Oct 12, 2009 at 1:26 PM, Yonik Seeley <
    yonik@lucidimagination.com>wrote:
    Guys, please - you're not new at this... this is what JavaDoc is for:

    /**
    * Returns a readonly reader containing all
    * current updates. Flush is called automatically. This
    * provides "near real-time" searching, in that changes
    * made during an IndexWriter session can be made
    * available for searching without closing the writer.
    *
    * <p>It's near real-time because there is no hard
    * guarantee on how quickly you can get a new reader after
    * making changes with IndexWriter. You'll have to
    * experiment in your situation to determine if it's
    * fast enough. As this is a new and experimental
    * feature, please report back on your findings so we can
    * learn, improve and iterate.</p>
    *
    * <p>The resulting reader supports {@link
    * IndexReader#reopen}, but that call will simply forward
    * back to this method (though this may change in the
    * future).</p>
    *
    * <p>The very first time this method is called, this
    * writer instance will make every effort to pool the
    * readers that it opens for doing merges, applying
    * deletes, etc. This means additional resources (RAM,
    * file descriptors, CPU time) will be consumed.</p>
    *
    * <p>For lower latency on reopening a reader, you should
    * call {@link #setMergedSegmentWarmer} to
    * pre-warm a newly merged segment before it's committed
    * to the index. This is important for minimizing
    * index-to-search delay after a large merge. </p>
    *
    * <p>If an addIndexes* call is running in another thread,
    * then this reader will only search those segments from
    * the foreign index that have been successfully copied
    * over, so far</p>.
    *
    * <p><b>NOTE</b>: Once the writer is closed, any
    * outstanding readers may continue to be used. However,
    * if you attempt to reopen any of those readers, you'll
    * hit an {@link AlreadyClosedException}.</p>
    *
    * <p><b>NOTE:</b> This API is experimental and might
    * change in incompatible ways in the next release.</p>
    *
    * @return IndexReader that covers entire index plus all
    * changes made so far by this IndexWriter instance
    *
    * @throws IOException
    */
    public IndexReader getReader() throws IOException {


    -Yonik
    http://www.lucidimagination.com

    On Mon, Oct 12, 2009 at 4:18 PM, John Wang wrote:
    Oh, that is really good to know!
    Is this deterministic? e.g. as long as writer.addDocument() is called, next
    getReader reflects the change? Does it work with deletes? e.g.
    writer.deleteDocuments()?
    Thanks Mike for clarifying!

    -John

    On Mon, Oct 12, 2009 at 12:11 PM, Michael McCandless <
    lucene@mikemccandless.com> wrote:
    Just to clarify: IndexWriter.newReader returns a reader that searches
    uncommitted changes as well. Ie, you need not call
    IndexWriter.commit
    to make the changes visible.

    However, if you're opening a reader the "normal" way
    (IndexReader.open) then it is necessary to first call
    IndexWriter.commit.

    Mike

    On Mon, Oct 12, 2009 at 5:24 AM, melix <cedric.champeau@lingway.com>
    wrote:
    Hi,

    I'm going to replace an old reader/writer synchronization mechanism
    we
    had
    implemented with the new near realtime search facilities in Lucene
    2.9.
    However, it's still a bit unclear on how to efficiently do it.

    Is the following implementation the good way to do achieve it ? The context
    is concurrent read/writes on an index :

    1. create a Directory instance
    2. create a writer on this directory
    3. on each write request, add document to the writer
    4. on each read request,
    a. use writer.getReader() to obtain an up-to-date reader
    b. create an IndexSearcher with that reader
    c. perform Query
    d. close IndexSearcher
    5. on application close
    a. close writer
    b. close directory

    While this seems to be ok, I'm really wondering about the
    performance
    of
    opening a searcher for each request. I could introduce some kind of
    delay
    and cache a searcher for some seconds, but I'm not sure it's the
    best
    thing
    to do.

    Thanks,

    Cedric


    --
    View this message in context:
    http://www.nabble.com/Realtime-search-best-practices-tp25852756p25852756.html
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Oct 12, 2009 at 9:06 pm
    OK I just committed it -- thanks!

    Mike
    On Mon, Oct 12, 2009 at 5:01 PM, Jake Mannix wrote:
    That seems a lot more straightforward Mike, thanks.

    -jake

    On Mon, Oct 12, 2009 at 1:56 PM, Michael McCandless <
    lucene@mikemccandless.com> wrote:
    I agree, the javadocs could be improved.  How about something like
    this for the first 2 paragraphs:

    * Returns a readonly reader, covering all committed as
    * well as un-committed changes to the index.  This
    * provides "near real-time" searching, in that changes
    * made during an IndexWriter session can be quickly made
    * available for searching without closing the writer nor
    * calling {@link #commit}.
    *
    * <p>Note that this is functionally equivalent to calling
    * {#commit} and then using {@link IndexReader#open} to
    * open a new reader.  But the turarnound time of this
    * method should be faster since it avoids the potentially
    * costly {@link #commit}.<p>

    Mike

    On Mon, Oct 12, 2009 at 4:35 PM, Jake Mannix <jake.mannix@gmail.com>
    wrote:
    Thanks Yonik,

    It may be surprising, but in fact I have read that
    javadoc.  It talks about not needing to close the
    writer, but doesn't specifically talk about the what
    the relationship between commit() calls and
    getReader() calls is.  I suppose I should have
    interpreted:

    "@returns a new reader which contains all
    changes..."

    to mean "all uncommitted changes", but why
    is it so obvious that what could be happening
    is that it only "returns all changes since the last
    commit, but without touching disk because it
    has docs in memory as well"?

    -jake

    On Mon, Oct 12, 2009 at 1:26 PM, Yonik Seeley <
    yonik@lucidimagination.com>wrote:
    Guys, please - you're not new at this... this is what JavaDoc is for:

    /**
    * Returns a readonly reader containing all
    * current updates.  Flush is called automatically.  This
    * provides "near real-time" searching, in that changes
    * made during an IndexWriter session can be made
    * available for searching without closing the writer.
    *
    * <p>It's near real-time because there is no hard
    * guarantee on how quickly you can get a new reader after
    * making changes with IndexWriter.  You'll have to
    * experiment in your situation to determine if it's
    * fast enough.  As this is a new and experimental
    * feature, please report back on your findings so we can
    * learn, improve and iterate.</p>
    *
    * <p>The resulting reader supports {@link
    * IndexReader#reopen}, but that call will simply forward
    * back to this method (though this may change in the
    * future).</p>
    *
    * <p>The very first time this method is called, this
    * writer instance will make every effort to pool the
    * readers that it opens for doing merges, applying
    * deletes, etc.  This means additional resources (RAM,
    * file descriptors, CPU time) will be consumed.</p>
    *
    * <p>For lower latency on reopening a reader, you should
    * call {@link #setMergedSegmentWarmer} to
    * pre-warm a newly merged segment before it's committed
    * to the index.  This is important for minimizing
    * index-to-search delay after a large merge.  </p>
    *
    * <p>If an addIndexes* call is running in another thread,
    * then this reader will only search those segments from
    * the foreign index that have been successfully copied
    * over, so far</p>.
    *
    * <p><b>NOTE</b>: Once the writer is closed, any
    * outstanding readers may continue to be used.  However,
    * if you attempt to reopen any of those readers, you'll
    * hit an {@link AlreadyClosedException}.</p>
    *
    * <p><b>NOTE:</b> This API is experimental and might
    * change in incompatible ways in the next release.</p>
    *
    * @return IndexReader that covers entire index plus all
    * changes made so far by this IndexWriter instance
    *
    * @throws IOException
    */
    public IndexReader getReader() throws IOException {


    -Yonik
    http://www.lucidimagination.com

    On Mon, Oct 12, 2009 at 4:18 PM, John Wang wrote:
    Oh, that is really good to know!
    Is this deterministic? e.g. as long as writer.addDocument() is called, next
    getReader reflects the change? Does it work with deletes? e.g.
    writer.deleteDocuments()?
    Thanks Mike for clarifying!

    -John

    On Mon, Oct 12, 2009 at 12:11 PM, Michael McCandless <
    lucene@mikemccandless.com> wrote:
    Just to clarify: IndexWriter.newReader returns a reader that searches
    uncommitted changes as well.  Ie, you need not call
    IndexWriter.commit
    to make the changes visible.

    However, if you're opening a reader the "normal" way
    (IndexReader.open) then it is necessary to first call
    IndexWriter.commit.

    Mike

    On Mon, Oct 12, 2009 at 5:24 AM, melix <cedric.champeau@lingway.com>
    wrote:
    Hi,

    I'm going to replace an old reader/writer synchronization mechanism
    we
    had
    implemented with the new near realtime search facilities in Lucene
    2.9.
    However, it's still a bit unclear on how to efficiently do it.

    Is the following implementation the good way to do achieve it ? The context
    is concurrent read/writes on an index :

    1. create a Directory instance
    2. create a writer on this directory
    3. on each write request, add document to the writer
    4. on each read request,
    a. use writer.getReader() to obtain an up-to-date reader
    b. create an IndexSearcher with that reader
    c. perform Query
    d. close IndexSearcher
    5. on application close
    a. close writer
    b. close directory

    While this seems to be ok, I'm really wondering about the
    performance
    of
    opening a searcher for each request. I could introduce some kind of
    delay
    and cache a searcher for some seconds, but I'm not sure it's the
    best
    thing
    to do.

    Thanks,

    Cedric


    --
    View this message in context:
    http://www.nabble.com/Realtime-search-best-practices-tp25852756p25852756.html
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jake Mannix at Oct 12, 2009 at 9:47 pm
    I still see some things we might want to document or explain:

    We still need to be careful what the call to "isCurrent()"
    will mean in the future for IndexReaders - as now there is another
    kind of "current" - "current even up to uncommitted changes".

    Imagine the following set of IndexReaders floating around an
    application:
    ------
    1) IndexReader reader = IndexReader.open(diskDir);

    // this reader is certainly current.
    2) assert(reader.isCurrent());

    3) IndexWriter writer = new IndexWriter(diskDir);
    4) writer.addDocument(doc);

    // this reader has access to that doc
    5) IndexReader nrtReader = writer.getReader();

    6) writer.addDocument(doc2);

    // now for the isCurrent() semantics... the disk reader is
    // still current, as of last commit:
    7) assert(reader.isCurrent());

    // as is the nrtReader, even though it has information
    // *past* the most recent commit, but not all of it!
    8) assert(nrtReader.isCurrent());

    // reopen the nrtReader and get access to doc2
    9) nrtReader = writer.getReader();

    // now nrtReader is not only current, but "maximally current"
    10) assert(nrtReader.isCurrent());

    // but what about now?
    11) writer.commit();

    // the disk index reader follows the old ways:
    12) assert(!reader.isCurrent());

    // but what does the nrtReader say?
    // it does not have access to the most recent commit
    // state, as there's been a commit (with documents)
    // since it was opened. But the nrtReader *has* those
    // documents.

    13) assert(!nrtReader.isCurrent());
    -----

    The result of lines 8 and 13 especially seem to show how
    one could get confused on what is meant by current - but
    it maybe is just a naming issue (although line 13 seems
    to be more than that: the nrtReader in that case really is
    up-to-date with disk at this point, and would show exactly
    the results which a freshly opened reader would).

    Maybe people should be advised to not mix and match
    disk readers and IndexWriter supplied ones, and if they
    want NRT search with lucene 2.9+, they grab a reader from
    the IndexWriter upon opening said writer, and then just
    continually call reopen() on it as queries come in
    throughout the life of their application (being careful not
    to close() their writer and thus trigger an
    AlreadyClosedException)?

    -jake

    On Mon, Oct 12, 2009 at 1:56 PM, Michael McCandless wrote:

    I agree, the javadocs could be improved. How about something like
    this for the first 2 paragraphs:

    * Returns a readonly reader, covering all committed as
    * well as un-committed changes to the index. This
    * provides "near real-time" searching, in that changes
    * made during an IndexWriter session can be quickly made
    * available for searching without closing the writer nor
    * calling {@link #commit}.
    *
    * <p>Note that this is functionally equivalent to calling
    * {#commit} and then using {@link IndexReader#open} to
    * open a new reader. But the turarnound time of this
    * method should be faster since it avoids the potentially
    * costly {@link #commit}.<p>

    Mike
    On Mon, Oct 12, 2009 at 4:35 PM, Jake Mannix wrote:
    Thanks Yonik,

    It may be surprising, but in fact I have read that
    javadoc. It talks about not needing to close the
    writer, but doesn't specifically talk about the what
    the relationship between commit() calls and
    getReader() calls is. I suppose I should have
    interpreted:

    "@returns a new reader which contains all
    changes..."

    to mean "all uncommitted changes", but why
    is it so obvious that what could be happening
    is that it only "returns all changes since the last
    commit, but without touching disk because it
    has docs in memory as well"?

    -jake

    On Mon, Oct 12, 2009 at 1:26 PM, Yonik Seeley <
    yonik@lucidimagination.com>wrote:
    Guys, please - you're not new at this... this is what JavaDoc is for:

    /**
    * Returns a readonly reader containing all
    * current updates. Flush is called automatically. This
    * provides "near real-time" searching, in that changes
    * made during an IndexWriter session can be made
    * available for searching without closing the writer.
    *
    * <p>It's near real-time because there is no hard
    * guarantee on how quickly you can get a new reader after
    * making changes with IndexWriter. You'll have to
    * experiment in your situation to determine if it's
    * fast enough. As this is a new and experimental
    * feature, please report back on your findings so we can
    * learn, improve and iterate.</p>
    *
    * <p>The resulting reader supports {@link
    * IndexReader#reopen}, but that call will simply forward
    * back to this method (though this may change in the
    * future).</p>
    *
    * <p>The very first time this method is called, this
    * writer instance will make every effort to pool the
    * readers that it opens for doing merges, applying
    * deletes, etc. This means additional resources (RAM,
    * file descriptors, CPU time) will be consumed.</p>
    *
    * <p>For lower latency on reopening a reader, you should
    * call {@link #setMergedSegmentWarmer} to
    * pre-warm a newly merged segment before it's committed
    * to the index. This is important for minimizing
    * index-to-search delay after a large merge. </p>
    *
    * <p>If an addIndexes* call is running in another thread,
    * then this reader will only search those segments from
    * the foreign index that have been successfully copied
    * over, so far</p>.
    *
    * <p><b>NOTE</b>: Once the writer is closed, any
    * outstanding readers may continue to be used. However,
    * if you attempt to reopen any of those readers, you'll
    * hit an {@link AlreadyClosedException}.</p>
    *
    * <p><b>NOTE:</b> This API is experimental and might
    * change in incompatible ways in the next release.</p>
    *
    * @return IndexReader that covers entire index plus all
    * changes made so far by this IndexWriter instance
    *
    * @throws IOException
    */
    public IndexReader getReader() throws IOException {


    -Yonik
    http://www.lucidimagination.com

    On Mon, Oct 12, 2009 at 4:18 PM, John Wang wrote:
    Oh, that is really good to know!
    Is this deterministic? e.g. as long as writer.addDocument() is called, next
    getReader reflects the change? Does it work with deletes? e.g.
    writer.deleteDocuments()?
    Thanks Mike for clarifying!

    -John

    On Mon, Oct 12, 2009 at 12:11 PM, Michael McCandless <
    lucene@mikemccandless.com> wrote:
    Just to clarify: IndexWriter.newReader returns a reader that searches
    uncommitted changes as well. Ie, you need not call
    IndexWriter.commit
    to make the changes visible.

    However, if you're opening a reader the "normal" way
    (IndexReader.open) then it is necessary to first call
    IndexWriter.commit.

    Mike

    On Mon, Oct 12, 2009 at 5:24 AM, melix <cedric.champeau@lingway.com>
    wrote:
    Hi,

    I'm going to replace an old reader/writer synchronization mechanism
    we
    had
    implemented with the new near realtime search facilities in Lucene
    2.9.
    However, it's still a bit unclear on how to efficiently do it.

    Is the following implementation the good way to do achieve it ? The context
    is concurrent read/writes on an index :

    1. create a Directory instance
    2. create a writer on this directory
    3. on each write request, add document to the writer
    4. on each read request,
    a. use writer.getReader() to obtain an up-to-date reader
    b. create an IndexSearcher with that reader
    c. perform Query
    d. close IndexSearcher
    5. on application close
    a. close writer
    b. close directory

    While this seems to be ok, I'm really wondering about the
    performance
    of
    opening a searcher for each request. I could introduce some kind of
    delay
    and cache a searcher for some seconds, but I'm not sure it's the
    best
    thing
    to do.

    Thanks,

    Cedric


    --
    View this message in context:
    http://www.nabble.com/Realtime-search-best-practices-tp25852756p25852756.html
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Yonik Seeley at Oct 12, 2009 at 10:13 pm
    Good point on isCurrent - I think it should only be with respect to
    the latest index commit point? and we should clarify that in the
    javadoc.

    [...]
    // but what does the nrtReader say?
    // it does not have access to the most recent commit
    // state, as there's been a commit (with documents)
    // since it was opened.  But the nrtReader *has* those
    // documents.
    I think we keep it simple - the nrtReader.isCurrent() would return
    false after a commit is called.
    Yes, isCurrent() is no longer such a great name.

    -Yonik
    http://www.lucidimagination.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ganesh at Oct 13, 2009 at 9:24 am
    Hello all,

    In case of 2.4.1, the reader after reopen, will be warmed before actual use. In 2.9, public void setMergedSegmentWarmer(IndexWriter.IndexReaderWarmer warmer), does warming when we do getReader().

    If we do getReader() for every request then whether it will reduce the search performance?

    Does warming necessarly required in 2.9? If we do warming for the very first time is not enough? Do we need to do it on every request?

    Regards
    Ganesh

    ----- Original Message -----
    From: "Yonik Seeley" <yonik@lucidimagination.com>
    To: <java-user@lucene.apache.org>
    Sent: Tuesday, October 13, 2009 3:42 AM
    Subject: Re: Realtime search best practices


    Good point on isCurrent - I think it should only be with respect to
    the latest index commit point? and we should clarify that in the
    javadoc.

    [...]
    // but what does the nrtReader say?
    // it does not have access to the most recent commit
    // state, as there's been a commit (with documents)
    // since it was opened. But the nrtReader *has* those
    // documents.
    I think we keep it simple - the nrtReader.isCurrent() would return
    false after a commit is called.
    Yes, isCurrent() is no longer such a great name.

    -Yonik
    http://www.lucidimagination.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    Send instant messages to your online friends http://in.messenger.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Oct 13, 2009 at 10:21 am

    On Tue, Oct 13, 2009 at 5:23 AM, Ganesh wrote:

    In case of 2.4.1, the reader after reopen, will be warmed before actual use.
    You mean you must warm it after you call reopen, before using it, right?
    In 2.9, public void setMergedSegmentWarmer(IndexWriter.IndexReaderWarmer warmer), does warming when we do getReader().
    Right, and this is better that doing your own warming after calling
    getReader because warming of newly merged segments won't block your
    near real-time turnaround.
    If we do getReader() for every request then whether it will reduce the search performance?
    For every search request? Yes this will always reduce performance,
    even worse than simply calling reopen for every search request,
    because getReader() forces the writer to flush a new segment.
    Does warming necessarly required in 2.9? If we do warming for the very first time is not enough? Do we need to do it on every request?
    It's not "required", but if you don't do it it means the first search
    to land after a getReader will pay that warming cost.

    Often this cost is negligible. But, rarely, once a very large segment
    merge has completed, the warming of that newly merged segment could be
    very large. This is heavily dependent on the size of your index,
    whether your queries are using the FieldCache (doing field sorting, or
    using function queries), etc.

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Oct 13, 2009 at 10:05 am
    I agree isCurrent doesn't work right for an NRT reader. Right now, it
    will always return "true" because it's sharing the segmentInfos in use
    by the writer.

    Similarly, getVersion will lie.

    I'll open an issue to track how to fix it.

    Mike

    On Mon, Oct 12, 2009 at 6:12 PM, Yonik Seeley
    wrote:
    Good point on isCurrent - I think it should only be with respect to
    the latest index commit point? and we should clarify that in the
    javadoc.

    [...]
    // but what does the nrtReader say?
    // it does not have access to the most recent commit
    // state, as there's been a commit (with documents)
    // since it was opened.  But the nrtReader *has* those
    // documents.
    I think we keep it simple - the nrtReader.isCurrent() would return
    false after a commit is called.
    Yes, isCurrent() is no longer such a great name.

    -Yonik
    http://www.lucidimagination.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Oct 13, 2009 at 10:12 am
    OK I opened https://issues.apache.org/jira/browse/LUCENE-1976.

    Mike

    On Tue, Oct 13, 2009 at 6:05 AM, Michael McCandless
    wrote:
    I agree isCurrent doesn't work right for an NRT reader.  Right now, it
    will always return "true" because it's sharing the segmentInfos in use
    by the writer.

    Similarly, getVersion will lie.

    I'll open an issue to track how to fix it.

    Mike

    On Mon, Oct 12, 2009 at 6:12 PM, Yonik Seeley
    wrote:
    Good point on isCurrent - I think it should only be with respect to
    the latest index commit point? and we should clarify that in the
    javadoc.

    [...]
    // but what does the nrtReader say?
    // it does not have access to the most recent commit
    // state, as there's been a commit (with documents)
    // since it was opened.  But the nrtReader *has* those
    // documents.
    I think we keep it simple - the nrtReader.isCurrent() would return
    false after a commit is called.
    Yes, isCurrent() is no longer such a great name.

    -Yonik
    http://www.lucidimagination.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Yonik Seeley at Oct 12, 2009 at 8:57 pm

    On Mon, Oct 12, 2009 at 4:35 PM, Jake Mannix wrote:
    It may be surprising, but in fact I have read that
    javadoc.
    It was not your email I responded to.
    It talks about not needing to close the
    writer, but doesn't specifically talk about the what
    the relationship between commit() calls and
    getReader() calls is.
    Do you have a suggestion of how to update the JavaDoc?
    I'm not sure I understand the relationship between commit and
    getReader that you refer to.
    , but why
    is it so obvious that what could be happening
    is that it only "returns all changes since the last
    commit, but without touching disk because it
    has docs in memory as well"?
    Sorry, this seems confusing - I'm not sure what you're trying to say.
    Perhaps we should approach this as proposed javadoc changes?

    -Yonik
    http://www.lucidimagination.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jake Mannix at Oct 12, 2009 at 9:09 pm

    On Mon, Oct 12, 2009 at 1:57 PM, Yonik Seeley wrote:
    On Mon, Oct 12, 2009 at 4:35 PM, Jake Mannix wrote:
    It may be surprising, but in fact I have read that
    javadoc.
    It was not your email I responded to.
    Sorry, my bad then - you said "guys" and John and I were the last two to be
    asking questions / commenting on this thread.

    It talks about not needing to close the
    writer, but doesn't specifically talk about the what
    the relationship between commit() calls and
    getReader() calls is.
    Do you have a suggestion of how to update the JavaDoc?
    I'm not sure I understand the relationship between commit and
    getReader that you refer to.
    I like Mike's clarification to the first two javadocs he just posted,
    very concise.

    -jake
  • John Wang at Oct 12, 2009 at 9:10 pm
    I think it was my email Yonik responded to and he is right, I was being lazy
    and didn't read the javadoc very carefully.My bad.
    Thanks for the javadoc change.

    -John
    On Mon, Oct 12, 2009 at 1:57 PM, Yonik Seeley wrote:
    On Mon, Oct 12, 2009 at 4:35 PM, Jake Mannix wrote:
    It may be surprising, but in fact I have read that
    javadoc.
    It was not your email I responded to.
    It talks about not needing to close the
    writer, but doesn't specifically talk about the what
    the relationship between commit() calls and
    getReader() calls is.
    Do you have a suggestion of how to update the JavaDoc?
    I'm not sure I understand the relationship between commit and
    getReader that you refer to.
    , but why
    is it so obvious that what could be happening
    is that it only "returns all changes since the last
    commit, but without touching disk because it
    has docs in memory as well"?
    Sorry, this seems confusing - I'm not sure what you're trying to say.
    Perhaps we should approach this as proposed javadoc changes?

    -Yonik
    http://www.lucidimagination.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Melix at Oct 12, 2009 at 10:04 pm
    Ok, thanks for the details. I see I'm not the only one finding the javadoc
    hard to understand. While this is well documented, it's still not clear
    enough about the exact semantics of "changes" : at first I thought it
    returned an IndexReader on the *uncommited changes only*, which meant it did
    not include commited ones. Well, it should have been obvious that I couldn't
    do anything with such a reader but you know ;)

    I'll try to implement something on that. I think it won't be so difficult as
    I've got many writes and less reads. It means that the performance penalty
    of creating a searcher should be acceptable. However, I'll keep you in
    touch.


    Jake Mannix wrote:
    Thanks Yonik,

    It may be surprising, but in fact I have read that
    javadoc. It talks about not needing to close the
    writer, but doesn't specifically talk about the what
    the relationship between commit() calls and
    getReader() calls is. I suppose I should have
    interpreted:

    "@returns a new reader which contains all
    changes..."

    to mean "all uncommitted changes", but why
    is it so obvious that what could be happening
    is that it only "returns all changes since the last
    commit, but without touching disk because it
    has docs in memory as well"?

    -jake

    On Mon, Oct 12, 2009 at 1:26 PM, Yonik Seeley
    wrote:
    Guys, please - you're not new at this... this is what JavaDoc is for:

    /**
    * Returns a readonly reader containing all
    * current updates. Flush is called automatically. This
    * provides "near real-time" searching, in that changes
    * made during an IndexWriter session can be made
    * available for searching without closing the writer.
    *
    * <p>It's near real-time because there is no hard
    * guarantee on how quickly you can get a new reader after
    * making changes with IndexWriter. You'll have to
    * experiment in your situation to determine if it's
    * fast enough. As this is a new and experimental
    * feature, please report back on your findings so we can
    * learn, improve and iterate.</p>
    *
    * <p>The resulting reader supports {@link
    * IndexReader#reopen}, but that call will simply forward
    * back to this method (though this may change in the
    * future).</p>
    *
    * <p>The very first time this method is called, this
    * writer instance will make every effort to pool the
    * readers that it opens for doing merges, applying
    * deletes, etc. This means additional resources (RAM,
    * file descriptors, CPU time) will be consumed.</p>
    *
    * <p>For lower latency on reopening a reader, you should
    * call {@link #setMergedSegmentWarmer} to
    * pre-warm a newly merged segment before it's committed
    * to the index. This is important for minimizing
    * index-to-search delay after a large merge. </p>
    *
    * <p>If an addIndexes* call is running in another thread,
    * then this reader will only search those segments from
    * the foreign index that have been successfully copied
    * over, so far</p>.
    *
    * <p>NOTE: Once the writer is closed, any
    * outstanding readers may continue to be used. However,
    * if you attempt to reopen any of those readers, you'll
    * hit an {@link AlreadyClosedException}.</p>
    *
    * <p>NOTE: This API is experimental and might
    * change in incompatible ways in the next release.</p>
    *
    * @return IndexReader that covers entire index plus all
    * changes made so far by this IndexWriter instance
    *
    * @throws IOException
    */
    public IndexReader getReader() throws IOException {


    -Yonik
    http://www.lucidimagination.com

    On Mon, Oct 12, 2009 at 4:18 PM, John Wang wrote:
    Oh, that is really good to know!
    Is this deterministic? e.g. as long as writer.addDocument() is called, next
    getReader reflects the change? Does it work with deletes? e.g.
    writer.deleteDocuments()?
    Thanks Mike for clarifying!

    -John

    On Mon, Oct 12, 2009 at 12:11 PM, Michael McCandless <
    lucene@mikemccandless.com> wrote:
    Just to clarify: IndexWriter.newReader returns a reader that searches
    uncommitted changes as well. Ie, you need not call IndexWriter.commit
    to make the changes visible.

    However, if you're opening a reader the "normal" way
    (IndexReader.open) then it is necessary to first call
    IndexWriter.commit.

    Mike

    On Mon, Oct 12, 2009 at 5:24 AM, melix <cedric.champeau@lingway.com>
    wrote:
    Hi,

    I'm going to replace an old reader/writer synchronization mechanism
    we
    had
    implemented with the new near realtime search facilities in Lucene
    2.9.
    However, it's still a bit unclear on how to efficiently do it.

    Is the following implementation the good way to do achieve it ? The context
    is concurrent read/writes on an index :

    1. create a Directory instance
    2. create a writer on this directory
    3. on each write request, add document to the writer
    4. on each read request,
    a. use writer.getReader() to obtain an up-to-date reader
    b. create an IndexSearcher with that reader
    c. perform Query
    d. close IndexSearcher
    5. on application close
    a. close writer
    b. close directory

    While this seems to be ok, I'm really wondering about the
    performance
    of
    opening a searcher for each request. I could introduce some kind of
    delay
    and cache a searcher for some seconds, but I'm not sure it's the
    best
    thing
    to do.

    Thanks,

    Cedric


    --
    View this message in context:
    http://www.nabble.com/Realtime-search-best-practices-tp25852756p25852756.html
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    View this message in context: http://www.nabble.com/Realtime-search-best-practices-tp25852756p25863095.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jason Rutherglen at Oct 12, 2009 at 8:49 pm
    Hi Cedric,

    There is a wiki page on NRT at:
    http://wiki.apache.org/lucene-java/NearRealtimeSearch

    Feel free tp ask questions if there's not enough information.

    -J
    On Mon, Oct 12, 2009 at 2:24 AM, melix wrote:

    Hi,

    I'm going to replace an old reader/writer synchronization mechanism we had
    implemented with the new near realtime search facilities in Lucene 2.9.
    However, it's still a bit unclear on how to efficiently do it.

    Is the following implementation the good way to do achieve it ? The context
    is concurrent read/writes on an index :

    1. create a Directory instance
    2. create a writer on this directory
    3. on each write request, add document to the writer
    4. on each read request,
    a. use writer.getReader() to obtain an up-to-date reader
    b. create an IndexSearcher with that reader
    c. perform Query
    d. close IndexSearcher
    5. on application close
    a. close writer
    b. close directory

    While this seems to be ok, I'm really wondering about the performance of
    opening a searcher for each request. I could introduce some kind of delay
    and cache a searcher for some seconds, but I'm not sure it's the best thing
    to do.

    Thanks,

    Cedric


    --
    View this message in context: http://www.nabble.com/Realtime-search-best-practices-tp25852756p25852756.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedOct 12, '09 at 9:25a
activeOct 13, '09 at 10:21a
posts23
users7
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase