FAQ
When rc3 came out, I modified the classes used for
Sorting to, in addition to Integer, Float and
String-based sort keys, use Long values. All I did
was add extra statements in 2 classes (SortField and
FieldSortedHitQueue) that made a special case for
longs, and created a LongSortedHitQueue identical to
the IntegerSortedHitQueue, only using longs.

This worked as expected; Long values converted to
strings and stored in Field.Keyword type fields would
be sorted according to Long order. The initial query
would take a while, to build the sorted array, but
subsequent queries would take little to no time at
all.

I went back to look at 1.4 final, and noticed the Sort
implementation has changed quite a bit. I tried the
same type of modifications to the existing source
files, but was unable to achieve similiar results.
Each subsequent query seems to take a significant
amount of time, as if the Sorted array is being
rebuilt each time. Also, I tried sorting on an
Integer fields and got similar results, which leads me
to believe there might be a caching problem somewhere.

Has anyone else seen this in 1.4-final? Also, I would
like it if Long sorted fields could become a part of
the API; it makes sorting by date a breeze.

Thanks!

Greg Gershman



__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - Send 10MB messages!
http://promotions.yahoo.com/new_mail

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Search Discussions

  • Aviran at Jul 21, 2004 at 1:35 pm
    Since I had to implement sorting in lucene 1.2 I had to write my own sorting
    using something similar to a lucene's contribution called SortField.
    Yesterday I did some tests, trying to use lucene 1.4 Sort objects and I
    realized that my old implementation works 40% faster then Lucene's
    implementation. My guess is that you are right and there is a problem with
    the cache although I couldn't find what that is yet.

    Aviran

    -----Original Message-----
    From: Greg Gershman
    Sent: Wednesday, July 21, 2004 9:22 AM
    To: lucene-user@jakarta.apache.org
    Subject: Sort: 1.4-rc3 vs. 1.4-final


    When rc3 came out, I modified the classes used for
    Sorting to, in addition to Integer, Float and
    String-based sort keys, use Long values. All I did
    was add extra statements in 2 classes (SortField and
    FieldSortedHitQueue) that made a special case for
    longs, and created a LongSortedHitQueue identical to
    the IntegerSortedHitQueue, only using longs.

    This worked as expected; Long values converted to
    strings and stored in Field.Keyword type fields would
    be sorted according to Long order. The initial query
    would take a while, to build the sorted array, but
    subsequent queries would take little to no time at
    all.

    I went back to look at 1.4 final, and noticed the Sort implementation has
    changed quite a bit. I tried the same type of modifications to the existing
    source files, but was unable to achieve similiar results.
    Each subsequent query seems to take a significant
    amount of time, as if the Sorted array is being
    rebuilt each time. Also, I tried sorting on an
    Integer fields and got similar results, which leads me
    to believe there might be a caching problem somewhere.

    Has anyone else seen this in 1.4-final? Also, I would
    like it if Long sorted fields could become a part of
    the API; it makes sorting by date a breeze.

    Thanks!

    Greg Gershman



    __________________________________
    Do you Yahoo!?
    New and Improved Yahoo! Mail - Send 10MB messages!
    http://promotions.yahoo.com/new_mail

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Greg Gershman at Jul 21, 2004 at 5:12 pm
    I've done a bit more snooping around; it seems that in
    FieldSortedHitQueue.getCachedComparator(line 153),
    calls to lookup a stored comparator in the cache
    always return null. This occurs even for the built-in
    sort types (I tested it on integers and my code for
    longs). The comparators don't even appear to be being
    stored in the HashMap to begin with.

    Any ideas?

    Greg



    --- Aviran wrote:
    Since I had to implement sorting in lucene 1.2 I had
    to write my own sorting
    using something similar to a lucene's contribution
    called SortField.
    Yesterday I did some tests, trying to use lucene 1.4
    Sort objects and I
    realized that my old implementation works 40% faster
    then Lucene's
    implementation. My guess is that you are right and
    there is a problem with
    the cache although I couldn't find what that is yet.

    Aviran

    -----Original Message-----
    From: Greg Gershman
    Sent: Wednesday, July 21, 2004 9:22 AM
    To: lucene-user@jakarta.apache.org
    Subject: Sort: 1.4-rc3 vs. 1.4-final


    When rc3 came out, I modified the classes used for
    Sorting to, in addition to Integer, Float and
    String-based sort keys, use Long values. All I did
    was add extra statements in 2 classes (SortField and
    FieldSortedHitQueue) that made a special case for
    longs, and created a LongSortedHitQueue identical to
    the IntegerSortedHitQueue, only using longs.

    This worked as expected; Long values converted to
    strings and stored in Field.Keyword type fields
    would
    be sorted according to Long order. The initial
    query
    would take a while, to build the sorted array, but
    subsequent queries would take little to no time at
    all.

    I went back to look at 1.4 final, and noticed the
    Sort implementation has
    changed quite a bit. I tried the same type of
    modifications to the existing
    source files, but was unable to achieve similiar
    results.
    Each subsequent query seems to take a significant
    amount of time, as if the Sorted array is being
    rebuilt each time. Also, I tried sorting on an
    Integer fields and got similar results, which leads
    me
    to believe there might be a caching problem
    somewhere.

    Has anyone else seen this in 1.4-final? Also, I
    would
    like it if Long sorted fields could become a part of
    the API; it makes sorting by date a breeze.

    Thanks!

    Greg Gershman



    __________________________________
    Do you Yahoo!?
    New and Improved Yahoo! Mail - Send 10MB messages!
    http://promotions.yahoo.com/new_mail

    ---------------------------------------------------------------------
    To unsubscribe, e-mail:
    lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail:
    lucene-user-help@jakarta.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail:
    lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail:
    lucene-user-help@jakarta.apache.org




    __________________________________
    Do you Yahoo!?
    Vote for the stars of Yahoo!'s next ad campaign!
    http://advision.webevents.yahoo.com/yahoo/votelifeengine/

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Greg Gershman at Jul 21, 2004 at 5:30 pm
    I switched the Comparators and FieldCache classes to
    use java.util.HashMap instead of
    java.util.WeakHashMap, and got the performance boost I
    was looking for (test index of 100K documents; initial
    search took 991 ms, all subsequent searchs took <
    90ms. Before, I was seeing initial query of ~1sec,
    subsequent queries between 500 and 700 ms, with
    comparator and field lookup table computed each time).

    I guess the question is why use a WeakHashMap here as
    opposed to a HashMap?

    Greg

    --- Greg Gershman wrote:
    I've done a bit more snooping around; it seems that
    in
    FieldSortedHitQueue.getCachedComparator(line 153),
    calls to lookup a stored comparator in the cache
    always return null. This occurs even for the
    built-in
    sort types (I tested it on integers and my code for
    longs). The comparators don't even appear to be
    being
    stored in the HashMap to begin with.

    Any ideas?

    Greg



    --- Aviran wrote:
    Since I had to implement sorting in lucene 1.2 I had
    to write my own sorting
    using something similar to a lucene's contribution
    called SortField.
    Yesterday I did some tests, trying to use lucene 1.4
    Sort objects and I
    realized that my old implementation works 40% faster
    then Lucene's
    implementation. My guess is that you are right and
    there is a problem with
    the cache although I couldn't find what that is yet.
    Aviran

    -----Original Message-----
    From: Greg Gershman
    Sent: Wednesday, July 21, 2004 9:22 AM
    To: lucene-user@jakarta.apache.org
    Subject: Sort: 1.4-rc3 vs. 1.4-final


    When rc3 came out, I modified the classes used for
    Sorting to, in addition to Integer, Float and
    String-based sort keys, use Long values. All I did
    was add extra statements in 2 classes (SortField and
    FieldSortedHitQueue) that made a special case for
    longs, and created a LongSortedHitQueue identical to
    the IntegerSortedHitQueue, only using longs.

    This worked as expected; Long values converted to
    strings and stored in Field.Keyword type fields
    would
    be sorted according to Long order. The initial
    query
    would take a while, to build the sorted array, but
    subsequent queries would take little to no time at
    all.

    I went back to look at 1.4 final, and noticed the
    Sort implementation has
    changed quite a bit. I tried the same type of
    modifications to the existing
    source files, but was unable to achieve similiar
    results.
    Each subsequent query seems to take a significant
    amount of time, as if the Sorted array is being
    rebuilt each time. Also, I tried sorting on an
    Integer fields and got similar results, which leads
    me
    to believe there might be a caching problem
    somewhere.

    Has anyone else seen this in 1.4-final? Also, I
    would
    like it if Long sorted fields could become a part of
    the API; it makes sorting by date a breeze.

    Thanks!

    Greg Gershman



    __________________________________
    Do you Yahoo!?
    New and Improved Yahoo! Mail - Send 10MB messages!
    http://promotions.yahoo.com/new_mail
    ---------------------------------------------------------------------
    To unsubscribe, e-mail:
    lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail:
    lucene-user-help@jakarta.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail:
    lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail:
    lucene-user-help@jakarta.apache.org




    __________________________________
    Do you Yahoo!?
    Vote for the stars of Yahoo!'s next ad campaign!
    http://advision.webevents.yahoo.com/yahoo/votelifeengine/
    ---------------------------------------------------------------------
    To unsubscribe, e-mail:
    lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail:
    lucene-user-help@jakarta.apache.org




    __________________________________
    Do you Yahoo!?
    Vote for the stars of Yahoo!'s next ad campaign!
    http://advision.webevents.yahoo.com/yahoo/votelifeengine/

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Aviran at Jul 21, 2004 at 5:45 pm
    I just saw this post, I guess we both came to the same conclusion.
    The only problem is that the cached object never gets released, and a new
    one will get created every time you open a new IndexReader

    Aviran

    -----Original Message-----
    From: Greg Gershman
    Sent: Wednesday, July 21, 2004 13:30 PM
    To: Lucene Users List
    Subject: RE: Sort: 1.4-rc3 vs. 1.4-final


    I switched the Comparators and FieldCache classes to
    use java.util.HashMap instead of
    java.util.WeakHashMap, and got the performance boost I
    was looking for (test index of 100K documents; initial
    search took 991 ms, all subsequent searchs took <
    90ms. Before, I was seeing initial query of ~1sec,
    subsequent queries between 500 and 700 ms, with
    comparator and field lookup table computed each time).

    I guess the question is why use a WeakHashMap here as
    opposed to a HashMap?

    Greg

    --- Greg Gershman wrote:
    I've done a bit more snooping around; it seems that
    in
    FieldSortedHitQueue.getCachedComparator(line 153),
    calls to lookup a stored comparator in the cache
    always return null. This occurs even for the
    built-in
    sort types (I tested it on integers and my code for
    longs). The comparators don't even appear to be
    being
    stored in the HashMap to begin with.

    Any ideas?

    Greg



    --- Aviran wrote:
    Since I had to implement sorting in lucene 1.2 I had
    to write my own sorting
    using something similar to a lucene's contribution
    called SortField.
    Yesterday I did some tests, trying to use lucene 1.4
    Sort objects and I
    realized that my old implementation works 40% faster
    then Lucene's
    implementation. My guess is that you are right and
    there is a problem with
    the cache although I couldn't find what that is yet.
    Aviran

    -----Original Message-----
    From: Greg Gershman
    Sent: Wednesday, July 21, 2004 9:22 AM
    To: lucene-user@jakarta.apache.org
    Subject: Sort: 1.4-rc3 vs. 1.4-final


    When rc3 came out, I modified the classes used for
    Sorting to, in addition to Integer, Float and
    String-based sort keys, use Long values. All I did
    was add extra statements in 2 classes (SortField and
    FieldSortedHitQueue) that made a special case for
    longs, and created a LongSortedHitQueue identical to
    the IntegerSortedHitQueue, only using longs.

    This worked as expected; Long values converted to
    strings and stored in Field.Keyword type fields
    would
    be sorted according to Long order. The initial
    query
    would take a while, to build the sorted array, but subsequent
    queries would take little to no time at all.

    I went back to look at 1.4 final, and noticed the
    Sort implementation has
    changed quite a bit. I tried the same type of modifications to the
    existing source files, but was unable to achieve similiar
    results.
    Each subsequent query seems to take a significant
    amount of time, as if the Sorted array is being
    rebuilt each time. Also, I tried sorting on an
    Integer fields and got similar results, which leads
    me
    to believe there might be a caching problem
    somewhere.

    Has anyone else seen this in 1.4-final? Also, I
    would
    like it if Long sorted fields could become a part of
    the API; it makes sorting by date a breeze.

    Thanks!

    Greg Gershman



    __________________________________
    Do you Yahoo!?
    New and Improved Yahoo! Mail - Send 10MB messages!
    http://promotions.yahoo.com/new_mail
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail:
    lucene-user-help@jakarta.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail:
    lucene-user-help@jakarta.apache.org




    __________________________________
    Do you Yahoo!?
    Vote for the stars of Yahoo!'s next ad campaign!
    http://advision.webevents.yahoo.com/yahoo/votelifeengine/
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail:
    lucene-user-help@jakarta.apache.org




    __________________________________
    Do you Yahoo!?
    Vote for the stars of Yahoo!'s next ad campaign!
    http://advision.webevents.yahoo.com/yahoo/votelifeengine/

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Aviran at Jul 21, 2004 at 5:41 pm
    I think I found the problem
    FieldCacheImpl uses WeakHashMap to store the cached objects, but since there
    is no other reference to this cache it is getting released.
    Switching to HashMap solves it.
    The only problem is that I don't see anywhere where the cached object will
    get released if you open a new IndexReader.

    Aviran

    -----Original Message-----
    From: Greg Gershman
    Sent: Wednesday, July 21, 2004 13:13 PM
    To: Lucene Users List
    Subject: RE: Sort: 1.4-rc3 vs. 1.4-final


    I've done a bit more snooping around; it seems that in
    FieldSortedHitQueue.getCachedComparator(line 153), calls to lookup a stored
    comparator in the cache always return null. This occurs even for the
    built-in sort types (I tested it on integers and my code for longs). The
    comparators don't even appear to be being stored in the HashMap to begin
    with.

    Any ideas?

    Greg



    --- Aviran wrote:
    Since I had to implement sorting in lucene 1.2 I had
    to write my own sorting
    using something similar to a lucene's contribution
    called SortField.
    Yesterday I did some tests, trying to use lucene 1.4
    Sort objects and I
    realized that my old implementation works 40% faster
    then Lucene's
    implementation. My guess is that you are right and
    there is a problem with
    the cache although I couldn't find what that is yet.

    Aviran

    -----Original Message-----
    From: Greg Gershman
    Sent: Wednesday, July 21, 2004 9:22 AM
    To: lucene-user@jakarta.apache.org
    Subject: Sort: 1.4-rc3 vs. 1.4-final


    When rc3 came out, I modified the classes used for
    Sorting to, in addition to Integer, Float and
    String-based sort keys, use Long values. All I did
    was add extra statements in 2 classes (SortField and
    FieldSortedHitQueue) that made a special case for
    longs, and created a LongSortedHitQueue identical to
    the IntegerSortedHitQueue, only using longs.

    This worked as expected; Long values converted to
    strings and stored in Field.Keyword type fields
    would
    be sorted according to Long order. The initial
    query
    would take a while, to build the sorted array, but
    subsequent queries would take little to no time at
    all.

    I went back to look at 1.4 final, and noticed the
    Sort implementation has
    changed quite a bit. I tried the same type of
    modifications to the existing
    source files, but was unable to achieve similiar
    results.
    Each subsequent query seems to take a significant
    amount of time, as if the Sorted array is being
    rebuilt each time. Also, I tried sorting on an
    Integer fields and got similar results, which leads
    me
    to believe there might be a caching problem
    somewhere.

    Has anyone else seen this in 1.4-final? Also, I
    would
    like it if Long sorted fields could become a part of
    the API; it makes sorting by date a breeze.

    Thanks!

    Greg Gershman



    __________________________________
    Do you Yahoo!?
    New and Improved Yahoo! Mail - Send 10MB messages!
    http://promotions.yahoo.com/new_mail

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org




    __________________________________
    Do you Yahoo!?
    Vote for the stars of Yahoo!'s next ad campaign!
    http://advision.webevents.yahoo.com/yahoo/votelifeengine/

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Doug Cutting at Jul 21, 2004 at 5:57 pm
    The key in the WeakHashMap should be the IndexReader, not the Entry. I
    think this should become a two-level cache, a WeakHashMap of HashMaps,
    the WeakHashMap keyed by IndexReader, the HashMap keyed by Entry. I
    think the Entry class can also be changed to not include an IndexReader
    field. Does this make sense? Would someone like to construct a patch
    and submit it to the developer list?

    Doug

    Aviran wrote:
    I think I found the problem
    FieldCacheImpl uses WeakHashMap to store the cached objects, but since there
    is no other reference to this cache it is getting released.
    Switching to HashMap solves it.
    The only problem is that I don't see anywhere where the cached object will
    get released if you open a new IndexReader.

    Aviran

    -----Original Message-----
    From: Greg Gershman
    Sent: Wednesday, July 21, 2004 13:13 PM
    To: Lucene Users List
    Subject: RE: Sort: 1.4-rc3 vs. 1.4-final


    I've done a bit more snooping around; it seems that in
    FieldSortedHitQueue.getCachedComparator(line 153), calls to lookup a stored
    comparator in the cache always return null. This occurs even for the
    built-in sort types (I tested it on integers and my code for longs). The
    comparators don't even appear to be being stored in the HashMap to begin
    with.

    Any ideas?

    Greg



    --- Aviran wrote:
    Since I had to implement sorting in lucene 1.2 I had
    to write my own sorting
    using something similar to a lucene's contribution
    called SortField.
    Yesterday I did some tests, trying to use lucene 1.4
    Sort objects and I
    realized that my old implementation works 40% faster
    then Lucene's
    implementation. My guess is that you are right and
    there is a problem with
    the cache although I couldn't find what that is yet.

    Aviran

    -----Original Message-----
    From: Greg Gershman
    Sent: Wednesday, July 21, 2004 9:22 AM
    To: lucene-user@jakarta.apache.org
    Subject: Sort: 1.4-rc3 vs. 1.4-final


    When rc3 came out, I modified the classes used for
    Sorting to, in addition to Integer, Float and
    String-based sort keys, use Long values. All I did
    was add extra statements in 2 classes (SortField and
    FieldSortedHitQueue) that made a special case for
    longs, and created a LongSortedHitQueue identical to
    the IntegerSortedHitQueue, only using longs.

    This worked as expected; Long values converted to
    strings and stored in Field.Keyword type fields
    would
    be sorted according to Long order. The initial
    query
    would take a while, to build the sorted array, but
    subsequent queries would take little to no time at
    all.

    I went back to look at 1.4 final, and noticed the
    Sort implementation has
    changed quite a bit. I tried the same type of
    modifications to the existing
    source files, but was unable to achieve similiar
    results.
    Each subsequent query seems to take a significant
    amount of time, as if the Sorted array is being
    rebuilt each time. Also, I tried sorting on an
    Integer fields and got similar results, which leads
    me
    to believe there might be a caching problem
    somewhere.

    Has anyone else seen this in 1.4-final? Also, I
    would
    like it if Long sorted fields could become a part of
    the API; it makes sorting by date a breeze.

    Thanks!

    Greg Gershman



    __________________________________
    Do you Yahoo!?
    New and Improved Yahoo! Mail - Send 10MB messages!
    http://promotions.yahoo.com/new_mail

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org





    __________________________________
    Do you Yahoo!?
    Vote for the stars of Yahoo!'s next ad campaign!
    http://advision.webevents.yahoo.com/yahoo/votelifeengine/

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Aviran at Jul 21, 2004 at 6:16 pm
    I will post a patch soon

    Aviran

    -----Original Message-----
    From: Doug Cutting
    Sent: Wednesday, July 21, 2004 13:56 PM
    To: Lucene Users List
    Subject: Re: Sort: 1.4-rc3 vs. 1.4-final


    The key in the WeakHashMap should be the IndexReader, not the Entry. I
    think this should become a two-level cache, a WeakHashMap of HashMaps,
    the WeakHashMap keyed by IndexReader, the HashMap keyed by Entry. I
    think the Entry class can also be changed to not include an IndexReader
    field. Does this make sense? Would someone like to construct a patch
    and submit it to the developer list?

    Doug

    Aviran wrote:
    I think I found the problem
    FieldCacheImpl uses WeakHashMap to store the cached objects, but since
    there is no other reference to this cache it is getting released.
    Switching to HashMap solves it. The only problem is that I don't see
    anywhere where the cached object will get released if you open a new
    IndexReader.

    Aviran

    -----Original Message-----
    From: Greg Gershman
    Sent: Wednesday, July 21, 2004 13:13 PM
    To: Lucene Users List
    Subject: RE: Sort: 1.4-rc3 vs. 1.4-final


    I've done a bit more snooping around; it seems that in
    FieldSortedHitQueue.getCachedComparator(line 153), calls to lookup a
    stored comparator in the cache always return null. This occurs even
    for the built-in sort types (I tested it on integers and my code for
    longs). The comparators don't even appear to be being stored in the
    HashMap to begin with.

    Any ideas?

    Greg



    --- Aviran wrote:
    Since I had to implement sorting in lucene 1.2 I had
    to write my own sorting
    using something similar to a lucene's contribution
    called SortField.
    Yesterday I did some tests, trying to use lucene 1.4
    Sort objects and I
    realized that my old implementation works 40% faster
    then Lucene's
    implementation. My guess is that you are right and
    there is a problem with
    the cache although I couldn't find what that is yet.

    Aviran

    -----Original Message-----
    From: Greg Gershman
    Sent: Wednesday, July 21, 2004 9:22 AM
    To: lucene-user@jakarta.apache.org
    Subject: Sort: 1.4-rc3 vs. 1.4-final


    When rc3 came out, I modified the classes used for
    Sorting to, in addition to Integer, Float and
    String-based sort keys, use Long values. All I did
    was add extra statements in 2 classes (SortField and
    FieldSortedHitQueue) that made a special case for
    longs, and created a LongSortedHitQueue identical to
    the IntegerSortedHitQueue, only using longs.

    This worked as expected; Long values converted to
    strings and stored in Field.Keyword type fields
    would
    be sorted according to Long order. The initial
    query
    would take a while, to build the sorted array, but
    subsequent queries would take little to no time at
    all.

    I went back to look at 1.4 final, and noticed the
    Sort implementation has
    changed quite a bit. I tried the same type of
    modifications to the existing
    source files, but was unable to achieve similiar
    results.
    Each subsequent query seems to take a significant
    amount of time, as if the Sorted array is being
    rebuilt each time. Also, I tried sorting on an
    Integer fields and got similar results, which leads
    me
    to believe there might be a caching problem
    somewhere.

    Has anyone else seen this in 1.4-final? Also, I
    would
    like it if Long sorted fields could become a part of
    the API; it makes sorting by date a breeze.

    Thanks!

    Greg Gershman



    __________________________________
    Do you Yahoo!?
    New and Improved Yahoo! Mail - Send 10MB messages!
    http://promotions.yahoo.com/new_mail

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org





    __________________________________
    Do you Yahoo!?
    Vote for the stars of Yahoo!'s next ad campaign!
    http://advision.webevents.yahoo.com/yahoo/votelifeengine/

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Rob Jose at Jul 21, 2004 at 6:37 pm
    Sorry for the slightly off topic post, but I have a need to use luke with my
    Analyzer. Has anyone done this? I have added a jar file to my classpath,
    but that didn't help.

    Thanks in advance
    Rob


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Chellappa, Kannan at Jul 21, 2004 at 7:16 pm
    Worked for me.
    I added my jar to the classpath and my analyzer appeared in the analyzers list in the search tab as well as in the analyzers list in the plugins tab.

    I am using Luke v 0.5 (2004-05-25)

    Kannan


    -----Original Message-----
    From: Rob Jose
    Sent: Wednesday, July 21, 2004 11:37 AM
    To: Lucene Users List
    Subject: Slightly off topic, I need to have luke use my Analyzer


    Sorry for the slightly off topic post, but I have a need to use luke with my
    Analyzer. Has anyone done this? I have added a jar file to my classpath,
    but that didn't help.

    Thanks in advance
    Rob


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Chellappa, Kannan at Jul 21, 2004 at 7:19 pm
    Sorry typo in the version date in my previous mail -- I meant Luke v 0.5 (2004-06-25)

    -----Original Message-----
    From: Chellappa, Kannan
    Sent: Wednesday, July 21, 2004 12:16 PM
    To: Lucene Users List
    Subject: RE: Slightly off topic, I need to have luke use my Analyzer


    Worked for me.
    I added my jar to the classpath and my analyzer appeared in the analyzers list in the search tab as well as in the analyzers list in the plugins tab.

    I am using Luke v 0.5 (2004-05-25)

    Kannan


    -----Original Message-----
    From: Rob Jose
    Sent: Wednesday, July 21, 2004 11:37 AM
    To: Lucene Users List
    Subject: Slightly off topic, I need to have luke use my Analyzer


    Sorry for the slightly off topic post, but I have a need to use luke with my
    Analyzer. Has anyone done this? I have added a jar file to my classpath,
    but that didn't help.

    Thanks in advance
    Rob


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Rob Jose at Jul 22, 2004 at 1:46 pm
    Thanks Kannan

    Rob
    ----- Original Message -----
    From: "Chellappa, Kannan" <Kannan.Chellappa@kanisa.com>
    To: "Lucene Users List" <lucene-user@jakarta.apache.org>
    Sent: Wednesday, July 21, 2004 12:19 PM
    Subject: RE: Slightly off topic, I need to have luke use my Analyzer


    Sorry typo in the version date in my previous mail -- I meant Luke v 0.5
    (2004-06-25)

    -----Original Message-----
    From: Chellappa, Kannan
    Sent: Wednesday, July 21, 2004 12:16 PM
    To: Lucene Users List
    Subject: RE: Slightly off topic, I need to have luke use my Analyzer


    Worked for me.
    I added my jar to the classpath and my analyzer appeared in the analyzers
    list in the search tab as well as in the analyzers list in the plugins tab.

    I am using Luke v 0.5 (2004-05-25)

    Kannan


    -----Original Message-----
    From: Rob Jose
    Sent: Wednesday, July 21, 2004 11:37 AM
    To: Lucene Users List
    Subject: Slightly off topic, I need to have luke use my Analyzer


    Sorry for the slightly off topic post, but I have a need to use luke with my
    Analyzer. Has anyone done this? I have added a jar file to my classpath,
    but that didn't help.

    Thanks in advance
    Rob


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Sergiu Gordea at Jul 22, 2004 at 2:48 pm
    Hi all,

    I have a question related to reindexing of documents with lucene.
    We want to implement the functinality of rebuilding lucene index.
    That means I want to delete all documents in the index and to add newer
    versions.
    All information I need to reindex is kept in the database so that I have
    a Term ID, which is unique.

    My problem is that I don't have a deleteall() method in IndexReader, and
    I don't have undelete(int) and undelete(Term)
    methods. I have only delete(Term) and undeleteAll() methods that can be
    used for this action.

    I would like to delete all documents (just mark as deleted). Add the new
    documents o the index and create a list of documents that were not
    succesfully indexed,
    (from different reasons, that may depend on lucene or on our code). At
    the end I would like to restore (mark as undeleted) the documents in the
    list and to optimize the
    index, so that the changes to be permanetly commited in the index.

    Is this possible witout hacking lucene code? Any Ideas?

    Thanks in advance,

    Sergiu





    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Aviran at Jul 22, 2004 at 2:59 pm
    Why don't you just build a new index in a different location and at the end
    add the missing documents from the old index to the new one, and then delete
    the old index.

    Aviran

    -----Original Message-----
    From: Sergiu Gordea
    Sent: Thursday, July 22, 2004 10:49 AM
    To: Lucene Users List
    Subject: rebuild index



    Hi all,

    I have a question related to reindexing of documents with lucene. We want
    to implement the functinality of rebuilding lucene index. That means I want
    to delete all documents in the index and to add newer
    versions.
    All information I need to reindex is kept in the database so that I have
    a Term ID, which is unique.

    My problem is that I don't have a deleteall() method in IndexReader, and
    I don't have undelete(int) and undelete(Term)
    methods. I have only delete(Term) and undeleteAll() methods that can be
    used for this action.

    I would like to delete all documents (just mark as deleted). Add the new
    documents o the index and create a list of documents that were not
    succesfully indexed,
    (from different reasons, that may depend on lucene or on our code). At
    the end I would like to restore (mark as undeleted) the documents in the
    list and to optimize the
    index, so that the changes to be permanetly commited in the index.

    Is this possible witout hacking lucene code? Any Ideas?

    Thanks in advance,

    Sergiu





    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Sergiu Gordea at Jul 22, 2004 at 3:14 pm
    Because on the other hand I want to have a clean index, without any kind
    of garbage.

    This is the requested funtionality of the rebuild index function.
    Clean Index and don't loose data.

    I was also thinking that I can delete the index location and create a
    new index, this may have the same effect as the missing
    deleteAll() method. But in this case I loose all the data from index
    forever, and If I get a error because of write lock,
    I may have no index at all. Which is inacceptable for a productve system.

    Anyway, thanks for ideea, It may work if I merge the indexes in my code,
    but I don't fill that this is the right way to solve the problem.

    Sergiu



    Aviran wrote:
    Why don't you just build a new index in a different location and at the end
    add the missing documents from the old index to the new one, and then delete
    the old index.

    Aviran

    -----Original Message-----
    From: Sergiu Gordea
    Sent: Thursday, July 22, 2004 10:49 AM
    To: Lucene Users List
    Subject: rebuild index



    Hi all,

    I have a question related to reindexing of documents with lucene. We want
    to implement the functinality of rebuilding lucene index. That means I want
    to delete all documents in the index and to add newer
    versions.
    All information I need to reindex is kept in the database so that I have
    a Term ID, which is unique.

    My problem is that I don't have a deleteall() method in IndexReader, and
    I don't have undelete(int) and undelete(Term)
    methods. I have only delete(Term) and undeleteAll() methods that can be
    used for this action.

    I would like to delete all documents (just mark as deleted). Add the new
    documents o the index and create a list of documents that were not
    succesfully indexed,
    (from different reasons, that may depend on lucene or on our code). At
    the end I would like to restore (mark as undeleted) the documents in the
    list and to optimize the
    index, so that the changes to be permanetly commited in the index.

    Is this possible witout hacking lucene code? Any Ideas?

    Thanks in advance,

    Sergiu





    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJul 21, '04 at 1:22p
activeJul 22, '04 at 3:14p
posts15
users6
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase