last week I realized an approach for GeoSort in lucene. Inspired by
"Lucene in action" I modified the algorithm in the following way. When
an IndexReader for a certain index is created, a cache for
geoinformation is created - this simply is a 2 dimensional int Array.
So it is possible to cache geoinformation for 1.000.000 docs in around
8 MB. Everytime the ScoreDocComparator.compare(ScoreDoc i, ScoreDoc j)
method is called I fetch the int Array with the geoinfo from the cache
and calculate the distance.
I think this is a quite good solution:
1. Only the distances of real Hits are calculated. So only needed
operations are done.
2. The geoinformation is not fetched via IndexReader.doc(i) but
directly from the cache that is placed in the RAM
3. All hits get returned because this approach does not work with a
boxed model, that excludes documents that are not within a certain
radius (this is very annoying if there is a hit with a distance of 51
km and the radius is 50 km)

What do you think about this approach? The only possible advantage is
the cache I think because I do not really know if the JVM is good in
handling 10 MB of data in the RAM.


Sascha Fahl

To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 2 | next ›
Discussion Overview
groupjava-user @
postedJul 19, '08 at 9:54a
activeJul 21, '08 at 10:43a

2 users in discussion

Toke Eskildsen: 1 post Sascha Fahl: 1 post



site design / logo © 2022 Grokbase