FAQ
Hi Stéphane:
If you are using Oracle Spatial I assume that you are using Oracle
too for storing text :)
Have you take a look at Oracle-Lucene integration project sponsored
by LendingClub.com?
http://docs.google.com/Doc?id=ddgw7sjp_54fgj9kg
http://sourceforge.net/project/showfiles.php?group_id=56183&package_id=255524&release_id=589900
Its a new domain index for Oracle using Lucene inside the Oracle JVM.
By doing that We can use Lucene as Oracle Text, but with many other
features, and using inline pagination We can get better perfomance
than latest 11g Text Counpound Domain Index.
If you are interested in this implementation simply drop me an email.
Best regards, Marcelo.

On Fri, May 2, 2008 at 3:58 AM, Stephane Nicoll
wrote:
Well for the moment we don't. The lucene index only contains the full
text content (indexed, not stored). We use lucene to perform full text
and fuzzy searches on the keywords field. Once we have the result, we
match them with the geospatial box provided by the user (we use Oracle
Spatial for that). We have no notion of city, state or zip code. Date
overlaps more than one countries most of the time actually.

We are thinking of reimplementing a quad tree in lucene to flag each
item with a spatial area. That way we will be able to pre-filter the
zone accordingly.

Still, this does not explain the deadlock on SegmentReader. If anyone
has an idea...

Thanks,
Stéphane


On Thu, May 1, 2008 at 8:50 PM, Michael Stoppelman wrote:
Stephane,

Could you describe how you setup the spatial area? Having BooleanQuery with
200 terms in it definitely slows things down (I'm not sure exactly why yet
-- it seems like it shouldn't be "that" slow). If you can describe your
spatial area in fewer terms you can get much better performance. It just
depends on how you're describing your spatial areas and the number of
results in each zipcode. If you had a field like "city,state" in your index
you would have far less terms in your query than if that query had all the
zipcodes in a "city,state" combo, thus making your query much faster.

M

On Thu, May 1, 2008 at 2:15 AM, mark harwood <markharw00d@yahoo.co.uk>
wrote:


The issue here is a general one of trying to perform an efficient join
between an external resource (rdbms) and Lucene.
This experiment may be of interest:
http://issues.apache.org/jira/browse/LUCENE-434

KeyMap.java embodies the core service which translates from lucene doc ids
to DB primary keys or vice versa.
There are a couple of implementations of KeyMap that are not optimal (they
pre-date Lucene's FieldCache) but it may give you food for thought.

Cheers
Mark


----- Original Message ----
From: Stephane Nicoll <stephane.nicoll@gmail.com>
To: java-user@lucene.apache.org
Sent: Thursday, 1 May, 2008 9:00:33 AM
Subject: hybrid query (lucene + db)

Hi there,

We're using lucene with Hibernate search and we're very happy so far
with the performance and the usability of lucene. We have however a
specific use cases that prevent us to use only lucene: spatial
queries. I already sent a mail on this list a while back about the
problem and we started investigating multiple solutions.

When the user selects a geographic area and some keywords we do the
following:

* Perform a search on the lucene index for the keywords with a
projection that returns only the primaryKey of the element sorted by
primary key
* Perform a search on the database with other criterias and a
projection that returns only the primary key of the elements
* Iterate on both list to find N matching IDs, optionally with paging
(some from X to X + N where X is the first result of the page)
* Run a query on the database to return the actual objects (select a
from MyClass a where a.id IN (the list of matching IDs) ) We limit the
page to 1000 results

We have searched a way to optimize the queries and to avoid to consume
too much memory, knowing that we must support paging.

With a single user a search by kewyords takes 30msec to complete, a
search by box takes 45msec. With both (keywords + spatial area) it
takes 300msec

With 10 concurrent users, a search by keywords takes 150msec/user but
for both it takes 3 sec/user !!!

I had the profiler running on this scenario and I've found that *all*
threads are waiting on org.apache.lucene.index.SegmentReader. I then
configured Hibernate Search to use a separate index reader per thread.
The deadlocks disappeared but it's still very slow (2.8sec).

Some questions:

* Does anyone knows where the deadlocks on SegmentReader are coming from?
* Is the sorting on the primary keys a bad idea regarding performance
and memory usage?
* Does anyone has an idea to perform this kind of hybrid query in an
efficient way?

I am using lucene 2.3.1 and Hibernate Search 3.0.1. I already ask for
support on the Hibernate Search forum but did not get any answer so
far.

Thanks,
Stéphane

--
Large Systems Suck: This rule is 100% transitive. If you build one,
you suck" -- S.Yegge

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org






__________________________________________________________
Sent from Yahoo! Mail.
A Smarter Email http://uk.docs.yahoo.com/nowyoucan.html

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


--


Large Systems Suck: This rule is 100% transitive. If you build one,
you suck" -- S.Yegge

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


--
Marcelo F. Ochoa
http://marceloochoa.blogspot.com/
http://marcelo.ochoa.googlepages.com/home
______________
Do you Know DBPrism? Look @ DB Prism's Web Site
http://www.dbprism.com.ar/index.html
More info?
Chapter 17 of the book "Programming the Oracle Database using Java &
Web Services"
http://www.amazon.com/gp/product/1555583296/
Chapter 21 of the book "Professional XML Databases" - Wrox Press
http://www.amazon.com/gp/product/1861003587/
Chapter 8 of the book "Oracle & Open Source" - O'Reilly
http://www.oreilly.com/catalog/oracleopen/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 6 of 7 | next ›
Discussion Overview
groupjava-user @
categorieslucene
postedMay 1, '08 at 8:01a
activeMay 2, '08 at 4:59p
posts7
users4
websitelucene.apache.org

People

Translate

site design / logo © 2021 Grokbase