I've been wondering if anyone has tried to compare the performance of
any 'native' Java DB as index storage mechanism vs Lucene custom
implementation? I'm assuming that DB products should provide some
functionality for 'free' right out of the box (correct, if I'm wrong):
- easily managable and maintainable index (accessible through any SQL
- efficient access into large massives of data
* potential support of 'distributed' DB, which can spawn across
multiple boxes transparently to the client app (the Lucene engine
generating the queries)
- much less hassle of integrating Lucene into the applications backed by
the DB (eg, many stores, 'city sites', portals which already have all
their data in relational tables and only need to get efficient fuzzy
searches across this data)
* no need to keep Lucene index in sync with data, since Lucene will
reuse PKs and indexes from the DB
So, I think the main question is whether Lucene custom way of
maintaining _and accessing_ the index is (much?) more efficient than
that one of available open source native Java DBs (Derby, etc)
You may be interested in Compass Framework. It is build on top of lucene,
implements JDBC-based storage as well as synchronization with things like Hibernate.
In my apps, I have to use both Lucene and relational databases since they both
have unique querying characteristics. I mean, there are requests which are
implementable on a RDB but not in Lucene, requests which are implementable with
Lucene but not in RDB. There are also queries which run on both.
Your idea of using RDBs to store Lucene indexes looks quite nice in the first
approach. You probably imagine something like
select id from tbl_index where value like 'te%st' or value like 'f_ne'
for a query like
Yes, this looks quite nice, in the first approach.
But if you take a closer look, you'll quickly find out that only a part of
Lucene queries could be converted into such SQLs.
Next problem is index format. Lucene indexes are (a bit ;) ) more complex than
simple index tables. So there's no "easy" index format which would make sense
for "any SQL clien tool".
There'll be also problems if you try to reuse PKs and DB indexes. You'll end up
with a lot of constraint exceptions and stale indexes - and someone still HAS to
sync the full text index - even if it's int its own table.
Finally, I have no numbers but from the gut feeling I don't think Lucene over
HSQLDB or Derby will be much more performant that Lucene on its own. Seriously
And still I like you idea. I work a lot with queries which currently require
evaluation in both Lucene and RDB. I would be fine with a limited Lucene query
syntax which would allow queries be processed homogeneously in a RDB only.
To unsubscribe, e-mail: firstname.lastname@example.org
For additional commands, e-mail: email@example.com