FAQ
Hi,

I've been wondering if anyone has tried to compare the performance of
any 'native' Java DB as index storage mechanism vs Lucene custom
implementation? I'm assuming that DB products should provide some
functionality for 'free' right out of the box (correct, if I'm wrong):

- easily managable and maintainable index (accessible through any SQL
client tool)
- efficient access into large massives of data
* potential support of 'distributed' DB, which can spawn across
multiple boxes transparently to the client app (the Lucene engine
generating the queries)
- much less hassle of integrating Lucene into the applications backed by
the DB (eg, many stores, 'city sites', portals which already have all
their data in relational tables and only need to get efficient fuzzy
searches across this data)
* no need to keep Lucene index in sync with data, since Lucene will
reuse PKs and indexes from the DB

So, I think the main question is whether Lucene custom way of
maintaining _and accessing_ the index is (much?) more efficient than
that one of available open source native Java DBs (Derby, etc)

Thanks!

Vladimir Olenin
Software Architect
[w]: 416-544-5598
[c]: 416-854-8384
[f]: 416-481-2950

Search Discussions

  • Aleksei Valikov at Oct 4, 2006 at 4:15 pm
    Hi.

    I've been wondering if anyone has tried to compare the performance of
    any 'native' Java DB as index storage mechanism vs Lucene custom
    implementation? I'm assuming that DB products should provide some
    functionality for 'free' right out of the box (correct, if I'm wrong):

    - easily managable and maintainable index (accessible through any SQL
    client tool)
    - efficient access into large massives of data
    * potential support of 'distributed' DB, which can spawn across
    multiple boxes transparently to the client app (the Lucene engine
    generating the queries)
    - much less hassle of integrating Lucene into the applications backed by
    the DB (eg, many stores, 'city sites', portals which already have all
    their data in relational tables and only need to get efficient fuzzy
    searches across this data)
    * no need to keep Lucene index in sync with data, since Lucene will
    reuse PKs and indexes from the DB

    So, I think the main question is whether Lucene custom way of
    maintaining _and accessing_ the index is (much?) more efficient than
    that one of available open source native Java DBs (Derby, etc)

    Thanks!
    You may be interested in Compass Framework. It is build on top of lucene,
    implements JDBC-based storage as well as synchronization with things like Hibernate.

    In my apps, I have to use both Lucene and relational databases since they both
    have unique querying characteristics. I mean, there are requests which are
    implementable on a RDB but not in Lucene, requests which are implementable with
    Lucene but not in RDB. There are also queries which run on both.


    Your idea of using RDBs to store Lucene indexes looks quite nice in the first
    approach. You probably imagine something like

    select id from tbl_index where value like 'te%st' or value like 'f_ne'

    for a query like

    "te*st f?ne"

    Yes, this looks quite nice, in the first approach.

    But if you take a closer look, you'll quickly find out that only a part of
    Lucene queries could be converted into such SQLs.

    Next problem is index format. Lucene indexes are (a bit ;) ) more complex than
    simple index tables. So there's no "easy" index format which would make sense
    for "any SQL clien tool".

    There'll be also problems if you try to reuse PKs and DB indexes. You'll end up
    with a lot of constraint exceptions and stale indexes - and someone still HAS to
    sync the full text index - even if it's int its own table.

    Finally, I have no numbers but from the gut feeling I don't think Lucene over
    HSQLDB or Derby will be much more performant that Lucene on its own. Seriously
    doubt that.

    And still I like you idea. I work a lot with queries which currently require
    evaluation in both Lucene and RDB. I would be fine with a limited Lucene query
    syntax which would allow queries be processed homogeneously in a RDB only.

    Bye.
    /lexi

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedOct 3, '06 at 4:00p
activeOct 4, '06 at 4:15p
posts2
users2
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase