FAQ
Hy,
i use Lucene to index a SQL-Table which contains three fields: a index-field,
the text to search in and another field. When adding a lucene document I let
Lucene index the search-field and also save the id along with it in the
lucene index.

Uppon searching I collect all ids and add them to a java-string with commas in
between to issue a SQL-Query like this one:

SELECT id,addfield FROM table WHERE id IN ([LUCENERESULT]);

Where LUCENERESULT is like 2,3,19,3,5.

This works fine but got one problem: The Search-Result of Lucene is order by
relevance and so the id-list is also sorted by relevance. But the result of
the SQL-Query is sorted by the id which destroys the relevance-sorting.

Does anybody know a work-arround?


Thanks
--
Dominik Bruhn
mailto: dominik@dbruhn.de
http://www.dbruhn.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Karl wettin at Jul 1, 2006 at 8:10 am

    On Sat, 2006-07-01 at 01:10 +0200, Dominik Bruhn wrote:

    SELECT id,addfield FROM table WHERE id IN ([LUCENERESULT]);

    Where LUCENERESULT is like 2,3,19,3,5.

    This works fine but got one problem: The Search-Result of Lucene is order by
    relevance and so the id-list is also sorted by relevance. But the result of
    the SQL-Query is sorted by the id which destroys the relevance-sorting.

    Does anybody know a work-arround?
    This is really a question you should ask in the forum of your RDBMS. You
    could always execute multiple SQL-queries within the same statement
    without too much loss. But I'm certain there is a way to enforce the
    order as you specified it in the WHERE-clause.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Aleksander M. Stensby at Jul 1, 2006 at 8:27 am
    Well, it is common in most databasesystems, that if you dont specify a
    sort, you get the results sorted by id, or by when the rows are inserted
    into the db.

    The quickest way for you is to write around your query.
    instead of doing one query, just do where queries with equals. this would
    produce a bit overflow if you have your database on an external server,
    but again, the quickest query to ever run in sql are equals-queries. Since
    the ID is unique (i hope so), it should be no problem at all.
    ... WHERE id = 2;
    ... WHERE id = 3
    ... WHERE id = 19
    and so on would give you the correct relevance-order.
    On Sat, 01 Jul 2006 10:09:01 +0200, karl wettin wrote:
    On Sat, 2006-07-01 at 01:10 +0200, Dominik Bruhn wrote:

    SELECT id,addfield FROM table WHERE id IN ([LUCENERESULT]);

    Where LUCENERESULT is like 2,3,19,3,5.

    This works fine but got one problem: The Search-Result of Lucene is
    order by
    relevance and so the id-list is also sorted by relevance. But the
    result of
    the SQL-Query is sorted by the id which destroys the relevance-sorting.

    Does anybody know a work-arround?
    This is really a question you should ask in the forum of your RDBMS. You
    could always execute multiple SQL-queries within the same statement
    without too much loss. But I'm certain there is a way to enforce the
    order as you specified it in the WHERE-clause.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --
    Aleksander M. Stensby
    Software Developer
    Integrasco A/S
    aleksander.stensby@integrasco.no
    Tlf.: +47 41 22 82 72

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael D. Curtin at Jul 1, 2006 at 12:36 pm

    Dominik Bruhn wrote:
    Hy,
    i use Lucene to index a SQL-Table which contains three fields: a index-field,
    the text to search in and another field. When adding a lucene document I let
    Lucene index the search-field and also save the id along with it in the
    lucene index.

    Uppon searching I collect all ids and add them to a java-string with commas in
    between to issue a SQL-Query like this one:

    SELECT id,addfield FROM table WHERE id IN ([LUCENERESULT]);

    Where LUCENERESULT is like 2,3,19,3,5.

    This works fine but got one problem: The Search-Result of Lucene is order by
    relevance and so the id-list is also sorted by relevance. But the result of
    the SQL-Query is sorted by the id which destroys the relevance-sorting.

    Does anybody know a work-arround?
    Sounds like you need to save the score from your Lucene search along with the
    ids, and use them in an ORDER BY clause in your SQL statement. One way to do
    this would be to put the Lucene results into a working / temporary table, join
    that table into your real table in the query, and ORDER BY score. Something
    along these lines:

    TRUNCATE TABLE tmptable;
    INSERT INTO tmptable (id, score) VALUES ([LUCENEID1], [LUCENESCORE1]);
    INSERT INTO tmptable (id, score) VALUES ([LUCENEID2], [LUCENESCORE2]);
    ...

    SELECT b.id, b.addfield
    FROM tmptable a, table b
    WHERE (b.id = a.id)
    ORDER BY a.score DESC;

    ORDER BY is the normal way in SQL to get ordered results. In general, there
    is no guarantee of ordering from a SELECT statement without ORDER BY (although
    most RDBMS engines tend to return the rows in the order they were INSERTed,
    possibly perturbed by subsequent DELETEs and UPDATEs).

    Good luck!

    --MDC

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Monsur Hossain at Jul 3, 2006 at 9:26 pm

    On 6/30/06, Dominik Bruhn wrote:
    SELECT id,addfield FROM table WHERE id IN ([LUCENERESULT]);

    Where LUCENERESULT is like 2,3,19,3,5.

    This works fine but got one problem: The Search-Result of Lucene is order by
    relevance and so the id-list is also sorted by relevance. But the result of
    the SQL-Query is sorted by the id which destroys the relevance-sorting.

    Does anybody know a work-arround?
    We have the same issue and solve this through code. As you're
    generating the list of IDs for the SQL query, also generate a hash
    that maps the ID to its position. When reading back the data from the
    database, read the position hash, and insert the item into the
    appropriate position in an array. Yeah, you're doing the sorting
    yourself, but in our case, we're only returning 10 items per page, so
    its not a huge performance hit.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Lukas Vlcek at Jul 4, 2006 at 4:53 am
    Hi,

    Looking at your problem I can think of one solution for small and
    *midsize* result sets. (And I have to say it may be similar to what
    Aleksander proposes).
    Write workaround query in the following form:

    select addfield from (
    select addfield, generated_counter from table where id = 2
    union
    select addfield, generated_counter from table where id = 4
    union
    select addfield, generated_counter from table where id = 1
    ...
    select addfield, generated_counter from table where id = 3
    ) order by generated_counter;

    I am not sure but that generated counter could be taken from DB
    sequence (something like select seq.nextvalue() from dual) or
    generated in Java while you compose the SQL statement.

    Well, this is what I have on my mind while I've just got up and my
    brain is still booting...

    Lukas
    On 7/3/06, Monsur Hossain wrote:
    On 6/30/06, Dominik Bruhn wrote:
    SELECT id,addfield FROM table WHERE id IN ([LUCENERESULT]);

    Where LUCENERESULT is like 2,3,19,3,5.

    This works fine but got one problem: The Search-Result of Lucene is order by
    relevance and so the id-list is also sorted by relevance. But the result of
    the SQL-Query is sorted by the id which destroys the relevance-sorting.

    Does anybody know a work-arround?
    We have the same issue and solve this through code. As you're
    generating the list of IDs for the SQL query, also generate a hash
    that maps the ID to its position. When reading back the data from the
    database, read the position hash, and insert the item into the
    appropriate position in an array. Yeah, you're doing the sorting
    yourself, but in our case, we're only returning 10 items per page, so
    its not a huge performance hit.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mike Streeton at Jul 4, 2006 at 8:16 am
    The simplest solution to this I would suggest is to decode the id to
    relevance score e.g.

    Select id, addfield
    From mytable
    Where id in (1,2,3,4,5,50,60,70)
    Order by case id when 1 then 0.9 when 2 then 0.8 when 3 then 0.7 ....
    end desc

    You will have to generate the in () and the case statement but this will
    do it. It will still go pear shaped when two ids have the same score
    unless you fiddle this.

    Mike


    www.ardentia.com the home of NetSearch
    -----Original Message-----
    From: Lukas Vlcek
    Sent: 04 July 2006 05:54
    To: java-user@lucene.apache.org
    Subject: Re: Sorting & SQL-Database

    Hi,

    Looking at your problem I can think of one solution for small and
    *midsize* result sets. (And I have to say it may be similar to what
    Aleksander proposes).
    Write workaround query in the following form:

    select addfield from (
    select addfield, generated_counter from table where id = 2
    union
    select addfield, generated_counter from table where id = 4
    union
    select addfield, generated_counter from table where id = 1
    ...
    select addfield, generated_counter from table where id = 3
    ) order by generated_counter;

    I am not sure but that generated counter could be taken from DB
    sequence (something like select seq.nextvalue() from dual) or
    generated in Java while you compose the SQL statement.

    Well, this is what I have on my mind while I've just got up and my
    brain is still booting...

    Lukas
    On 7/3/06, Monsur Hossain wrote:
    On 6/30/06, Dominik Bruhn wrote:
    SELECT id,addfield FROM table WHERE id IN ([LUCENERESULT]);

    Where LUCENERESULT is like 2,3,19,3,5.

    This works fine but got one problem: The Search-Result of Lucene is
    order by
    relevance and so the id-list is also sorted by relevance. But the
    result of
    the SQL-Query is sorted by the id which destroys the
    relevance-sorting.
    Does anybody know a work-arround?
    We have the same issue and solve this through code. As you're
    generating the list of IDs for the SQL query, also generate a hash
    that maps the ID to its position. When reading back the data from the
    database, read the position hash, and insert the item into the
    appropriate position in an array. Yeah, you're doing the sorting
    yourself, but in our case, we're only returning 10 items per page, so
    its not a huge performance hit.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • George Abraham at Jul 8, 2006 at 3:06 pm
    Dominik,
    Sorry I saw this a little late, but I asked a similar question back in Dec
    2005 and Jeff Rodenburg gave me some splendid help which we are implementing
    even now. Here is the gist of his solution.

    - Include the Lucene Doc's score (or sequential order, as an int) with the
    list of id values.
    - Create an XML string, then pass as it as a text parameter to a stored
    procedure. The XML string contains a set of id values and their associated
    document score or order value.
    - Read the text parameter within a stored procedure into a declared table
    using OPENXML
    - Join the declared table with your existing resultset code on the id values
    - Order the resultset by the order value included in the XML

    Here is some sample code for the stored procedure:

    /*******************************************************/
    create procedure dbo.Get_Records_ByXml (
    @xmlSource text
    )
    as

    /*
    <Documents>
    <Doc>
    <Id>123</Id>
    <ScoreOrder>1</ScoreOrder>
    </Doc>
    <Doc>
    <Id>456</Id>
    <ScoreOrder>2</ScoreOrder>
    </Doc>
    <Doc>
    <Id>789</Id>
    <ScoreOrder>3</ScoreOrder>
    </Doc>
    </Documents>
    */

    set nocount on
    /* xmlHandle */
    declare @xmlHandle int

    -- SQL's internal stored procedure to prep the @xmlSource stream for reading
    by OPENXML
    exec sp_xml_preparedocument @xmlHandle OUTPUT, @xmlSource

    declare @docs table (
    Id int,
    ScoreOrder int
    )

    -- Use OPENXML to populate declared table
    insert into @docs
    select Id, ScoreOrder
    from openxml (@xmlHandle, '/Documents/Doc',1)
    with (
    Id int 'Id',
    ScoreOrder int 'ScoreOrder'
    )

    -- Declared table is populated, ready to be joined with other tables
    select Id, ScoreOrder
    from @docs
    order by ScoreOrder asc

    -- SQL's internal stored procedure to drop the @xmlSource stream handle from
    memory
    exec sp_xml_removedocument @xmlHandle

    set nocount off
    go


    declare @text varchar(8000)

    select @text =
    '<Documents>' +
    ' <Doc>' +
    ' <Id>123</Id>' +
    ' <ScoreOrder>1</ScoreOrder>' +
    ' </Doc>' +
    ' <Doc>' +
    ' <Id>456</Id>' +
    ' <ScoreOrder>2</ScoreOrder>' +
    ' </Doc>' +
    ' <Doc>' +
    ' <Id>789</Id>' +
    ' <ScoreOrder>3</ScoreOrder>' +
    ' </Doc>' +
    '</Documents>'

    exec Get_Records_ByXml @xmlSource=@text

    /*
    -- Returns this result

    Id ScoreOrder
    ---- ----------
    123 1
    456 2
    789 3
    */

    Hope that helps!

    George

    On 6/30/06, Dominik Bruhn wrote:

    Hy,
    i use Lucene to index a SQL-Table which contains three fields: a
    index-field,
    the text to search in and another field. When adding a lucene document I
    let
    Lucene index the search-field and also save the id along with it in the
    lucene index.

    Uppon searching I collect all ids and add them to a java-string with
    commas in
    between to issue a SQL-Query like this one:

    SELECT id,addfield FROM table WHERE id IN ([LUCENERESULT]);

    Where LUCENERESULT is like 2,3,19,3,5.

    This works fine but got one problem: The Search-Result of Lucene is order
    by
    relevance and so the id-list is also sorted by relevance. But the result
    of
    the SQL-Query is sorted by the id which destroys the relevance-sorting.

    Does anybody know a work-arround?


    Thanks
    --
    Dominik Bruhn
    mailto: dominik@dbruhn.de
    http://www.dbruhn.de

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 30, '06 at 11:11p
activeJul 8, '06 at 3:06p
posts8
users8
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase