FAQ
Hello,

there is a index with a lot of docs, 2 of them are:

doc1:

1.Field=id ITSVopfOLB=ITS---f0-- Value= 192
2.Field=name ITSVopfOLB=ITS----0-- Value= queen

doc2:

1.Field=id ITSVopfOLB=ITS---f0-- Value= 701492
2.Field=name ITSVopfOLB=ITS----0-- Value= queen板野友美 (Here
are chinese characters - hopefully you can see them)

if I search in the index - with a TermQuery there is a different
behavior between Lucene 3.1.0 and 3.3.0 :

Query:

Term:field='name' text='queen'

Result Lucene 3.1.0:

0 Score=13,2132 Doc.Id=176002 id=192 name=queen
1 Score=13,2132 Doc.Id=523407 id=701492 name=queen板野友美

Result Lucene 3.3.0:

0 Score=13,2132 Doc.Id=523407 id=701492 name=queen板野友美
1 Score=13,2132 Doc.Id=176002 id=192 name=queen

The result from Lucene 3.1.0 is that, what I would expect if I do a
'exact matching' Term Query.
Each index was indexed with its associated LuceneVersion.
I tested it with luke and with my own Code - the result was always the
same.

Is it a new feature in Lucene 3.3.0 or a bug?

Thanks in advance!
Thomas


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Ian Lea at Jul 18, 2011 at 8:09 am
    I'm not sure what you are getting at. A search using 3.1.0 and 3.3.0
    returns the same docs with identical scores, except that one gives
    them in order A,B and the other in order B,A? What search method are
    you using? Does it guarantee anything about the order of returning
    docs with identical scores?


    --
    Ian.

    On Fri, Jul 15, 2011 at 3:01 PM, Thomas Rewig wrote:
    Hello,

    there is a index with a lot of docs, 2 of them are:

    doc1:

    1.Field=id ITSVopfOLB=ITS---f0-- Value= 192
    2.Field=name ITSVopfOLB=ITS----0-- Value= queen

    doc2:

    1.Field=id ITSVopfOLB=ITS---f0-- Value= 701492
    2.Field=name ITSVopfOLB=ITS----0-- Value= queen板野友美 (Here are chinese
    characters - hopefully you can see them)

    if I search in the index - with a TermQuery there is a different behavior
    between Lucene 3.1.0 and 3.3.0 :

    Query:

    Term:field='name' text='queen'

    Result Lucene 3.1.0:

    0 Score=13,2132 Doc.Id=176002 id=192 name=queen
    1 Score=13,2132 Doc.Id=523407 id=701492 name=queen板野友美

    Result Lucene 3.3.0:

    0 Score=13,2132 Doc.Id=523407 id=701492 name=queen板野友美
    1 Score=13,2132 Doc.Id=176002 id=192 name=queen

    The result from Lucene 3.1.0 is that, what I would expect if I do a 'exact
    matching' Term Query.
    Each index was indexed with its associated LuceneVersion.
    I tested it with luke and with my own Code - the result was always the same.

    Is it a new feature in Lucene 3.3.0 or a bug?

    Thanks in advance!
    Thomas


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Thomas Rewig at Jul 18, 2011 at 10:06 am
    Hi Ian,

    yes the score is identical but the inner ordering of same scores seems
    to be different in the versions.

    In Lucene 3.3.0 it seems that terms with special characters will be
    preferred before the exact hit.

    My code is:

    PhraseQuery query = new PhraseQuery();
    query.add(new Term("name", strQueryName));

    //topDocs = this.indexSeacher.search(query, 10);
    //topDocs = this.indexSeacher.search(query, 10, Sort.RELEVANCE);
    topDocs = this.indexSeacher.search(query, 10, Sort.INDEXORDER);

    In all variants there are similar ordering problems even if they do not
    always occur at the same query.
    e.g. if I order by Sort.RELEVANCE the "queen" Doc problem doesn't occur
    but there is a wrong ordering in the token aim (query name:aim)


    0 Score=12,2324 Doc.Id=8060 id=709579 name=aim溝脇しほみ
    1 Score=12,2324 Doc.Id=227606 id=716893 name=aim


    Is there a way to guarantee the inner sorting of same scores? Or how can
    I avoid that documente with special characters have the same score as
    documente of exact matches?

    Thanks in advance!
    Thomas




    Am 18.07.2011 10:08, schrieb Ian Lea:
    I'm not sure what you are getting at. A search using 3.1.0 and 3.3.0
    returns the same docs with identical scores, except that one gives
    them in order A,B and the other in order B,A? What search method are
    you using? Does it guarantee anything about the order of returning
    docs with identical scores?


    --
    Ian.


    On Fri, Jul 15, 2011 at 3:01 PM, Thomas Rewigwrote:
    Hello,

    there is a index with a lot of docs, 2 of them are:

    doc1:

    1.Field=id ITSVopfOLB=ITS---f0-- Value= 192
    2.Field=name ITSVopfOLB=ITS----0-- Value= queen

    doc2:

    1.Field=id ITSVopfOLB=ITS---f0-- Value= 701492
    2.Field=name ITSVopfOLB=ITS----0-- Value= queen板野友美 (Here are chinese
    characters - hopefully you can see them)

    if I search in the index - with a TermQuery there is a different behavior
    between Lucene 3.1.0 and 3.3.0 :

    Query:

    Term:field='name' text='queen'

    Result Lucene 3.1.0:

    0 Score=13,2132 Doc.Id=176002 id=192 name=queen
    1 Score=13,2132 Doc.Id=523407 id=701492 name=queen板野友美

    Result Lucene 3.3.0:

    0 Score=13,2132 Doc.Id=523407 id=701492 name=queen板野友美
    1 Score=13,2132 Doc.Id=176002 id=192 name=queen

    The result from Lucene 3.1.0 is that, what I would expect if I do a 'exact
    matching' Term Query.
    Each index was indexed with its associated LuceneVersion.
    I tested it with luke and with my own Code - the result was always the same.

    Is it a new feature in Lucene 3.3.0 or a bug?

    Thanks in advance!
    Thomas


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Uwe Schindler at Jul 18, 2011 at 10:13 am
    Hi Thomas,

    Just one question: Are these docIds from Lucene or your own ones? And second, are the underlying indexes also built with the corresponding Lucene versions?

    The reason behind: Nothing in Lucene guarantees the order of docIds for same scores, they can be arbitrary. One change in Lucene 3.3 is for example the use of TieredMergePolicy, that reorders documents during indexing for more efficient merging. So when you indexed also with Lucene 3.3 and the displayed document IDs are your own application specific ones (not the internal Lucene ones), the different order of search results can simply be caused by the fact, that the indexer in 3.3 can suddenly reorder the documents during merging (TieredMergePolicy). There is nothing wrong with that.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Thomas Rewig
    Sent: Monday, July 18, 2011 12:06 PM
    To: java-user@lucene.apache.org
    Subject: Re: TermQuery - ExactMatching, Lucene 3.1.0 vs. 3.3.0, special
    character behavior

    Hi Ian,

    yes the score is identical but the inner ordering of same scores seems to be
    different in the versions.

    In Lucene 3.3.0 it seems that terms with special characters will be preferred
    before the exact hit.

    My code is:

    PhraseQuery query = new PhraseQuery();
    query.add(new Term("name", strQueryName));

    //topDocs = this.indexSeacher.search(query, 10);
    //topDocs = this.indexSeacher.search(query, 10, Sort.RELEVANCE);
    topDocs = this.indexSeacher.search(query, 10, Sort.INDEXORDER);

    In all variants there are similar ordering problems even if they do not always
    occur at the same query.
    e.g. if I order by Sort.RELEVANCE the "queen" Doc problem doesn't occur but
    there is a wrong ordering in the token aim (query name:aim)


    0 Score=12,2324 Doc.Id=8060 id=709579 name=aim溝脇しほみ
    1 Score=12,2324 Doc.Id=227606 id=716893 name=aim


    Is there a way to guarantee the inner sorting of same scores? Or how can I
    avoid that documente with special characters have the same score as
    documente of exact matches?

    Thanks in advance!
    Thomas




    Am 18.07.2011 10:08, schrieb Ian Lea:
    I'm not sure what you are getting at. A search using 3.1.0 and 3.3.0
    returns the same docs with identical scores, except that one gives
    them in order A,B and the other in order B,A? What search method are
    you using? Does it guarantee anything about the order of returning
    docs with identical scores?


    --
    Ian.


    On Fri, Jul 15, 2011 at 3:01 PM, Thomas Rewigwrote:
    Hello,

    there is a index with a lot of docs, 2 of them are:

    doc1:

    1.Field=id ITSVopfOLB=ITS---f0-- Value= 192
    2.Field=name ITSVopfOLB=ITS----0-- Value= queen

    doc2:

    1.Field=id ITSVopfOLB=ITS---f0-- Value= 701492
    2.Field=name ITSVopfOLB=ITS----0-- Value= queen板野友美 (Here
    are chinese
    characters - hopefully you can see them)

    if I search in the index - with a TermQuery there is a different
    behavior between Lucene 3.1.0 and 3.3.0 :

    Query:

    Term:field='name' text='queen'

    Result Lucene 3.1.0:

    0 Score=13,2132 Doc.Id=176002 id=192 name=queen
    1 Score=13,2132 Doc.Id=523407 id=701492 name=queen板野友美

    Result Lucene 3.3.0:

    0 Score=13,2132 Doc.Id=523407 id=701492 name=queen板野友美
    1 Score=13,2132 Doc.Id=176002 id=192 name=queen

    The result from Lucene 3.1.0 is that, what I would expect if I do a
    'exact matching' Term Query.
    Each index was indexed with its associated LuceneVersion.
    I tested it with luke and with my own Code - the result was always the
    same.
    Is it a new feature in Lucene 3.3.0 or a bug?

    Thanks in advance!
    Thomas


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Thomas Rewig at Jul 19, 2011 at 7:38 am
    Hi Uwe,

    the docIds are these from Lucene. (and all Indexes are built with its
    Lucene versions)

    So if I order e.g. with 'Sort.INDEXORDER' and I understand the sorting
    principle correctly, the following is correct because the aim溝脇しほみ
    Term is the first in index:

    0 Score=12,2324 Doc.Id=8060 id=709579 name=aim溝脇しほみ
    1 Score=12,2324 Doc.Id=227606 id=716893 name=aim

    To avoid these problems right from the start, I need to use a different
    analyser for indexing? (So that the docs 'aim溝脇しほみ' and 'aim' have
    different scores)

    Thanks
    Thomas








    Am 18.07.2011 12:13, schrieb Uwe Schindler:
    Hi Thomas,

    Just one question: Are these docIds from Lucene or your own ones? And second, are the underlying indexes also built with the corresponding Lucene versions?

    The reason behind: Nothing in Lucene guarantees the order of docIds for same scores, they can be arbitrary. One change in Lucene 3.3 is for example the use of TieredMergePolicy, that reorders documents during indexing for more efficient merging. So when you indexed also with Lucene 3.3 and the displayed document IDs are your own application specific ones (not the internal Lucene ones), the different order of search results can simply be caused by the fact, that the indexer in 3.3 can suddenly reorder the documents during merging (TieredMergePolicy). There is nothing wrong with that.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Thomas Rewig
    Sent: Monday, July 18, 2011 12:06 PM
    To: java-user@lucene.apache.org
    Subject: Re: TermQuery - ExactMatching, Lucene 3.1.0 vs. 3.3.0, special
    character behavior

    Hi Ian,

    yes the score is identical but the inner ordering of same scores seems to be
    different in the versions.

    In Lucene 3.3.0 it seems that terms with special characters will be preferred
    before the exact hit.

    My code is:

    PhraseQuery query = new PhraseQuery();
    query.add(new Term("name", strQueryName));

    //topDocs = this.indexSeacher.search(query, 10);
    //topDocs = this.indexSeacher.search(query, 10, Sort.RELEVANCE);
    topDocs = this.indexSeacher.search(query, 10, Sort.INDEXORDER);

    In all variants there are similar ordering problems even if they do not always
    occur at the same query.
    e.g. if I order by Sort.RELEVANCE the "queen" Doc problem doesn't occur but
    there is a wrong ordering in the token aim (query name:aim)


    0 Score=12,2324 Doc.Id=8060 id=709579 name=aim溝脇しほみ
    1 Score=12,2324 Doc.Id=227606 id=716893 name=aim


    Is there a way to guarantee the inner sorting of same scores? Or how can I
    avoid that documente with special characters have the same score as
    documente of exact matches?

    Thanks in advance!
    Thomas




    Am 18.07.2011 10:08, schrieb Ian Lea:
    I'm not sure what you are getting at. A search using 3.1.0 and 3.3.0
    returns the same docs with identical scores, except that one gives
    them in order A,B and the other in order B,A? What search method are
    you using? Does it guarantee anything about the order of returning
    docs with identical scores?


    --
    Ian.


    On Fri, Jul 15, 2011 at 3:01 PM, Thomas Rewigwrote:
    Hello,

    there is a index with a lot of docs, 2 of them are:

    doc1:

    1.Field=id ITSVopfOLB=ITS---f0-- Value= 192
    2.Field=name ITSVopfOLB=ITS----0-- Value= queen

    doc2:

    1.Field=id ITSVopfOLB=ITS---f0-- Value= 701492
    2.Field=name ITSVopfOLB=ITS----0-- Value= queen板野友美 (Here
    are chinese
    characters - hopefully you can see them)

    if I search in the index - with a TermQuery there is a different
    behavior between Lucene 3.1.0 and 3.3.0 :

    Query:

    Term:field='name' text='queen'

    Result Lucene 3.1.0:

    0 Score=13,2132 Doc.Id=176002 id=192 name=queen
    1 Score=13,2132 Doc.Id=523407 id=701492 name=queen板野友美

    Result Lucene 3.3.0:

    0 Score=13,2132 Doc.Id=523407 id=701492 name=queen板野友美
    1 Score=13,2132 Doc.Id=176002 id=192 name=queen

    The result from Lucene 3.1.0 is that, what I would expect if I do a
    'exact matching' Term Query.
    Each index was indexed with its associated LuceneVersion.
    I tested it with luke and with my own Code - the result was always the
    same.
    Is it a new feature in Lucene 3.3.0 or a bug?

    Thanks in advance!
    Thomas


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJul 15, '11 at 2:02p
activeJul 19, '11 at 7:38a
posts5
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase