FAQ
Dear All,

I have two documents. The analyzed and the tokenized contents are mentioned
below.

*Document 1 :*

*when*, null_1, *my*, null_1, money,

fund, amount, payment, creditcard, credit,

card, *bank, account*, debit, deduct,

*charge*, null_1, my, mobile, usage,

*service*, connection


*Document 2:*

*when*, what, time, what, day,

null_1, money, fund, cash, payment,

null_1, i, do, you, i,

null_1, deduct, *charge*, reduce, debit,

from, *my*, *bank, account*, credit,

card, null_1, *adsl*, adsl1, adsl-2,

adsl-1, adsl2, adsl, 1, adsl,

2, usage, connection, *service*


Then, I searched for the following text.

*Query:* when my bank account charge adsl service

*Scores
*

Document 1 = 0.74406385

Document 2 = Score = 0.66067594

I was expecting to have Document 2 as the top ranked document. But I get
Document 1 as the top ranked even it does not contains the term “adsl”.

The word order of the Document 1 matches with the query very well. Can it
be the reason ?

If it is, how can I neglect the word order when searching. (I am not using
phase queries).

My searching code look like below and it is very simple.


*QueryParser parser = new QueryParser(Version.LUCENE_30, *

*"pattern", *

*new StandardAnalyzer(Version.LUCENE_30)); *

*org.apache.lucene.search.Query query1 =
parser.parse(this.query.getQuestion()); *

*TopDocs hits = is.search(query1, 10); *

Please advice


Thanks,

Lahiru

Search Discussions

  • Ian Lea at Jan 18, 2011 at 12:29 pm
    See what Searcher.explain() says for each hit. I don't think that word
    order will matter with the query you give. There are several factors
    in scoring - see oal.search.Similarity or google lucene scoring.

    Or have a play with Luke: invaluable for investigating things with
    lucene and will tell you everything about your index.


    --
    Ian.

    On Tue, Jan 18, 2011 at 12:12 PM, Lahiru Samarakoon wrote:
    Dear All,

    I have two documents. The analyzed and the tokenized contents are mentioned
    below.

    *Document 1 :*

    *when*, null_1, *my*, null_1, money,

    fund, amount, payment, creditcard, credit,

    card, *bank, account*, debit, deduct,

    *charge*, null_1, my, mobile, usage,

    *service*, connection


    *Document 2:*

    *when*, what, time, what, day,

    null_1, money, fund, cash, payment,

    null_1, i, do, you, i,

    null_1, deduct, *charge*, reduce, debit,

    from, *my*, *bank, account*, credit,

    card, null_1, *adsl*, adsl1, adsl-2,

    adsl-1, adsl2, adsl, 1, adsl,

    2, usage, connection, *service*


    Then, I searched for the following text.

    *Query:* when my bank account charge adsl service

    *Scores
    *

    Document 1 = 0.74406385

    Document 2 = Score = 0.66067594

    I was expecting to have Document 2 as the top ranked document. But I get
    Document 1 as the top ranked even it does not contains  the term “adsl”.

    The word order of the Document 1 matches with the query very well. Can it
    be the reason ?

    If it is, how can I neglect the word order when searching. (I am not using
    phase queries).

    My searching code look like below and it is very simple.


    *QueryParser parser = new QueryParser(Version.LUCENE_30, *

    *"pattern", *

    *new StandardAnalyzer(Version.LUCENE_30)); *

    *org.apache.lucene.search.Query query1 =
    parser.parse(this.query.getQuestion()); *

    *TopDocs hits = is.search(query1, 10); *

    Please advice


    Thanks,

    Lahiru
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Umesh Prasad at Jan 18, 2011 at 12:33 pm
    Hi Lahiru,
    Comments are inline:

    On Tue, Jan 18, 2011 at 5:42 PM, Lahiru Samarakoon wrote:

    Dear All,

    I have two documents. The analyzed and the tokenized contents are
    mentioned
    below.

    *Document 1 :*

    *when*, null_1, *my*, null_1, money,

    fund, amount, payment, creditcard, credit,

    card, *bank, account*, debit, deduct,

    *charge*, null_1, my, mobile, usage,

    *service*, connection


    *Document 2:*

    *when*, what, time, what, day,

    null_1, money, fund, cash, payment,

    null_1, i, do, you, i,

    null_1, deduct, *charge*, reduce, debit,

    from, *my*, *bank, account*, credit,

    card, null_1, *adsl*, adsl1, adsl-2,

    adsl-1, adsl2, adsl, 1, adsl,

    2, usage, connection, *service*


    Then, I searched for the following text.

    *Query:* when my bank account charge adsl service

    *Scores
    *

    Document 1 = 0.74406385

    Document 2 = Score = 0.66067594

    Please read the documentation of lucene scoring.
    http://lucene.apache.org/java/2_9_1/scoring.html.
    That will help you understand the bigger picture.

    I was expecting to have Document 2 as the top ranked document. But I get
    Document 1 as the top ranked even it does not contains the term “adsl”.

    The word order of the Document 1 matches with the query very well. Can it
    be the reason ?

    Word order doesn't matter. However tf/idf , norms and other factors do
    matter as described in above link.

    You can get see how , documents got assigned score by using

    IndexSearcher.explain(query,docId); as described in
    http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/search/Searcher.html#explain%28org.apache.lucene.search.Query,%20int%29


    If it is, how can I neglect the word order when searching. (I am not using
    phase queries).

    My searching code look like below and it is very simple.


    *QueryParser parser = new QueryParser(Version.LUCENE_30, *

    *"pattern", *

    *new StandardAnalyzer(Version.LUCENE_30)); *

    *org.apache.lucene.search.Query query1 =
    parser.parse(this.query.getQuestion()); *

    *TopDocs hits = is.search(query1, 10); *

    Please advice


    Thanks,

    Lahiru


    --
    ---
    Thanks & Regards
    Umesh Prasad
  • Lahiru Samarakoon at Jan 18, 2011 at 1:47 pm
    HI Ian & Umesh.

    This is what I was looking for.
    Thank a lot.

    Regards,
    Lahiru

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJan 18, '11 at 12:12p
activeJan 18, '11 at 1:47p
posts4
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase