FAQ
I am using this code, with SnowBall and TopDocScore
the code: http://pastebin.com/3X3gbpXE

Example of Question:
- What is the role of PrnP in mad cow disease?

I am running in 11.638 documents and the result is 10410 docs for this
question (lowwwwww precision)
How optimize this?

Thanks,
Celso.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Erick Erickson at Nov 15, 2010 at 10:20 pm
    First question: What's the default operator? Out of
    the box, its OR. See QueryParser.setDefaultOperator...

    Second, how are you forming your query? Just running
    it at the query parser? Query.toString() may be your friend.

    Best
    Erick
    On Mon, Nov 15, 2010 at 2:20 PM, Celso Fontes wrote:

    I am using this code, with SnowBall and TopDocScore
    the code: http://pastebin.com/3X3gbpXE

    Example of Question:
    - What is the role of PrnP in mad cow disease?

    I am running in 11.638 documents and the result is 10410 docs for this
    question (lowwwwww precision)
    How optimize this?

    Thanks,
    Celso.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Celso Fontes at Nov 15, 2010 at 11:01 pm
    Hi Erick,

    My queries going from a list of Genomic TREC 2006...

    What the operator you recommend to me?

    Thanks,
    Celso


    2010/11/15 Erick Erickson <erickerickson@gmail.com>
    First question: What's the default operator? Out of
    the box, its OR. See QueryParser.setDefaultOperator...

    Second, how are you forming your query? Just running
    it at the query parser? Query.toString() may be your friend.

    Best
    Erick
    On Mon, Nov 15, 2010 at 2:20 PM, Celso Fontes wrote:

    I am using this code, with SnowBall and TopDocScore
    the code: http://pastebin.com/3X3gbpXE

    Example of Question:
    - What is the role of PrnP in mad cow disease?

    I am running in 11.638 documents and the result is 10410 docs for this
    question (lowwwwww precision)
    How optimize this?

    Thanks,
    Celso.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ahmet Arslan at Nov 15, 2010 at 11:32 pm

    Example of Question:
    - What is the role of PrnP in mad cow disease?
    First thing is do not directly query questions. Manually formulate queries:
    remove 'what' 'is' 'the' 'of' '?' etc.

    For example i would convert this question into:

    "mad cow"^5 "cow disease"^3 "mad cow disease"^15 "role PrnP"~5^2 "role mad cow disease"~45 mad^0.1 role^0.5 cow disease PrnP^10
    I am running in 11.638 documents and the result is 10410
    docs for this question (lowwwwww precision)
    Use OR default operator, collect and evaluate top 1000 documents only.

    And instead of Porter you can try KStem.
    http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi

    Try different length normalization described here. Also their Lucene query example (SpanNear) can inspire you. http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Lance Norskog at Nov 16, 2010 at 5:56 am
    First, to understand what your query looks like, go to
    admin/analysis.jsp. It lets you see what happens to your queries when
    they go in. Then, do the query with debugQuery=true. This will add some
    complex junk to the end of the XML page that describes in painful detail
    exactly how each document was scored.

    After all that- you might have a problem with the PrnP etc. stuff
    getting chopped up in weird ways. I don't know how people handle this in
    chemistry/bio search.

    Lance

    Ahmet Arslan wrote:
    Example of Question:
    - What is the role of PrnP in mad cow disease?
    First thing is do not directly query questions. Manually formulate queries:
    remove 'what' 'is' 'the' 'of' '?' etc.

    For example i would convert this question into:

    "mad cow"^5 "cow disease"^3 "mad cow disease"^15 "role PrnP"~5^2 "role mad cow disease"~45 mad^0.1 role^0.5 cow disease PrnP^10

    I am running in 11.638 documents and the result is 10410
    docs for this question (lowwwwww precision)
    Use OR default operator, collect and evaluate top 1000 documents only.

    And instead of Porter you can try KStem.
    http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi

    Try different length normalization described here. Also their Lucene query example (SpanNear) can inspire you. http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedNov 15, '10 at 7:21p
activeNov 16, '10 at 5:56a
posts5
users4
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase