First, to understand what your query looks like, go to
admin/analysis.jsp. It lets you see what happens to your queries when
they go in. Then, do the query with debugQuery=true. This will add some
complex junk to the end of the XML page that describes in painful detail
exactly how each document was scored.
After all that- you might have a problem with the PrnP etc. stuff
getting chopped up in weird ways. I don't know how people handle this in
chemistry/bio search.
Lance
Ahmet Arslan wrote:
Example of Question:
- What is the role of PrnP in mad cow disease?
First thing is do not directly query questions. Manually formulate queries:
remove 'what' 'is' 'the' 'of' '?' etc.
For example i would convert this question into:
"mad cow"^5 "cow disease"^3 "mad cow disease"^15 "role PrnP"~5^2 "role mad cow disease"~45 mad^0.1 role^0.5 cow disease PrnP^10
I am running in 11.638 documents and the result is 10410
docs for this question (lowwwwww precision)
Use OR default operator, collect and evaluate top 1000 documents only.
And instead of Porter you can try KStem.
http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgiTry different length normalization described here. Also their Lucene query example (SpanNear) can inspire you.
http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org