FAQ
Hi,

I need to add a query operator '!' such that when it precedes a word or a
phrase in the query, that term will contribute twice its weight if it is
positioned in an even offset of the document. The position of a phrase is
determined by the offset of its first word.

I guess it involves payloads...

Elias.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • AHMET ARSLAN at Dec 19, 2009 at 4:41 pm

    Hi,

    I need to add a query operator '!' such that when it
    precedes a word or a
    phrase in the query, that term will contribute twice its
    weight if it is
    positioned in an even offset of the document. The position
    of a phrase is
    determined by the offset of its first word.

    I guess it involves payloads...

    Elias.
    '!' is already a query operator. It is equivalent of NOT. So you cannot use it. Why not use carat operator? Like singleterm^2 "some phrase"^2

    [Boosting a Term] http://lucene.apache.org/java/3_0_0/queryparsersyntax.html





    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Elias Khsheibun at Dec 19, 2009 at 4:44 pm
    I want to override the operator - it is for a project purpose.

    -----Original Message-----
    From: AHMET ARSLAN
    Sent: Saturday, December 19, 2009 6:41 PM
    To: java-user@lucene.apache.org
    Subject: Re: Payloads
    Hi,

    I need to add a query operator '!' such that when it
    precedes a word or a
    phrase in the query, that term will contribute twice its
    weight if it is
    positioned in an even offset of the document. The position
    of a phrase is
    determined by the offset of its first word.

    I guess it involves payloads...

    Elias.
    '!' is already a query operator. It is equivalent of NOT. So you cannot use
    it. Why not use carat operator? Like singleterm^2 "some phrase"^2

    [Boosting a Term] http://lucene.apache.org/java/3_0_0/queryparsersyntax.html





    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • AHMET ARSLAN at Dec 19, 2009 at 4:48 pm

    I want to override the operator - it
    is for a project purpose.
    Can you explain your requirements more? What do you mean by "an even offset of the document"?




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Elias Khsheibun at Dec 19, 2009 at 4:54 pm
    Let's say I have a document that contains the following text:

    "Graph Algorithms is one of the most important topics in computer science"

    And a query "!Graph Algorithms" then the term Graph in the query should have
    a double weight because the offset of Graph is 0 (and it is even) - we apply
    this doubling of weight only if a '!' operator precedes the term and if its
    offset from the document is even.


    -----Original Message-----
    From: AHMET ARSLAN
    Sent: Saturday, December 19, 2009 6:48 PM
    To: java-user@lucene.apache.org
    Subject: RE: Payloads
    I want to override the operator - it
    is for a project purpose.
    Can you explain your requirements more? What do you mean by "an even offset
    of the document"?




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Uwe Schindler at Dec 19, 2009 at 5:01 pm
    Just a question, how big is this university course about Lucene? You are the
    third asking for the same :-)

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Elias Khsheibun
    Sent: Saturday, December 19, 2009 5:54 PM
    To: java-user@lucene.apache.org
    Subject: RE: Payloads

    Let's say I have a document that contains the following text:

    "Graph Algorithms is one of the most important topics in computer science"

    And a query "!Graph Algorithms" then the term Graph in the query should
    have
    a double weight because the offset of Graph is 0 (and it is even) - we
    apply
    this doubling of weight only if a '!' operator precedes the term and if
    its
    offset from the document is even.


    -----Original Message-----
    From: AHMET ARSLAN
    Sent: Saturday, December 19, 2009 6:48 PM
    To: java-user@lucene.apache.org
    Subject: RE: Payloads
    I want to override the operator - it
    is for a project purpose.
    Can you explain your requirements more? What do you mean by "an even
    offset
    of the document"?




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Elias Khsheibun at Dec 19, 2009 at 5:05 pm
    About 60 students I think, if you have given some answers I would be
    grateful if you could link me to them or quote them again.

    -----Original Message-----
    From: Uwe Schindler
    Sent: Saturday, December 19, 2009 7:00 PM
    To: java-user@lucene.apache.org
    Subject: RE: Payloads

    Just a question, how big is this university course about Lucene? You are the
    third asking for the same :-)

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Elias Khsheibun
    Sent: Saturday, December 19, 2009 5:54 PM
    To: java-user@lucene.apache.org
    Subject: RE: Payloads

    Let's say I have a document that contains the following text:

    "Graph Algorithms is one of the most important topics in computer science"

    And a query "!Graph Algorithms" then the term Graph in the query should
    have
    a double weight because the offset of Graph is 0 (and it is even) - we
    apply
    this doubling of weight only if a '!' operator precedes the term and if
    its
    offset from the document is even.


    -----Original Message-----
    From: AHMET ARSLAN
    Sent: Saturday, December 19, 2009 6:48 PM
    To: java-user@lucene.apache.org
    Subject: RE: Payloads
    I want to override the operator - it
    is for a project purpose.
    Can you explain your requirements more? What do you mean by "an even
    offset
    of the document"?




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • AHMET ARSLAN at Dec 19, 2009 at 6:34 pm

    Let's say I have a document that
    contains the following text:

    "Graph Algorithms is one of the most important topics in
    computer science"

    And a query "!Graph Algorithms" then the term Graph in the
    query should have
    a double weight because the offset of Graph is 0 (and it is
    even) - we apply
    this doubling of weight only if a '!' operator precedes the
    term and if its
    offset from the document is even.
    I modified the TokenOffsetPayloadTokenFilter and created TermPositionPayloadTokenFilter.

    Index time you can use WhitespaceTokenizer + TermPositionPayloadTokenFilter to assign payload values of 2.0f to the tokens that have an even term position.

    Modifying the QueryParser to change the meaning of ! operator is very troublesome.
    If you can convert your query "!Graph Algorithms" to "Graph|2.0 Algorithms" you can use DelimitedPayloadTokenFilter to set payload of marked term.

    Additionally you need to everride QueryParser to return PayloadTermQuery
    and scorePayload method of DefaultSimilarity.
    By doing so payloads will be included in score calculation.


    public class PayloadAnalyzer extends Analyzer {

    public TokenStream tokenStream(String fieldName, Reader reader) {
    TokenStream result = new WhitespaceTokenizer(reader);
    result = new DelimitedPayloadTokenFilter(result, '|', new FloatEncoder());
    return result;
    }

    public static void main(String[] args) throws ParseException {
    QueryParser qp = new QueryParser(Version.LUCENE_29, "f", new PayloadAnalyzer());
    System.out.println(qp.parse("Graph|2.0 Algorithms").toString());
    }

    }
    public class TermPositionPayloadTokenFilter extends TokenFilter {

    protected PayloadAttribute payAtt;
    protected PositionIncrementAttribute posIncrAtt;

    private static final Payload evenPayload = new Payload(PayloadHelper.encodeFloat(2.0f));

    private int termPosition = 0;

    public TermPositionPayloadTokenFilter(TokenStream input) {
    super(input);
    payAtt = (PayloadAttribute) addAttribute(PayloadAttribute.class);
    posIncrAtt = (PositionIncrementAttribute) addAttribute(PositionIncrementAttribute.class);
    }

    public final boolean incrementToken() throws IOException {
    if (input.incrementToken()) {
    if ((termPosition % 2) == 0)
    payAtt.setPayload(evenPayload);
    termPosition += posIncrAtt.getPositionIncrement();
    return true;
    } else {
    return false;
    }
    }

    public static void main(String[] args) throws IOException {
    String test = "Graph Algorithms is one of the most important topics in computer science";
    TokenStream tokenStream = new TermPositionPayloadTokenFilter(new WhitespaceTokenizer(new StringReader(test)));
    TermAttribute termAtt = (TermAttribute) tokenStream.getAttribute(TermAttribute.class);
    PayloadAttribute payloadAtt = (PayloadAttribute) tokenStream.getAttribute(PayloadAttribute.class);

    while (tokenStream.incrementToken()) {
    System.out.print(termAtt.term());
    Payload payload = payloadAtt.getPayload();
    if (payload != null)
    System.out.println(" Payload = " + PayloadHelper.decodeFloat(payload.toByteArray()));
    else
    System.out.println(" Payload is null.");
    }
    }
    }




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Elias Khsheibun at Dec 19, 2009 at 8:53 pm
    If I need to override the QueryParser to return PayloadTermQuery, what
    function for PayloadFunction should I use in the constructor (If you can
    show me an example).

    In your code I didn't see an indexer, will this work with the regular
    IndexWriter but with the new Analyzer that you overloaded ?

    -----Original Message-----
    From: AHMET ARSLAN
    Sent: Saturday, December 19, 2009 8:34 PM
    To: java-user@lucene.apache.org
    Subject: RE: Payloads
    Let's say I have a document that
    contains the following text:

    "Graph Algorithms is one of the most important topics in computer
    science"

    And a query "!Graph Algorithms" then the term Graph in the query
    should have a double weight because the offset of Graph is 0 (and it
    is
    even) - we apply
    this doubling of weight only if a '!' operator precedes the term and
    if its offset from the document is even.
    I modified the TokenOffsetPayloadTokenFilter and created
    TermPositionPayloadTokenFilter.

    Index time you can use WhitespaceTokenizer + TermPositionPayloadTokenFilter
    to assign payload values of 2.0f to the tokens that have an even term
    position.

    Modifying the QueryParser to change the meaning of ! operator is very
    troublesome.
    If you can convert your query "!Graph Algorithms" to "Graph|2.0 Algorithms"
    you can use DelimitedPayloadTokenFilter to set payload of marked term.

    Additionally you need to everride QueryParser to return PayloadTermQuery and
    scorePayload method of DefaultSimilarity.
    By doing so payloads will be included in score calculation.


    public class PayloadAnalyzer extends Analyzer {

    public TokenStream tokenStream(String fieldName, Reader reader) {
    TokenStream result = new WhitespaceTokenizer(reader);
    result = new DelimitedPayloadTokenFilter(result, '|', new
    FloatEncoder());
    return result;
    }

    public static void main(String[] args) throws ParseException {
    QueryParser qp = new QueryParser(Version.LUCENE_29, "f", new
    PayloadAnalyzer());
    System.out.println(qp.parse("Graph|2.0 Algorithms").toString());
    }

    }
    public class TermPositionPayloadTokenFilter extends TokenFilter {

    protected PayloadAttribute payAtt;
    protected PositionIncrementAttribute posIncrAtt;

    private static final Payload evenPayload = new
    Payload(PayloadHelper.encodeFloat(2.0f));

    private int termPosition = 0;

    public TermPositionPayloadTokenFilter(TokenStream input) {
    super(input);
    payAtt = (PayloadAttribute) addAttribute(PayloadAttribute.class);
    posIncrAtt = (PositionIncrementAttribute)
    addAttribute(PositionIncrementAttribute.class);
    }

    public final boolean incrementToken() throws IOException {
    if (input.incrementToken()) {
    if ((termPosition % 2) == 0)
    payAtt.setPayload(evenPayload);
    termPosition += posIncrAtt.getPositionIncrement();
    return true;
    } else {
    return false;
    }
    }

    public static void main(String[] args) throws IOException {
    String test = "Graph Algorithms is one of the most important topics
    in computer science";
    TokenStream tokenStream = new TermPositionPayloadTokenFilter(new
    WhitespaceTokenizer(new StringReader(test)));
    TermAttribute termAtt = (TermAttribute)
    tokenStream.getAttribute(TermAttribute.class);
    PayloadAttribute payloadAtt = (PayloadAttribute)
    tokenStream.getAttribute(PayloadAttribute.class);

    while (tokenStream.incrementToken()) {
    System.out.print(termAtt.term());
    Payload payload = payloadAtt.getPayload();
    if (payload != null)
    System.out.println(" Payload = " +
    PayloadHelper.decodeFloat(payload.toByteArray()));
    else
    System.out.println(" Payload is null.");
    }
    }
    }




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • AHMET ARSLAN at Dec 19, 2009 at 9:19 pm

    If I need to override the QueryParser
    to return PayloadTermQuery, what
    function for PayloadFunction should I use in the
    constructor (If you can
    show me an example).
    I am not sure about that. Maybe custom one.
    In your code I didn't see an indexer, will this work with
    the regular
    IndexWriter but with the new Analyzer that you overloaded
    No, at index time [IndexWriter] you are going to use a new analyzer that uses WhitespaceTokenizer + TermPositionPayloadTokenFilter.

    PayloadAnalyzer will be used at query time. [QueryParser]

    You need to setSimilarity(new CustomSimilarity) of both indexer and searcher.




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Elias Khsheibun at Dec 19, 2009 at 10:45 pm
    What do you mean by a custom one - please explain. I must use a
    PayloadTermQuery ?

    And for the TermPositionPayloadTokenFilter there is a method that is not
    used - incrementToken (only used in the main method) ... I didn't see in the
    code the place that examines if the query term is at an even offset of the
    document I can see it is only called from the main method - but how should
    this work all together ?

    Thank you.

    If I need to override the QueryParser
    to return PayloadTermQuery, what
    function for PayloadFunction should I use in the constructor (If you
    can show me an example).
    I am not sure about that. Maybe custom one.
    In your code I didn't see an indexer, will this work with the regular
    IndexWriter but with the new Analyzer that you overloaded
    No, at index time [IndexWriter] you are going to use a new analyzer that
    uses WhitespaceTokenizer + TermPositionPayloadTokenFilter.

    PayloadAnalyzer will be used at query time. [QueryParser]

    You need to setSimilarity(new CustomSimilarity) of both indexer and
    searcher.



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Elias Khsheibun at Dec 20, 2009 at 1:51 pm
    I'm trying to run queries now, the problem is - the scoring of the
    BoostingTermQuery is always giving a double weight to even terms, and not if
    the query itself contains the term, here is the code that I'm using:


    public class DocumentAnalyzer extends Analyzer {

    @Override
    public TokenStream tokenStream(String fieldName, Reader reader) {
    TokenStream result = new WhitespaceTokenizer(reader);
    result = new TermPositionPayloadTokenFilter(result);

    return result;
    }

    }


    public class TermPositionPayloadTokenFilter extends TokenFilter {

    protected PayloadAttribute payAtt;
    protected PositionIncrementAttribute posIncrAtt;

    private static final Payload evenPayload = new
    Payload(PayloadHelper.encodeFloat(2.0f));

    private int termPosition = 0;

    public TermPositionPayloadTokenFilter(TokenStream input) {
    super(input);
    payAtt = (PayloadAttribute) addAttribute(PayloadAttribute.class);
    posIncrAtt = (PositionIncrementAttribute)
    addAttribute(PositionIncrementAttribute.class);
    }

    @Override
    public final boolean incrementToken() throws IOException {
    if (input.incrementToken()) {
    if ((termPosition % 2) == 0)
    payAtt.setPayload(evenPayload);
    termPosition += posIncrAtt.getPositionIncrement();
    return true;
    } else {
    return false;
    }
    }

    }



    public class BoostingSimilarity extends DefaultSimilarity {
    public float scorePayload(String fieldName, byte[] payload, int
    offset, int length) {
    if (payload != null)
    return PayloadHelper.decodeFloat(payload, offset);

    else
    return 1.0F;
    }
    }

    And this is a test I've written, if you look at the scores, then you will
    notice that the BoostingTermQuery is always giving a double weight to even
    terms no matter if they appear in the query or no (this is my current
    problem now):

    public class PayloadsTest extends TestCase {
    Directory dir;
    IndexWriter writer;
    DocumentAnalyzer analyzer;
    protected void setUp() throws Exception {
    super.setUp();
    dir = new RAMDirectory();
    analyzer = new DocumentAnalyzer();
    writer = new IndexWriter(dir, analyzer,
    IndexWriter.MaxFieldLength.UNLIMITED);
    }
    protected void tearDown() throws Exception {
    super.tearDown();
    writer.close();
    }
    void addDoc(String title, String contents) throws IOException {
    Document doc = new Document();
    doc.add(new Field("title",
    title,
    Field.Store.YES,
    Field.Index.NO));

    doc.add(new Field("contents",
    contents,
    Field.Store.NO,
    Field.Index.ANALYZED));

    writer.addDocument(doc);
    }

    public void testBoostingTermQuery() throws Throwable {
    addDoc("Hurricane warning", "A hurricane warning was issued at 6 AM
    for the outer great banks");
    addDoc("Warning label maker", "The warning label maker is a
    delightful toy for your precocious six year old's warning needs");
    addDoc("Tornado warning", "There is a tornado warning for Worcester
    county until 6 PM today");
    writer.commit();
    IndexSearcher searcher = new IndexSearcher(dir);
    searcher.setSimilarity(new BoostingSimilarity());
    Term warning = new Term("contents", "tornado");
    Query query1 = new TermQuery(warning);
    System.out.println("\nTermQuery results:");

    ScoreDoc [] hits = searcher.search(query1, 10).scoreDocs;
    for (int i = 0; i < hits.length; i++) {
    Document hitDoc = searcher.doc(hits[i].doc);
    System.out.println(hitDoc.get("title"));
    }
    Query query2 = new BoostingTermQuery(warning);
    System.out.println("\nBoostingTermQuery results:");

    ScoreDoc [] hits2 = searcher.search(query2, 10).scoreDocs;
    for (int i = 0; i < hits2.length; i++) {
    Document hitDoc = searcher.doc(hits2[i].doc);
    System.out.println(hitDoc.get("title"));
    }
    }
    }


    -----Original Message-----
    From: AHMET ARSLAN
    Sent: Saturday, December 19, 2009 11:19 PM
    To: java-user@lucene.apache.org
    Subject: RE: Payloads

    If I need to override the QueryParser
    to return PayloadTermQuery, what
    function for PayloadFunction should I use in the
    constructor (If you can
    show me an example).
    I am not sure about that. Maybe custom one.
    In your code I didn't see an indexer, will this work with
    the regular
    IndexWriter but with the new Analyzer that you overloaded
    No, at index time [IndexWriter] you are going to use a new analyzer that
    uses WhitespaceTokenizer + TermPositionPayloadTokenFilter.

    PayloadAnalyzer will be used at query time. [QueryParser]

    You need to setSimilarity(new CustomSimilarity) of both indexer and
    searcher.




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Uwe Schindler at Dec 20, 2009 at 3:07 pm
    The problem was solved in #lucene irc channel already. The behaviour of
    PayloadTermQuery was correct if you compare scores of a document with an
    even and no-even match in the *same* query.

    In general: You cannot compare scores on different queries or different
    indexes.

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Elias Khsheibun
    Sent: Sunday, December 20, 2009 2:51 PM
    To: java-user@lucene.apache.org
    Subject: RE: Payloads


    I'm trying to run queries now, the problem is - the scoring of the
    BoostingTermQuery is always giving a double weight to even terms, and not
    if
    the query itself contains the term, here is the code that I'm using:


    public class DocumentAnalyzer extends Analyzer {

    @Override
    public TokenStream tokenStream(String fieldName, Reader reader) {
    TokenStream result = new WhitespaceTokenizer(reader);
    result = new TermPositionPayloadTokenFilter(result);

    return result;
    }

    }


    public class TermPositionPayloadTokenFilter extends TokenFilter {

    protected PayloadAttribute payAtt;
    protected PositionIncrementAttribute posIncrAtt;

    private static final Payload evenPayload = new
    Payload(PayloadHelper.encodeFloat(2.0f));

    private int termPosition = 0;

    public TermPositionPayloadTokenFilter(TokenStream input) {
    super(input);
    payAtt = (PayloadAttribute) addAttribute(PayloadAttribute.class);
    posIncrAtt = (PositionIncrementAttribute)
    addAttribute(PositionIncrementAttribute.class);
    }

    @Override
    public final boolean incrementToken() throws IOException {
    if (input.incrementToken()) {
    if ((termPosition % 2) == 0)
    payAtt.setPayload(evenPayload);
    termPosition += posIncrAtt.getPositionIncrement();
    return true;
    } else {
    return false;
    }
    }

    }



    public class BoostingSimilarity extends DefaultSimilarity {
    public float scorePayload(String fieldName, byte[] payload, int
    offset, int length) {
    if (payload != null)
    return PayloadHelper.decodeFloat(payload, offset);

    else
    return 1.0F;
    }
    }

    And this is a test I've written, if you look at the scores, then you will
    notice that the BoostingTermQuery is always giving a double weight to even
    terms no matter if they appear in the query or no (this is my current
    problem now):

    public class PayloadsTest extends TestCase {
    Directory dir;
    IndexWriter writer;
    DocumentAnalyzer analyzer;
    protected void setUp() throws Exception {
    super.setUp();
    dir = new RAMDirectory();
    analyzer = new DocumentAnalyzer();
    writer = new IndexWriter(dir, analyzer,
    IndexWriter.MaxFieldLength.UNLIMITED);
    }
    protected void tearDown() throws Exception {
    super.tearDown();
    writer.close();
    }
    void addDoc(String title, String contents) throws IOException {
    Document doc = new Document();
    doc.add(new Field("title",
    title,
    Field.Store.YES,
    Field.Index.NO));

    doc.add(new Field("contents",
    contents,
    Field.Store.NO,
    Field.Index.ANALYZED));

    writer.addDocument(doc);
    }

    public void testBoostingTermQuery() throws Throwable {
    addDoc("Hurricane warning", "A hurricane warning was issued at 6 AM
    for the outer great banks");
    addDoc("Warning label maker", "The warning label maker is a
    delightful toy for your precocious six year old's warning needs");
    addDoc("Tornado warning", "There is a tornado warning for Worcester
    county until 6 PM today");
    writer.commit();
    IndexSearcher searcher = new IndexSearcher(dir);
    searcher.setSimilarity(new BoostingSimilarity());
    Term warning = new Term("contents", "tornado");
    Query query1 = new TermQuery(warning);
    System.out.println("\nTermQuery results:");

    ScoreDoc [] hits = searcher.search(query1, 10).scoreDocs;
    for (int i = 0; i < hits.length; i++) {
    Document hitDoc = searcher.doc(hits[i].doc);
    System.out.println(hitDoc.get("title"));
    }
    Query query2 = new BoostingTermQuery(warning);
    System.out.println("\nBoostingTermQuery results:");

    ScoreDoc [] hits2 = searcher.search(query2, 10).scoreDocs;
    for (int i = 0; i < hits2.length; i++) {
    Document hitDoc = searcher.doc(hits2[i].doc);
    System.out.println(hitDoc.get("title"));
    }
    }
    }


    -----Original Message-----
    From: AHMET ARSLAN
    Sent: Saturday, December 19, 2009 11:19 PM
    To: java-user@lucene.apache.org
    Subject: RE: Payloads

    If I need to override the QueryParser
    to return PayloadTermQuery, what
    function for PayloadFunction should I use in the
    constructor (If you can
    show me an example).
    I am not sure about that. Maybe custom one.
    In your code I didn't see an indexer, will this work with
    the regular
    IndexWriter but with the new Analyzer that you overloaded
    No, at index time [IndexWriter] you are going to use a new analyzer that
    uses WhitespaceTokenizer + TermPositionPayloadTokenFilter.

    PayloadAnalyzer will be used at query time. [QueryParser]

    You need to setSimilarity(new CustomSimilarity) of both indexer and
    searcher.




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Elias Khsheibun at Dec 21, 2009 at 6:52 pm
    Thank you, I managed to do that for terms - but for a phrase like the
    example below ("!Graph Algorithms") I'm still don't know how to do it...

    -----Original Message-----
    From: AHMET ARSLAN
    Sent: Saturday, December 19, 2009 8:34 PM
    To: java-user@lucene.apache.org
    Subject: RE: Payloads

    Let's say I have a document that
    contains the following text:

    "Graph Algorithms is one of the most important topics in computer
    science"

    And a query "!Graph Algorithms" then the term Graph in the query
    should have a double weight because the offset of Graph is 0 (and it
    is
    even) - we apply
    this doubling of weight only if a '!' operator precedes the term and
    if its offset from the document is even.
    I modified the TokenOffsetPayloadTokenFilter and created
    TermPositionPayloadTokenFilter.

    Index time you can use WhitespaceTokenizer + TermPositionPayloadTokenFilter
    to assign payload values of 2.0f to the tokens that have an even term
    position.

    Modifying the QueryParser to change the meaning of ! operator is very
    troublesome.
    If you can convert your query "!Graph Algorithms" to "Graph|2.0 Algorithms"
    you can use DelimitedPayloadTokenFilter to set payload of marked term.

    Additionally you need to everride QueryParser to return PayloadTermQuery and
    scorePayload method of DefaultSimilarity.
    By doing so payloads will be included in score calculation.


    public class PayloadAnalyzer extends Analyzer {

    public TokenStream tokenStream(String fieldName, Reader reader) {
    TokenStream result = new WhitespaceTokenizer(reader);
    result = new DelimitedPayloadTokenFilter(result, '|', new
    FloatEncoder());
    return result;
    }

    public static void main(String[] args) throws ParseException {
    QueryParser qp = new QueryParser(Version.LUCENE_29, "f", new
    PayloadAnalyzer());
    System.out.println(qp.parse("Graph|2.0 Algorithms").toString());
    }

    }
    public class TermPositionPayloadTokenFilter extends TokenFilter {

    protected PayloadAttribute payAtt;
    protected PositionIncrementAttribute posIncrAtt;

    private static final Payload evenPayload = new
    Payload(PayloadHelper.encodeFloat(2.0f));

    private int termPosition = 0;

    public TermPositionPayloadTokenFilter(TokenStream input) {
    super(input);
    payAtt = (PayloadAttribute) addAttribute(PayloadAttribute.class);
    posIncrAtt = (PositionIncrementAttribute)
    addAttribute(PositionIncrementAttribute.class);
    }

    public final boolean incrementToken() throws IOException {
    if (input.incrementToken()) {
    if ((termPosition % 2) == 0)
    payAtt.setPayload(evenPayload);
    termPosition += posIncrAtt.getPositionIncrement();
    return true;
    } else {
    return false;
    }
    }

    public static void main(String[] args) throws IOException {
    String test = "Graph Algorithms is one of the most important topics
    in computer science";
    TokenStream tokenStream = new TermPositionPayloadTokenFilter(new
    WhitespaceTokenizer(new StringReader(test)));
    TermAttribute termAtt = (TermAttribute)
    tokenStream.getAttribute(TermAttribute.class);
    PayloadAttribute payloadAtt = (PayloadAttribute)
    tokenStream.getAttribute(PayloadAttribute.class);

    while (tokenStream.incrementToken()) {
    System.out.print(termAtt.term());
    Payload payload = payloadAtt.getPayload();
    if (payload != null)
    System.out.println(" Payload = " +
    PayloadHelper.decodeFloat(payload.toByteArray()));
    else
    System.out.println(" Payload is null.");
    }
    }
    }




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedDec 19, '09 at 1:07p
activeDec 21, '09 at 6:52p
posts14
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase