FAQ
Hi All,

I use solr 3.x and put excel documents into an index.
I have my own query parser and use SpanQueries to provide a proximity search feature. It works really good.
Most often than not its better to limit the proxmity to one or two line's, not to X words.
I try to find a NewLine indicator... unsuccessfully.
How can use the number of lines as proximity?

Thx!
___________________________________________________________
WEB.DE DSL Doppel-Flat ab 19,99 €/mtl.! Jetzt mit
gratis Handy-Flat! http://produkte.web.de/go/DSL_Doppel_Flatrate/2

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Pierre GOSSE at Feb 8, 2011 at 8:38 am
    Hi Livia,

    One way of doing this line slope would be to implement a custom tokenizer that could tokenize on new line, and split each token into the words it contains. I.e. Each word of a line would be seen as being at the same position (and having same offset and length as the complete line).

    I don't think usual queries would respond well with that field, so maybe you will have to have two fields, one standard for searching and scoring and one custom for filtering your results.

    But maybe that's overkill, and there's a simpler manner to achieve this line slope, I'm quite new to solr. :)

    Pierre

    -----Message d'origine-----
    De : Livia Hauser
    Envoyé : lundi 7 février 2011 20:59
    À : java-user@lucene.apache.org
    Objet : How to implement a proximity search using LINES as slop

    Hi All,

    I use solr 3.x and put excel documents into an index.
    I have my own query parser and use SpanQueries to provide a proximity search feature. It works really good.
    Most often than not its better to limit the proxmity to one or two line's, not to X words.
    I try to find a NewLine indicator... unsuccessfully.
    How can use the number of lines as proximity?

    Thx!
    ___________________________________________________________
    WEB.DE DSL Doppel-Flat ab 19,99 €/mtl.! Jetzt mit
    gratis Handy-Flat! http://produkte.web.de/go/DSL_Doppel_Flatrate/2

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Livia Hauser at Feb 8, 2011 at 9:51 pm
    Hi Pierre,

    many thanks for your idear.
    I had a look to Payloads ... should it possible to store the line number as payload?

    Best Regards,
    Livia


    -----Ursprüngliche Nachricht-----
    Von: "Pierre GOSSE" <pierre.gosse@arisem.com>
    Gesendet: 08.02.2011 09:37:53
    An: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
    Betreff: RE: How to implement a proximity search using LINES as slop
    Hi Livia,

    One way of doing this line slope would be to implement a custom tokenizer that could tokenize on new line, and split each token into the words it contains. I.e. Each word of a line would be seen as being at the same position (and having same offset and length as the complete line).

    I don't think usual queries would respond well with that field, so maybe you will have to have two fields, one standard for searching and scoring and one custom for filtering your results.

    But maybe that's overkill, and there's a simpler manner to achieve this line slope, I'm quite new to solr. :)

    Pierre

    -----Message d'origine-----
    De : Livia Hauser
    Envoyé : lundi 7 février 2011 20:59
    À : java-user@lucene.apache.org
    Objet : How to implement a proximity search using LINES as slop

    Hi All,

    I use solr 3.x and put excel documents into an index.
    I have my own query parser and use SpanQueries to provide a proximity search feature. It works really good.
    Most often than not its better to limit the proxmity to one or two line's, not to X words.
    I try to find a NewLine indicator... unsuccessfully.
    How can use the number of lines as proximity?

    Thx!
    ___________________________________________________________
    WEB.DE DSL Doppel-Flat ab 19,99 €/mtl.! Jetzt mit
    gratis Handy-Flat! http://produkte.web.de/go/DSL_Doppel_Flatrate/2

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ___________________________________________________________
    Neu: WEB.DE De-Mail - Einfach wie E-Mail, sicher wie ein Brief!
    Jetzt De-Mail-Adresse reservieren: https://produkte.web.de/go/demail02

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Pierre GOSSE at Feb 9, 2011 at 8:16 am
    I've just read about payload, and I'm not sure it would be easy to use that feature in calculating distances for SpanQueries. I guess You would have to build your own SpanQuery, using payload instead of term position. But it would be much lighter for index size if it were possible, indeed.

    I hope some lucene expert can give his insight about all this. :)

    Pierre

    -----Message d'origine-----
    De : Livia Hauser
    Envoyé : mardi 8 février 2011 22:51
    À : java-user@lucene.apache.org
    Objet : RE: How to implement a proximity search using LINES as slop

    Hi Pierre,

    many thanks for your idear.
    I had a look to Payloads ... should it possible to store the line number as payload?

    Best Regards,
    Livia


    -----Ursprüngliche Nachricht-----
    Von: "Pierre GOSSE" <pierre.gosse@arisem.com>
    Gesendet: 08.02.2011 09:37:53
    An: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
    Betreff: RE: How to implement a proximity search using LINES as slop
    Hi Livia,

    One way of doing this line slope would be to implement a custom tokenizer that could tokenize on new line, and split each token into the words it contains. I.e. Each word of a line would be seen as being at the same position (and having same offset and length as the complete line).

    I don't think usual queries would respond well with that field, so maybe you will have to have two fields, one standard for searching and scoring and one custom for filtering your results.

    But maybe that's overkill, and there's a simpler manner to achieve this line slope, I'm quite new to solr. :)

    Pierre

    -----Message d'origine-----
    De : Livia Hauser
    Envoyé : lundi 7 février 2011 20:59
    À : java-user@lucene.apache.org
    Objet : How to implement a proximity search using LINES as slop

    Hi All,

    I use solr 3.x and put excel documents into an index.
    I have my own query parser and use SpanQueries to provide a proximity search feature. It works really good.
    Most often than not its better to limit the proxmity to one or two line's, not to X words.
    I try to find a NewLine indicator... unsuccessfully.
    How can use the number of lines as proximity?

    Thx!
    ___________________________________________________________
    WEB.DE DSL Doppel-Flat ab 19,99 €/mtl.! Jetzt mit
    gratis Handy-Flat! http://produkte.web.de/go/DSL_Doppel_Flatrate/2

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ___________________________________________________________
    Neu: WEB.DE De-Mail - Einfach wie E-Mail, sicher wie ein Brief!
    Jetzt De-Mail-Adresse reservieren: https://produkte.web.de/go/demail02

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Doron Cohen at Feb 10, 2011 at 1:54 pm
    IIUC what you are trying to achieve I think the following could help,
    without setting all words in a line to be in the same position:
    At indexing, set a position increment of N (e.g. 100) at line start tokens.
    This would set a position gap of N between last token of line x to first
    token of line x+1.
    At search, if using span near queries (or sloppy phrase queries), set the
    acceptable span as N, to only accept matches from the same line.
    If this makes sense, take a look at Analyzer.getPositionIncrementGap().
    If you also want scores in these cases to treat distances based on line
    numbers, in your custom Similarity change sloppyFreq(distance) to be aware
    of N, e.g. use (1+distance/N) instead of distance for the computation.
    HTH,
    Doron
    On Wed, Feb 9, 2011 at 10:15 AM, Pierre GOSSE wrote:

    I've just read about payload, and I'm not sure it would be easy to use that
    feature in calculating distances for SpanQueries. I guess You would have to
    build your own SpanQuery, using payload instead of term position. But it
    would be much lighter for index size if it were possible, indeed.

    I hope some lucene expert can give his insight about all this. :)

    Pierre

    -----Message d'origine-----
    De : Livia Hauser
    Envoyé : mardi 8 février 2011 22:51
    À : java-user@lucene.apache.org
    Objet : RE: How to implement a proximity search using LINES as slop

    Hi Pierre,

    many thanks for your idear.
    I had a look to Payloads ... should it possible to store the line number as
    payload?

    Best Regards,
    Livia


    -----Ursprüngliche Nachricht-----
    Von: "Pierre GOSSE" <pierre.gosse@arisem.com>
    Gesendet: 08.02.2011 09:37:53
    An: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
    Betreff: RE: How to implement a proximity search using LINES as slop
    Hi Livia,

    One way of doing this line slope would be to implement a custom tokenizer
    that could tokenize on new line, and split each token into the words it
    contains. I.e. Each word of a line would be seen as being at the same
    position (and having same offset and length as the complete line).
    I don't think usual queries would respond well with that field, so maybe
    you will have to have two fields, one standard for searching and scoring and
    one custom for filtering your results.
    But maybe that's overkill, and there's a simpler manner to achieve this
    line slope, I'm quite new to solr. :)
    Pierre

    -----Message d'origine-----
    De : Livia Hauser
    Envoyé : lundi 7 février 2011 20:59
    À : java-user@lucene.apache.org
    Objet : How to implement a proximity search using LINES as slop

    Hi All,

    I use solr 3.x and put excel documents into an index.
    I have my own query parser and use SpanQueries to provide a proximity
    search feature. It works really good.
    Most often than not its better to limit the proxmity to one or two line's,
    not to X words.
    I try to find a NewLine indicator... unsuccessfully.
    How can use the number of lines as proximity?

    Thx!
    ___________________________________________________________
    WEB.DE DSL Doppel-Flat ab 19,99 €/mtl.! Jetzt mit
    gratis Handy-Flat! http://produkte.web.de/go/DSL_Doppel_Flatrate/2

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ___________________________________________________________
    Neu: WEB.DE De-Mail - Einfach wie E-Mail, sicher wie ein Brief!
    Jetzt De-Mail-Adresse reservieren: https://produkte.web.de/go/demail02

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedFeb 7, '11 at 7:59p
activeFeb 10, '11 at 1:54p
posts5
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase