FAQ
hi,

i would like to provide a exact "PrefixField Search", i.e. a search for
exactly the first words in a field.
I think I can't use a PrefixQuery because it would find also substrings
inside the field, e.g.
action* would find titles like "Action and knowledge" but also (that's
what i don't want it to find)
"Lucene in Action"

As a regex it would be sth. like /^Action and.*/

Now the question for me is how to implement this functionality, I see to
ways:

1) Some kind of TermEnum over all Docs (or the prefixquery results?) and
string comparison
2) Using the regex contribution
3) a super -fast lucene function I have overseen :)

with 2) I am worrying about performance, anybody have experiences with
regex-queries?

.. but same for 1) anybody already impolemented this already and could
give some code samples / hints ?

tia,


martin





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Erik Hatcher at Nov 14, 2006 at 1:38 pm
    Martin,

    SpanFirstQuery is what you're after.

    Erik

    On Nov 14, 2006, at 8:32 AM, Martin Braun wrote:

    hi,

    i would like to provide a exact "PrefixField Search", i.e. a search
    for
    exactly the first words in a field.
    I think I can't use a PrefixQuery because it would find also
    substrings
    inside the field, e.g.
    action* would find titles like "Action and knowledge" but also (that's
    what i don't want it to find)
    "Lucene in Action"

    As a regex it would be sth. like /^Action and.*/

    Now the question for me is how to implement this functionality, I
    see to
    ways:

    1) Some kind of TermEnum over all Docs (or the prefixquery
    results?) and
    string comparison
    2) Using the regex contribution
    3) a super -fast lucene function I have overseen :)

    with 2) I am worrying about performance, anybody have experiences with
    regex-queries?

    .. but same for 1) anybody already impolemented this already and could
    give some code samples / hints ?

    tia,


    martin





    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Martin Braun at Nov 14, 2006 at 4:19 pm
    Hi Erik,
    SpanFirstQuery is what you're after.
    thanks for this hint (@Erick: thanks for the good explanation of my prob),

    I read the chapter for the spanfirstquery in LIA, but what I don't
    understand is, how do i have to do a "Phrase" SpanFirstQuery?
    I found a message with example code (
    http://www.nabble.com/Speedup-indexing-process-tf1140025.html#a3034612 ):

    here's my jruby snippet:

    SpanFirstQuery = org.apache.lucene.search.spans.SpanFirstQuery
    SpanTermQuery = org.apache.lucene.search.spans.SpanTermQuery
    Term = org.apache.lucene.index.Term

    sp = SpanFirstQuery.new(SpanTermQuery.new(Term.new("TI",search)),2)
    hits = searcher.search(sp)
    for i in 0...hits.length
    puts hits.doc(i).getField("kurz")
    end

    I get no results for "action and" (there are some docs with beginning
    with "action and" in the title) but i get (correct) results for "action",
    What am I doing wrong here?

    tia,
    martin

    Erik

    On Nov 14, 2006, at 8:32 AM, Martin Braun wrote:

    hi,

    i would like to provide a exact "PrefixField Search", i.e. a search for
    exactly the first words in a field.
    I think I can't use a PrefixQuery because it would find also substrings
    inside the field, e.g.
    action* would find titles like "Action and knowledge" but also (that's
    what i don't want it to find)
    "Lucene in Action"

    As a regex it would be sth. like /^Action and.*/

    Now the question for me is how to implement this functionality, I see to
    ways:

    1) Some kind of TermEnum over all Docs (or the prefixquery results?) and
    string comparison
    2) Using the regex contribution
    3) a super -fast lucene function I have overseen :)

    with 2) I am worrying about performance, anybody have experiences with
    regex-queries?

    .. but same for 1) anybody already impolemented this already and could
    give some code samples / hints ?

    tia,


    martin





    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    Universitaetsbibliothek Heidelberg Tel: +49 6221 54-2580
    Ploeck 107-109, D-69117 Heidelberg Fax: +49 6221 54-2623

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erik Hatcher at Nov 14, 2006 at 4:46 pm

    On Nov 14, 2006, at 11:18 AM, Martin Braun wrote:
    Hi Erik,
    SpanFirstQuery is what you're after.
    thanks for this hint (@Erick: thanks for the good explanation of my
    prob),

    I read the chapter for the spanfirstquery in LIA, but what I don't
    understand is, how do i have to do a "Phrase" SpanFirstQuery?
    I found a message with example code (
    http://www.nabble.com/Speedup-indexing-process-
    tf1140025.html#a3034612 ):

    here's my jruby snippet:

    SpanFirstQuery = org.apache.lucene.search.spans.SpanFirstQuery
    SpanTermQuery = org.apache.lucene.search.spans.SpanTermQuery
    Term = org.apache.lucene.index.Term

    sp = SpanFirstQuery.new(SpanTermQuery.new(Term.new("TI",search)),2)
    hits = searcher.search(sp)
    for i in 0...hits.length
    puts hits.doc(i).getField("kurz")
    end

    I get no results for "action and" (there are some docs with beginning
    with "action and" in the title) but i get (correct) results for
    "action",
    What am I doing wrong here?
    "action and" is likely not a single Term, so you'll want to create a
    SpanNearQuery of those individual terms (that match the way they were
    when analyzed and indexed, mind you) and use a SpanNearQuery inside a
    SpanFirstQuery. Make sense?

    JRuby! Yeehaw!

    Erik



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Martin Braun at Nov 16, 2006 at 3:14 pm
    hi Erik,
    "action and" is likely not a single Term, so you'll want to create a
    SpanNearQuery of those individual terms (that match the way they were
    when analyzed and indexed, mind you) and use a SpanNearQuery inside a
    SpanFirstQuery. Make sense?
    Yes, it works (see below)!
    ... but with my Java-App I have the Problem that I need to combine this
    SpanFirstQuery with a Query from the QueryParser,
    i.e. from the Webform I get the SpanFirstQuery (which I am just
    .split'ing as inJRuby Sample) and I get another inputfield with a query
    I normally parse with the QueryParser.

    Is there a way to merge these two query-classes?

    tia,
    martin


    SpanFirstQuery = org.apache.lucene.search.spans.SpanFirstQuery
    SpanTermQuery = org.apache.lucene.search.spans.SpanTermQuery
    SpanQuery = org.apache.lucene.search.spans.SpanQuery
    SpanNearQuery = org.apache.lucene.search.spans.SpanNearQuery

    Term = org.apache.lucene.index.Term

    qs = search.split(/\s/)

    spanq_ar =SpanQuery[].new(qs.length)

    for i in 0...qs.length
    spanq_ar[i] = SpanTermQuery.new( Term.new("TI", qs[i] ) )
    end

    sp = SpanFirstQuery.new(SpanNearQuery.new(spanq_ar,1,true),
    spanq_ar.length)
    hits = searcher.search(sp)
    for i in 0...hits.length
    puts hits.doc(i).getField("kurz")
    end



    --
    Universitaetsbibliothek Heidelberg Tel: +49 6221 54-2580
    Ploeck 107-109, D-69117 Heidelberg Fax: +49 6221 54-2623

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erick Erickson at Nov 14, 2006 at 1:51 pm
    What Erik said <G>...

    But I thought I'd add that I was pleasantly surprised by how very fast the
    regex contribution went when creating a filter. And you can cache the
    filters. Don't be afraid <G>.

    But in this case I don't think that would help either. Your basic problem is
    probably that you're indexing discrete elements because the analyzer breaks
    them up. i.e. "Lucene in action" gets indexed as three tokens, then you have
    a hard time searching them other than as individual tokens. Regex wouldn't
    help you here, since there's really no concept of a "line" after things are
    tokenized (except that you can do interesting things with the offsets of the
    tokens in the field, which is what I believe SpanFirst is doing for you).

    But don't go there, use Erik's suggestion instead. And if that doesn't do
    exactly what you want, consider indexing a separate field that only
    contains, say, the first word and doing your "prefix field search" on that
    field. You wouldn't even have to store that field......

    Best
    Erick@ICanTypeMoreThanErikAndBeLessHelp.....
    On 11/14/06, Martin Braun wrote:

    hi,

    i would like to provide a exact "PrefixField Search", i.e. a search for
    exactly the first words in a field.
    I think I can't use a PrefixQuery because it would find also substrings
    inside the field, e.g.
    action* would find titles like "Action and knowledge" but also (that's
    what i don't want it to find)
    "Lucene in Action"

    As a regex it would be sth. like /^Action and.*/

    Now the question for me is how to implement this functionality, I see to
    ways:

    1) Some kind of TermEnum over all Docs (or the prefixquery results?) and
    string comparison
    2) Using the regex contribution
    3) a super -fast lucene function I have overseen :)

    with 2) I am worrying about performance, anybody have experiences with
    regex-queries?

    .. but same for 1) anybody already impolemented this already and could
    give some code samples / hints ?

    tia,


    martin





    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedNov 14, '06 at 1:32p
activeNov 16, '06 at 3:14p
posts6
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase