FAQ
Hi,

I have a problem to do searches in fields tokenized.
Initially I had associated with an advertisement 10 terms and for each term corresponded to one field in my index and the query had operations OR for the 10 fields.
Now, the advertisements have more than 2,000 terms and the current solution (to create 2,000 fields) not works.
I think in create only field, that contens all terms tokenized with ";" for example. How I can do search in a field that contains tokenized fields or exists another solution for this problem?

Example:
advertise_id = "00001"
terms[2000]:
1- "home work"
2- "house"
3- "yellow green ball sell"
4- "star sports"
5- "tennis ball new"
...
2000- "xyz"
My unique field contains: "home work; house; yellow green ball sell; star sports; tennis ball new; ... ; xyz;"
If my query is:
query= "house" -> result = 1
query= "yellow ball" -> result = 1
query= "yellow sell" -> result = 1
query= "ball star" -> result = 0 (no has result)
query= "home xyz" -> result = 0 (no has result)

Haroldo

_________________________________________________________________
Mais do que emails! Confira tudo o que Windows Live™ pode oferecer.
http://www.microsoft.com/windows/windowslive/

Search Discussions

  • Haroldo Nascimento at Jan 17, 2009 at 7:48 pm
    Hi,

    I send my first e-mail to java-user@lucene.apache.org and recieve the e-mail error below.

    Why my message no send to list ?

    Thanks

    Haroldo> To: haroldo_lucene@hotmail.com> Subject: RE: Search on tokenized field> From: support@magentanews.com> Date: Sat, 17 Jan 2009 14:35:40 -0500> > Dear sender,> > Delivery of your message has failed. This is an automatic reply.> > The domain magentanews.com has been changed and is longer in use. Please resend your email to the same user in the format "name@meltwater.com" and be sure to update your address book.> > If this is a support request or you require further assistance please resend your mail to support@meltwaternews.com> > Best regards,> Meltwater Group>
    _________________________________________________________________
    Mais do que emails! Confira tudo o que Windows Live™ pode oferecer.
    http://www.microsoft.com/windows/windowslive/
  • Michael McCandless at Jan 19, 2009 at 10:45 am
    Your message did go through to java-user.

    It's just that one subscriber to java-user failed to receive your
    message, and the bounce from that subscriber was then forwarded back
    to you. It's rather alarming the first time you see it....

    Mike

    Haroldo Nascimento wrote:
    Hi,

    I send my first e-mail to java-user@lucene.apache.org and recieve
    the e-mail error below.

    Why my message no send to list ?

    Thanks

    Haroldo> To: haroldo_lucene@hotmail.com> Subject: RE: Search on
    tokenized field> From: support@magentanews.com> Date: Sat, 17 Jan
    2009 14:35:40 -0500> > Dear sender,> > Delivery of your message has
    failed. This is an automatic reply.> > The domain magentanews.com
    has been changed and is longer in use. Please resend your email to
    the same user in the format "name@meltwater.com" and be sure to
    update your address book.> > If this is a support request or you
    require further assistance please resend your mail to support@meltwaternews.com
    Best regards,> Meltwater Group>
    _________________________________________________________________
    Mais do que emails! Confira tudo o que Windows Live™ pode oferecer.
    http://www.microsoft.com/windows/windowslive/

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erick Erickson at Jan 18, 2009 at 12:20 am
    Probably the easiest way to do this would be to index all
    the terms in the same field with a large increment gap between.
    See Analyzer.getPositionIncrementGap (you'll have to create
    your own analyzer here, probably just subclassing one of the
    existing ones).

    Once things are indexed that way, then you can do, say, SpanQueries
    or even proximity queries (i.e. "yellow sell"~5).

    This sounds a bit like gibberish, but bear with me. Let's say you have
    overridden an analyzer and return an increment gap of 100. Now say you
    index as follows (pseudo code).

    Document doc = new Document()
    doc.add(new Field("field", "house", ...))
    doc.add(new Field("field", "yellow ball", ...))
    doc.add(new Field("field'', "yellow sell", ...))
    doc.add(new Field("field", "ball star", ...))
    doc.add(new Field("field", "home xyz", ...))
    IndexerWriter.addDocument(doc)


    Now, here are (roughly), your term positions
    house - 1
    yellow - 102
    ball - 103
    yellow - 204
    sell - 205
    ball - 306
    star - 307
    home - 408
    xyz - 409

    The bump comes because each time you call doc.add, if it's already been
    called before on that document, a call is made to getPositionIncrementGap
    and the return value is added to the offset of the first token.

    Now if you choose a large enough increment gap and make your proximity
    searchers
    require that all the terms are within *less* than that gap, you should be
    fine.

    Best
    Erick

    P.S. Both messages came through, so I have no idea why you got your message,
    you might check your local server.
    On Sat, Jan 17, 2009 at 2:35 PM, Haroldo Nascimento wrote:


    Hi,

    I have a problem to do searches in fields tokenized.
    Initially I had associated with an advertisement 10 terms and for each term
    corresponded to one field in my index and the query had operations OR for
    the 10 fields.
    Now, the advertisements have more than 2,000 terms and the current
    solution (to create 2,000 fields) not works.
    I think in create only field, that contens all terms tokenized with ";"
    for example. How I can do search in a field that contains tokenized fields
    or exists another solution for this problem?

    Example:
    advertise_id = "00001"
    terms[2000]:
    1- "home work"
    2- "house"
    3- "yellow green ball sell"
    4- "star sports"
    5- "tennis ball new"
    ...
    2000- "xyz"
    My unique field contains: "home work; house; yellow green ball sell; star
    sports; tennis ball new; ... ; xyz;"
    If my query is:
    query= "house" -> result = 1
    query= "yellow ball" -> result = 1
    query= "yellow sell" -> result = 1
    query= "ball star" -> result = 0 (no has result)
    query= "home xyz" -> result = 0 (no has result)

    Haroldo

    _________________________________________________________________
    Mais do que emails! Confira tudo o que Windows Live™ pode oferecer.
    http://www.microsoft.com/windows/windowslive/

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJan 17, '09 at 7:35p
activeJan 19, '09 at 10:45a
posts4
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase