FAQ
Hi All,

I am implementing a search function for address by hibernate search which is based on lucene. The class definition as following:

@Indexed
public class Address implements Cloneable
{
@DocumentId
private int id;
@Field
private String addrCountry;
private String addrDesc;
@Field
private String addrLineOne;
private String addrLineTwo;
@Field
private String addrCity;
......

As you see, addrCountry, addrLineone and addrCity are fields for search. I am using default analyzer in index & search. So I think country name like United States would be indexed as two terms United, and states.

In addition, during search, a search keyword like United states, or Salt lake city would be tokenized as two or three single words.

As result, any address fields contain united, city would be returned. like United Kingdom, but actually I want to get a result of united states.

My expected result as following:

if someone searches for "united" it should return "united states" and "united kingdom".

if someone searches for "united states" it should return "united states", and not "united kingdom".

I hope the analyzer can generate term with multiple words. say, united states to united states. I think standardanalyzer would analyze united states to united and states?

A different example: if search keyword is parking lot in Salt Lake City, the generated terms to search need to be: parking lot and Salt Lake City, not parking,lot,salt,lake and city.

I wonder if any analyzer can help me to implement my requirement. It would be better to use dictionary based solution, then I can manage some search terms that could have multiple words.

thanks

Ian

Search Discussions

  • Samarendra Pratap at Apr 16, 2010 at 1:03 pm
    Hi. I don't think you need a different analyzer. Read about
    PhraseQuery<http://lucene.apache.org/java/2_9_0/api/core/org/apache/lucene/search/PhraseQuery.html>.
    If you are using parse() method of QueryParser. Enclose the searched string
    in extra double quotes, which must obviously be escaped.

    Query q = qp.parse("\"united states\"");


    2010/4/15 Ian.huang <yiwong2001@hotmail.com>
    Hi All,

    I am implementing a search function for address by hibernate search which
    is based on lucene. The class definition as following:

    @Indexed
    public class Address implements Cloneable
    {
    @DocumentId
    private int id;
    @Field
    private String addrCountry;
    private String addrDesc;
    @Field
    private String addrLineOne;
    private String addrLineTwo;
    @Field
    private String addrCity;
    ......

    As you see, addrCountry, addrLineone and addrCity are fields for search. I
    am using default analyzer in index & search. So I think country name like
    United States would be indexed as two terms United, and states.

    In addition, during search, a search keyword like United states, or Salt
    lake city would be tokenized as two or three single words.

    As result, any address fields contain united, city would be returned. like
    United Kingdom, but actually I want to get a result of united states.

    My expected result as following:

    if someone searches for "united" it should return "united states" and
    "united kingdom".

    if someone searches for "united states" it should return "united states",
    and not "united kingdom".

    I hope the analyzer can generate term with multiple words. say, united
    states to united states. I think standardanalyzer would analyze united
    states to united and states?

    A different example: if search keyword is parking lot in Salt Lake City,
    the generated terms to search need to be: parking lot and Salt Lake City,
    not parking,lot,salt,lake and city.

    I wonder if any analyzer can help me to implement my requirement. It would
    be better to use dictionary based solution, then I can manage some search
    terms that could have multiple words.

    thanks

    Ian



    --
    Regards,
    Samar
  • Ian.huang at Apr 19, 2010 at 10:03 am
    Does a token of "united states" exist in index if using standard analyzer.
    My understanding is, united and states are separately stored in index, but
    not as "united states". So, if I build a query like Query q =
    qp.parse("\"united states\""); It would not return any result. Am I right?

    Ian

    --------------------------------------------------
    From: "Samarendra Pratap" <samarzone@gmail.com>
    Sent: Friday, April 16, 2010 9:02 PM
    To: <java-user@lucene.apache.org>
    Subject: Re: about analyzer for searching location
    Hi. I don't think you need a different analyzer. Read about
    PhraseQuery<http://lucene.apache.org/java/2_9_0/api/core/org/apache/lucene/search/PhraseQuery.html>.
    If you are using parse() method of QueryParser. Enclose the searched
    string
    in extra double quotes, which must obviously be escaped.

    Query q = qp.parse("\"united states\"");


    2010/4/15 Ian.huang <yiwong2001@hotmail.com>
    Hi All,

    I am implementing a search function for address by hibernate search which
    is based on lucene. The class definition as following:

    @Indexed
    public class Address implements Cloneable
    {
    @DocumentId
    private int id;
    @Field
    private String addrCountry;
    private String addrDesc;
    @Field
    private String addrLineOne;
    private String addrLineTwo;
    @Field
    private String addrCity;
    ......

    As you see, addrCountry, addrLineone and addrCity are fields for search.
    I
    am using default analyzer in index & search. So I think country name like
    United States would be indexed as two terms United, and states.

    In addition, during search, a search keyword like United states, or Salt
    lake city would be tokenized as two or three single words.

    As result, any address fields contain united, city would be returned.
    like
    United Kingdom, but actually I want to get a result of united states.

    My expected result as following:

    if someone searches for "united" it should return "united states" and
    "united kingdom".

    if someone searches for "united states" it should return "united states",
    and not "united kingdom".

    I hope the analyzer can generate term with multiple words. say, united
    states to united states. I think standardanalyzer would analyze united
    states to united and states?

    A different example: if search keyword is parking lot in Salt Lake City,
    the generated terms to search need to be: parking lot and Salt Lake City,
    not parking,lot,salt,lake and city.

    I wonder if any analyzer can help me to implement my requirement. It
    would
    be better to use dictionary based solution, then I can manage some search
    terms that could have multiple words.

    thanks

    Ian



    --
    Regards,
    Samar
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Samarendra Pratap at Apr 19, 2010 at 10:32 am
    Well... you are 50% right.

    when you write
    *
    *
    * Query q = qp.parse("\"united states\"");*

    It does search for two separate tokens "united" and "states" but checks if
    those are written sequentially. So above search will search for documents
    where token "states" is written after "united".

    *Note* that since it checks tokens sequentially it may also find documents
    where some non-tokenizable characters or stop words exist between "united"
    and "states", e.g. - *united and states *(here "and" is a stop word).

    TermQuery will work it the way you said in your reply, i.e. will search for
    a token "united states" which is not what you want.


    On Mon, Apr 19, 2010 at 3:33 PM, Ian.huang wrote:

    Does a token of "united states" exist in index if using standard analyzer.
    My understanding is, united and states are separately stored in index, but
    not as "united states". So, if I build a query like Query q =
    qp.parse("\"united states\""); It would not return any result. Am I right?

    Ian

    --------------------------------------------------
    From: "Samarendra Pratap" <samarzone@gmail.com>
    Sent: Friday, April 16, 2010 9:02 PM
    To: <java-user@lucene.apache.org>
    Subject: Re: about analyzer for searching location

    Hi. I don't think you need a different analyzer. Read about
    PhraseQuery<
    http://lucene.apache.org/java/2_9_0/api/core/org/apache/lucene/search/PhraseQuery.html
    .
    If you are using parse() method of QueryParser. Enclose the searched
    string
    in extra double quotes, which must obviously be escaped.

    Query q = qp.parse("\"united states\"");


    2010/4/15 Ian.huang <yiwong2001@hotmail.com>

    Hi All,
    I am implementing a search function for address by hibernate search which
    is based on lucene. The class definition as following:

    @Indexed
    public class Address implements Cloneable
    {
    @DocumentId
    private int id;
    @Field
    private String addrCountry;
    private String addrDesc;
    @Field
    private String addrLineOne;
    private String addrLineTwo;
    @Field
    private String addrCity;
    ......

    As you see, addrCountry, addrLineone and addrCity are fields for search.
    I
    am using default analyzer in index & search. So I think country name like
    United States would be indexed as two terms United, and states.

    In addition, during search, a search keyword like United states, or Salt
    lake city would be tokenized as two or three single words.

    As result, any address fields contain united, city would be returned.
    like
    United Kingdom, but actually I want to get a result of united states.

    My expected result as following:

    if someone searches for "united" it should return "united states" and
    "united kingdom".

    if someone searches for "united states" it should return "united states",
    and not "united kingdom".

    I hope the analyzer can generate term with multiple words. say, united
    states to united states. I think standardanalyzer would analyze united
    states to united and states?

    A different example: if search keyword is parking lot in Salt Lake City,
    the generated terms to search need to be: parking lot and Salt Lake City,
    not parking,lot,salt,lake and city.

    I wonder if any analyzer can help me to implement my requirement. It
    would
    be better to use dictionary based solution, then I can manage some search
    terms that could have multiple words.

    thanks

    Ian



    --
    Regards,
    Samar
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    Regards,
    Samar

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedApr 15, '10 at 10:30a
activeApr 19, '10 at 10:32a
posts4
users2
websitelucene.apache.org

2 users in discussion

Samarendra Pratap: 2 posts Ian.huang: 2 posts

People

Translate

site design / logo © 2022 Grokbase