FAQ
Hi there,
I am using WhiteSpaceAnalyser to index documents. I have used this because i
need to split tokens based on space only. Also Tokensized=true
While indexing what does it do with special characters like + - && || ! ( )
{ } [ ] ^ " ~ * ? : \, will these characters be indexed or will be chopped
off? I am confused about this.

Now i am having problem while searching as well..
for query strings like "jason dartling (e-mail)" and "re: fyi.dat", i don't
have to escape the special characters ( , ) and : but for input such as
"re:" queryParser is producing error so i have escaped characters here.
So it seems like i have two cases to deal with..
Can anyone suggest me one generic way to deal with both the cases?

Basically how to index and search string with escape characters will be my
generalized question?


Please help me
miztaken





--
View this message in context: http://www.nabble.com/Issues-with-Special-Characters-tp19511428p19511428.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Erick Erickson at Sep 16, 2008 at 1:46 pm
    You can easily answer the questions about what WhitespaceTokenizer
    produces by getting a copy of Luke and looking at your index. Or writing
    a really simple test program that prints out tokens.

    At the bottom of this page is a list of special characters for escaping:
    http://lucene.apache.org/java/docs/queryparsersyntax.html

    Best
    Erick
    On Tue, Sep 16, 2008 at 9:05 AM, miztaken wrote:


    Hi there,
    I am using WhiteSpaceAnalyser to index documents. I have used this because
    i
    need to split tokens based on space only. Also Tokensized=true
    While indexing what does it do with special characters like + - && || ! ( )
    { } [ ] ^ " ~ * ? : \, will these characters be indexed or will be chopped
    off? I am confused about this.

    Now i am having problem while searching as well..
    for query strings like "jason dartling (e-mail)" and "re: fyi.dat", i don't
    have to escape the special characters ( , ) and : but for input such as
    "re:" queryParser is producing error so i have escaped characters here.
    So it seems like i have two cases to deal with..
    Can anyone suggest me one generic way to deal with both the cases?

    Basically how to index and search string with escape characters will be my
    generalized question?


    Please help me
    miztaken





    --
    View this message in context:
    http://www.nabble.com/Issues-with-Special-Characters-tp19511428p19511428.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Miztaken at Sep 16, 2008 at 1:50 pm
    Hi there,
    I will check that out but what do you suggest for searching??
    without escaping works for query string "fw: fyi.dat" but i have to escape :
    char for query string "fw:" so i am having two cases?

    Please help me




    Erick Erickson wrote:
    You can easily answer the questions about what WhitespaceTokenizer
    produces by getting a copy of Luke and looking at your index. Or writing
    a really simple test program that prints out tokens.

    At the bottom of this page is a list of special characters for escaping:
    http://lucene.apache.org/java/docs/queryparsersyntax.html

    Best
    Erick
    On Tue, Sep 16, 2008 at 9:05 AM, miztaken wrote:


    Hi there,
    I am using WhiteSpaceAnalyser to index documents. I have used this
    because
    i
    need to split tokens based on space only. Also Tokensized=true
    While indexing what does it do with special characters like + - && || ! (
    )
    { } [ ] ^ " ~ * ? : \, will these characters be indexed or will be
    chopped
    off? I am confused about this.

    Now i am having problem while searching as well..
    for query strings like "jason dartling (e-mail)" and "re: fyi.dat", i
    don't
    have to escape the special characters ( , ) and : but for input such as
    "re:" queryParser is producing error so i have escaped characters here.
    So it seems like i have two cases to deal with..
    Can anyone suggest me one generic way to deal with both the cases?

    Basically how to index and search string with escape characters will be
    my
    generalized question?


    Please help me
    miztaken





    --
    View this message in context:
    http://www.nabble.com/Issues-with-Special-Characters-tp19511428p19511428.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    View this message in context: http://www.nabble.com/Issues-with-Special-Characters-tp19511428p19512277.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erick Erickson at Sep 16, 2008 at 2:15 pm
    I question whether your example actually searches what you
    think. What I suggest is that you get a copy of Luke and
    look at what your queries actually produce. That'll give you
    a much better idea of what happens under the covers.

    Also, query.toString() is your friend. Try printing out your
    queries (again, I recommend a small test program) to get
    a better feel for what each analyzer does to your queries.

    Of course you have to escape the ':' character when it does
    not refer to a field. How else could you imagine that the
    query parser can distinguish between a field designation
    and a field value? Consider the query:

    f:stuff g:morestuff

    The parser has to understand that the ':' separates the
    field 'f' from the value 'stuff' and that 'g' is another field. So
    if you want to parse
    f:stuff g:somemore:stuff
    the parser gets all confused by whether the ':' between
    somemore and stuff is a field delimiter or a value, unless
    it's escaped.

    Best
    Erick
    On Tue, Sep 16, 2008 at 9:49 AM, miztaken wrote:


    Hi there,
    I will check that out but what do you suggest for searching??
    without escaping works for query string "fw: fyi.dat" but i have to escape
    :
    char for query string "fw:" so i am having two cases?

    Please help me




    Erick Erickson wrote:
    You can easily answer the questions about what WhitespaceTokenizer
    produces by getting a copy of Luke and looking at your index. Or writing
    a really simple test program that prints out tokens.

    At the bottom of this page is a list of special characters for escaping:
    http://lucene.apache.org/java/docs/queryparsersyntax.html

    Best
    Erick
    On Tue, Sep 16, 2008 at 9:05 AM, miztaken wrote:


    Hi there,
    I am using WhiteSpaceAnalyser to index documents. I have used this
    because
    i
    need to split tokens based on space only. Also Tokensized=true
    While indexing what does it do with special characters like + - && || !
    (
    )
    { } [ ] ^ " ~ * ? : \, will these characters be indexed or will be
    chopped
    off? I am confused about this.

    Now i am having problem while searching as well..
    for query strings like "jason dartling (e-mail)" and "re: fyi.dat", i
    don't
    have to escape the special characters ( , ) and : but for input such as
    "re:" queryParser is producing error so i have escaped characters here.
    So it seems like i have two cases to deal with..
    Can anyone suggest me one generic way to deal with both the cases?

    Basically how to index and search string with escape characters will be
    my
    generalized question?


    Please help me
    miztaken





    --
    View this message in context:
    http://www.nabble.com/Issues-with-Special-Characters-tp19511428p19511428.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    View this message in context:
    http://www.nabble.com/Issues-with-Special-Characters-tp19511428p19512277.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Miztaken at Sep 16, 2008 at 4:21 pm
    Hi,
    I tested sample application with Luke as well.
    I am using .NEt Version of Lucene (2.0.0.4) and i think i am getting error
    due to that.

    When i tested my queries with luke then its working fine and getting me
    output as desired but then i used lucene API available for .NET then its
    producing error.

    I have a field name key and i have inserted 4 documents with following
    values
    fw:
    fw: fyi.msg
    hatti ghoda
    hatti ghoda (e-mail)

    in luke i did following queries:

    Q1: key:"fw\: fyi.msg" //with escape char //OK
    Q2: key:"fw: fyi.msg" //without escape char //OK
    Q3: key:fw\: //with escape char //OK
    Q4: key:fw: //without escape char //NOT OK
    Q5: key:"hatti ghoda (e-mail)" //without escape char //OK
    Q6: key:"hatti ghoda \(e\-mail\)" //with escape char //OK

    And with .NET API the behaviour is different.
    Q1 doesnt work
    Q2 works
    Q3 works
    Q4 throws exception
    Q5 works
    Q6 doesnt work

    Any suggestions?



    --
    View this message in context: http://www.nabble.com/Issues-with-Special-Characters-tp19511428p19515241.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erick Erickson at Sep 16, 2008 at 4:28 pm
    Um, ask over on the .NET user group?

    Erick
    On Tue, Sep 16, 2008 at 12:20 PM, miztaken wrote:


    Hi,
    I tested sample application with Luke as well.
    I am using .NEt Version of Lucene (2.0.0.4) and i think i am getting error
    due to that.

    When i tested my queries with luke then its working fine and getting me
    output as desired but then i used lucene API available for .NET then its
    producing error.

    I have a field name key and i have inserted 4 documents with following
    values
    fw:
    fw: fyi.msg
    hatti ghoda
    hatti ghoda (e-mail)

    in luke i did following queries:

    Q1: key:"fw\: fyi.msg" //with escape char //OK
    Q2: key:"fw: fyi.msg" //without escape char //OK
    Q3: key:fw\: //with escape char //OK
    Q4: key:fw: //without escape char //NOT OK
    Q5: key:"hatti ghoda (e-mail)" //without escape char //OK
    Q6: key:"hatti ghoda \(e\-mail\)" //with escape char //OK

    And with .NET API the behaviour is different.
    Q1 doesnt work
    Q2 works
    Q3 works
    Q4 throws exception
    Q5 works
    Q6 doesnt work

    Any suggestions?



    --
    View this message in context:
    http://www.nabble.com/Issues-with-Special-Characters-tp19511428p19515241.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedSep 16, '08 at 1:05p
activeSep 16, '08 at 4:28p
posts6
users2
websitelucene.apache.org

2 users in discussion

Miztaken: 3 posts Erick Erickson: 3 posts

People

Translate

site design / logo © 2022 Grokbase