FAQ
I am indexing some text in a java object that is "%772B" with the
standard analyser and Lucene 2.

Should I be able to search for this with the same text as the query, or
do I need to do any escaping of characters?

Thanks

Adrian

-----------------------------------------
This message (including any attachments) may contain confidential
information intended for a specific individual and purpose. If you
are not the intended recipient, delete this message. If you are
not the intended recipient, disclosing, copying, distributing, or
taking any action based on this message is strictly prohibited.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Search Discussions

  • Erick Erickson at Aug 10, 2006 at 5:34 pm
    See below...
    On 8/10/06, Pillinger, Adrian wrote:

    I am indexing some text in a java object that is "%772B" with the
    standard analyser and Lucene 2.

    Should I be able to search for this with the same text as the query, or
    do I need to do any escaping of characters?

    probably not because I doubt that you'll have the '%' in the index (but I
    admit I don't know for sure). Get Luke and check to be sure (
    http://www.getopt.org/luke/). That will tell you exactly what is in the
    index. I suspect you'll find "772B" but the '%' will simply be absent.

    Also, watch capitalization. The StandardAnalyzer lowercases your stream as I
    remember....

    You probably want a different analyzer fot *both* indexing and searching if
    you really need to search such strings, try WhitespaceAnalyzer and perhaps
    store your values UN_TOKENIZED (but watch that latter, this assumes you're
    controlling your tokens yourself and not relying on the analyzer to break up
    your input stream).

    And if you want to look treat different fields differently, think about
    PerFieldAnalyzerWrapper.

    Best
    Erick


    Thanks
    Adrian

    -----------------------------------------
    This message (including any attachments) may contain confidential
    information intended for a specific individual and purpose. If you
    are not the intended recipient, delete this message. If you are
    not the intended recipient, disclosing, copying, distributing, or
    taking any action based on this message is strictly prohibited.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Martin Braun at Aug 11, 2006 at 5:24 am
    Hello Adrian,
    I am indexing some text in a java object that is "%772B" with the
    standard analyser and Lucene 2.

    Should I be able to search for this with the same text as the query, or
    do I need to do any escaping of characters?
    Besides Luke there are the AnalyzerUtils from the LIA book, (you can
    download the source code examples here:
    http://www.lucenebook.com/LuceneInAction.zip

    You'll just have to customize the test-class and you'll get an output
    like this:


    Analzying "%772B"
    org.apache.lucene.analysis.standard.StandardAnalyzer:
    [772b]


    1: [772b:1->5:<ALPHANUM>]

    1: [772b]


    Analzying "%772B"
    org.apache.lucene.analysis.KeywordAnalyzer:
    [%772B]


    1: [%772B:0->5:word]

    1: [%772B]

    hth,
    martin


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Erik Hatcher at Aug 11, 2006 at 1:33 pm

    On Aug 11, 2006, at 1:23 AM, Martin Braun wrote:
    Hello Adrian,
    I am indexing some text in a java object that is "%772B" with the
    standard analyser and Lucene 2.

    Should I be able to search for this with the same text as the
    query, or
    do I need to do any escaping of characters?
    Besides Luke there are the AnalyzerUtils from the LIA book, (you can
    download the source code examples here:
    http://www.lucenebook.com/LuceneInAction.zip
    You can also try out analysis just using "ant AnalyzerDemo", like this:

    $ ant AnalyzerDemo
    Buildfile: build.xml

    check-environment:

    compile:

    build-test-index:

    build-perf-index:

    prepare:

    AnalyzerDemo:
    [echo]
    [echo] Demonstrates analysis of sample text.
    [echo]
    [echo] Refer to the "Analysis" chapter for much more on this
    [echo] extremely crucial topic.
    [echo]
    [input] Press return to continue...

    [input] String to analyze: [This string will be analyzed.]
    %772B
    [echo] Running lia.analysis.AnalyzerDemo...
    [java] Analyzing "%772B"
    [java] WhitespaceAnalyzer:
    [java] [%772B]

    [java] SimpleAnalyzer:
    [java] [b]

    [java] StopAnalyzer:
    [java] [b]

    [java] StandardAnalyzer:
    [java] [772b]


    BUILD SUCCESSFUL
    Total time: 7 seconds


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Adrian Pillinger at Aug 14, 2006 at 7:28 am
    Thanks for the replies on my question.

    In the end I've taken the StandardAnalyser grammar, modified it and
    generated a new analyser with JavaCC. Seems to be working a treat!

    Adrian
    On 11 Aug 2006, at 14:32, Erik Hatcher wrote:

    On Aug 11, 2006, at 1:23 AM, Martin Braun wrote:
    Hello Adrian,
    I am indexing some text in a java object that is "%772B" with the
    standard analyser and Lucene 2.

    Should I be able to search for this with the same text as the
    query, or
    do I need to do any escaping of characters?
    Besides Luke there are the AnalyzerUtils from the LIA book, (you can
    download the source code examples here:
    http://www.lucenebook.com/LuceneInAction.zip
    You can also try out analysis just using "ant AnalyzerDemo", like
    this:

    $ ant AnalyzerDemo
    Buildfile: build.xml

    check-environment:

    compile:

    build-test-index:

    build-perf-index:

    prepare:

    AnalyzerDemo:
    [echo]
    [echo] Demonstrates analysis of sample text.
    [echo]
    [echo] Refer to the "Analysis" chapter for much more on
    this
    [echo] extremely crucial topic.
    [echo]
    [input] Press return to continue...

    [input] String to analyze: [This string will be analyzed.]
    %772B
    [echo] Running lia.analysis.AnalyzerDemo...
    [java] Analyzing "%772B"
    [java] WhitespaceAnalyzer:
    [java] [%772B]

    [java] SimpleAnalyzer:
    [java] [b]

    [java] StopAnalyzer:
    [java] [b]

    [java] StandardAnalyzer:
    [java] [772b]


    BUILD SUCCESSFUL
    Total time: 7 seconds


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]

    -----------------------------------------
    This message (including any attachments) may contain confidential
    information intended for a specific individual and purpose. If you
    are not the intended recipient, delete this message. If you are
    not the intended recipient, disclosing, copying, distributing, or
    taking any action based on this message is strictly prohibited.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedAug 10, '06 at 3:50p
activeAug 14, '06 at 7:28a
posts5
users4
websitelucene.apache.org

People

Translate

site design / logo © 2023 Grokbase