FAQ
Hello,



I have the following problem with my lucene index.



When indexing fields containing special characters (like &), a blank space
is inserted before the special character. For example: the content
"L'article" is indexed as "L '" (with a blank space between 'L' and
'&').



Is there any way to avoid that?



The characteristics of my field are the following: Indexed, Tokenized,
Stored and Term Vector.



Thanks in advance for your help,



Leire

Search Discussions

  • Leire Urcelay at Nov 5, 2007 at 12:11 pm
    Sorry, I did a mistake in my previous email.
    The field "L'article" is indexed as "L 'article". The blank space is
    inserted between 'L' and ''article'.

    Thanks,

    Leire

    -----Message d'origine-----
    De : Leire Urcelay
    Envoyé : lundi, 5. novembre 2007 13:02
    À : java-user@lucene.apache.org
    Objet : blank space before special characters

    Hello,

    I have the following problem with my lucene index.

    When indexing fields containing special characters (like &), a blank space
    is inserted before the special character. For example: the content
    "L'article" is indexed as "L '" (with a blank space between 'L' and
    '&').

    Is there any way to avoid that?

    The characteristics of my field are the following: Indexed, Tokenized,
    Stored and Term Vector.

    Thanks in advance for your help,

    Leire



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erick Erickson at Nov 5, 2007 at 6:35 pm
    There are several issues here....
    1> How are you getting the entity reference? You must be encoding
    the stream (or getting it encoded for you). So the first thing I'd do
    is un-encode it.
    2> After that, it's a question of what Filters/Analyzers you're using.
    Take a look at ISOLatin1AccentFilter. I'm unclear whether it "closes up" the
    case you're looking at, so be sure to check.
    3> Since my peculiar situation can't use the Filter (the character
    set I'm using isn't standard), I've pre-processed the input (both at
    index and query time) to substitute the empty string for the
    apostrophe

    Hope this helps
    Erick
    On 11/5/07, Leire Urcelay wrote:

    Sorry, I did a mistake in my previous email.
    The field "L'article" is indexed as "L 'article". The blank space is
    inserted between 'L' and ''article'.

    Thanks,

    Leire

    -----Message d'origine-----
    De: Leire Urcelay
    Envoyé: lundi, 5. novembre 2007 13:02
    À: java-user@lucene.apache.org
    Objet: blank space before special characters

    Hello,

    I have the following problem with my lucene index.

    When indexing fields containing special characters (like &), a blank space
    is inserted before the special character. For example: the content
    "L'article" is indexed as "L '" (with a blank space between 'L' and
    '&').

    Is there any way to avoid that?

    The characteristics of my field are the following: Indexed, Tokenized,
    Stored and Term Vector.

    Thanks in advance for your help,

    Leire



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedNov 5, '07 at 12:03p
activeNov 5, '07 at 6:35p
posts3
users2
websitelucene.apache.org

2 users in discussion

Leire Urcelay: 2 posts Erick Erickson: 1 post

People

Translate

site design / logo © 2022 Grokbase