FAQ
Hello,

I tried org.apache.lucene.analysis.fr.FrenchAnalyzer and I got strange
search results on strings in uppercase. (example : VEHICLE)
When I search the string (in lower case), I get no result. I get results if
I use "vehicle*" or "vehiclE", or "vehicLe" etc.

What is odd is that it affects only some of the strings, not all of them.
Anyone who has ever experienced this problem?

Thanks,
Florian
--
View this message in context: http://www.nabble.com/Very-odd-behaviour-of-FrenchAnalyzer-with-strings-in-capital-letters-tf3789153.html#a10715673
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Erick Erickson at May 21, 2007 at 1:27 pm
    First have you gotten a copy of Luke to examine your index to see
    what's actually indexed?

    The default behavior is usually to lowercase everything, but I'm not
    entirely sure if the French analyzer does this. But I suspect so.

    Searches are case sensitive. To get caseless searching, you need
    to put everything in the same case. This is usually done for you with
    any of the standard analyzers, but check specifically.

    Are you using the same analyzer at index AND search time?

    Best
    Erick
    On 5/21/07, Jolinar13 wrote:


    Hello,

    I tried org.apache.lucene.analysis.fr.FrenchAnalyzer and I got strange
    search results on strings in uppercase. (example : VEHICLE)
    When I search the string (in lower case), I get no result. I get results
    if
    I use "vehicle*" or "vehiclE", or "vehicLe" etc.

    What is odd is that it affects only some of the strings, not all of them.
    Anyone who has ever experienced this problem?

    Thanks,
    Florian
    --
    View this message in context:
    http://www.nabble.com/Very-odd-behaviour-of-FrenchAnalyzer-with-strings-in-capital-letters-tf3789153.html#a10715673
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jolinar13 at May 21, 2007 at 2:00 pm
    Hello,
    Thank you for your quick answer.
    I use Luke to examine the index, but since I switched to FrenchAnalyzer, it
    says 'Not a Lucene index'.
    If I open the index files in a text viewer, the strings are in UPPER case.
    I do use the same analyzer to index and search.
    So, do I have to specify the FrenchAnalyzer not to be case sensitive? How to
    do that?
    Thanks a lot
    Florian


    Erick Erickson wrote:
    First have you gotten a copy of Luke to examine your index to see
    what's actually indexed?

    The default behavior is usually to lowercase everything, but I'm not
    entirely sure if the French analyzer does this. But I suspect so.

    Searches are case sensitive. To get caseless searching, you need
    to put everything in the same case. This is usually done for you with
    any of the standard analyzers, but check specifically.

    Are you using the same analyzer at index AND search time?

    Best
    Erick
    On 5/21/07, Jolinar13 wrote:


    Hello,

    I tried org.apache.lucene.analysis.fr.FrenchAnalyzer and I got strange
    search results on strings in uppercase. (example : VEHICLE)
    When I search the string (in lower case), I get no result. I get results
    if
    I use "vehicle*" or "vehiclE", or "vehicLe" etc.

    What is odd is that it affects only some of the strings, not all of them.
    Anyone who has ever experienced this problem?

    Thanks,
    Florian
    --
    View this message in context:
    http://www.nabble.com/Very-odd-behaviour-of-FrenchAnalyzer-with-strings-in-capital-letters-tf3789153.html#a10715673
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    View this message in context: http://www.nabble.com/Very-odd-behaviour-of-FrenchAnalyzer-with-strings-in-capital-letters-tf3789153.html#a10719413
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jolinar13 at May 28, 2007 at 12:59 pm
    Hello Erick,
    Still no idea about my problem?
    Anybody here using the FrenchAnalyzer?
    Thanks,
    Florian


    Jolinar13 wrote:
    Hello,
    Thank you for your quick answer.
    I use Luke to examine the index, but since I switched to FrenchAnalyzer,
    it says 'Not a Lucene index'.
    If I open the index files in a text viewer, the strings are in UPPER case.
    I do use the same analyzer to index and search.
    So, do I have to specify the FrenchAnalyzer not to be case sensitive? How
    to do that?
    Thanks a lot
    Florian


    Erick Erickson wrote:
    First have you gotten a copy of Luke to examine your index to see
    what's actually indexed?

    The default behavior is usually to lowercase everything, but I'm not
    entirely sure if the French analyzer does this. But I suspect so.

    Searches are case sensitive. To get caseless searching, you need
    to put everything in the same case. This is usually done for you with
    any of the standard analyzers, but check specifically.

    Are you using the same analyzer at index AND search time?

    Best
    Erick
    On 5/21/07, Jolinar13 wrote:


    Hello,

    I tried org.apache.lucene.analysis.fr.FrenchAnalyzer and I got strange
    search results on strings in uppercase. (example : VEHICLE)
    When I search the string (in lower case), I get no result. I get results
    if
    I use "vehicle*" or "vehiclE", or "vehicLe" etc.

    What is odd is that it affects only some of the strings, not all of
    them.
    Anyone who has ever experienced this problem?

    Thanks,
    Florian
    --
    View this message in context:
    http://www.nabble.com/Very-odd-behaviour-of-FrenchAnalyzer-with-strings-in-capital-letters-tf3789153.html#a10715673
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    View this message in context: http://www.nabble.com/Very-odd-behaviour-of-FrenchAnalyzer-with-strings-in-capital-letters-tf3789153.html#a10835636
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark Miller at May 28, 2007 at 1:10 pm
    FrenchAnalyzer does lowercase and using it would not in anyway alter
    Lukes ability to read your index.

    - Mark

    Jolinar13 wrote:
    Hello Erick,
    Still no idea about my problem?
    Anybody here using the FrenchAnalyzer?
    Thanks,
    Florian


    Jolinar13 wrote:
    Hello,
    Thank you for your quick answer.
    I use Luke to examine the index, but since I switched to FrenchAnalyzer,
    it says 'Not a Lucene index'.
    If I open the index files in a text viewer, the strings are in UPPER case.
    I do use the same analyzer to index and search.
    So, do I have to specify the FrenchAnalyzer not to be case sensitive? How
    to do that?
    Thanks a lot
    Florian


    Erick Erickson wrote:
    First have you gotten a copy of Luke to examine your index to see
    what's actually indexed?

    The default behavior is usually to lowercase everything, but I'm not
    entirely sure if the French analyzer does this. But I suspect so.

    Searches are case sensitive. To get caseless searching, you need
    to put everything in the same case. This is usually done for you with
    any of the standard analyzers, but check specifically.

    Are you using the same analyzer at index AND search time?

    Best
    Erick
    On 5/21/07, Jolinar13 wrote:

    Hello,

    I tried org.apache.lucene.analysis.fr.FrenchAnalyzer and I got strange
    search results on strings in uppercase. (example : VEHICLE)
    When I search the string (in lower case), I get no result. I get results
    if
    I use "vehicle*" or "vehiclE", or "vehicLe" etc.

    What is odd is that it affects only some of the strings, not all of
    them.
    Anyone who has ever experienced this problem?

    Thanks,
    Florian
    --
    View this message in context:
    http://www.nabble.com/Very-odd-behaviour-of-FrenchAnalyzer-with-strings-in-capital-letters-tf3789153.html#a10715673
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jolinar13 at May 28, 2007 at 1:51 pm
    Hello Mark!
    Thank you a lot for your answer.
    You are right for the Luke part. My Luke version was too old. My bad.
    But with Luke I still observe the problem I described.
    Any idea how to sort this out?
    Thank you
    Florian
    I got strange
    search results on strings in uppercase. (example : VEHICLE)
    When I search the string (in lower case), I get no result. I get
    results
    if
    I use "vehicle*" or "vehiclE", or "vehicLe" etc.

    What is odd is that it affects only some of the strings, not all of
    them.

    markrmiller wrote:
    FrenchAnalyzer does lowercase and using it would not in anyway alter
    Lukes ability to read your index.

    - Mark

    Jolinar13 wrote:
    Hello Erick,
    Still no idea about my problem?
    Anybody here using the FrenchAnalyzer?
    Thanks,
    Florian


    Jolinar13 wrote:
    Hello,
    Thank you for your quick answer.
    I use Luke to examine the index, but since I switched to FrenchAnalyzer,
    it says 'Not a Lucene index'.
    If I open the index files in a text viewer, the strings are in UPPER
    case.
    I do use the same analyzer to index and search.
    So, do I have to specify the FrenchAnalyzer not to be case sensitive?
    How
    to do that?
    Thanks a lot
    Florian


    Erick Erickson wrote:
    First have you gotten a copy of Luke to examine your index to see
    what's actually indexed?

    The default behavior is usually to lowercase everything, but I'm not
    entirely sure if the French analyzer does this. But I suspect so.

    Searches are case sensitive. To get caseless searching, you need
    to put everything in the same case. This is usually done for you with
    any of the standard analyzers, but check specifically.

    Are you using the same analyzer at index AND search time?

    Best
    Erick
    On 5/21/07, Jolinar13 wrote:

    Hello,

    I tried org.apache.lucene.analysis.fr.FrenchAnalyzer and I got strange
    search results on strings in uppercase. (example : VEHICLE)
    When I search the string (in lower case), I get no result. I get
    results
    if
    I use "vehicle*" or "vehiclE", or "vehicLe" etc.

    What is odd is that it affects only some of the strings, not all of
    them.
    Anyone who has ever experienced this problem?

    Thanks,
    Florian
    --
    View this message in context:
    http://www.nabble.com/Very-odd-behaviour-of-FrenchAnalyzer-with-strings-in-capital-letters-tf3789153.html#a10715673
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/Very-odd-behaviour-of-FrenchAnalyzer-with-strings-in-capital-letters-tf3789153.html#a10836580
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jolinar13 at May 28, 2007 at 2:01 pm
    Thanks to Luke, I realized my terms were not parsed correctly, and this has
    nothing to do with upper case!
    It seems to happen when the word ends with "ni". For example "giovanni" is
    parsed "giovann".
    Something about this?
    Florian


    Jolinar13 wrote:
    Hello Mark!
    Thank you a lot for your answer.
    You are right for the Luke part. My Luke version was too old. My bad.
    But with Luke I still observe the problem I described.
    Any idea how to sort this out?
    Maybe this has to do with the fact I use Compass?
    Thank you
    Florian
    I got strange
    search results on strings in uppercase. (example : VEHICLE)
    When I search the string (in lower case), I get no result. I get
    results
    if
    I use "vehicle*" or "vehiclE", or "vehicLe" etc.

    What is odd is that it affects only some of the strings, not all of
    them.

    markrmiller wrote:
    FrenchAnalyzer does lowercase and using it would not in anyway alter
    Lukes ability to read your index.

    - Mark

    Jolinar13 wrote:
    Hello Erick,
    Still no idea about my problem?
    Anybody here using the FrenchAnalyzer?
    Thanks,
    Florian


    Jolinar13 wrote:
    Hello,
    Thank you for your quick answer.
    I use Luke to examine the index, but since I switched to
    FrenchAnalyzer,
    it says 'Not a Lucene index'.
    If I open the index files in a text viewer, the strings are in UPPER
    case.
    I do use the same analyzer to index and search.
    So, do I have to specify the FrenchAnalyzer not to be case sensitive?
    How
    to do that?
    Thanks a lot
    Florian


    Erick Erickson wrote:
    First have you gotten a copy of Luke to examine your index to see
    what's actually indexed?

    The default behavior is usually to lowercase everything, but I'm not
    entirely sure if the French analyzer does this. But I suspect so.

    Searches are case sensitive. To get caseless searching, you need
    to put everything in the same case. This is usually done for you with
    any of the standard analyzers, but check specifically.

    Are you using the same analyzer at index AND search time?

    Best
    Erick
    On 5/21/07, Jolinar13 wrote:

    Hello,

    I tried org.apache.lucene.analysis.fr.FrenchAnalyzer and I got
    strange
    search results on strings in uppercase. (example : VEHICLE)
    When I search the string (in lower case), I get no result. I get
    results
    if
    I use "vehicle*" or "vehiclE", or "vehicLe" etc.

    What is odd is that it affects only some of the strings, not all of
    them.
    Anyone who has ever experienced this problem?

    Thanks,
    Florian
    --
    View this message in context:
    http://www.nabble.com/Very-odd-behaviour-of-FrenchAnalyzer-with-strings-in-capital-letters-tf3789153.html#a10715673
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/Very-odd-behaviour-of-FrenchAnalyzer-with-strings-in-capital-letters-tf3789153.html#a10836694
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jolinar13 at May 28, 2007 at 2:15 pm
    Some terms I tested :
    vehicle => all:vehicl
    vehiCle => all:vehicle
    Vehicle => all:vehicl
    VeHicle => all:vehicle
    VEHICLE => all:vehicle
    vehicles => all:vehicl
    paris => all:par
    :S


    Jolinar13 wrote:
    Thanks to Luke, I realized my terms were not parsed correctly, and this
    has nothing to do with upper case!
    It seems to happen when the word ends with "*i". For example "giovanni" is
    parsed "giovann".
    Something about this?
    Florian


    Jolinar13 wrote:
    Hello Mark!
    Thank you a lot for your answer.
    You are right for the Luke part. My Luke version was too old. My bad.
    But with Luke I still observe the problem I described.
    Any idea how to sort this out?
    Maybe this has to do with the fact I use Compass?
    Thank you
    Florian
    I got strange
    search results on strings in uppercase. (example : VEHICLE)
    When I search the string (in lower case), I get no result. I get
    results
    if
    I use "vehicle*" or "vehiclE", or "vehicLe" etc.

    What is odd is that it affects only some of the strings, not all of
    them.

    markrmiller wrote:
    FrenchAnalyzer does lowercase and using it would not in anyway alter
    Lukes ability to read your index.

    - Mark

    Jolinar13 wrote:
    Hello Erick,
    Still no idea about my problem?
    Anybody here using the FrenchAnalyzer?
    Thanks,
    Florian


    Jolinar13 wrote:
    Hello,
    Thank you for your quick answer.
    I use Luke to examine the index, but since I switched to
    FrenchAnalyzer,
    it says 'Not a Lucene index'.
    If I open the index files in a text viewer, the strings are in UPPER
    case.
    I do use the same analyzer to index and search.
    So, do I have to specify the FrenchAnalyzer not to be case sensitive?
    How
    to do that?
    Thanks a lot
    Florian


    Erick Erickson wrote:
    First have you gotten a copy of Luke to examine your index to see
    what's actually indexed?

    The default behavior is usually to lowercase everything, but I'm not
    entirely sure if the French analyzer does this. But I suspect so.

    Searches are case sensitive. To get caseless searching, you need
    to put everything in the same case. This is usually done for you with
    any of the standard analyzers, but check specifically.

    Are you using the same analyzer at index AND search time?

    Best
    Erick
    On 5/21/07, Jolinar13 wrote:

    Hello,

    I tried org.apache.lucene.analysis.fr.FrenchAnalyzer and I got
    strange
    search results on strings in uppercase. (example : VEHICLE)
    When I search the string (in lower case), I get no result. I get
    results
    if
    I use "vehicle*" or "vehiclE", or "vehicLe" etc.

    What is odd is that it affects only some of the strings, not all of
    them.
    Anyone who has ever experienced this problem?

    Thanks,
    Florian
    --
    View this message in context:
    http://www.nabble.com/Very-odd-behaviour-of-FrenchAnalyzer-with-strings-in-capital-letters-tf3789153.html#a10715673
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/Very-odd-behaviour-of-FrenchAnalyzer-with-strings-in-capital-letters-tf3789153.html#a10836893
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jolinar13 at May 28, 2007 at 2:28 pm
    It looks like it remove the letter in the end, if it ends with an 'a', 'e' or
    'i'.
    Femelles => all:femel
    Is this expected?
    How to use FrenchAnalyzer?
    Thanks
    Florian


    Jolinar13 wrote:
    Some terms I tested :
    vehicle => all:vehicl
    vehiCle => all:vehicle
    Vehicle => all:vehicl
    VeHicle => all:vehicle
    VEHICLE => all:vehicle
    vehicles => all:vehicl
    paris => all:par
    :S


    Jolinar13 wrote:
    Thanks to Luke, I realized my terms were not parsed correctly, and this
    has nothing to do with upper case!
    It seems to happen when the word ends with "*i". For example "giovanni"
    is parsed "giovann".
    Something about this?
    Florian


    Jolinar13 wrote:
    Hello Mark!
    Thank you a lot for your answer.
    You are right for the Luke part. My Luke version was too old. My bad.
    But with Luke I still observe the problem I described.
    Any idea how to sort this out?
    Maybe this has to do with the fact I use Compass?
    Thank you
    Florian
    I got strange
    search results on strings in uppercase. (example : VEHICLE)
    When I search the string (in lower case), I get no result. I get
    results
    if
    I use "vehicle*" or "vehiclE", or "vehicLe" etc.

    What is odd is that it affects only some of the strings, not all of
    them.

    markrmiller wrote:
    FrenchAnalyzer does lowercase and using it would not in anyway alter
    Lukes ability to read your index.

    - Mark

    Jolinar13 wrote:
    Hello Erick,
    Still no idea about my problem?
    Anybody here using the FrenchAnalyzer?
    Thanks,
    Florian


    Jolinar13 wrote:
    Hello,
    Thank you for your quick answer.
    I use Luke to examine the index, but since I switched to
    FrenchAnalyzer,
    it says 'Not a Lucene index'.
    If I open the index files in a text viewer, the strings are in UPPER
    case.
    I do use the same analyzer to index and search.
    So, do I have to specify the FrenchAnalyzer not to be case sensitive?
    How
    to do that?
    Thanks a lot
    Florian


    Erick Erickson wrote:
    First have you gotten a copy of Luke to examine your index to see
    what's actually indexed?

    The default behavior is usually to lowercase everything, but I'm not
    entirely sure if the French analyzer does this. But I suspect so.

    Searches are case sensitive. To get caseless searching, you need
    to put everything in the same case. This is usually done for you
    with
    any of the standard analyzers, but check specifically.

    Are you using the same analyzer at index AND search time?

    Best
    Erick
    On 5/21/07, Jolinar13 wrote:

    Hello,

    I tried org.apache.lucene.analysis.fr.FrenchAnalyzer and I got
    strange
    search results on strings in uppercase. (example : VEHICLE)
    When I search the string (in lower case), I get no result. I get
    results
    if
    I use "vehicle*" or "vehiclE", or "vehicLe" etc.

    What is odd is that it affects only some of the strings, not all of
    them.
    Anyone who has ever experienced this problem?

    Thanks,
    Florian
    --
    View this message in context:
    http://www.nabble.com/Very-odd-behaviour-of-FrenchAnalyzer-with-strings-in-capital-letters-tf3789153.html#a10715673
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/Very-odd-behaviour-of-FrenchAnalyzer-with-strings-in-capital-letters-tf3789153.html#a10837045
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jolinar13 at May 28, 2007 at 3:25 pm
    Finally, I use the standard analyzer with some custom stop words :
    le,la,les,l',un,une,des,d',à,au,de,et,en,dans,se,sont,qui,a,est,il,pour,que,du,sa,par,mais,sur,avec,aux,ce,d,s,l,ou,pas,ses
    Thanks anyway
    Florian


    Jolinar13 wrote:
    It looks like it remove the letter in the end, if it ends with an 'a', 'e'
    or 'i'.
    Femelles => all:femel
    Is this expected?
    How to use FrenchAnalyzer?
    Thanks
    Florian


    Jolinar13 wrote:
    Some terms I tested :
    vehicle => all:vehicl
    vehiCle => all:vehicle
    Vehicle => all:vehicl
    VeHicle => all:vehicle
    VEHICLE => all:vehicle
    vehicles => all:vehicl
    paris => all:par
    :S


    Jolinar13 wrote:
    Thanks to Luke, I realized my terms were not parsed correctly, and this
    has nothing to do with upper case!
    It seems to happen when the word ends with "*i". For example "giovanni"
    is parsed "giovann".
    Something about this?
    Florian


    Jolinar13 wrote:
    Hello Mark!
    Thank you a lot for your answer.
    You are right for the Luke part. My Luke version was too old. My bad.
    But with Luke I still observe the problem I described.
    Any idea how to sort this out?
    Maybe this has to do with the fact I use Compass?
    Thank you
    Florian
    I got strange
    search results on strings in uppercase. (example : VEHICLE)
    When I search the string (in lower case), I get no result. I get
    results
    if
    I use "vehicle*" or "vehiclE", or "vehicLe" etc.

    What is odd is that it affects only some of the strings, not all of
    them.

    markrmiller wrote:
    FrenchAnalyzer does lowercase and using it would not in anyway alter
    Lukes ability to read your index.

    - Mark

    Jolinar13 wrote:
    Hello Erick,
    Still no idea about my problem?
    Anybody here using the FrenchAnalyzer?
    Thanks,
    Florian


    Jolinar13 wrote:
    Hello,
    Thank you for your quick answer.
    I use Luke to examine the index, but since I switched to
    FrenchAnalyzer,
    it says 'Not a Lucene index'.
    If I open the index files in a text viewer, the strings are in UPPER
    case.
    I do use the same analyzer to index and search.
    So, do I have to specify the FrenchAnalyzer not to be case
    sensitive? How
    to do that?
    Thanks a lot
    Florian


    Erick Erickson wrote:
    First have you gotten a copy of Luke to examine your index to see
    what's actually indexed?

    The default behavior is usually to lowercase everything, but I'm
    not
    entirely sure if the French analyzer does this. But I suspect so.

    Searches are case sensitive. To get caseless searching, you need
    to put everything in the same case. This is usually done for you
    with
    any of the standard analyzers, but check specifically.

    Are you using the same analyzer at index AND search time?

    Best
    Erick
    On 5/21/07, Jolinar13 wrote:

    Hello,

    I tried org.apache.lucene.analysis.fr.FrenchAnalyzer and I got
    strange
    search results on strings in uppercase. (example : VEHICLE)
    When I search the string (in lower case), I get no result. I get
    results
    if
    I use "vehicle*" or "vehiclE", or "vehicLe" etc.

    What is odd is that it affects only some of the strings, not all
    of
    them.
    Anyone who has ever experienced this problem?

    Thanks,
    Florian
    --
    View this message in context:
    http://www.nabble.com/Very-odd-behaviour-of-FrenchAnalyzer-with-strings-in-capital-letters-tf3789153.html#a10715673
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/Very-odd-behaviour-of-FrenchAnalyzer-with-strings-in-capital-letters-tf3789153.html#a10837835
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark Miller at May 28, 2007 at 4:50 pm
    FrenchAnalyzer has a stemmer built in. You are seeing the result of that
    stemmer in action. If you would not like to stem, you should take a look
    at the code for FrenchAnalyzer and copy it to make your own...just
    remove the FrenchStemming filter.

    - Mark

    Jolinar13 wrote:
    Finally, I use the standard analyzer with some custom stop words :
    le,la,les,l',un,une,des,d',à,au,de,et,en,dans,se,sont,qui,a,est,il,pour,que,du,sa,par,mais,sur,avec,aux,ce,d,s,l,ou,pas,ses
    Thanks anyway
    Florian


    Jolinar13 wrote:
    It looks like it remove the letter in the end, if it ends with an 'a', 'e'
    or 'i'.
    Femelles => all:femel
    Is this expected?
    How to use FrenchAnalyzer?
    Thanks
    Florian


    Jolinar13 wrote:
    Some terms I tested :
    vehicle => all:vehicl
    vehiCle => all:vehicle
    Vehicle => all:vehicl
    VeHicle => all:vehicle
    VEHICLE => all:vehicle
    vehicles => all:vehicl
    paris => all:par
    :S


    Jolinar13 wrote:
    Thanks to Luke, I realized my terms were not parsed correctly, and this
    has nothing to do with upper case!
    It seems to happen when the word ends with "*i". For example "giovanni"
    is parsed "giovann".
    Something about this?
    Florian


    Jolinar13 wrote:
    Hello Mark!
    Thank you a lot for your answer.
    You are right for the Luke part. My Luke version was too old. My bad.
    But with Luke I still observe the problem I described.
    Any idea how to sort this out?
    Maybe this has to do with the fact I use Compass?
    Thank you
    Florian

    I got strange
    search results on strings in uppercase. (example : VEHICLE)
    When I search the string (in lower case), I get no result. I get
    results
    if
    I use "vehicle*" or "vehiclE", or "vehicLe" etc.

    What is odd is that it affects only some of the strings, not all of
    them.
    markrmiller wrote:
    FrenchAnalyzer does lowercase and using it would not in anyway alter
    Lukes ability to read your index.

    - Mark

    Jolinar13 wrote:
    Hello Erick,
    Still no idea about my problem?
    Anybody here using the FrenchAnalyzer?
    Thanks,
    Florian


    Jolinar13 wrote:

    Hello,
    Thank you for your quick answer.
    I use Luke to examine the index, but since I switched to
    FrenchAnalyzer,
    it says 'Not a Lucene index'.
    If I open the index files in a text viewer, the strings are in UPPER
    case.
    I do use the same analyzer to index and search.
    So, do I have to specify the FrenchAnalyzer not to be case
    sensitive? How
    to do that?
    Thanks a lot
    Florian


    Erick Erickson wrote:

    First have you gotten a copy of Luke to examine your index to see
    what's actually indexed?

    The default behavior is usually to lowercase everything, but I'm
    not
    entirely sure if the French analyzer does this. But I suspect so.

    Searches are case sensitive. To get caseless searching, you need
    to put everything in the same case. This is usually done for you
    with
    any of the standard analyzers, but check specifically.

    Are you using the same analyzer at index AND search time?

    Best
    Erick

    On 5/21/07, Jolinar13 wrote:

    Hello,

    I tried org.apache.lucene.analysis.fr.FrenchAnalyzer and I got
    strange
    search results on strings in uppercase. (example : VEHICLE)
    When I search the string (in lower case), I get no result. I get
    results
    if
    I use "vehicle*" or "vehiclE", or "vehicLe" etc.

    What is odd is that it affects only some of the strings, not all
    of
    them.
    Anyone who has ever experienced this problem?

    Thanks,
    Florian
    --
    View this message in context:
    http://www.nabble.com/Very-odd-behaviour-of-FrenchAnalyzer-with-strings-in-capital-letters-tf3789153.html#a10715673
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMay 21, '07 at 9:30a
activeMay 28, '07 at 4:50p
posts11
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase