FAQ
Hi,

We are trying to index html files which have japanese / korean / chinese
content using the CJK analyser. But while indexing we are getting Lexical
parse error. Encountered unkown character. We tried setting the string
encoding to UTF 8 but it does not help.

Can anyone please help. Any pointers will be highly appreciated.

Thanks
--
View this message in context: http://www.nabble.com/Chinese-Japanese-Korean-Indexing-issue-Version-2.4-tp25388003p25388003.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Asitag at Sep 10, 2009 at 5:58 pm
    TO add some more context - I am able to index english and Western european
    langauages.


    asitag wrote:
    Hi,

    We are trying to index html files which have japanese / korean / chinese
    content using the CJK analyser. But while indexing we are getting Lexical
    parse error. Encountered unkown character. We tried setting the string
    encoding to UTF 8 but it does not help.

    Can anyone please help. Any pointers will be highly appreciated.

    Thanks
    --
    View this message in context: http://www.nabble.com/Chinese-Japanese-Korean-Indexing-issue-Version-2.4-tp25388003p25388078.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedSep 10, '09 at 5:53p
activeSep 10, '09 at 5:58p
posts2
users1
websitelucene.apache.org

1 user in discussion

Asitag: 2 posts

People

Translate

site design / logo © 2022 Grokbase