FAQ
Hi All! I have a trouble... When I index text documents in english, there
is no problem, buy when I index Spanish text documents (And they're big),
a lot of information form the document don't become Indexed (I suppose it
is due to the Analyzer). Howewer I want to Index ALL the strings in the
document with no StopWords. Is this possible??

Thank's in advance

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Erick Erickson at Jun 10, 2006 at 12:04 am
    Couple of things.

    1> you can use a different analyzer to NOT remove stopwords. SimpleAnalyzer
    comes to mind (though watch out for case). Look at LuceneInAction for an
    explanation of several analyzers that are available.

    2> If memory servers, Lucene defaults to indexing only the first 10,000
    words of a document, so it's quite possible that you are missing parts of
    your document. I believe this is configurable, but haven't had to delve into
    it yet, but IndexWriter.setMaximumFieldLength looks promising...

    Best
    Erick

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 9, '06 at 5:32p
activeJun 10, '06 at 12:04a
posts2
users2
websitelucene.apache.org

2 users in discussion

Erick Erickson: 1 post Manumohedano: 1 post

People

Translate

site design / logo © 2022 Grokbase