I currently have a project that indexes multiple file formats. There is a
2nd index that I use to keep track of files (because the queries in the
database are too slow, we query an index and use an ID field to get the
stuff out of the database)
However, I've started to run into some issues with the StandardAnalyzer. We
were using different analyzers at one point, so moved all creations of an
anaylzer to this function
public static Analyzer getAnalyzer()
{
Hashtable htStopWords = new Hashtable();
Analyzer analyzer = new
StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29, htStopWords);
return analyzer;
}
So now all functions should now be using a StandardAnalyzer.
It is to my knowledge that a StandardAnalyzer uses a LowerCaseFilter to
change all strings to a lower-case string and in some cases that is true.
To get all documents in an index, we use a field called SearchAll and store
the word "SearchAll" into the index, then search for that.
Creation of the document to write is done in this function
public Document getFileInfoDoc()
{
Document doc = new Document();
doc.Add(new Field('FieldId", this.FieldID, Field.Store.YES,
Field.Index.NOT_ANALYZED));
doc.Add(new Field("SelectAll", "SelectAll", Field.Store.NO,
Field.Index.ANALYZED));
doc.Add(new Field("FilePath", this.FilePath, Field.Store.YES,
Field.Index.ANALYZED));
return doc;
}
In one case we call this code
Document doc = getFileInfoDoc();
Analyzer analyzer = getAnalyzer();
indexWriter.UpdateDocument(new Term("FileId", this.FileId.ToString()), doc,
analyzer);
This code writes to the indexWriter, but DOES NOT ALWAYS apply the
LowerCaseFilter to the string stored in SelectAll.
To rebuild the index, we DeleteAllDocs from the index and loop through each
file to be stored, we then call the getFileInfoDoc from above and then call
the following 2 lines of code
Analyzer analyzer = getAnalyzer();
iwCurrent.UpdateDocument(new Term("FileId", iFileID.ToString(), doc,
analyzer);
this USUALLY stores the SearchAll field as lower case, but sometimes it
still fails and writes it as upper case.
Is there anything that I am missing in terms of making the LowerCaseFilter
be applied? I don't particularly want to change the text to lower case in
my code as a 2nd index we use may be having the same issues, but contains
the contents of the file and changing that to lower case may have a major
impact on performance.
Thanks in advance,
Trevor Watson