On Jan 5, 2011, at 1:00 PM, L Duperval wrote:
I also have two fields, one for indexing and another for display. How does the
above affect searching? If you type "brown do" will it find the title correctly
or do you have to type "brown dog" in order to get a match? Would "brown do"
match "The brown horse has a dog" or not? My understanding is that that Lucene
(BTW, I'm using 2.4.1 because it's the latest version to work with Compass)
matches the prefix first, and then combines the matching results with other
clauses as specified.
No. Typing "brown do" will match on "brown dog" but not match on "the brown dog" that way we don't care which way the user types it. In our system "brown do" will not match on "the brown horse has a dog". We only do the PrefixQuery which is against the keyword field ("brown dog" is a single term as is "the brown dog"). We don't have a BooleanQuery like you do, but I don't see why it wouldn't work.
We basically have a method that looks something like
List<Book> getBooksBeginningWithTitle(String prefix);
and that code looks something like (we use Hibernate Search and not Compass, but they are pretty similar) :
FullTextSession fullTextSession = Search.getFullTextSession(getSession());
PrefixQuery prefixQuery = new PrefixQuery(new Term("titlekeyword", TextNormailzationUtil.transformKeyword(prefix, LetterCaseTransform.Lower)));
FullTextQuery ftQuery = fullTextSession.createFullTextQuery(prefixQuery, Book.class);
The field creation for the keyword fields looks like (done in a Hibernate Search construct called a FieldBridge - can't remember if Compass has something similar)
document.add(new Field("titlekeyword", TextNormailzationUtil.transformKeyword(fullTitle, LetterCaseTransform.Lower), Store.NO, Index.NOT_ANALYZED_NO_NORMS));
document.add(new Field("titlekeyword", TextNormailzationUtil.transformKeyword(partialTitle, LetterCaseTransform.Lower), Store.NO, Index.NOT_ANALYZED_NO_NORMS));
The partialTitle is just the full title with leading articles removed ('A', 'An', 'The', 'L'', etc).
The TextNormalizationUtil.transformKeyword in this case removes punctuation and non-spacing marks from the text and then lowercases. This is a business decision because in a keyword the case matters and users might not type in the punctuation or have Caps Lock key on so we normalize things down. You have to be sure that the same normalization happens at index and at search time.
That's what I was planning to look at next. Why did you choose not to use this
approach? Is it because of the other things you want to do with those fields or
something about the way the SpanQuery classes work?
I needed the field for other things and the code to do the PrefixQuery against this field was pretty simple.
We use SpanQuery's (well, list of SpanRegexQuery clauses fed into a SpanNearQuery) when we do something similar with authors (user can type an author name in first/last or last/first order and then what about any additional parts of their name - which means we would have had to create a lot of keyword fields to handle all the combinations and would still have missed some).
To unsubscribe, e-mail: firstname.lastname@example.org
For additional commands, e-mail: email@example.com