Hy All

I'm using Lucene to extract keywords out of a text.
The Lucene Index is built over a set of defined words (we call them
keywords). Then a text is queried to search that index. The goal is to find
out which keywords appear in the given text.

This works fine as long as the defined keywords are only "single" words and
not phrases.
But what to do if a keyword is a phrase and has to appear as whole in the
text? (and no part of the phrase must match for it's own)
Searching of phrases in quotes would do the job, but I have no knowledge of
the text beeing searched.

To make my problem clearer:

Indexed: Document1 fieldName:economy
Text to find keywords in: An economy consists of the economic system of a
country or other area
search processed: fieldName: economy, fieldName:consists, fieldName:economic
... etc
result: Document1 -> keyword economy

But now:
Keyword phrase I want to index: Make peace not war
Text to find keywords in: Two or three eggs for breakfast make no difference

if indexed as before this text matches to "Make peace not war" because of
the "make" in the text.
This is what I'm trying to avoid.

Am I overseeing something or is there a different approach for finding
keywords and keywordPhrases in a text?

Thanks for your help!
View this message in context: http://lucene.472066.n3.nabble.com/Indexing-and-searching-phrases-tp1084545p1084545.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
postedAug 11, '10 at 8:40a
activeAug 11, '10 at 8:40a

1 user in discussion

Christian S.: 1 post



site design / logo © 2022 Grokbase