The example code in
http://lucene.472066.n3.nabble.com/Problem-searching-in-the-same-sentence-td1501269.htmlreads
custom standard analyzer:
public class MyStandardAnalyzer extends StandardAnalyzer implements
IndexFields {
public MyStandardAnalyzer(Version matchVersion) {
super(matchVersion);
}
public int getPositionIncrementGap(String fieldName) {
int incrementGap = super.getPositionIncrementGap(fieldName);
if (fieldName.equals(IFIELD_TEXT)) {
incrementGap += 10;
}
return incrementGap;
}
}
so if you used this analyzer and called
new Field(IFIELD_TEXT, value, ...) and
new Field("someothername", value, ...) the first field would get the
modified gaps and the second one wouldn't.
Hope that helps.
--
Ian.
On Thu, Mar 10, 2011 at 4:34 PM, Michael Wiegand
wrote:
Conceptually, I think I know what to do. Unfortunately, with the given
interfaces of Lucene I have some difficulty.
If I add the content of a document sentence by sentence, i.e. line by line,
(using a multi-valued field), there are only two constructors possible:
Field(String name, String value, Field.Store store, Field.Index index)
or
Field(String name, String value, Field.Store store, Field.Index index,
Field.TermVector termVector)
The sentence comes as a string which I get from a BufferedReader-object by
using the readLine() method.
But as far as I understood, I need to access some TokenStream-object in
order to set the PositionIncrementAttribute. So how should that work?
Thank you in advance.
Ian Lea schrieb:
You can use multi valued fields if you play with the position
increment gap. See e.g.
http://lucene.472066.n3.nabble.com/Problem-searching-in-the-same-sentence-td1501269.htmlA google search for "lucene indexing sentences" or similar finds that,
and more.
Different docs can have different fields/different numbers of fields,
but the position gap approach is probably better.
--
Ian.
On Fri, Mar 4, 2011 at 7:06 AM, Michael Wiegand
wrote:
Hi,
I would like to create an index with Lucene to a document collections of
text files.
The index should be created in such a way, that for the search I can
enforce
that query term A and query term B are contained within the same
sentence.
How should implement the index? Should I have for every sentence a
different
field (but make sure that it is not a multi-valued field because they
would
get merged which is exactly what I do not want)?
Would it be problematic that different documents would then end up
having
different numbes of fields?
Thank you in advance!
Best,
Michael
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org