We are trying to modify the positional encoding of a term occurrence for
experimentation purposes. One solution we adopt is to use payloads to
sotre our own positional information encoding, but with this solution,
it becomes difficult to measure the increase or decrease of index size.
It is why we would like to directly change the positional encoding.
I have seen that Michael McCandless recently refactored the
DocumentsWriter into a flexible indexer chain (see LUCENE-1301). By
analysing the code, I have noticed that only two classes should be
modified when writing a document:
When adding a document with IndexWriter.addDocument
- FieldInvertState (increment and store positions)
- FreqProxTermsWriterPerField.writeProx
When flushing documents:
- FreqProxTermsWriterPerField.appendPostings
I have noticed that only one class should be modified when reading a
document:
- SegmentTermPositions.nextPositions, and
- SegmentTermPositions.readDeltaPositions
Could a member of the Lucene team approve my modifications ? Do I forget
to modify some classes ?
Another question, since the lucene core classes are kind of close, what
is the best way to implement these modifications ? Make a branch of
lucene, and add my new classes to the lucene package
org.apache.lucene.index ? Or do a more elegant solution is possible ?
Thanks in advance,
Regards.
--
Renaud Delbru
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Renaud Delbru
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org