FAQ
[PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
-------------------------------------------------------------------------

Key: LUCENE-2133
URL: https://issues.apache.org/jira/browse/LUCENE-2133
Project: Lucene - Java
Issue Type: Improvement
Components: Search
Affects Versions: 3.1
Reporter: Christian Kohlschütter


Hi all,

up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open

The FieldCache is flawed because it is incorrect to assume that:
1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.

Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.

There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.

I now propose the following:
1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)

I have provided an patch which takes care of all these issues. It passes all JUnit tests.

The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.

In detail and besides the above mentioned improvements, the following is provided:
1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.

The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
- FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
- FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
- FieldCache (=> IndexFieldCache)
- FieldCacheImpl (=> IndexFieldCacheImpl)
- all classes in FieldCacheImpl (=> several package-level classes)
- all subclasses of FieldComparator (=> several package-level classes)

Final notes:
- The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
- The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
- I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
- I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
- The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.

Cheers,
Christian


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Search Discussions

  • Christian Kohlschütter (JIRA) at Dec 8, 2009 at 7:46 am
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Christian Kohlschütter updated LUCENE-2133:
    -------------------------------------------

    Attachment: java.patch

    The main patch (for src/java)
    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 3.1
    Reporter: Christian Kohlschütter
    Attachments: java.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Christian Kohlschütter (JIRA) at Dec 8, 2009 at 7:48 am
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Christian Kohlschütter updated LUCENE-2133:
    -------------------------------------------

    Attachment: contrib.patch

    Patch for contrib code (src/contrib), may be applied after approval of main patch

    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 3.1
    Reporter: Christian Kohlschütter
    Attachments: contrib.patch, demo.patch, java.patch, test.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Christian Kohlschütter (JIRA) at Dec 8, 2009 at 7:48 am
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Christian Kohlschütter updated LUCENE-2133:
    -------------------------------------------

    Attachment: test.patch

    Patch for JUnit tests (src/test) to enable them to use the new API
    (may be applied after approval of main patch)

    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 3.1
    Reporter: Christian Kohlschütter
    Attachments: contrib.patch, demo.patch, java.patch, test.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Christian Kohlschütter (JIRA) at Dec 8, 2009 at 7:48 am
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Christian Kohlschütter updated LUCENE-2133:
    -------------------------------------------

    Attachment: demo.patch

    Patch for demo code (src/demo), may be applied after approval of main patch

    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 3.1
    Reporter: Christian Kohlschütter
    Attachments: contrib.patch, demo.patch, java.patch, test.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Dec 8, 2009 at 9:05 am
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Uwe Schindler updated LUCENE-2133:
    ----------------------------------

    Affects Version/s: (was: 3.1)
    2.9.1
    3.0

    Thanks for the patch and the explanation!

    Some notes, without a deep insight:
    - The problem with your current patch is the same as with the other propsals: backwards compatibility is not yet adressed, but is the most hairy problem with all other approaches.
    - The change of setNextReader() to use *only* the cache may not be the correct approach, as the collection of results is IndexReader-specific and not cache-specific (the Collectors in core for sorting only need that, but other collectors outside Lucene, e.g. in Solr need the IndexReader). It is e.g. needed for building filters and so on.
    - The solution for the above point is the same like with all other changes: IndexReader needs a method to get a cache instance not the other way round. By this the collector.setNextReader change is obsolete.
    - Also the patch file changes files, that are not related or removes import statements, that are clearly marked as "// javadocs only".
    - For version 3.1: The "affects version", setting should be 2.9/3.0, as 3.1 is not yet released. An fix version cannot be yet given, thats correct.

    There may be more problems, but I will review the patch more detailed!

    Uwe
    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: contrib.patch, demo.patch, java.patch, test.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Christian Kohlschütter (JIRA) at Dec 8, 2009 at 9:20 am
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787378#action_12787378 ]

    Christian Kohlschütter commented on LUCENE-2133:
    ------------------------------------------------

    Hi Uwe,

    thanks for reviewing. Just a quick response to your comment:

    # The problem with your current patch is the same as with the other propsals: backwards compatibility is not yet adressed, but is the most hairy problem with all other approaches.

    As I wrote, the patch *does* provide backwards compatibility. Please correct me if I am wrong :)
    I think actually over-stressed the backwards-compatibility issue because many classes were marked as "Expert; can be changed in incompatible ways in the next release" (e.g. SortField/FieldComparator, Collector)

    # The change of setNextReader() to use only the cache may not be the correct approach, as the collection of results is IndexReader-specific and not cache-specific (the Collectors in core for sorting only need that, but other collectors outside Lucene, e.g. in Solr need the IndexReader). It is e.g. needed for building filters and so on.

    You can access the IndexReader from IndexCache and vice versa. The patch actually contains a few changes for contrib where this is actually utilized.

    Passing IndexCache instead of IndexReader to setNextReader gives a slight gain in performance for most cases (i.e. whenever only the IndexCache is accessed), since there is no need to traverse the IndexReader-to-IndexCache method chain (IndexReader#getIndexCache() and the synchronized getOrCreateIndexCache()) for each call to setNextReader.

    In all the other cases, newCache.getIndexReader() gives full access to the IndexReader. There even is a default method which calls the (now-to-be-deprecated) newIndexReader(IndexReader,int) method, so there are actually zero changes necessary for existing code.

    # Also the patch file changes files, that are not related or removes import statements, that are clearly marked as "// javadocs only".

    The corresponding references in the javadocs comments have been removed (e.g. FieldCache has been replaced to IndexFieldCache), so it made sense to remove the imports as well.

    # For version 3.1: The "affects version", setting should be 2.9/3.0, as 3.1 is not yet released. An fix version cannot be yet given, thats correct.

    Indeed.


    Best regards,
    Christian

    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: contrib.patch, demo.patch, java.patch, test.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Christian Kohlschütter (JIRA) at Dec 8, 2009 at 9:22 am
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787378#action_12787378 ]

    Christian Kohlschütter edited comment on LUCENE-2133 at 12/8/09 9:20 AM:
    -------------------------------------------------------------------------

    Hi Uwe,

    thanks for reviewing. Just a quick response to your comment:

    bq. The problem with your current patch is the same as with the other propsals: backwards compatibility is not yet adressed, but is the most hairy problem with all other approaches.

    As I wrote, the patch *does* provide backwards compatibility. Please correct me if I am wrong :)
    I think actually over-stressed the backwards-compatibility issue because many classes were marked as "Expert; can be changed in incompatible ways in the next release" (e.g. SortField/FieldComparator, Collector)

    bq. The change of setNextReader() to use only the cache may not be the correct approach, as the collection of results is IndexReader-specific and not cache-specific (the Collectors in core for sorting only need that, but other collectors outside Lucene, e.g. in Solr need the IndexReader). It is e.g. needed for building filters and so on.

    You can access the IndexReader from IndexCache and vice versa. The patch actually contains a few changes for contrib where this is actually utilized.

    Passing IndexCache instead of IndexReader to setNextReader gives a slight gain in performance for most cases (i.e. whenever only the IndexCache is accessed), since there is no need to traverse the IndexReader-to-IndexCache method chain (IndexReader#getIndexCache() and the synchronized getOrCreateIndexCache()) for each call to setNextReader.

    In all the other cases, newCache.getIndexReader() gives full access to the IndexReader. There even is a default method which calls the (now-to-be-deprecated) newIndexReader(IndexReader,int) method, so there are actually zero changes necessary for existing code.

    bq. Also the patch file changes files, that are not related or removes import statements, that are clearly marked as "// javadocs only".

    The corresponding references in the javadocs comments have been removed (e.g. FieldCache has been replaced to IndexFieldCache), so it made sense to remove the imports as well.

    bq. For version 3.1: The "affects version", setting should be 2.9/3.0, as 3.1 is not yet released. An fix version cannot be yet given, thats correct.

    Indeed.


    Best regards,
    Christian


    was (Author: ck@newsclub.de):
    Hi Uwe,

    thanks for reviewing. Just a quick response to your comment:

    # The problem with your current patch is the same as with the other propsals: backwards compatibility is not yet adressed, but is the most hairy problem with all other approaches.

    As I wrote, the patch *does* provide backwards compatibility. Please correct me if I am wrong :)
    I think actually over-stressed the backwards-compatibility issue because many classes were marked as "Expert; can be changed in incompatible ways in the next release" (e.g. SortField/FieldComparator, Collector)

    # The change of setNextReader() to use only the cache may not be the correct approach, as the collection of results is IndexReader-specific and not cache-specific (the Collectors in core for sorting only need that, but other collectors outside Lucene, e.g. in Solr need the IndexReader). It is e.g. needed for building filters and so on.

    You can access the IndexReader from IndexCache and vice versa. The patch actually contains a few changes for contrib where this is actually utilized.

    Passing IndexCache instead of IndexReader to setNextReader gives a slight gain in performance for most cases (i.e. whenever only the IndexCache is accessed), since there is no need to traverse the IndexReader-to-IndexCache method chain (IndexReader#getIndexCache() and the synchronized getOrCreateIndexCache()) for each call to setNextReader.

    In all the other cases, newCache.getIndexReader() gives full access to the IndexReader. There even is a default method which calls the (now-to-be-deprecated) newIndexReader(IndexReader,int) method, so there are actually zero changes necessary for existing code.

    # Also the patch file changes files, that are not related or removes import statements, that are clearly marked as "// javadocs only".

    The corresponding references in the javadocs comments have been removed (e.g. FieldCache has been replaced to IndexFieldCache), so it made sense to remove the imports as well.

    # For version 3.1: The "affects version", setting should be 2.9/3.0, as 3.1 is not yet released. An fix version cannot be yet given, thats correct.

    Indeed.


    Best regards,
    Christian

    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: contrib.patch, demo.patch, java.patch, test.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Dec 8, 2009 at 10:08 am
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787387#action_12787387 ]

    Uwe Schindler commented on LUCENE-2133:
    ---------------------------------------

    bq. The change of setNextReader() to use only the cache may not be the correct approach, as the collection of results is IndexReader-specific and not cache-specific (the Collectors in core for sorting only need that, but other collectors outside Lucene, e.g. in Solr need the IndexReader). It is e.g. needed for building filters and so on.

    This is one of the backwards breaks. As noted, the Collector abstract methods cannot be changed and should not, as the IndexReader is the important part used for collecting the results. The cache is only used by sorting but not in general. Use of IndexCache here would be a design flaw, because it combines unrelated things.
    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: contrib.patch, demo.patch, java.patch, test.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Dec 8, 2009 at 10:10 am
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787387#action_12787387 ]

    Uwe Schindler edited comment on LUCENE-2133 at 12/8/09 10:08 AM:
    -----------------------------------------------------------------

    bq. The change of setNextReader() to use only the cache may not be the correct approach, as the collection of results is IndexReader-specific and not cache-specific (the Collectors in core for sorting only need that, but other collectors outside Lucene, e.g. in Solr need the IndexReader). It is e.g. needed for building filters and so on.

    This is one of the backwards breaks. As noted, the Collector abstract methods cannot be changed and should not, as the IndexReader is the important part used for collecting the results. The cache is only used by sorting but not in general. Use of IndexCache here would be a design flaw, because it combines unrelated things.

    And setNextReader is only called seldom (only once for each segment). It would have an impact if you would need to call synchronized methods inside collect().

    was (Author: thetaphi):
    bq. The change of setNextReader() to use only the cache may not be the correct approach, as the collection of results is IndexReader-specific and not cache-specific (the Collectors in core for sorting only need that, but other collectors outside Lucene, e.g. in Solr need the IndexReader). It is e.g. needed for building filters and so on.

    This is one of the backwards breaks. As noted, the Collector abstract methods cannot be changed and should not, as the IndexReader is the important part used for collecting the results. The cache is only used by sorting but not in general. Use of IndexCache here would be a design flaw, because it combines unrelated things.
    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: contrib.patch, demo.patch, java.patch, test.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Christian Kohlschütter (JIRA) at Dec 8, 2009 at 3:34 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787502#action_12787502 ]

    Christian Kohlschütter commented on LUCENE-2133:
    ------------------------------------------------

    Hi Uwe,

    your arguments seem reasonable to me. I have added a new patch (src/java only), leaving all test and contrib classes intact (but still passes all tests).
    The new patch now keeps setNextReader as it is.

    Cheers,
    Christian

    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: contrib.patch, demo.patch, java.patch, test.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Christian Kohlschütter (JIRA) at Dec 8, 2009 at 3:36 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Christian Kohlschütter updated LUCENE-2133:
    -------------------------------------------

    Attachment: LUCENE-2133.patch

    Improved patch (src/java only), now without modification to newIndexReader.

    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: contrib.patch, demo.patch, java.patch, LUCENE-2133.patch, test.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Christian Kohlschütter (JIRA) at Dec 8, 2009 at 3:38 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Christian Kohlschütter updated LUCENE-2133:
    -------------------------------------------

    Attachment: (was: test.patch)
    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Christian Kohlschütter (JIRA) at Dec 8, 2009 at 3:38 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Christian Kohlschütter updated LUCENE-2133:
    -------------------------------------------

    Attachment: (was: demo.patch)
    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Christian Kohlschütter (JIRA) at Dec 8, 2009 at 3:38 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Christian Kohlschütter updated LUCENE-2133:
    -------------------------------------------

    Attachment: (was: contrib.patch)
    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Christian Kohlschütter (JIRA) at Dec 8, 2009 at 3:38 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Christian Kohlschütter updated LUCENE-2133:
    -------------------------------------------

    Attachment: (was: java.patch)
    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Christian Kohlschütter (JIRA) at Dec 8, 2009 at 4:25 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Christian Kohlschütter updated LUCENE-2133:
    -------------------------------------------

    Attachment: LUCENE-2133.patch

    Added code to automatically evict entries from IndexFieldCache upon close of IndexCache/IndexReader
    (see also LUCENE-2135)

    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Dec 8, 2009 at 7:04 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787661#action_12787661 ]

    Michael McCandless commented on LUCENE-2133:
    --------------------------------------------

    Christian, could you roll up all patches into a single attachment? (Actually it looks like the separate demo, contrib, test, core patches were removed).
    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Christian Kohlschütter (JIRA) at Dec 8, 2009 at 7:10 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787667#action_12787667 ]

    Christian Kohlschütter commented on LUCENE-2133:
    ------------------------------------------------

    Michael, LUCENE-2133.patch now contains all the patches for src/java. I have removed the test, contrib and demo files because they actually do not belong to the patch (since LUCENE-2133.patch really provides complete backwards compatibility now). It would not make much sense to claim backwards compatibility and at the same time modify a bunch of test classes, would it? :)

    Nevertheless, I am now preparing an updated LUCENE-2133-complete.patch with all these additional patches included. In the meantime you may simply apply LUCENE-2133.patch to your local trunk copy and see if it works for you.

    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Dec 8, 2009 at 7:41 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787699#action_12787699 ]

    Michael McCandless commented on LUCENE-2133:
    --------------------------------------------

    bq. I have removed the test, contrib and demo files because they actually do not belong to the patch (since LUCENE-2133.patch really provides complete backwards compatibility now).

    Ahh OK
    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Christian Kohlschütter (JIRA) at Dec 8, 2009 at 7:51 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Christian Kohlschütter updated LUCENE-2133:
    -------------------------------------------

    Attachment: LUCENE-2133-complete.patch

    The same patch as LUCENE-2133.patch plus the changes in src/test and contrib to fully utilize the new IndexFieldCache and SortField code.

    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Mark Miller (JIRA) at Dec 8, 2009 at 8:09 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787711#action_12787711 ]

    Mark Miller commented on LUCENE-2133:
    -------------------------------------

    There are a bunch or unrelated changes (imports/names/exception thrown) that should be pulled from this patch.
    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Mark Miller (JIRA) at Dec 8, 2009 at 8:19 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787715#action_12787715 ]

    Mark Miller commented on LUCENE-2133:
    -------------------------------------

    Hmm ... nevermind. The exception is related and most of the imports are correct - brain spin.

    Didn't see that

    import org.apache.lucene.search.SortField; // for javadocs

    wasn't being used anymore anyway.

    import org.apache.lucene.search.fields.IndexFieldCache in NumericQuery should get a //javadoc so someone doesn't accidently remove it.

    And I guess the t to threadLocal change doesn't hurt with the amount your changing that anyway. Its a better name.

    This looks pretty nice overall.
    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Christian Kohlschütter (JIRA) at Dec 8, 2009 at 8:21 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787718#action_12787718 ]

    Christian Kohlschütter commented on LUCENE-2133:
    ------------------------------------------------

    Mark, could you please name the changes you would like to be excluded?
    I thought I had only included necessary changes.

    Some dependent changes are not so obvious, but necessary.

    For example, Analyzer.close now needs to throw an IOException because of CloseableThreadLocal now closing an IOException because of doClose() throwing an IOException because it may close references from an IndexCache that might throw an IOException. Luckily this is covered by java.io.Closeable (which declared #close() to throw an IOException), and Analyzer implements that interface already.

    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Christian Kohlschütter (JIRA) at Dec 8, 2009 at 8:23 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787720#action_12787720 ]

    Christian Kohlschütter commented on LUCENE-2133:
    ------------------------------------------------

    bq. Hmm ... nevermind. The exception is related and most of the imports are correct - brain spin.

    OK, no problem.

    bq. This looks pretty nice overall.

    Thanks :-)

    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Mark Miller (JIRA) at Dec 8, 2009 at 8:35 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787729#action_12787729 ]

    Mark Miller commented on LUCENE-2133:
    -------------------------------------

    A couple more quick notes:

    I know the FieldComparator class is ugly, but I'm not sure we should pull the rug by putting the impls in a new package. On the other hand, its not likely to affect many and it was experimental - so its a tough call. Its a lot of classes in there ;)

    I'm also not sure if fields is the right package name? And do the Filters belong in that package?

    Also, almost a non issue, but extending a deprecated class is going to be an ultra minor back compat break when its removed. Not likely a problem though. But we might put a note to that affect to be clear. It is almost self documenting anyway though :)

    Rather then changing the tests to the new classes, we should prob copy them and make new ones - then remove them when the deprecations are removed.

    Also, you should pull the author tag(s) - all credit is through JIRA and Changes. (I only see it like once, so I bet thats eclipse?)

    I havn't done a thorough review it all, but this is pretty great stuff to appear so complete and out of nowhere :)
    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Mark Miller (JIRA) at Dec 8, 2009 at 8:41 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787734#action_12787734 ]

    Mark Miller commented on LUCENE-2133:
    -------------------------------------

    It looks like FieldCacheTermsFilterDocIdSet is using the wrong StringIndex? And I think the FieldCache import in that class can be removed.
    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Mark Miller (JIRA) at Dec 8, 2009 at 8:43 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787734#action_12787734 ]

    Mark Miller edited comment on LUCENE-2133 at 12/8/09 8:42 PM:
    --------------------------------------------------------------

    It looks like FieldCacheTermsFilterDocIdSet is using the wrong StringIndex? And I think the FieldCache import in that class can be removed (same with IndexFieldCacheRangeFilter).

    was (Author: markrmiller@gmail.com):
    It looks like FieldCacheTermsFilterDocIdSet is using the wrong StringIndex? And I think the FieldCache import in that class can be removed.
    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Mark Miller (JIRA) at Dec 8, 2009 at 8:55 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787740#action_12787740 ]

    Mark Miller commented on LUCENE-2133:
    -------------------------------------

    I dont quite understand why we would have no more insanity? What happens when you get the cache from a multireader and then from each segment reader within it? You double up right? Insanity?
    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Christian Kohlschütter (JIRA) at Dec 8, 2009 at 9:10 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787743#action_12787743 ]

    Christian Kohlschütter commented on LUCENE-2133:
    ------------------------------------------------

    bq. I know the FieldComparator class is ugly, but I'm not sure we should pull the rug by putting the impls in a new package. On the other hand, its not likely to affect many and it was experimental - so its a tough call. Its a lot of classes in there

    I think it makes sense to move the stuff into another package because o.a.l.search is already filled with a lot of classes. Since there are no package-level-protection dependencies to o.a.l.search, I think it does not hurt either.

    bq. I'm also not sure if fields is the right package name? And do the Filters belong in that package?

    I couldn't really come up with a better name. It's all about the fields somehow (caches, comparators and value parsers).

    bq. Also, almost a non issue, but extending a deprecated class is going to be an ultra minor back compat break when its removed. Not likely a problem though. But we might put a note to that affect to be clear. It is almost self documenting anyway though

    Yes. I thought it's better to make it as clear as possibe.

    Also, almost a non issue, but extending a deprecated class is going to be an ultra minor back compat break when its removed. Not likely a problem though. But we might put a note to that affect to be clear. It is almost self documenting anyway though

    bq. Rather then changing the tests to the new classes, we should prob copy them and make new ones - then remove them when the deprecations are removed.

    Yes, good idea. This way we can test the old and the new behaviour at once. Maybe it is enough to add new test methods to the same class?

    bq. Also, you should pull the author tag(s) - all credit is through JIRA and Changes. (I only see it like once, so I bet thats eclipse?)

    Sorry, Eclipse defaults. I thought I had removed all of them.

    bq. I haven't done a thorough review it all, but this is pretty great stuff to appear so complete and out of nowhere

    Yes, it also just came over me the last weekend ;-)


    I will make a new patch tomorrow (CET time) with the new test cases incorporated.

    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Mark Miller (JIRA) at Dec 8, 2009 at 9:22 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787748#action_12787748 ]

    Mark Miller commented on LUCENE-2133:
    -------------------------------------

    bq. I think it does not hurt either.

    I didn't notice that you actually just deprecated the originals - I guess thats not a complete rug pull ...

    By the way, I don't think you need to deprecate something in a new class ( IndexFieldCacheImpl):

    {code}

    /**
    * @deprecated Use {@link #clear()} instead.
    */
    public void purgeAllCaches() {
    init();
    }
    {code}
    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Christian Kohlschütter (JIRA) at Dec 8, 2009 at 9:26 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787750#action_12787750 ]

    Christian Kohlschütter commented on LUCENE-2133:
    ------------------------------------------------

    bq. It looks like FieldCacheTermsFilterDocIdSet is using the wrong StringIndex? And I think the FieldCache import in that class can be removed (same with IndexFieldCacheRangeFilter).

    Good catch. Since o.a.l.search.field.StringIndex extends o.a.l.search.StringIndex it's just a matter of declaration. The imports can indeed be removed (they were only required for javadocs, which should now also refer to the new classes instead). I have made the changes locally and will put them into the next update for this patch.

    bq. I dont quite understand why we would have no more insanity? What happens when you get the cache from a multireader and then from each segment reader within it? You double up right? Insanity?

    The MultiReader has a different cache than the internal SegmentReaders, there is no interconnection between the two. While change the testcases, my patch initially even triggered a JUnit failure because of this -- o.a.l.util.TestFieldCacheSanityChecker expected a cache error which now will not happen anymore.

    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Mark Miller (JIRA) at Dec 8, 2009 at 9:30 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787752#action_12787752 ]

    Mark Miller commented on LUCENE-2133:
    -------------------------------------

    And what about the doubling up insanity? It looks like you just commented out that check? It appears to me that thats still an issue we want to check for - we want to make sure Lucene core and users have a way to be sure they are not using a toplevel reader and its sub readers for caches unless they *really* intend to.
    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Christian Kohlschütter (JIRA) at Dec 8, 2009 at 9:30 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787754#action_12787754 ]

    Christian Kohlschütter commented on LUCENE-2133:
    ------------------------------------------------

    bq. I didn't notice that you actually just deprecated the originals - I guess thats not a complete rug pull ...

    Writing the new code was one day, then making it backwards compatible with all these deprecations another one :-b
    I actually wouldn't care so much about backwards compatibility in the most cases, but I think it is better now to allow a smooth transition to the new code.

    bq. By the way, I don't think you need to deprecate something in a new class ( IndexFieldCacheImpl):

    Indeed. This was a leftover from the early changes to src/test code. It's removed now (locally).

    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Dec 8, 2009 at 9:37 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787757#action_12787757 ]

    Uwe Schindler commented on LUCENE-2133:
    ---------------------------------------

    After removing the very problematic change to the Collector class (which is very central in Lucene and should not be changed again after 2.9), thatI told you to do in the morning, the other changes are no longer too intrusive. I am quite happy with your new patch and I now also like it very much. It is as a good candiate for replacing the very, very ugly FieldCache impl we currently have.

    I am not really sure, if the package name is good and I would like to also add Earwins changes and not bind the cache so hard to the IndexReader (which was also the problem with the last FieldCache), instead just make it a plugin component (like all the other flex parts). For the functionality of Lucene, FieldCache is not needed, sorting is just an addon on searching, but not realy basic functionality (lots of users have other sorting algos). Also not sure if all classes in search that contain "FieldCache" should be renamed. The FieldCacheTermsFilter and RangeFilter only use the cache internally, they should simply be changed to use the new API and maybe only get additional ctors for the other parser instance classes. So some stuff still needs to be changed.

    If it fits good together with the new flexible indexing branch, I see no problems with appling it soon. So its all good work. It is a pity, tht we heard not much from you in the past, the patch suddenly appeared in JIRA and almost nobody know you. I only found one introduction from you long time ago on java-dev. Are you still working at L3S? If yes, send nice greetings also to Jan Brase! :-)

    This patch and the flex branch together and the many deprecations would make a 3.9 soon and 4.0 without all ugly stuff soon would be nice. I again would do the cleaning heavy commiting police man!

    So, good work. Thanks!
    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Mark Miller (JIRA) at Dec 8, 2009 at 9:37 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787752#action_12787752 ]

    Mark Miller edited comment on LUCENE-2133 at 12/8/09 9:34 PM:
    --------------------------------------------------------------

    And what about the doubling up insanity? It looks like you just commented out that check? It appears to me that thats still an issue we want to check for - we want to make sure Lucene core and users have a way to be sure they are not using a toplevel reader and its sub readers for caches unless they *really* intend to.

    *edit*

    This type of change actually even exaggerates that problem (though if we want to improve things here, its something we will have to deal with).

    Now you might have a mixture of old api/new api caches as well if you don't properly upgrade everything at once.

    was (Author: markrmiller@gmail.com):
    And what about the doubling up insanity? It looks like you just commented out that check? It appears to me that thats still an issue we want to check for - we want to make sure Lucene core and users have a way to be sure they are not using a toplevel reader and its sub readers for caches unless they *really* intend to.
    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Christian Kohlschütter (JIRA) at Dec 8, 2009 at 9:41 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787759#action_12787759 ]

    Christian Kohlschütter commented on LUCENE-2133:
    ------------------------------------------------

    bq. And what about the doubling up insanity? It looks like you just commented out that check? It appears to me that thats still an issue we want to check for - we want to make sure Lucene core and users have a way to be sure they are not using a toplevel reader and its sub readers for caches unless they really intend to.

    The new IndexFieldCacheSanityChecker can be removed out-of-the-box (as documented in the class itself). It is just kind of a "showcase" class for this issue.
    The section I commented out is non-functional with the new change since there is no more ReaderKey in IndexFieldCache.CacheEntry.

    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Mark Miller (JIRA) at Dec 8, 2009 at 9:48 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787763#action_12787763 ]

    Mark Miller commented on LUCENE-2133:
    -------------------------------------

    But we still want its functionality - we still want to check for "doubling" up insanity.
    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Mark Miller (JIRA) at Dec 8, 2009 at 9:56 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787767#action_12787767 ]

    Mark Miller commented on LUCENE-2133:
    -------------------------------------

    bq. not bind the cache so hard to the IndexReader (which was also the problem with the last FieldCache), instead just make it a plugin component

    At a minimum, you should be able to set the cache for the reader.

    bq. For the functionality of Lucene, FieldCache is not needed, sorting is just an addon on searching

    The way he has it, this is not just for the fieldache, but also the fieldsreader and vectorreader - if we go down that road, we should consider norms as well.

    bq. I see no problems with appling it soon

    I still think it might be a little early. This has a lot of consequences.
    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Christian Kohlschütter (JIRA) at Dec 8, 2009 at 9:56 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787768#action_12787768 ]

    Christian Kohlschütter commented on LUCENE-2133:
    ------------------------------------------------

    Uwe, thanks a lot for your assessment!

    I will definitely look at the flex branch and see what needs to be changed.

    Yes, I still work at L3S, working hard for my PhD (planned to finish mid 2010). This is probably the main reason for not being so active in the Lucene community in the past. However, as I use Lucene for research and commercial systems over the last years, I always try to contribute back patches whenever possible.

    Jan Brase doesn't work at L3S anymore, but I will happily forward the greetings.

    Cheers,
    Christian

    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Uwe Schindler (JIRA) at Dec 8, 2009 at 10:03 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787772#action_12787772 ]

    Uwe Schindler commented on LUCENE-2133:
    ---------------------------------------

    bq. I still think it might be a little early. This has a lot of consequences.

    I mean we should not wait for years like with LUCENE-831 :-) No heavy committing please *g*.
    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Christian Kohlschütter (JIRA) at Dec 8, 2009 at 10:35 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Christian Kohlschütter updated LUCENE-2133:
    -------------------------------------------

    Attachment: LUCENE-2133.patch

    New changes from the discussions incorporated into the src/java patch.
    The complete patch (including src/test and contrib) follows tomorrow.

    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Christian Kohlschütter (JIRA) at Dec 8, 2009 at 10:43 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787791#action_12787791 ]

    Christian Kohlschütter edited comment on LUCENE-2133 at 12/8/09 10:42 PM:
    --------------------------------------------------------------------------

    bq. But we still want its functionality - we still want to check for "doubling" up insanity.

    Mark,
    the latest patch re-inserts the commented fragment.

    The "readerkey" in the original FieldCacheSanityChecker refers to the FieldCacheKey in IndexReader, which is now obsoleted by IndexReader#getIndexCache.getIndexReader(). I now use this as the reader key, even though I am still not quite sure if it is really necessary. Checking for insanity sounds so paranoid to me. If we know that there is a conceptional bug in the code, we should fix it, not circumvent and hotfix it using an insanity checker. But this is another story.


    was (Author: ck@newsclub.de):
    bq. But we still want its functionality - we still want to check for "doubling" up insanity.

    Mark,
    the latest patch re-inserts the commented fragment.

    The "readerkey" in the original FieldCacheSanityChecker refers to the FieldCacheKey in IndexReader, which is now obsoleted by IndexReader#getIndexCache.getIndexReader(). I now use this as the reader key, even though I am still not quite sure if it is really necessary. Checking for insanity sounds so paranoid to me. If we know that there is a conceptional bug in the code, we should fix it, not circumvent and hotfix it using a insanity checker. But this is another story.

    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Christian Kohlschütter (JIRA) at Dec 8, 2009 at 10:43 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787791#action_12787791 ]

    Christian Kohlschütter commented on LUCENE-2133:
    ------------------------------------------------

    bq. But we still want its functionality - we still want to check for "doubling" up insanity.

    Mark,
    the latest patch re-inserts the commented fragment.

    The "readerkey" in the original FieldCacheSanityChecker refers to the FieldCacheKey in IndexReader, which is now obsoleted by IndexReader#getIndexCache.getIndexReader(). I now use this as the reader key, even though I am still not quite sure if it is really necessary. Checking for insanity sounds so paranoid to me. If we know that there is a conceptional bug in the code, we should fix it, not circumvent and hotfix it using a insanity checker. But this is another story.

    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless (JIRA) at Dec 10, 2009 at 5:24 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788798#action_12788798 ]

    Michael McCandless commented on LUCENE-2133:
    --------------------------------------------

    This patch is a good step forward -- it associates the cache directly
    with IndexReader, where it belongs; it cleanly decouples cache from
    reader (vs the hack we have today with IndexReader.getFieldCacheKey),
    so that eg cloned readers can share the same cache; it also preserves
    back compat, which is quite a stunning accomplishment :)

    But... there are many more things I don't like about FieldCache, that
    I'm not sure (?) the patch addresses:

    * Uninversion to derive eg an int[] is horribly slow, compared to
    say loading the pre-encoded binary ints from disk, created during
    indexing. Ie, I think, if we are going to overhaul FieldCache
    API, we should somehow make LUCENE-1231 feasible.

    * There's no pluggability to customize where the int[] comes from
    for a given field -- most you can do is provide your own IntParser
    that the uninverter uses. EG the fact that the patch had to
    "move" FieldCacheRange/TermsFilter down, is strange -- these
    filters (and in general any future "cache consumers") should live
    in oal.search, but simply pull from a different int[] source,
    somehow.

    * Error checking of single-value-per-field (for StringIndex) is
    dangerous, today -- it's intermittent, and, it's an unchecked
    exception. We should probably just remove the exception, or,
    maybe make it checked. Actually I'll go open a new issue for
    this. We should simply fix this.

    * Single-value-per-field limitation (though, that's a nice to have,
    future improvement)

    * Even accepting the single-value-per-field limitaiton, we should
    allow multiple values per field during uninversion, w/
    customizable logic about which value is kept as the single one
    (there is an issue open for this I think). This really should be
    some sort of added extensibility to whatever class drives
    uninversion...

    * The terror of accidentally asking for the array at the top-level
    of Multi/DirReader. I think this shouldn't even be allowed, at
    least not easily, ie Dir/MultiReader.getIndexCache should throw
    UOE. If we really wanted to, we could provide sugar methods in
    maybe ReaderUtil to "glom N int[]'s into a new int[]". But it
    should be named something scary :) Then we wouldn't need any
    insanity checking.

    * No control over caching policy (cannot evict things)

    * If we make field cache flexible enough, we could maybe fold norms
    & deleted docs into it (would be a separate future issue to
    actually do so...).

    Some other questions about the patch:

    * Consumers of the cache API (the sort comparator,
    FieldCacheTerms/RangeFilter, and any other future users of "the
    field cache") shouldn't have to move down into fields sub-package?

    * It's a little strange that the term vectors & fields reader also
    got pulled into the cache?

    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Erick Erickson at Dec 10, 2009 at 5:40 pm
    Mike:

    Which of these do you think this patch *should* address before committing?
    Just the last two?
    As many as Christian has energy for <G>?
    On Thu, Dec 10, 2009 at 12:24 PM, Michael McCandless (JIRA) wrote:


    [
    https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788798#action_12788798]

    Michael McCandless commented on LUCENE-2133:
    --------------------------------------------

    This patch is a good step forward -- it associates the cache directly
    with IndexReader, where it belongs; it cleanly decouples cache from
    reader (vs the hack we have today with IndexReader.getFieldCacheKey),
    so that eg cloned readers can share the same cache; it also preserves
    back compat, which is quite a stunning accomplishment :)

    But... there are many more things I don't like about FieldCache, that
    I'm not sure (?) the patch addresses:

    * Uninversion to derive eg an int[] is horribly slow, compared to
    say loading the pre-encoded binary ints from disk, created during
    indexing. Ie, I think, if we are going to overhaul FieldCache
    API, we should somehow make LUCENE-1231 feasible.

    * There's no pluggability to customize where the int[] comes from
    for a given field -- most you can do is provide your own IntParser
    that the uninverter uses. EG the fact that the patch had to
    "move" FieldCacheRange/TermsFilter down, is strange -- these
    filters (and in general any future "cache consumers") should live
    in oal.search, but simply pull from a different int[] source,
    somehow.

    * Error checking of single-value-per-field (for StringIndex) is
    dangerous, today -- it's intermittent, and, it's an unchecked
    exception. We should probably just remove the exception, or,
    maybe make it checked. Actually I'll go open a new issue for
    this. We should simply fix this.

    * Single-value-per-field limitation (though, that's a nice to have,
    future improvement)

    * Even accepting the single-value-per-field limitaiton, we should
    allow multiple values per field during uninversion, w/
    customizable logic about which value is kept as the single one
    (there is an issue open for this I think). This really should be
    some sort of added extensibility to whatever class drives
    uninversion...

    * The terror of accidentally asking for the array at the top-level
    of Multi/DirReader. I think this shouldn't even be allowed, at
    least not easily, ie Dir/MultiReader.getIndexCache should throw
    UOE. If we really wanted to, we could provide sugar methods in
    maybe ReaderUtil to "glom N int[]'s into a new int[]". But it
    should be named something scary :) Then we wouldn't need any
    insanity checking.

    * No control over caching policy (cannot evict things)

    * If we make field cache flexible enough, we could maybe fold norms
    & deleted docs into it (would be a separate future issue to
    actually do so...).

    Some other questions about the patch:

    * Consumers of the cache API (the sort comparator,
    FieldCacheTerms/RangeFilter, and any other future users of "the
    field cache") shouldn't have to move down into fields sub-package?

    * It's a little strange that the term vectors & fields reader also
    got pulled into the cache?

    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch,
    LUCENE-2133.patch, LUCENE-2133.patch

    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the
    FieldCache. The FieldCache is a singleton which is supposed to cache certain
    information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many
    clones (of SegmentReader) or decorators (FilterIndexReader) which all access
    the very same data.
    2. the cache information remains valid for the lifetime of an
    IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may
    contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the
    limitations imposed by the singleton construct there was no implementation
    other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several
    static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831,
    LUCENE-1579 and LUCENE-1749, but the overall situation remains the same:
    There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific,
    extensible cache instances ("IndexCache"). IndexCaches provide common
    caching functionality for all IndexReaders and may be extended (for example,
    SegmentReader would have a SegmentReaderIndexCache and store different data
    than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the
    IndexCache. IndexFieldCache is an interface just like FieldCache and may
    support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated
    IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are
    expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator,
    SortField)
    I have provided an patch which takes care of all these issues. It passes
    all JUnit tests.
    The patch is quite large, admittedly, but the change required several
    modifications and some more to preserve backwards-compatibility.
    Backwards-compatibility is preserved by moving some of the updated
    functionality in the package org.apache.lucene.search.fields (field
    comparators and parsers, SortField) while adding wrapper instances and
    keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are
    moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the
    close() method to all registered instances by calling an onClose() method
    with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered
    by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being
    passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of
    FieldComparatorSource. This removes the "switch" statements and the
    possibility to throw IllegalArgumentExceptions because of unsupported type
    values.
    The following classes have been deprecated and replaced by new classes in
    org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary.
    The Lucene community has to decide which classes/methods can immediately be
    removed, which ones later, which not at all. Whenever new classes depend on
    the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class
    IndexFieldCacheSanityChecker.java which is just there for testing purposes,
    to show that no sanity checks are necessary any longer. This class may be
    removed at any time.
    - I expect that the patch does not impact performance. On the contrary,
    as the patch removes a few unnecessary checks we might even see a slight
    speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible
    and to focus on the class/method structure only. We certainly may improve
    the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For
    all supported atomic types (byte, double, float, int, long, short) three
    classes each are required: *Cache, *Comparator and *Parser. I think that
    further simplification might be possible (maybe using Java generics?), but I
    guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Michael McCandless at Dec 10, 2009 at 6:11 pm

    On Thu, Dec 10, 2009 at 12:40 PM, Erick Erickson wrote:

    Which of these do you think this patch *should* address before committing?
    Just the last two?
    Well... I'd really like pluggability to a different value source, with
    "FieldUninverter" just being the one we have today, and adding
    control/extensibilty over how FieldUninverter deals w/ multi-valued
    fields. And I'd like to see that LUCENE-1231 could be built within
    that framework (though we don't have to build/finish it today),
    as just another value source.

    StringIndex's error checking is simple to fix separately -- I've
    opened LUCENE-2142 for that.

    I think allowing multiple values per field, and being able to cut over
    norms/deleted docs to field cache, are clearly future items (but we
    should keep them in mind on such a big change to FieldCache,
    ie, at least don't "preclude" them or make them harder to achieve).

    The remaining items I think are relatively simple to fix.

    Basically, I don't see compelling enough gains as the patch currently
    stands -- it's largely "moving" the FieldCache API from one place to
    another (I think?). This generates alot of noise, people must migrate
    to the new API, but then we still have FieldCache's deeper limitations
    to live with.

    Fixing these limitations will require improvements to the external API
    -- that's what I'd like to get fixed, which would be a good justification for
    the API movement noise.

    I think especially with the cutover to per-segment searching, the
    acceptance (?) that an atomic array is OK, we ought to be able
    to bang out a basic interface for some plugin to provide native
    arrays.
    As many as Christian has energy for <G>?
    I don't think this should be all Christian -- we should all help out
    here! This is a big (and long overdue) change.

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Mark Miller (JIRA) at Dec 10, 2009 at 7:06 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788846#action_12788846 ]

    Mark Miller commented on LUCENE-2133:
    -------------------------------------

    My impression is that this is not much different than LUCENE-831, and we are stuck on the same issues (831 started down the path of working with these goals, but its severely out of date now).

    bq. It's a little strange that the term vectors & fields reader also got pulled into the cache?

    Couldn't figure this out either ... if anything, you'd think norms might goto the cache, but the vector and fields reader ... ?
    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Christian Kohlschütter (JIRA) at Dec 10, 2009 at 7:42 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788863#action_12788863 ]

    Christian Kohlschütter commented on LUCENE-2133:
    ------------------------------------------------

    bq. This patch is a good step forward - it associates the cache directly
    with IndexReader, where it belongs; it cleanly decouples cache from
    reader (vs the hack we have today with IndexReader.getFieldCacheKey),
    so that eg cloned readers can share the same cache; it also preserves
    back compat, which is quite a stunning accomplishment

    Yes, backwards compatibility was actually the driving factor for this patch.
    I actually have not addressed major changes in functionality. This would definitely require rework that breaks backwards compatibility.
    I just wanted to get rid of the static FieldCache, which caused me a lot of OOME headaches.

    As I wrote above, I see this patch as a good starting point for further development. Imagine I had submitted a real, complete rework of the FieldCache like in LUCENE-831 -- it would be stuck as an open issue forever just like its predecessor.

    There is nothing wrong with the current patch (that's why I suggest comitting it to trunk soon-ish) -- it does not break anything (it could even go into 3.1 I guess).
    Starting from a common codebase we can then later on extend it and address all the points you mentioned in Michael's most recent post:

    bq. But... there are many more things I don't like about FieldCache, that I'm not sure the patch addresses:
    bq. (snip)

    None of these (very important) issues have been addressed by the patch. Intentionally (as described).

    bq. Some other questions about the patch:
    bq.
    bq. * Consumers of the cache API (the sort comparator,
    bq. FieldCacheTerms/RangeFilter, and any other future users of "the
    bq. field cache") shouldn't have to move down into fields sub-package?

    As you wish. I don't have problems with keep it in o.a.l.search. I was just a little scared about the plethora of classes in that package. Since we do not need to utilize package-level protection, it was actually possible to "encapsulate" that field-related functionality in another namespace. I just personally prefer to use many nested packages, I do not have problems of moving classes back to its parent package.

    bq. * It's a little strange that the term vectors & fields reader also
    bq. got pulled into the cache?

    They are not in the FieldCache. Caching fields is only one functionality provided by IndexCache. I took the opportunity to demonstrate caching something other than fields. You now save a few bytes per SegmentReader instance because of that change. Maybe it would make sense to move SegmentReader.CoreReaders to SegmentReaderIndexCache completely. However, if I had included that as well into this issue, it would again be too large to be discussed in time for the next release.

    If you have strong objections against moving these SegmentReader-specific things to the SegmentReaderIndexCache *now*, I can remove that part from the patch. However, I should probably then file another issue. Unfortunately, this will then depend on the outcome of this LUCENE-2133.

    To summarize, I suggest:

    1. Complete the additions to src/test (i.e. do not remove/modify the existing test cases but rather add new ones, as discussed previously)
    2. Commit the patch to svn, target release for Lucene 3.1.
    3. File another issue and discuss functional improvements for the IndexFieldCache part. Use Michael's wishlist as a starting point. Agree on breaking backwards compatibility, target for Lucene 4.0, 3.5 or whatever fancy version number. Incorporate things from LUCENE-831, LUCENE-1231 and also LUCENE-2135.
    4. File a new issue for the improvements in SegmentReader (move things that are shared between the original reader and its clones to a common SegmentReaderIndexCache, such as CoreReader and ThreadLocals), keep backwards compatibility and target for Lucene 3.1.

    What do you think?

    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Mark Miller (JIRA) at Dec 10, 2009 at 7:54 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788874#action_12788874 ]

    Mark Miller commented on LUCENE-2133:
    -------------------------------------

    I don't know that back compat is really a concern if we are just leaving the old API intact as part of that, with its own caching mechanism?

    Just deprecate the old API, and make a new one. This is a big pain, because you have to be sure you don't straddle the two apis on upgrading, but thats the boat we will be in anyway.

    Which means a new impl should provide enough benefits to make that large pain worth enduring. 831 was not committed for the same reason - it didn't bring enough to table to be worth it after we got to a per segment cache in another way. Since I don't see that this provides anything over 831, I don't see how its not in the same boat.

    I'm not sure we should target a specific release with this - we don't even know when 3.1 is going to happen. 2.9 took a year. Its anybodies guess - we should prob just do what makes sense and commit it when its ready.
    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Christian Kohlschütter (JIRA) at Dec 10, 2009 at 8:26 pm
    [ https://issues.apache.org/jira/browse/LUCENE-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788896#action_12788896 ]

    Christian Kohlschütter commented on LUCENE-2133:
    ------------------------------------------------

    bq. I don't know that back compat is really a concern if we are just leaving the old API intact as part of that, with its own caching mechanism?

    I don't think that this is an option. FieldCache is fundamentally flawed and already creates a lot of trouble that has only been somehow fixed recently (FieldCacheKey, Insanity checks).

    FieldCache, the static "registry" of caches sort of violates fundamental OOP concepts -- I mean, almost *all* methods require IndexReader as the first parameter. Since we allow IndexReader composition and decoration this apparently was not the right way to go because it exactly caused the problems that have later been addressed by the aformentioned FieldCacheKey and insanity checks.

    bq. Just deprecate the old API, and make a new one. This is a big pain, because you have to be sure you don't straddle the two apis on upgrading, but thats the boat we will be in anyway.

    That is always an option, but as you see Uwe's initial reaction I think it is not so easy to convince everybody. Which is fine, because we now have a solution that is somehow intermediate between the two extremes (keeping as it is vs. complete rework) and still is completely bw-compatible.
    With a "real" complete overhaul of FieldCache, I imagine a lot of additional work to be then done for other projects such as SOLR, Nutch etc., which I would rather see not to happen abruptly.

    The IndexCache abstraction allows us to separate the two issues 1: "make caches non-static" and 2: "make the fieldcache snappier" very easily. We start with issue 1 here in LUCENE-2133, and then develop a new FancyFieldCache in LUCENE-xyz (which can be used in parallel to the then-old IndexFieldCache).

    In parallel we can integrate Michael's, Earwin's and Yonik's suggestions from LUCENE-2135 without starting the discussion from the beginning.

    I am not suggesting to take the current solution as a "temporary workaround", but as a base for future work. Anything else would make no sense indeed.

    bq. Since I don't see that this provides anything over 831, I don't see how its not in the same boat.

    LUCENE-831 still requires a static FieldCache, the root of all evil :) It addresses orthogonal problems (ValueSource, Tries etc.).
    Finally, to cite yourself from LUCENE-831: "It probably makes sense to start from one of Hoss's original patches or even from scratch."

    bq. I'm not sure we should target a specific release with this - we don't even know when 3.1 is going to happen. 2.9 took a year. Its anybodies guess - we should prob just do what makes sense and commit it when its ready.

    The more complex the patches are, the longer it will take to integrate them into a new version.
    The more such patches you have, the longer it will take to get to a new release.

    Let's make it simple, submit what we have and build upon that.
    [PATCH] IndexCache: Refactoring of FieldCache, FieldComparator, SortField
    -------------------------------------------------------------------------

    Key: LUCENE-2133
    URL: https://issues.apache.org/jira/browse/LUCENE-2133
    Project: Lucene - Java
    Issue Type: Improvement
    Components: Search
    Affects Versions: 2.9.1, 3.0
    Reporter: Christian Kohlschütter
    Attachments: LUCENE-2133-complete.patch, LUCENE-2133.patch, LUCENE-2133.patch, LUCENE-2133.patch


    Hi all,
    up to the current version Lucene contains a conceptual flaw, that is the FieldCache. The FieldCache is a singleton which is supposed to cache certain information for every IndexReader that is currently open
    The FieldCache is flawed because it is incorrect to assume that:
    1. one IndexReader instance equals one index. In fact, there can be many clones (of SegmentReader) or decorators (FilterIndexReader) which all access the very same data.
    2. the cache information remains valid for the lifetime of an IndexReader. In fact, some IndexReaders may be reopen()'ed and thus they may contain completely different information.
    3. all IndexReaders need the same type of cache. In fact, because of the limitations imposed by the singleton construct there was no implementation other than FieldCacheImpl.
    Furthermore, FieldCacheImpl and FieldComparator are bloated by several static inner-classes that could be moved to package level.
    There have been a few attempts to improve FieldCache, namely LUCENE-831, LUCENE-1579 and LUCENE-1749, but the overall situation remains the same: There is a central registry for assigning Caches to IndexReader instances.
    I now propose the following:
    1. Obsolete FieldCache and FieldCacheKey and provide index-specific, extensible cache instances ("IndexCache"). IndexCaches provide common caching functionality for all IndexReaders and may be extended (for example, SegmentReader would have a SegmentReaderIndexCache and store different data than a regular IndexCache)
    2. Add the index-specific field cache (IndexFieldCache) to the IndexCache. IndexFieldCache is an interface just like FieldCache and may support different implementations.
    3. The IndexCache instances may be flushed/closed by the associated IndexReaders whenever necessary.
    4. Obsolete FieldCacheSanityChecker because no more "insanities" are expected (or at least, they do not impact the overall performance)
    5. Refactor FieldCacheImpl and the related classes (FieldComparator, SortField)
    I have provided an patch which takes care of all these issues. It passes all JUnit tests.
    The patch is quite large, admittedly, but the change required several modifications and some more to preserve backwards-compatibility. Backwards-compatibility is preserved by moving some of the updated functionality in the package org.apache.lucene.search.fields (field comparators and parsers, SortField) while adding wrapper instances and keeping old code in org.apache.lucene.search.
    In detail and besides the above mentioned improvements, the following is provided:
    1. An IndexCache specific for SegmentReaders. The two ThreadLocals are moved from SegmentReader to SegmentReaderIndexCache.
    2. A housekeeping improvement to CloseableThreadLocal. Now delegates the close() method to all registered instances by calling an onClose() method with the threads' instances.
    3. Analyzer.close now may throw an IOException (this already is covered by java.io.Closeable).
    4. A change to Collector: allow IndexCache instead of IndexReader being passed to setNextReader()
    5. SortField's numeric types have been replaced by direct assignments of FieldComparatorSource. This removes the "switch" statements and the possibility to throw IllegalArgumentExceptions because of unsupported type values.
    The following classes have been deprecated and replaced by new classes in org.apache.lucene.search.fields:
    - FieldCacheRangeFilter (=> IndexFieldCacheRangeFilter)
    - FieldCacheTermsFilter (=> IndexFieldCacheTermsFilter)
    - FieldCache (=> IndexFieldCache)
    - FieldCacheImpl (=> IndexFieldCacheImpl)
    - all classes in FieldCacheImpl (=> several package-level classes)
    - all subclasses of FieldComparator (=> several package-level classes)
    Final notes:
    - The patch would be simpler if no backwards compatibility was necessary. The Lucene community has to decide which classes/methods can immediately be removed, which ones later, which not at all. Whenever new classes depend on the old ones, an appropriate notice exists in the javadocs.
    - The patch introduces a new, deprecated class IndexFieldCacheSanityChecker.java which is just there for testing purposes, to show that no sanity checks are necessary any longer. This class may be removed at any time.
    - I expect that the patch does not impact performance. On the contrary, as the patch removes a few unnecessary checks we might even see a slight speedup. No benchmarking has been done so far, though.
    - I have tried to preserve the existing functionality wherever possible and to focus on the class/method structure only. We certainly may improve the caches' behavior, but this out of scope for this patch.
    - The refactoring finally makes the high duplication of code visible: For all supported atomic types (byte, double, float, int, long, short) three classes each are required: *Cache, *Comparator and *Parser. I think that further simplification might be possible (maybe using Java generics?), but I guess the current patch is large enough for now.
    Cheers,
    Christian
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieslucene
postedDec 8, '09 at 7:42a
activeDec 11, '09 at 5:15p
posts60
users3
websitelucene.apache.org

People

Translate

site design / logo © 2021 Grokbase