FAQ
I have an index that works fine on Lucene 2.3.2 but fails to open in 2.4.1, it always fails with an Read past EOF. The index does contain some field names with german umlaut characters in

Any ideas?

Many Thanks

Mike

CheckIndex v2.3.2


NOTE: testing will be more thorough if you run java with '-ea:org.apache.lucene', so assertions are enabled

Opening index @ C:/index/german

Segments file=segments_9 numSegments=1 version=FORMAT_SHARED_DOC_STORE [Lucene 2.3]
1 of 1: name=_3 docCount=235535
compound=true
numFiles=1
size (MB)=301.684
no deletions
test: open reader.........OK
test: fields, norms.......OK [70 fields]
test: terms, freq, prox...OK [1475862 terms; 25448796 terms/docs pairs; 28642994 tokens]
test: stored fields.......OK [13560464 total field count; avg 57.573 fields per doc]
test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]

No problems were detected with this index.

CheckIndex v2.4.1


NOTE: testing will be more thorough if you run java with '-ea:org.apache.lucene...', so assertions are enabled

Opening index @ C:/index/german

Segments file=segments_9 numSegments=1 version=FORMAT_SHARED_DOC_STORE [Lucene 2.3]
1 of 1: name=_3 docCount=235535
compound=true
hasProx=true
numFiles=1
size (MB)=301.684
no deletions
test: open reader.........FAILED
WARNING: fixIndex() would remove reference to this segment; full exception:
java.io.IOException: read past EOF
at org.apache.lucene.store.BufferedIndexInput.refill(Unknown Source)
at org.apache.lucene.store.BufferedIndexInput.readBytes(Unknown Source)
at org.apache.lucene.store.BufferedIndexInput.readBytes(Unknown Source)
at org.apache.lucene.store.IndexInput.readString(Unknown Source)
at org.apache.lucene.index.FieldInfos.read(Unknown Source)
at org.apache.lucene.index.FieldInfos.<init>(Unknown Source)
at org.apache.lucene.index.SegmentReader.initialize(Unknown Source)
at org.apache.lucene.index.SegmentReader.get(Unknown Source)
at org.apache.lucene.index.SegmentReader.get(Unknown Source)
at org.apache.lucene.index.CheckIndex.checkIndex(Unknown Source)
at org.apache.lucene.index.CheckIndex.main(Unknown Source)

WARNING: 1 broken segments (containing 235535 documents) detected
WARNING: would write new segments file, and 235535 documents would be lost, if -fix were specified

Search Discussions

  • Mike Streeton at Apr 28, 2009 at 12:45 pm
    An update, I have managed to get it to not fail by debugging and changing the value of org.apache.lucene.store.InputIndex.preUTF8Strings = true. The value is always false when it fails.

    Mike

    -----Original Message-----
    From: Mike Streeton
    Sent: 28 April 2009 12:53
    To: java-user@lucene.apache.org
    Subject: Read past EOF

    I have an index that works fine on Lucene 2.3.2 but fails to open in 2.4.1, it always fails with an Read past EOF. The index does contain some field names with german umlaut characters in

    Any ideas?

    Many Thanks

    Mike

    CheckIndex v2.3.2


    NOTE: testing will be more thorough if you run java with '-ea:org.apache.lucene', so assertions are enabled

    Opening index @ C:/index/german

    Segments file=segments_9 numSegments=1 version=FORMAT_SHARED_DOC_STORE [Lucene 2.3]
    1 of 1: name=_3 docCount=235535
    compound=true
    numFiles=1
    size (MB)=301.684
    no deletions
    test: open reader.........OK
    test: fields, norms.......OK [70 fields]
    test: terms, freq, prox...OK [1475862 terms; 25448796 terms/docs pairs; 28642994 tokens]
    test: stored fields.......OK [13560464 total field count; avg 57.573 fields per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]

    No problems were detected with this index.

    CheckIndex v2.4.1


    NOTE: testing will be more thorough if you run java with '-ea:org.apache.lucene...', so assertions are enabled

    Opening index @ C:/index/german

    Segments file=segments_9 numSegments=1 version=FORMAT_SHARED_DOC_STORE [Lucene 2.3]
    1 of 1: name=_3 docCount=235535
    compound=true
    hasProx=true
    numFiles=1
    size (MB)=301.684
    no deletions
    test: open reader.........FAILED
    WARNING: fixIndex() would remove reference to this segment; full exception:
    java.io.IOException: read past EOF
    at org.apache.lucene.store.BufferedIndexInput.refill(Unknown Source)
    at org.apache.lucene.store.BufferedIndexInput.readBytes(Unknown Source)
    at org.apache.lucene.store.BufferedIndexInput.readBytes(Unknown Source)
    at org.apache.lucene.store.IndexInput.readString(Unknown Source)
    at org.apache.lucene.index.FieldInfos.read(Unknown Source)
    at org.apache.lucene.index.FieldInfos.<init>(Unknown Source)
    at org.apache.lucene.index.SegmentReader.initialize(Unknown Source)
    at org.apache.lucene.index.SegmentReader.get(Unknown Source)
    at org.apache.lucene.index.SegmentReader.get(Unknown Source)
    at org.apache.lucene.index.CheckIndex.checkIndex(Unknown Source)
    at org.apache.lucene.index.CheckIndex.main(Unknown Source)

    WARNING: 1 broken segments (containing 235535 documents) detected
    WARNING: would write new segments file, and 235535 documents would be lost, if -fix were specified


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Apr 28, 2009 at 2:15 pm
    Ugh, indeed FieldInfos fails to properly read 2.3.x indices if the
    field name contains non-ascii characters. I'll open an issue, make a
    test case and work out a fix. Hmm.

    Thanks for raising this!

    Mike

    On Tue, Apr 28, 2009 at 7:53 AM, Mike Streeton
    wrote:
    I have an index that works fine on Lucene 2.3.2 but fails to open in 2.4.1, it always fails with an Read past EOF. The index does contain some field names with german umlaut characters in

    Any ideas?

    Many Thanks

    Mike

    CheckIndex v2.3.2


    NOTE: testing will be more thorough if you run java with '-ea:org.apache.lucene', so assertions are enabled

    Opening index @ C:/index/german

    Segments file=segments_9 numSegments=1 version=FORMAT_SHARED_DOC_STORE [Lucene 2.3]
    1 of 1: name=_3 docCount=235535
    compound=true
    numFiles=1
    size (MB)=301.684
    no deletions
    test: open reader.........OK
    test: fields, norms.......OK [70 fields]
    test: terms, freq, prox...OK [1475862 terms; 25448796 terms/docs pairs; 28642994 tokens]
    test: stored fields.......OK [13560464 total field count; avg 57.573 fields per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]

    No problems were detected with this index.

    CheckIndex v2.4.1


    NOTE: testing will be more thorough if you run java with '-ea:org.apache.lucene...', so assertions are enabled

    Opening index @ C:/index/german

    Segments file=segments_9 numSegments=1 version=FORMAT_SHARED_DOC_STORE [Lucene 2.3]
    1 of 1: name=_3 docCount=235535
    compound=true
    hasProx=true
    numFiles=1
    size (MB)=301.684
    no deletions
    test: open reader.........FAILED
    WARNING: fixIndex() would remove reference to this segment; full exception:
    java.io.IOException: read past EOF
    at org.apache.lucene.store.BufferedIndexInput.refill(Unknown Source)
    at org.apache.lucene.store.BufferedIndexInput.readBytes(Unknown Source)
    at org.apache.lucene.store.BufferedIndexInput.readBytes(Unknown Source)
    at org.apache.lucene.store.IndexInput.readString(Unknown Source)
    at org.apache.lucene.index.FieldInfos.read(Unknown Source)
    at org.apache.lucene.index.FieldInfos.<init>(Unknown Source)
    at org.apache.lucene.index.SegmentReader.initialize(Unknown Source)
    at org.apache.lucene.index.SegmentReader.get(Unknown Source)
    at org.apache.lucene.index.SegmentReader.get(Unknown Source)
    at org.apache.lucene.index.CheckIndex.checkIndex(Unknown Source)
    at org.apache.lucene.index.CheckIndex.main(Unknown Source)

    WARNING: 1 broken segments (containing 235535 documents) detected
    WARNING: would write new segments file, and 235535 documents would be lost, if -fix were specified
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Apr 29, 2009 at 5:27 pm
    I've opened https://issues.apache.org/jira/browse/LUCENE-1623 for this.

    Mike

    On Tue, Apr 28, 2009 at 10:15 AM, Michael McCandless
    wrote:
    Ugh, indeed FieldInfos fails to properly read 2.3.x indices if the
    field name contains non-ascii characters.  I'll open an issue, make a
    test case and work out a fix.  Hmm.

    Thanks for raising this!

    Mike

    On Tue, Apr 28, 2009 at 7:53 AM, Mike Streeton
    wrote:
    I have an index that works fine on Lucene 2.3.2 but fails to open in 2.4.1, it always fails with an Read past EOF. The index does contain some field names with german umlaut characters in

    Any ideas?

    Many Thanks

    Mike

    CheckIndex v2.3.2


    NOTE: testing will be more thorough if you run java with '-ea:org.apache.lucene', so assertions are enabled

    Opening index @ C:/index/german

    Segments file=segments_9 numSegments=1 version=FORMAT_SHARED_DOC_STORE [Lucene 2.3]
    1 of 1: name=_3 docCount=235535
    compound=true
    numFiles=1
    size (MB)=301.684
    no deletions
    test: open reader.........OK
    test: fields, norms.......OK [70 fields]
    test: terms, freq, prox...OK [1475862 terms; 25448796 terms/docs pairs; 28642994 tokens]
    test: stored fields.......OK [13560464 total field count; avg 57.573 fields per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]

    No problems were detected with this index.

    CheckIndex v2.4.1


    NOTE: testing will be more thorough if you run java with '-ea:org.apache.lucene...', so assertions are enabled

    Opening index @ C:/index/german

    Segments file=segments_9 numSegments=1 version=FORMAT_SHARED_DOC_STORE [Lucene 2.3]
    1 of 1: name=_3 docCount=235535
    compound=true
    hasProx=true
    numFiles=1
    size (MB)=301.684
    no deletions
    test: open reader.........FAILED
    WARNING: fixIndex() would remove reference to this segment; full exception:
    java.io.IOException: read past EOF
    at org.apache.lucene.store.BufferedIndexInput.refill(Unknown Source)
    at org.apache.lucene.store.BufferedIndexInput.readBytes(Unknown Source)
    at org.apache.lucene.store.BufferedIndexInput.readBytes(Unknown Source)
    at org.apache.lucene.store.IndexInput.readString(Unknown Source)
    at org.apache.lucene.index.FieldInfos.read(Unknown Source)
    at org.apache.lucene.index.FieldInfos.<init>(Unknown Source)
    at org.apache.lucene.index.SegmentReader.initialize(Unknown Source)
    at org.apache.lucene.index.SegmentReader.get(Unknown Source)
    at org.apache.lucene.index.SegmentReader.get(Unknown Source)
    at org.apache.lucene.index.CheckIndex.checkIndex(Unknown Source)
    at org.apache.lucene.index.CheckIndex.main(Unknown Source)

    WARNING: 1 broken segments (containing 235535 documents) detected
    WARNING: would write new segments file, and 235535 documents would be lost, if -fix were specified
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedApr 28, '09 at 12:33p
activeApr 29, '09 at 5:27p
posts4
users2
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase