FAQ
Hi, dear experts:

When IndexReader.termsPositions is used to access specific terms, the call to TermsPosition.nextPosition success if TermsPosition.next is used. But if TermsPosition.skipTo is used instead of TermsPosition.next, a java.lang.IllegalArgumentException will be thrown, as bellows.

java.lang.IllegalArgumentException: Negative position
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:610)
at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:161)
at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:213)
at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:92)
at org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:181)
at org.apache.lucene.index.SegmentTermPositions.readDeltaPosition(SegmentTermPositions.java:75)
at org.apache.lucene.index.SegmentTermPositions.skipPositions(SegmentTermPositions.java:130)
at org.apache.lucene.index.SegmentTermPositions.lazySkip(SegmentTermPositions.java:168)
at org.apache.lucene.index.SegmentTermPositions.nextPosition(SegmentTermPositions.java:69)

In my further study, I found that if docid execeed 1044278, the exception occurs everytime, for the small ones, the exception never occur. BTW, the total number of documents is about 1344278, and are never deleted.

Thanks.

2011-04-01



Yuan Wu [GMail]

Search Discussions

  • Michael McCandless at Apr 1, 2011 at 9:58 am
    Hmm this could be from a corrupted index.

    What version of Lucene? What OS/filesystem?

    Can you run CheckIndex and post the output?

    Mike

    http://blog.mikemccandless.com

    2011/3/31 袁武 [GMail] <yuanwu.mail@gmail.com>:
    Hi, dear experts:

    When IndexReader.termsPositions is used to access specific terms, the call to TermsPosition.nextPosition success if TermsPosition.next is used. But if TermsPosition.skipTo is used instead of TermsPosition.next, a java.lang.IllegalArgumentException will be thrown, as bellows.

    java.lang.IllegalArgumentException: Negative position
    at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:610)
    at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:161)
    at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:213)
    at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
    at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:92)
    at org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:181)
    at org.apache.lucene.index.SegmentTermPositions.readDeltaPosition(SegmentTermPositions.java:75)
    at org.apache.lucene.index.SegmentTermPositions.skipPositions(SegmentTermPositions.java:130)
    at org.apache.lucene.index.SegmentTermPositions.lazySkip(SegmentTermPositions.java:168)
    at org.apache.lucene.index.SegmentTermPositions.nextPosition(SegmentTermPositions.java:69)

    In my further study, I found that if docid execeed 1044278, the exception occurs everytime, for the small ones,  the exception never occur. BTW, the total number of documents is about 1344278, and are never deleted.

    Thanks.

    2011-04-01



    Yuan Wu [GMail]
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • 袁武 [GMail] at Apr 1, 2011 at 10:26 am
    Hi, Dear Mike:

    belows list the report of checkIndex. OS is Fedora Linux.

    [oracle@server bin]$ java -classpath ./ org.apache.lucene.index.CheckIndex /data/Index/URL/Generic/ -fix

    NOTE: testing will be more thorough if you run java with '-ea:org.apache.lucene...', so assertions are enabled
    Opening index @ /GeoGrid/data/Index/URL/Generic/
    Segments file=segments_ayup numSegments=1 version=FORMAT_DIAGNOSTICS [Lucene 2.9]
    1 of 1: name=_bym0 docCount=1344278
    compound=false
    hasProx=true
    numFiles=11
    size (MB)=15,979.593
    diagnostics = {optimize=true, mergeFactor=3, os.version=2.6.27.41-170.2.117.fc10.x86_64, os=Linux, mergeDocStores=true, lucene.version=3.0-dev, source=merge, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.}
    no deletions
    test: open reader.........OK
    test: fields..............OK [11 fields]
    test: field norms.........OK [2 fields]
    test: terms, freq, prox...
    OK [1263500 terms; 601715072 terms/docs pairs; 1513780631 tokens]
    test: stored fields.......OK [8063151 total field count; avg 5.998 fields per doc]
    test: term vectors........OK [1344278 total vector count; avg 1 term/freq vector fields per doc]
    No problems were detected with this index.



    2011-04-01



    袁武 [GMail]



    发件人: Michael McCandless
    发送时间: 2011-04-01 17:58:08
    收件人: java-user
    抄送: 袁武 [GMail]
    主题: Re: A likely bug of TermsPosition.nextPosition

    Hmm this could be from a corrupted index.
    What version of Lucene? What OS/filesystem?
    Can you run CheckIndex and post the output?
    Mike
    http://blog.mikemccandless.com
    2011/3/31 袁武 [GMail] <yuanwu.mail@gmail.com>:
    Hi, dear experts:

    When IndexReader.termsPositions is used to access specific terms, the call to TermsPosition.nextPosition success if TermsPosition.next is used. But if TermsPosition.skipTo is used instead of TermsPosition.next, a java.lang.IllegalArgumentException will be thrown, as bellows.

    java.lang.IllegalArgumentException: Negative position
    at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:610)
    at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:161)
    at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:213)
    at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
    at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:92)
    at org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:181)
    at org.apache.lucene.index.SegmentTermPositions.readDeltaPosition(SegmentTermPositions.java:75)
    at org.apache.lucene.index.SegmentTermPositions.skipPositions(SegmentTermPositions.java:130)
    at org.apache.lucene.index.SegmentTermPositions.lazySkip(SegmentTermPositions.java:168)
    at org.apache.lucene.index.SegmentTermPositions.nextPosition(SegmentTermPositions.java:69)

    In my further study, I found that if docid execeed 1044278, the exception occurs everytime, for the small ones, the exception never occur. BTW, the total number of documents is about 1344278, and are never deleted.

    Thanks.

    2011-04-01



    Yuan Wu [GMail]
  • Michael McCandless at Apr 1, 2011 at 4:14 pm
    Hmm so it's not index corruption. Curious.

    Which Lucene version are you using? Looks like it's 2.9, but not
    2.9.4? Can you try 2.9.4 and see if you still hit the problem?

    Can you post a small test case showing the problem, on your index?

    Mike

    http://blog.mikemccandless.com

    2011/4/1 袁武 [GMail] <yuanwu.mail@gmail.com>:
    Hi, Dear Mike:

    belows list the report of checkIndex. OS is Fedora Linux.

    [oracle@server bin]$ java -classpath ./ org.apache.lucene.index.CheckIndex /data/Index/URL/Generic/ -fix

    NOTE: testing will be more thorough if you run java with '-ea:org.apache.lucene...', so assertions are enabled
    Opening index @ /GeoGrid/data/Index/URL/Generic/
    Segments file=segments_ayup numSegments=1 version=FORMAT_DIAGNOSTICS [Lucene 2.9]
    1 of 1: name=_bym0 docCount=1344278
    compound=false
    hasProx=true
    numFiles=11
    size (MB)=15,979.593
    diagnostics = {optimize=true, mergeFactor=3, os.version=2.6.27.41-170.2.117.fc10.x86_64, os=Linux, mergeDocStores=true, lucene.version=3.0-dev, source=merge, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.}
    no deletions
    test: open reader.........OK
    test: fields..............OK [11 fields]
    test: field norms.........OK [2 fields]
    test: terms, freq, prox...
    OK [1263500 terms; 601715072 terms/docs pairs; 1513780631 tokens]
    test: stored fields.......OK [8063151 total field count; avg 5.998 fields per doc]
    test: term vectors........OK [1344278 total vector count; avg 1 term/freq vector fields per doc]
    No problems were detected with this index.



    2011-04-01
    ________________________________
    袁武 [GMail]
    ________________________________
    发件人: Michael McCandless
    发送时间: 2011-04-01 17:58:08
    收件人: java-user
    抄送: 袁武 [GMail]
    主题: Re: A likely bug of TermsPosition.nextPosition
    Hmm this could be from a corrupted index.
    What version of Lucene? What OS/filesystem?
    Can you run CheckIndex and post the output?
    Mike
    http://blog.mikemccandless.com
    2011/3/31 袁武 [GMail] <yuanwu.mail@gmail.com>:
    Hi, dear experts:

    When IndexReader.termsPositions is used to access specific terms, the call to TermsPosition.nextPosition success if TermsPosition.next is used. But if TermsPosition.skipTo is used instead of TermsPosition.next, a java.lang.IllegalArgumentException will be thrown, as bellows.

    java.lang.IllegalArgumentException: Negative position
    at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:610)
    at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:161)
    at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:213)
    at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
    at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:92)
    at org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:181)
    at org.apache.lucene.index.SegmentTermPositions.readDeltaPosition(SegmentTermPositions.java:75)
    at org.apache.lucene.index.SegmentTermPositions.skipPositions(SegmentTermPositions.java:130)
    at org.apache.lucene.index.SegmentTermPositions.lazySkip(SegmentTermPositions.java:168)
    at org.apache.lucene.index.SegmentTermPositions.nextPosition(SegmentTermPositions.java:69)

    In my further study, I found that if docid execeed 1044278, the exception occurs everytime, for the small ones, the exception never occur. BTW, the total number of documents is about 1344278, and are never deleted.

    Thanks.

    2011-04-01



    Yuan Wu [GMail]
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • 袁武 [GMail] at Apr 2, 2011 at 8:07 am
    Dear Mike:

    The index was constructed using Lucene 2.9. After the problem occured, I switched to recently release Lucene 3.1, leaving the index untouched.

    Thanks to your help

    2011-04-02



    袁武 [GMail]



    发件人: Michael McCandless
    发送时间: 2011-04-02 00:13:35
    收件人: 袁武 [GMail]
    抄送: java-user
    主题: Re: Re: A likely bug of TermsPosition.nextPosition

    Hmm so it's not index corruption. Curious.
    Which Lucene version are you using? Looks like it's 2.9, but not
    2.9.4? Can you try 2.9.4 and see if you still hit the problem?
    Can you post a small test case showing the problem, on your index?
    Mike
    http://blog.mikemccandless.com
    2011/4/1 袁武 [GMail] <yuanwu.mail@gmail.com>:
    Hi, Dear Mike:

    belows list the report of checkIndex. OS is Fedora Linux.

    [oracle@server bin]$ java -classpath ./ org.apache.lucene.index.CheckIndex /data/Index/URL/Generic/ -fix

    NOTE: testing will be more thorough if you run java with '-ea:org.apache.lucene...', so assertions are enabled
    Opening index @ /GeoGrid/data/Index/URL/Generic/
    Segments file=segments_ayup numSegments=1 version=FORMAT_DIAGNOSTICS [Lucene 2.9]
    1 of 1: name=_bym0 docCount=1344278
    compound=false
    hasProx=true
    numFiles=11
    size (MB)=15,979.593
    diagnostics = {optimize=true, mergeFactor=3, os.version=2.6.27.41-170.2.117.fc10.x86_64, os=Linux, mergeDocStores=true, lucene.version=3.0-dev, source=merge, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.}
    no deletions
    test: open reader.........OK
    test: fields..............OK [11 fields]
    test: field norms.........OK [2 fields]
    test: terms, freq, prox...
    OK [1263500 terms; 601715072 terms/docs pairs; 1513780631 tokens]
    test: stored fields.......OK [8063151 total field count; avg 5.998 fields per doc]
    test: term vectors........OK [1344278 total vector count; avg 1 term/freq vector fields per doc]
    No problems were detected with this index.



    2011-04-01
    ________________________________
    袁武 [GMail]
    ________________________________
    发件人: Michael McCandless
    发送时间: 2011-04-01 17:58:08
    收件人: java-user
    抄送: 袁武 [GMail]
    主题: Re: A likely bug of TermsPosition.nextPosition
    Hmm this could be from a corrupted index.
    What version of Lucene? What OS/filesystem?
    Can you run CheckIndex and post the output?
    Mike
    http://blog.mikemccandless.com
    2011/3/31 袁武 [GMail] <yuanwu.mail@gmail.com>:
    Hi, dear experts:

    When IndexReader.termsPositions is used to access specific terms, the call to TermsPosition.nextPosition success if TermsPosition.next is used. But if TermsPosition.skipTo is used instead of TermsPosition.next, a java.lang.IllegalArgumentException will be thrown, as bellows.

    java.lang.IllegalArgumentException: Negative position
    at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:610)
    at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:161)
    at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:213)
    at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
    at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:92)
    at org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:181)
    at org.apache.lucene.index.SegmentTermPositions.readDeltaPosition(SegmentTermPositions.java:75)
    at org.apache.lucene.index.SegmentTermPositions.skipPositions(SegmentTermPositions.java:130)
    at org.apache.lucene.index.SegmentTermPositions.lazySkip(SegmentTermPositions.java:168)
    at org.apache.lucene.index.SegmentTermPositions.nextPosition(SegmentTermPositions.java:69)

    In my further study, I found that if docid execeed 1044278, the exception occurs everytime, for the small ones, the exception never occur. BTW, the total number of documents is about 1344278, and are never deleted.

    Thanks.

    2011-04-01



    Yuan Wu [GMail]
  • Michael McCandless at Apr 2, 2011 at 1:11 pm
    So your test case still hits the exception in 3.1?

    If you fully rebuild you index in 3.1, does the exception still occur?

    Is there any way I could get access to this index?

    Do other terms besides "\1" have the problem?

    I just committed a change to the 3.x branch
    (https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x) --
    are you able to checkout this branch, build it, and run its CheckIndex
    on your index? I'm also attaching the patch... and I could also send
    you the patched Lucene core JAR if you want.

    I'd like to see if that change to CheckIndex detects the problem with
    your index.

    Mike

    http://blog.mikemccandless.com

    2011/4/2 袁武 [GMail] <yuanwu.mail@gmail.com>:
    Dear Mike:

    The index was constructed using Lucene 2.9. After the problem occured, I
    switched to recently release Lucene 3.1, leaving the index untouched.

    Thanks to your help

    2011-04-02
    ________________________________
    袁武 [GMail]
    ________________________________
    发件人: Michael McCandless
    发送时间: 2011-04-02 00:13:35
    收件人: 袁武 [GMail]
    抄送: java-user
    主题: Re: Re: A likely bug of TermsPosition.nextPosition
    Hmm so it's not index corruption. Curious.
    Which Lucene version are you using? Looks like it's 2.9, but not
    2.9.4? Can you try 2.9.4 and see if you still hit the problem?
    Can you post a small test case showing the problem, on your index?
    Mike
    http://blog.mikemccandless.com
    2011/4/1 袁武 [GMail] <yuanwu.mail@gmail.com>:
    Hi, Dear Mike:

    belows list the report of checkIndex. OS is Fedora Linux.

    [oracle@server bin]$ java -classpath ./ org.apache.lucene.index.CheckIndex /data/Index/URL/Generic/ -fix

    NOTE: testing will be more thorough if you run java with '-ea:org.apache.lucene...', so assertions are enabled
    Opening index @ /GeoGrid/data/Index/URL/Generic/
    Segments file=segments_ayup numSegments=1 version=FORMAT_DIAGNOSTICS [Lucene 2.9]
    1 of 1: name=_bym0 docCount=1344278
    compound=false
    hasProx=true
    numFiles=11
    size (MB)=15,979.593
    diagnostics = {optimize=true, mergeFactor=3, os.version=2.6.27.41-170.2.117.fc10.x86_64, os=Linux, mergeDocStores=true, lucene.version=3.0-dev, source=merge, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.}
    no deletions
    test: open reader.........OK
    test: fields..............OK [11 fields]
    test: field norms.........OK [2 fields]
    test: terms, freq, prox...
    OK [1263500 terms; 601715072 terms/docs pairs; 1513780631 tokens]
    test: stored fields.......OK [8063151 total field count; avg 5.998 fields per doc]
    test: term vectors........OK [1344278 total vector count; avg 1 term/freq vector fields per doc]
    No problems were detected with this index.



    2011-04-01
    ________________________________
    袁武 [GMail]
    ________________________________
    发件人: Michael McCandless
    发送时间: 2011-04-01 17:58:08
    收件人: java-user
    抄送: 袁武 [GMail]
    主题: Re: A likely bug of TermsPosition.nextPosition
    Hmm this could be from a corrupted index.
    What version of Lucene? What OS/filesystem?
    Can you run CheckIndex and post the output?
    Mike
    http://blog.mikemccandless.com
    2011/3/31 袁武 [GMail] <yuanwu.mail@gmail.com>:
    Hi, dear experts:

    When IndexReader.termsPositions is used to access specific terms, the call to TermsPosition.nextPosition success if TermsPosition.next is used. But if TermsPosition.skipTo is used instead of TermsPosition.next, a java.lang.IllegalArgumentException will be thrown, as bellows.

    java.lang.IllegalArgumentException: Negative position
    at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:610)
    at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:161)
    at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:213)
    at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
    at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:92)
    at org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:181)
    at org.apache.lucene.index.SegmentTermPositions.readDeltaPosition(SegmentTermPositions.java:75)
    at org.apache.lucene.index.SegmentTermPositions.skipPositions(SegmentTermPositions.java:130)
    at org.apache.lucene.index.SegmentTermPositions.lazySkip(SegmentTermPositions.java:168)
    at org.apache.lucene.index.SegmentTermPositions.nextPosition(SegmentTermPositions.java:69)

    In my further study, I found that if docid execeed 1044278, the exception occurs everytime, for the small ones, the exception never occur. BTW, the total number of documents is about 1344278, and are never deleted.

    Thanks.

    2011-04-01



    Yuan Wu [GMail]
  • 袁武 [GMail] at Apr 6, 2011 at 2:00 am
    Dear Mike:

    Thanks to your help, and apologise for delayed reply.

    Yes, the execption still ocuur in 3.1 and the index is now being rebuilt for 3.1.

    In the index, only the term '\1' has payload, If the search switch to other terms, the exception doesn't raise.

    If you can send me a new recent copy of CheckIndex in jar, I will run it and send you the report as soon as possible.

    Great thanks.

    2011-04-06



    袁武 [GMail]



    发件人: Michael McCandless
    发送时间: 2011-04-02 21:11:05
    收件人: 袁武 [GMail]
    抄送: java-user
    主题: Re: Re: Re: A likely bug of TermsPosition.nextPosition

    So your test case still hits the exception in 3.1?
    If you fully rebuild you index in 3.1, does the exception still occur?
    Is there any way I could get access to this index?
    Do other terms besides "\1" have the problem?
    I just committed a change to the 3.x branch
    (https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x) --
    are you able to checkout this branch, build it, and run its CheckIndex
    on your index? I'm also attaching the patch... and I could also send
    you the patched Lucene core JAR if you want.
    I'd like to see if that change to CheckIndex detects the problem with
    your index.
    Mike
    http://blog.mikemccandless.com
    2011/4/2 袁武 [GMail] <yuanwu.mail@gmail.com>:
    Dear Mike:

    The index was constructed using Lucene 2.9. After the problem occured, I
    switched to recently release Lucene 3.1, leaving the index untouched.

    Thanks to your help

    2011-04-02
    ________________________________
    袁武 [GMail]
    ________________________________
    发件人: Michael McCandless
    发送时间: 2011-04-02 00:13:35
    收件人: 袁武 [GMail]
    抄送: java-user
    主题: Re: Re: A likely bug of TermsPosition.nextPosition
    Hmm so it's not index corruption. Curious.
    Which Lucene version are you using? Looks like it's 2.9, but not
    2.9.4? Can you try 2.9.4 and see if you still hit the problem?
    Can you post a small test case showing the problem, on your index?
    Mike
    http://blog.mikemccandless.com
    2011/4/1 袁武 [GMail] <yuanwu.mail@gmail.com>:
    Hi, Dear Mike:

    belows list the report of checkIndex. OS is Fedora Linux.

    [oracle@server bin]$ java -classpath ./ org.apache.lucene.index.CheckIndex /data/Index/URL/Generic/ -fix

    NOTE: testing will be more thorough if you run java with '-ea:org.apache.lucene...', so assertions are enabled
    Opening index @ /GeoGrid/data/Index/URL/Generic/
    Segments file=segments_ayup numSegments=1 version=FORMAT_DIAGNOSTICS [Lucene 2.9]
    1 of 1: name=_bym0 docCount=1344278
    compound=false
    hasProx=true
    numFiles=11
    size (MB)=15,979.593
    diagnostics = {optimize=true, mergeFactor=3, os.version=2.6.27.41-170.2.117.fc10.x86_64, os=Linux, mergeDocStores=true, lucene.version=3.0-dev, source=merge, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.}
    no deletions
    test: open reader.........OK
    test: fields..............OK [11 fields]
    test: field norms.........OK [2 fields]
    test: terms, freq, prox...
    OK [1263500 terms; 601715072 terms/docs pairs; 1513780631 tokens]
    test: stored fields.......OK [8063151 total field count; avg 5.998 fields per doc]
    test: term vectors........OK [1344278 total vector count; avg 1 term/freq vector fields per doc]
    No problems were detected with this index.



    2011-04-01
    ________________________________
    袁武 [GMail]
    ________________________________
    发件人: Michael McCandless
    发送时间: 2011-04-01 17:58:08
    收件人: java-user
    抄送: 袁武 [GMail]
    主题: Re: A likely bug of TermsPosition.nextPosition
    Hmm this could be from a corrupted index.
    What version of Lucene? What OS/filesystem?
    Can you run CheckIndex and post the output?
    Mike
    http://blog.mikemccandless.com
    2011/3/31 袁武 [GMail] <yuanwu.mail@gmail.com>:
    Hi, dear experts:

    When IndexReader.termsPositions is used to access specific terms, the call to TermsPosition.nextPosition success if TermsPosition.next is used. But if TermsPosition.skipTo is used instead of TermsPosition.next, a java.lang.IllegalArgumentException will be thrown, as bellows.

    java.lang.IllegalArgumentException: Negative position
    at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:610)
    at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:161)
    at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:213)
    at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
    at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:92)
    at org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:181)
    at org.apache.lucene.index.SegmentTermPositions.readDeltaPosition(SegmentTermPositions.java:75)
    at org.apache.lucene.index.SegmentTermPositions.skipPositions(SegmentTermPositions.java:130)
    at org.apache.lucene.index.SegmentTermPositions.lazySkip(SegmentTermPositions.java:168)
    at org.apache.lucene.index.SegmentTermPositions.nextPosition(SegmentTermPositions.java:69)

    In my further study, I found that if docid execeed 1044278, the exception occurs everytime, for the small ones, the exception never occur. BTW, the total number of documents is about 1344278, and are never deleted.

    Thanks.

    2011-04-01



    Yuan Wu [GMail]
  • 袁武 [GMail] at Apr 6, 2011 at 9:32 am
    Dear Mike:

    I have run the CheckIndex of branch_3x, and the result report is listed below:

    [oracle@server bin]$ java -classpath ./ org.apache.lucene.index.CheckIndex /data/Index/URL/Generic/ -fix
    NOTE: testing will be more thorough if you run java with '-ea:org.apache.lucene...', so assertions are enabled
    Opening index @ /data/Index/URL/Generic/
    Segments file=segments_ayup numSegments=1 version=FORMAT_DIAGNOSTICS [Lucene 2.9]
    1 of 1: name=_bym0 docCount=1344278
    compound=false
    hasProx=true
    numFiles=11
    size (MB)=15,979.593
    diagnostics = {optimize=true, mergeFactor=3, os.version=2.6.27.41-170.2.117. fc10.x86_64, os=Linux, mergeDocStores=true, lucene.version=3.0-dev, source=merge , os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.}
    no deletions
    test: open reader.........OK
    test: fields..............OK [11 fields]
    test: field norms.........OK [2 fields]
    test: terms, freq, prox...OK [1263500 terms; 601715072 terms/docs pairs; 1513780631 tokens]
    test: stored fields.......OK [8063151 total field count; avg 5.998 fields per doc]
    test: term vectors........OK [1344278 total vector count; avg 1 term/freq vector fields per doc]
    No problems were detected with this index.

    2011-04-06



    袁武 [GMail]



    发件人: Michael McCandless
    发送时间: 2011-04-02 21:11:05
    收件人: 袁武 [GMail]
    抄送: java-user
    主题: Re: Re: Re: A likely bug of TermsPosition.nextPosition

    So your test case still hits the exception in 3.1?
    If you fully rebuild you index in 3.1, does the exception still occur?
    Is there any way I could get access to this index?
    Do other terms besides "\1" have the problem?
    I just committed a change to the 3.x branch
    (https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x) --
    are you able to checkout this branch, build it, and run its CheckIndex
    on your index? I'm also attaching the patch... and I could also send
    you the patched Lucene core JAR if you want.
    I'd like to see if that change to CheckIndex detects the problem with
    your index.
    Mike
    http://blog.mikemccandless.com
    2011/4/2 袁武 [GMail] <yuanwu.mail@gmail.com>:
    Dear Mike:

    The index was constructed using Lucene 2.9. After the problem occured, I
    switched to recently release Lucene 3.1, leaving the index untouched.

    Thanks to your help

    2011-04-02
    ________________________________
    袁武 [GMail]
    ________________________________
    发件人: Michael McCandless
    发送时间: 2011-04-02 00:13:35
    收件人: 袁武 [GMail]
    抄送: java-user
    主题: Re: Re: A likely bug of TermsPosition.nextPosition
    Hmm so it's not index corruption. Curious.
    Which Lucene version are you using? Looks like it's 2.9, but not
    2.9.4? Can you try 2.9.4 and see if you still hit the problem?
    Can you post a small test case showing the problem, on your index?
    Mike
    http://blog.mikemccandless.com
    2011/4/1 袁武 [GMail] <yuanwu.mail@gmail.com>:
    Hi, Dear Mike:

    belows list the report of checkIndex. OS is Fedora Linux.

    [oracle@server bin]$ java -classpath ./ org.apache.lucene.index.CheckIndex /data/Index/URL/Generic/ -fix

    NOTE: testing will be more thorough if you run java with '-ea:org.apache.lucene...', so assertions are enabled
    Opening index @ /GeoGrid/data/Index/URL/Generic/
    Segments file=segments_ayup numSegments=1 version=FORMAT_DIAGNOSTICS [Lucene 2.9]
    1 of 1: name=_bym0 docCount=1344278
    compound=false
    hasProx=true
    numFiles=11
    size (MB)=15,979.593
    diagnostics = {optimize=true, mergeFactor=3, os.version=2.6.27.41-170.2.117.fc10.x86_64, os=Linux, mergeDocStores=true, lucene.version=3.0-dev, source=merge, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.}
    no deletions
    test: open reader.........OK
    test: fields..............OK [11 fields]
    test: field norms.........OK [2 fields]
    test: terms, freq, prox...
    OK [1263500 terms; 601715072 terms/docs pairs; 1513780631 tokens]
    test: stored fields.......OK [8063151 total field count; avg 5.998 fields per doc]
    test: term vectors........OK [1344278 total vector count; avg 1 term/freq vector fields per doc]
    No problems were detected with this index.



    2011-04-01
    ________________________________
    袁武 [GMail]
    ________________________________
    发件人: Michael McCandless
    发送时间: 2011-04-01 17:58:08
    收件人: java-user
    抄送: 袁武 [GMail]
    主题: Re: A likely bug of TermsPosition.nextPosition
    Hmm this could be from a corrupted index.
    What version of Lucene? What OS/filesystem?
    Can you run CheckIndex and post the output?
    Mike
    http://blog.mikemccandless.com
    2011/3/31 袁武 [GMail] <yuanwu.mail@gmail.com>:
    Hi, dear experts:

    When IndexReader.termsPositions is used to access specific terms, the call to TermsPosition.nextPosition success if TermsPosition.next is used. But if TermsPosition.skipTo is used instead of TermsPosition.next, a java.lang.IllegalArgumentException will be thrown, as bellows.

    java.lang.IllegalArgumentException: Negative position
    at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:610)
    at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:161)
    at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:213)
    at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
    at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:92)
    at org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:181)
    at org.apache.lucene.index.SegmentTermPositions.readDeltaPosition(SegmentTermPositions.java:75)
    at org.apache.lucene.index.SegmentTermPositions.skipPositions(SegmentTermPositions.java:130)
    at org.apache.lucene.index.SegmentTermPositions.lazySkip(SegmentTermPositions.java:168)
    at org.apache.lucene.index.SegmentTermPositions.nextPosition(SegmentTermPositions.java:69)

    In my further study, I found that if docid execeed 1044278, the exception occurs everytime, for the small ones, the exception never occur. BTW, the total number of documents is about 1344278, and are never deleted.

    Thanks.

    2011-04-01



    Yuan Wu [GMail]
  • Michael McCandless at Apr 7, 2011 at 2:48 pm
    I committed another change to 3.x's CheckIndex (to also invoke
    .nextPosition); can you run that and see if it can detect this?

    Is there any way I can get this index?

    One question: how large are your payloads, typically?

    Does the exception still occur on the index fully rebuilt with 3.1?

    Mike

    http://blog.mikemccandless.com

    2011/4/6 袁武 [GMail] <yuanwu.mail@gmail.com>:
    Dear Mike:

    I have run the CheckIndex of branch_3x, and the result report is listed
    below:

    [oracle@server bin]$ java -classpath ./ org.apache.lucene.index.CheckIndex /data/Index/URL/Generic/ -fix
    NOTE: testing will be more thorough if you run java with '-ea:org.apache.lucene...', so assertions are enabled
    Opening index @ /data/Index/URL/Generic/
    Segments file=segments_ayup numSegments=1 version=FORMAT_DIAGNOSTICS [Lucene 2.9]
    1 of 1: name=_bym0 docCount=1344278
    compound=false
    hasProx=true
    numFiles=11
    size (MB)=15,979.593
    diagnostics = {optimize=true, mergeFactor=3, os.version=2.6.27.41-170.2.117. fc10.x86_64, os=Linux, mergeDocStores=true, lucene.version=3.0-dev, source=merge , os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.}
    no deletions
    test: open reader.........OK
    test: fields..............OK [11 fields]
    test: field norms.........OK [2 fields]
    test: terms, freq, prox...OK [1263500 terms; 601715072 terms/docs pairs; 1513780631 tokens]
    test: stored fields.......OK [8063151 total field count; avg 5.998 fields per doc]
    test: term vectors........OK [1344278 total vector count; avg 1 term/freq vector fields per doc]
    No problems were detected with this index.

    2011-04-06
    ________________________________
    袁武 [GMail]
    ________________________________
    发件人: Michael McCandless
    发送时间: 2011-04-02 21:11:05
    收件人: 袁武 [GMail]
    抄送: java-user
    主题: Re: Re: Re: A likely bug of TermsPosition.nextPosition
    So your test case still hits the exception in 3.1?
    If you fully rebuild you index in 3.1, does the exception still occur?
    Is there any way I could get access to this index?
    Do other terms besides "\1" have the problem?
    I just committed a change to the 3.x branch
    (https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x) --
    are you able to checkout this branch, build it, and run its CheckIndex
    on your index? I'm also attaching the patch... and I could also send
    you the patched Lucene core JAR if you want.
    I'd like to see if that change to CheckIndex detects the problem with
    your index.
    Mike
    http://blog.mikemccandless.com
    2011/4/2 袁武 [GMail] <yuanwu.mail@gmail.com>:
    Dear Mike:

    The index was constructed using Lucene 2.9. After the problem occured, I
    switched to recently release Lucene 3.1, leaving the index untouched.

    Thanks to your help

    2011-04-02
    ________________________________
    袁武 [GMail]
    ________________________________
    发件人: Michael McCandless
    发送时间: 2011-04-02 00:13:35
    收件人: 袁武 [GMail]
    抄送: java-user
    主题: Re: Re: A likely bug of TermsPosition.nextPosition
    Hmm so it's not index corruption. Curious.
    Which Lucene version are you using? Looks like it's 2.9, but not
    2.9.4? Can you try 2.9.4 and see if you still hit the problem?
    Can you post a small test case showing the problem, on your index?
    Mike
    http://blog.mikemccandless.com
    2011/4/1 袁武 [GMail] <yuanwu.mail@gmail.com>:
    Hi, Dear Mike:

    belows list the report of checkIndex. OS is Fedora Linux.

    [oracle@server bin]$ java -classpath ./ org.apache.lucene.index.CheckIndex /data/Index/URL/Generic/ -fix

    NOTE: testing will be more thorough if you run java with '-ea:org.apache.lucene...', so assertions are enabled
    Opening index @ /GeoGrid/data/Index/URL/Generic/
    Segments file=segments_ayup numSegments=1 version=FORMAT_DIAGNOSTICS [Lucene 2.9]
    1 of 1: name=_bym0 docCount=1344278
    compound=false
    hasProx=true
    numFiles=11
    size (MB)=15,979.593
    diagnostics = {optimize=true, mergeFactor=3, os.version=2.6.27.41-170.2.117.fc10.x86_64, os=Linux, mergeDocStores=true, lucene.version=3.0-dev, source=merge, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.}
    no deletions
    test: open reader.........OK
    test: fields..............OK [11 fields]
    test: field norms.........OK [2 fields]
    test: terms, freq, prox...
    OK [1263500 terms; 601715072 terms/docs pairs; 1513780631 tokens]
    test: stored fields.......OK [8063151 total field count; avg 5.998 fields per doc]
    test: term vectors........OK [1344278 total vector count; avg 1 term/freq vector fields per doc]
    No problems were detected with this index.



    2011-04-01
    ________________________________
    袁武 [GMail]
    ________________________________
    发件人: Michael McCandless
    发送时间: 2011-04-01 17:58:08
    收件人: java-user
    抄送: 袁武 [GMail]
    主题: Re: A likely bug of TermsPosition.nextPosition
    Hmm this could be from a corrupted index.
    What version of Lucene? What OS/filesystem?
    Can you run CheckIndex and post the output?
    Mike
    http://blog.mikemccandless.com
    2011/3/31 袁武 [GMail] <yuanwu.mail@gmail.com>:
    Hi, dear experts:

    When IndexReader.termsPositions is used to access specific terms, the call to TermsPosition.nextPosition success if TermsPosition.next is used. But if TermsPosition.skipTo is used instead of TermsPosition.next, a java.lang.IllegalArgumentException will be thrown, as bellows.

    java.lang.IllegalArgumentException: Negative position
    at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:610)
    at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:161)
    at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:213)
    at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
    at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:92)
    at org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:181)
    at org.apache.lucene.index.SegmentTermPositions.readDeltaPosition(SegmentTermPositions.java:75)
    at org.apache.lucene.index.SegmentTermPositions.skipPositions(SegmentTermPositions.java:130)
    at org.apache.lucene.index.SegmentTermPositions.lazySkip(SegmentTermPositions.java:168)
    at org.apache.lucene.index.SegmentTermPositions.nextPosition(SegmentTermPositions.java:69)

    In my further study, I found that if docid execeed 1044278, the exception occurs everytime, for the small ones, the exception never occur. BTW, the total number of documents is about 1344278, and are never deleted.

    Thanks.

    2011-04-01



    Yuan Wu [GMail]
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • 袁武 [GMail] at Apr 2, 2011 at 8:00 am
    Hi, Dear exports:

    The below case runs ok:

    public void test() throws IOException {
    IndexSearcher is = null;
    TermPositions tp = null;

    try {
    is = GenericStorage.getInstance().getSearcher();
    tp = is.getIndexReader().termPositions(new Term(IndexConfig.TEXT_FEILD, "\1"));
    while (tp.next()) {
    try {
    for (int i = 0; i < tp.freq(); i++) {
    tp.nextPosition();
    }
    } catch (IOException e) {
    System.out.println(tp.doc());
    e.printStackTrace();
    }
    }
    } catch (IOException e) {
    e.printStackTrace();
    } finally {
    if (is != null)
    is.close();
    if (tp != null)
    tp.close();
    }
    }

    while the below one throws exception:

    public void test() throws IOException {
    IndexSearcher is = null;
    TermPositions tp = null;

    int doc = 1044278;//or other bigger number.
    try {
    is = GenericStorage.getInstance().getSearcher();
    tp = is.getIndexReader().termPositions(new Term(IndexConfig.TEXT_FEILD, "\1"));
    if(tp.skipTo(doc)) {
    try {
    for (int i = 0; i < tp.freq(); i++) {
    tp.nextPosition(); //Throw exceptions
    }
    } catch (IOException e) {
    System.out.println(tp.doc());
    e.printStackTrace();
    }
    }
    } catch (IOException e) {
    e.printStackTrace();
    } finally {
    if (is != null)
    is.close();
    if (tp != null)
    tp.close();
    }
    }


    thanks.
    2011-04-02



    袁武 [GMail]



    发件人: Michael McCandless
    发送时间: 2011-04-02 00:13:35
    收件人: 袁武 [GMail]
    抄送: java-user
    主题: Re: Re: A likely bug of TermsPosition.nextPosition

    Hmm so it's not index corruption. Curious.
    Which Lucene version are you using? Looks like it's 2.9, but not
    2.9.4? Can you try 2.9.4 and see if you still hit the problem?
    Can you post a small test case showing the problem, on your index?
    Mike
    http://blog.mikemccandless.com
    2011/4/1 袁武 [GMail] <yuanwu.mail@gmail.com>:
    Hi, Dear Mike:

    belows list the report of checkIndex. OS is Fedora Linux.

    [oracle@server bin]$ java -classpath ./ org.apache.lucene.index.CheckIndex /data/Index/URL/Generic/ -fix

    NOTE: testing will be more thorough if you run java with '-ea:org.apache.lucene...', so assertions are enabled
    Opening index @ /GeoGrid/data/Index/URL/Generic/
    Segments file=segments_ayup numSegments=1 version=FORMAT_DIAGNOSTICS [Lucene 2.9]
    1 of 1: name=_bym0 docCount=1344278
    compound=false
    hasProx=true
    numFiles=11
    size (MB)=15,979.593
    diagnostics = {optimize=true, mergeFactor=3, os.version=2.6.27.41-170.2.117.fc10.x86_64, os=Linux, mergeDocStores=true, lucene.version=3.0-dev, source=merge, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.}
    no deletions
    test: open reader.........OK
    test: fields..............OK [11 fields]
    test: field norms.........OK [2 fields]
    test: terms, freq, prox...
    OK [1263500 terms; 601715072 terms/docs pairs; 1513780631 tokens]
    test: stored fields.......OK [8063151 total field count; avg 5.998 fields per doc]
    test: term vectors........OK [1344278 total vector count; avg 1 term/freq vector fields per doc]
    No problems were detected with this index.



    2011-04-01
    ________________________________
    袁武 [GMail]
    ________________________________
    发件人: Michael McCandless
    发送时间: 2011-04-01 17:58:08
    收件人: java-user
    抄送: 袁武 [GMail]
    主题: Re: A likely bug of TermsPosition.nextPosition
    Hmm this could be from a corrupted index.
    What version of Lucene? What OS/filesystem?
    Can you run CheckIndex and post the output?
    Mike
    http://blog.mikemccandless.com
    2011/3/31 袁武 [GMail] <yuanwu.mail@gmail.com>:
    Hi, dear experts:

    When IndexReader.termsPositions is used to access specific terms, the call to TermsPosition.nextPosition success if TermsPosition.next is used. But if TermsPosition.skipTo is used instead of TermsPosition.next, a java.lang.IllegalArgumentException will be thrown, as bellows.

    java.lang.IllegalArgumentException: Negative position
    at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:610)
    at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:161)
    at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:213)
    at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
    at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:92)
    at org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:181)
    at org.apache.lucene.index.SegmentTermPositions.readDeltaPosition(SegmentTermPositions.java:75)
    at org.apache.lucene.index.SegmentTermPositions.skipPositions(SegmentTermPositions.java:130)
    at org.apache.lucene.index.SegmentTermPositions.lazySkip(SegmentTermPositions.java:168)
    at org.apache.lucene.index.SegmentTermPositions.nextPosition(SegmentTermPositions.java:69)

    In my further study, I found that if docid execeed 1044278, the exception occurs everytime, for the small ones, the exception never occur. BTW, the total number of documents is about 1344278, and are never deleted.

    Thanks.

    2011-04-01



    Yuan Wu [GMail]

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedApr 1, '11 at 3:34a
activeApr 7, '11 at 2:48p
posts10
users2
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase