FAQ
Hi,

Can anyone have any idea that what are the reason to lucene index got corrupted ?

I am running lucene 1.9 on unix machine...updating my index very frequently....after few updation it says "read past eof"

I know this exception generally comes when one of the index got corrupted...but i dont know why it got corrupted ?

may be mine code problem but i am not able to figure out exact problem...

plz help..

Thanks.
Bhavin

Search Discussions

  • Michael McCandless at Aug 30, 2006 at 10:38 am

    Bhavin Pandya wrote:

    I am running lucene 1.9 on unix machine...updating my index very frequently....after few updation it says "read past eof"

    I know this exception generally comes when one of the index got corrupted...but i dont know why it got corrupted ?

    may be mine code problem but i am not able to figure out exact problem...
    Is it your IndexWriter that's raising the exception? Can you post the
    full exception here? Also could you provide more detail about how your
    application works, what filesystem you're using under Unix, etc?

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Bhavin Pandya at Aug 30, 2006 at 12:45 pm
    Hi Mike,


    Here is the full stack trace of error which I got at search time....

    java.io.IOException: read past EOF
    at org.apache.lucene.store.FSIndexInput.readInternal FSDirectory.java:451)
    at
    org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:45)
    at
    org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:219)
    at
    org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:64)
    at
    org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:33)
    at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:46)
    at org.apache.lucene.index.SegmentTermEnum.(SegmentTermEnum.java:47)
    at org.apache.lucene.index.TermInfosReader.(TermInfosReader.java:48)
    at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:147)
    at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:129)
    at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:110)
    at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:154)
    at org.apache.lucene.store.Lock$With.run(Lock.java:109)
    at org.apache.lucene.index.IndexReader.open(IndexReader.java:143)
    at org.apache.lucene.index.IndexReader.open(IndexReader.java:127)
    at org.apache.lucene.search.IndexSearcher.(IndexSearcher.java:42)


    same kind of exception i am getting while i am trying to optimize my
    index..but at that time Its throwing from IndexWriter ...

    My guess is ..."One of my index is got corrupted so whenever I am trying to
    search the index or optimize the index or merging the multiple index ...It
    will throws same exception but from different class...sometime from
    IndexReader or sometime from IndexWriter depends on how it is being called"

    I am storing my index on local file system only ..on unix machine.

    Here is in short what i am doing actually...

    I have a index which is updated every 15min... using lucene 1.9 and jdk1.4..
    . on unix machine....
    once the index gets done I am moving index folder to "searchable-index"
    folder...so searchable index is completely saparate from actual index.
    Its not multi-threaded application...so I am sure only one thread at a time
    accessing the index...

    Thanks.
    Bhavin

    ----- Original Message -----
    From: "Michael McCandless" <lucene@mikemccandless.com>
    To: <java-user@lucene.apache.org>
    Sent: Wednesday, August 30, 2006 4:07 PM
    Subject: Re: read past EOF

    Bhavin Pandya wrote:
    I am running lucene 1.9 on unix machine...updating my index very
    frequently....after few updation it says "read past eof"

    I know this exception generally comes when one of the index got
    corrupted...but i dont know why it got corrupted ?

    may be mine code problem but i am not able to figure out exact problem...
    Is it your IndexWriter that's raising the exception? Can you post the
    full exception here? Also could you provide more detail about how your
    application works, what filesystem you're using under Unix, etc?

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Aug 31, 2006 at 2:32 am

    Bhavin Pandya wrote:
    My guess is ..."One of my index is got corrupted so whenever I am trying
    to search the index or optimize the index or merging the multiple index
    ...It will throws same exception but from different class...sometime
    from IndexReader or sometime from IndexWriter depends on how it is being
    called"
    It indeed looks like you have an X.cfs file that is truncated. It would
    be good to get to the root cause that led to this ...

    Were there any other errors leading up to this? For example, when you
    move your index after it's built, is this actually a copy (and maybe
    the disk filled up when copying)? Or a previous [initial] exception
    when building the index?

    Are you really sure only one writer is building the index at once?

    Also is this easily reproduced?

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Bhavin Pandya at Aug 31, 2006 at 5:25 am

    Mike wrote:
    Were there any other errors leading up to this? For example, when you
    move your index after it's built, is this actually a copy (and maybe
    the disk filled up when copying)? Or a previous [initial] exception
    when building the index?

    Are you really sure only one writer is building the index at once?

    Also is this easily reproduced?

    Hi Mike,

    Yes I am sure only one writer at a time accessing index.

    no i am not getting any other exception.

    and there is no problem of disk space also.

    right now i have backcopy of indexes so whenever one index got corrupted i m
    replacing with backup one and starting the indexer again from that duration.

    Here is the script which i am using to move index after its built.

    - rm -rf backupindex/*
    - mv index backupindex;
    - mv newindex index;
    - mkdir newindex
    - cp -dpR index/* newindex/
    - touch index.done
    - echo "done";

    where "newindex" is the index which I am using for indexing...."index" which
    i am using for search purpose....and "backupindex" contains previous index.

    Is there any way through which I can check if index is corrupt or
    not....right now because of this exception (read past EOF ) i made few
    changes in my code to check for corrupt index. But i am checking for corrupt
    index through optimizing...If in optimization of index i m getting
    IOException I am considering that index got corrupted or there is permission
    issue..

    Thanks.
    Bhavin



    ----- Original Message -----
    From: "Michael McCandless" <lucene@mikemccandless.com>
    To: <java-user@lucene.apache.org>
    Sent: Thursday, August 31, 2006 8:01 AM
    Subject: Re: read past EOF

    Bhavin Pandya wrote:
    My guess is ..."One of my index is got corrupted so whenever I am trying
    to search the index or optimize the index or merging the multiple index
    ...It will throws same exception but from different class...sometime from
    IndexReader or sometime from IndexWriter depends on how it is being
    called"
    It indeed looks like you have an X.cfs file that is truncated. It would
    be good to get to the root cause that led to this ...

    Were there any other errors leading up to this? For example, when you
    move your index after it's built, is this actually a copy (and maybe
    the disk filled up when copying)? Or a previous [initial] exception
    when building the index?

    Are you really sure only one writer is building the index at once?

    Also is this easily reproduced?

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Sep 1, 2006 at 11:38 am

    Yes I am sure only one writer at a time accessing index.

    no i am not getting any other exception.

    and there is no problem of disk space also.

    right now i have backcopy of indexes so whenever one index got corrupted
    i m replacing with backup one and starting the indexer again from that
    duration.

    Here is the script which i am using to move index after its built.

    - rm -rf backupindex/*
    - mv index backupindex;
    - mv newindex index;
    - mkdir newindex
    - cp -dpR index/* newindex/
    - touch index.done
    - echo "done";

    where "newindex" is the index which I am using for indexing...."index"
    which i am using for search purpose....and "backupindex" contains
    previous index.
    It sounds like you're working with the index correctly, so I don't have
    any other ideas on why you're getting CFS files that are truncated. I
    would wory about the "cp" step filling up disk, but if you're nowhere
    near filling up disk that's not the root cause here.

    Does this happen intermittantly? Or it happened once and now it's gone?
    Or is it easy to reproduce?
    Is there any way through which I can check if index is corrupt or
    not....right now because of this exception (read past EOF ) i made few
    changes in my code to check for corrupt index. But i am checking for
    corrupt index through optimizing...If in optimization of index i m
    getting IOException I am considering that index got corrupted or there
    is permission issue..
    That's a great question. I don't know of existing tools for doing this
    (anyone else?). Running optimize is likely a good test, so long as
    there's more than 1 segment before optimize (so that it actually does
    something).

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Bhavin Pandya at Sep 8, 2006 at 10:47 am
    Hi Mike,
    It sounds like you're working with the index correctly, so I don't have
    any other ideas on why you're getting CFS files that are truncated. I
    would wory about the "cp" step filling up disk, but if you're nowhere near
    filling up disk that's not the root cause here.
    I have found the cause of this problem... You were right .....
    Its because at perticular point of time my hard disk got full so It
    currupted index at that time but after that because of some batch process
    disk becomes empty enough so I was not able to find continuous exception
    like "no space left"...but when i gone through all the log i tracked it
    sucessfully.

    Thanks for your help....

    - Bhavin pandya

    ----- Original Message -----
    From: "Michael McCandless" <lucene@mikemccandless.com>
    To: <java-user@lucene.apache.org>
    Sent: Friday, September 01, 2006 5:07 PM
    Subject: Re: read past EOF

    Yes I am sure only one writer at a time accessing index.

    no i am not getting any other exception.

    and there is no problem of disk space also.

    right now i have backcopy of indexes so whenever one index got corrupted
    i m replacing with backup one and starting the indexer again from that
    duration.

    Here is the script which i am using to move index after its built.

    - rm -rf backupindex/*
    - mv index backupindex;
    - mv newindex index;
    - mkdir newindex
    - cp -dpR index/* newindex/
    - touch index.done
    - echo "done";

    where "newindex" is the index which I am using for indexing...."index"
    which i am using for search purpose....and "backupindex" contains
    previous index.
    It sounds like you're working with the index correctly, so I don't have
    any other ideas on why you're getting CFS files that are truncated. I
    would wory about the "cp" step filling up disk, but if you're nowhere near
    filling up disk that's not the root cause here.

    Does this happen intermittantly? Or it happened once and now it's gone?
    Or is it easy to reproduce?
    Is there any way through which I can check if index is corrupt or
    not....right now because of this exception (read past EOF ) i made few
    changes in my code to check for corrupt index. But i am checking for
    corrupt index through optimizing...If in optimization of index i m
    getting IOException I am considering that index got corrupted or there is
    permission issue..
    That's a great question. I don't know of existing tools for doing this
    (anyone else?). Running optimize is likely a good test, so long as
    there's more than 1 segment before optimize (so that it actually does
    something).

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Sep 8, 2006 at 11:59 am

    Bhavin Pandya wrote:

    It sounds like you're working with the index correctly, so I don't
    have any other ideas on why you're getting CFS files that are
    truncated. I would wory about the "cp" step filling up disk, but if
    you're nowhere near filling up disk that's not the root cause here.
    I have found the cause of this problem... You were right .....
    Its because at perticular point of time my hard disk got full so It
    currupted index at that time but after that because of some batch
    process disk becomes empty enough so I was not able to find continuous
    exception like "no space left"...but when i gone through all the log i
    tracked it sucessfully.
    Phew! Glad to hear you got down to the root cause and that in fact that
    root cause was "outside" of Lucene :)

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedAug 30, '06 at 9:14a
activeSep 8, '06 at 11:59a
posts8
users2
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase