FAQ
We have a problem with our fileserver where our indexes are hosted
remotely, using Lucene 2.9.3.

This can mean that a segments file is written which is full of ASCII
zeros. Using the od -ah command, we get:

0000000 nul nul nul nul nul nul nul....etc

If opened in Luke, the index opens successfully but has zero documents.

Why does this open correctly in luke, and is there a procedure in the
lucene code that can verify a segments file, e.g. check whether it
refers to any segments?

Thanks

Greg


Please consider the environment before printing this email.

This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.

Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory. The contents of this email may relate to dealings with other companies under the control of Detica Limited, details of which can be found at http://www.detica.com/statutory-information.

Detica Limited is registered in England under No: 1337451.
Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.

Search Discussions

  • Shai Erera at Jun 28, 2011 at 9:36 am
    You can try the CheckIndex tool. You feed it a directory and call .check()
    and it reports the results.

    Shai
    On Tue, Jun 28, 2011 at 11:46 AM, Tarr, Gregory wrote:

    We have a problem with our fileserver where our indexes are hosted
    remotely, using Lucene 2.9.3.

    This can mean that a segments file is written which is full of ASCII
    zeros. Using the od -ah command, we get:

    0000000 nul nul nul nul nul nul nul....etc

    If opened in Luke, the index opens successfully but has zero documents.

    Why does this open correctly in luke, and is there a procedure in the
    lucene code that can verify a segments file, e.g. check whether it
    refers to any segments?

    Thanks

    Greg


    Please consider the environment before printing this email.

    This message should be regarded as confidential. If you have received this
    email in error please notify the sender and destroy it immediately.

    Statements of intent shall only become binding when confirmed in hard copy
    by an authorised signatory. The contents of this email may relate to
    dealings with other companies under the control of Detica Limited, details
    of which can be found at http://www.detica.com/statutory-information.

    Detica Limited is registered in England under No: 1337451.
    Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP,
    England.
  • Tarr, Gregory at Jun 28, 2011 at 9:57 am
    Yes I have done that, and you just get "No problems were detected with
    this index"

    Surely there is a major problem with this index?

    Also the check() procedure takes a long time - is there any way you can
    just do a health check on the segments file?

    Thanks

    Greg

    -----Original Message-----
    From: Shai Erera
    Sent: 28 June 2011 10:36
    To: java-user@lucene.apache.org
    Subject: Re: Corrupt segments file full of zeros

    You can try the CheckIndex tool. You feed it a directory and call
    .check() and it reports the results.

    Shai

    On Tue, Jun 28, 2011 at 11:46 AM, Tarr, Gregory
    wrote:
    We have a problem with our fileserver where our indexes are hosted
    remotely, using Lucene 2.9.3.

    This can mean that a segments file is written which is full of ASCII
    zeros. Using the od -ah command, we get:

    0000000 nul nul nul nul nul nul nul....etc

    If opened in Luke, the index opens successfully but has zero
    documents.
    Why does this open correctly in luke, and is there a procedure in the
    lucene code that can verify a segments file, e.g. check whether it
    refers to any segments?

    Thanks

    Greg


    Please consider the environment before printing this email.

    This message should be regarded as confidential. If you have received
    this email in error please notify the sender and destroy it
    immediately.
    Statements of intent shall only become binding when confirmed in hard
    copy by an authorised signatory. The contents of this email may
    relate to dealings with other companies under the control of Detica
    Limited, details of which can be found at
    http://www.detica.com/statutory-information.
    Detica Limited is registered in England under No: 1337451.
    Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP,
    England.
    Please consider the environment before printing this email.

    This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.

    Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory. The contents of this email may relate to dealings with other companies under the control of Detica Limited, details of which can be found at http://www.detica.com/statutory-information.

    Detica Limited is registered in England under No: 1337451.
    Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Uwe Schindler at Jun 28, 2011 at 11:32 am
    So where is the problem at all? Why should a segments file not contain lots
    of zeroes? If the index is not corrupt all is fine.

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Tarr, Gregory
    Sent: Tuesday, June 28, 2011 11:56 AM
    To: java-user@lucene.apache.org
    Subject: RE: Corrupt segments file full of zeros

    Yes I have done that, and you just get "No problems were detected with this
    index"

    Surely there is a major problem with this index?

    Also the check() procedure takes a long time - is there any way you can just
    do a health check on the segments file?

    Thanks

    Greg

    -----Original Message-----
    From: Shai Erera
    Sent: 28 June 2011 10:36
    To: java-user@lucene.apache.org
    Subject: Re: Corrupt segments file full of zeros

    You can try the CheckIndex tool. You feed it a directory and call
    .check() and it reports the results.

    Shai

    On Tue, Jun 28, 2011 at 11:46 AM, Tarr, Gregory
    wrote:
    We have a problem with our fileserver where our indexes are hosted
    remotely, using Lucene 2.9.3.

    This can mean that a segments file is written which is full of ASCII
    zeros. Using the od -ah command, we get:

    0000000 nul nul nul nul nul nul nul....etc

    If opened in Luke, the index opens successfully but has zero
    documents.
    Why does this open correctly in luke, and is there a procedure in the
    lucene code that can verify a segments file, e.g. check whether it
    refers to any segments?

    Thanks

    Greg


    Please consider the environment before printing this email.

    This message should be regarded as confidential. If you have received
    this email in error please notify the sender and destroy it
    immediately.
    Statements of intent shall only become binding when confirmed in hard
    copy by an authorised signatory. The contents of this email may
    relate to dealings with other companies under the control of Detica
    Limited, details of which can be found at
    http://www.detica.com/statutory-information.
    Detica Limited is registered in England under No: 1337451.
    Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP,
    England.
    Please consider the environment before printing this email.

    This message should be regarded as confidential. If you have received this
    email in error please notify the sender and destroy it immediately.

    Statements of intent shall only become binding when confirmed in hard copy
    by an authorised signatory. The contents of this email may relate to dealings
    with other companies under the control of Detica Limited, details of which
    can be found at http://www.detica.com/statutory-information.

    Detica Limited is registered in England under No: 1337451.
    Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP,
    England.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Tarr, Gregory at Jun 28, 2011 at 11:55 am
    The segments file containing lots of zeros means that the index has no
    segments.

    We could run the following to check this:

    SegmentInfos sis = new SegmentInfos();
    sis.read(indexDir);
    int numSegments = sis.size();
    if (numSegments < 1) { // index has no segments }

    Greg

    -----Original Message-----
    From: Uwe Schindler
    Sent: 28 June 2011 12:33
    To: java-user@lucene.apache.org
    Subject: RE: Corrupt segments file full of zeros

    So where is the problem at all? Why should a segments file not contain
    lots of zeroes? If the index is not corrupt all is fine.

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Tarr, Gregory
    Sent: Tuesday, June 28, 2011 11:56 AM
    To: java-user@lucene.apache.org
    Subject: RE: Corrupt segments file full of zeros

    Yes I have done that, and you just get "No problems were detected with this
    index"

    Surely there is a major problem with this index?

    Also the check() procedure takes a long time - is there any way you
    can just
    do a health check on the segments file?

    Thanks

    Greg

    -----Original Message-----
    From: Shai Erera
    Sent: 28 June 2011 10:36
    To: java-user@lucene.apache.org
    Subject: Re: Corrupt segments file full of zeros

    You can try the CheckIndex tool. You feed it a directory and call
    .check() and it reports the results.

    Shai

    On Tue, Jun 28, 2011 at 11:46 AM, Tarr, Gregory
    wrote:
    We have a problem with our fileserver where our indexes are hosted
    remotely, using Lucene 2.9.3.

    This can mean that a segments file is written which is full of ASCII
    zeros. Using the od -ah command, we get:

    0000000 nul nul nul nul nul nul nul....etc

    If opened in Luke, the index opens successfully but has zero
    documents.
    Why does this open correctly in luke, and is there a procedure in
    the lucene code that can verify a segments file, e.g. check whether
    it refers to any segments?

    Thanks

    Greg


    Please consider the environment before printing this email.

    This message should be regarded as confidential. If you have
    received this email in error please notify the sender and destroy it
    immediately.
    Statements of intent shall only become binding when confirmed in
    hard copy by an authorised signatory. The contents of this email
    may relate to dealings with other companies under the control of
    Detica Limited, details of which can be found at
    http://www.detica.com/statutory-information.
    Detica Limited is registered in England under No: 1337451.
    Registered offices: Surrey Research Park, Guildford, Surrey, GU2
    7YP, England.
    Please consider the environment before printing this email.

    This message should be regarded as confidential. If you have received
    this email in error please notify the sender and destroy it
    immediately.
    Statements of intent shall only become binding when confirmed in hard
    copy by an authorised signatory. The contents of this email may
    relate to dealings
    with other companies under the control of Detica Limited, details of
    which can be found at http://www.detica.com/statutory-information.

    Detica Limited is registered in England under No: 1337451.
    Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP,
    England.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    Please consider the environment before printing this email.

    This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.

    Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory. The contents of this email may relate to dealings with other companies under the control of Detica Limited, details of which can be found at http://www.detica.com/statutory-information.

    Detica Limited is registered in England under No: 1337451.
    Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark harwood at Jun 28, 2011 at 12:09 pm
    According to the spec there should at least be an Int32 of -9 to declare the
    Format - http://lucene.apache.org/java/2_9_3/fileformats.html#Segments File



    ----- Original Message ----
    From: Uwe Schindler <uwe@thetaphi.de>
    To: java-user@lucene.apache.org
    Sent: Tue, 28 June, 2011 12:32:34
    Subject: RE: Corrupt segments file full of zeros

    So where is the problem at all? Why should a segments file not contain lots
    of zeroes? If the index is not corrupt all is fine.

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Tarr, Gregory
    Sent: Tuesday, June 28, 2011 11:56 AM
    To: java-user@lucene.apache.org
    Subject: RE: Corrupt segments file full of zeros

    Yes I have done that, and you just get "No problems were detected with this
    index"

    Surely there is a major problem with this index?

    Also the check() procedure takes a long time - is there any way you can just
    do a health check on the segments file?

    Thanks

    Greg

    -----Original Message-----
    From: Shai Erera
    Sent: 28 June 2011 10:36
    To: java-user@lucene.apache.org
    Subject: Re: Corrupt segments file full of zeros

    You can try the CheckIndex tool. You feed it a directory and call
    .check() and it reports the results.

    Shai

    On Tue, Jun 28, 2011 at 11:46 AM, Tarr, Gregory
    wrote:
    We have a problem with our fileserver where our indexes are hosted
    remotely, using Lucene 2.9.3.

    This can mean that a segments file is written which is full of ASCII
    zeros. Using the od -ah command, we get:

    0000000 nul nul nul nul nul nul nul....etc

    If opened in Luke, the index opens successfully but has zero
    documents.
    Why does this open correctly in luke, and is there a procedure in the
    lucene code that can verify a segments file, e.g. check whether it
    refers to any segments?

    Thanks

    Greg


    Please consider the environment before printing this email.

    This message should be regarded as confidential. If you have received
    this email in error please notify the sender and destroy it
    immediately.
    Statements of intent shall only become binding when confirmed in hard
    copy by an authorised signatory. The contents of this email may
    relate to dealings with other companies under the control of Detica
    Limited, details of which can be found at
    http://www.detica.com/statutory-information.
    Detica Limited is registered in England under No: 1337451.
    Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP,
    England.
    Please consider the environment before printing this email.

    This message should be regarded as confidential. If you have received this
    email in error please notify the sender and destroy it immediately.

    Statements of intent shall only become binding when confirmed in hard copy
    by an authorised signatory. The contents of this email may relate to dealings
    with other companies under the control of Detica Limited, details of which
    can be found at http://www.detica.com/statutory-information.

    Detica Limited is registered in England under No: 1337451.
    Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP,
    England.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Tarr, Gregory at Jun 28, 2011 at 12:18 pm
    We don't have a -9 in the file. It isn't a valid lucene segments file,
    as it only contains zeros.

    We're wondering why this opens in Luke, and why the CheckIndex reports
    that the index is OK.

    -----Original Message-----
    From: mark harwood
    Sent: 28 June 2011 13:09
    To: java-user@lucene.apache.org
    Subject: Re: Corrupt segments file full of zeros

    According to the spec there should at least be an Int32 of -9 to
    declare the Format -
    http://lucene.apache.org/java/2_9_3/fileformats.html#Segments File



    ----- Original Message ----
    From: Uwe Schindler <uwe@thetaphi.de>
    To: java-user@lucene.apache.org
    Sent: Tue, 28 June, 2011 12:32:34
    Subject: RE: Corrupt segments file full of zeros

    So where is the problem at all? Why should a segments file not contain
    lots
    of zeroes? If the index is not corrupt all is fine.

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Tarr, Gregory
    Sent: Tuesday, June 28, 2011 11:56 AM
    To: java-user@lucene.apache.org
    Subject: RE: Corrupt segments file full of zeros

    Yes I have done that, and you just get "No problems were detected with this
    index"

    Surely there is a major problem with this index?

    Also the check() procedure takes a long time - is there any way you
    can
    just
    do a health check on the segments file?

    Thanks

    Greg

    -----Original Message-----
    From: Shai Erera
    Sent: 28 June 2011 10:36
    To: java-user@lucene.apache.org
    Subject: Re: Corrupt segments file full of zeros

    You can try the CheckIndex tool. You feed it a directory and call
    .check() and it reports the results.

    Shai

    On Tue, Jun 28, 2011 at 11:46 AM, Tarr, Gregory
    wrote:
    We have a problem with our fileserver where our indexes are hosted
    remotely, using Lucene 2.9.3.

    This can mean that a segments file is written which is full of ASCII
    zeros. Using the od -ah command, we get:

    0000000 nul nul nul nul nul nul nul....etc

    If opened in Luke, the index opens successfully but has zero
    documents.
    Why does this open correctly in luke, and is there a procedure in
    the
    lucene code that can verify a segments file, e.g. check whether it
    refers to any segments?

    Thanks

    Greg


    Please consider the environment before printing this email.

    This message should be regarded as confidential. If you have
    received
    this email in error please notify the sender and destroy it
    immediately.
    Statements of intent shall only become binding when confirmed in
    hard
    copy by an authorised signatory. The contents of this email may
    relate to dealings with other companies under the control of Detica
    Limited, details of which can be found at
    http://www.detica.com/statutory-information.
    Detica Limited is registered in England under No: 1337451.
    Registered offices: Surrey Research Park, Guildford, Surrey, GU2
    7YP,
    England.
    Please consider the environment before printing this email.

    This message should be regarded as confidential. If you have received this
    email in error please notify the sender and destroy it immediately.

    Statements of intent shall only become binding when confirmed in hard copy
    by an authorised signatory. The contents of this email may relate to dealings
    with other companies under the control of Detica Limited, details of which
    can be found at http://www.detica.com/statutory-information.

    Detica Limited is registered in England under No: 1337451.
    Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP,
    England.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    Please consider the environment before printing this email.

    This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.

    Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory. The contents of this email may relate to dealings with other companies under the control of Detica Limited, details of which can be found at http://www.detica.com/statutory-information.

    Detica Limited is registered in England under No: 1337451.
    Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Jun 28, 2011 at 12:26 pm
    Is there only one segments_N file in the index (the one with all 0s)?
    Or is there a segments_(N-1) too?

    Mike McCandless

    http://blog.mikemccandless.com
    On Tue, Jun 28, 2011 at 8:17 AM, Tarr, Gregory wrote:
    We don't have a -9 in the file. It isn't a valid lucene segments file,
    as it only contains zeros.

    We're wondering why this opens in Luke, and why the CheckIndex reports
    that the index is OK.

    -----Original Message-----
    From: mark harwood
    Sent: 28 June 2011 13:09
    To: java-user@lucene.apache.org
    Subject: Re: Corrupt segments file full of zeros

    According to the spec there should at least be an Int32 of  -9 to
    declare the Format -
    http://lucene.apache.org/java/2_9_3/fileformats.html#Segments File



    ----- Original Message ----
    From: Uwe Schindler <uwe@thetaphi.de>
    To: java-user@lucene.apache.org
    Sent: Tue, 28 June, 2011 12:32:34
    Subject: RE: Corrupt segments file full of zeros

    So where is the problem at all? Why should a segments file not contain
    lots
    of zeroes? If the index is not corrupt all is fine.

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Tarr, Gregory
    Sent: Tuesday, June 28, 2011 11:56 AM
    To: java-user@lucene.apache.org
    Subject: RE: Corrupt segments file full of zeros

    Yes I have done that, and you just get "No problems were detected with this
    index"

    Surely there is a major problem with this index?

    Also the check() procedure takes a long time - is there any way you
    can
    just
    do a health check on the segments file?

    Thanks

    Greg

    -----Original Message-----
    From: Shai Erera
    Sent: 28 June 2011 10:36
    To: java-user@lucene.apache.org
    Subject: Re: Corrupt segments file full of zeros

    You can try the CheckIndex tool. You feed it a directory and call
    .check() and it reports the results.

    Shai

    On Tue, Jun 28, 2011 at 11:46 AM, Tarr, Gregory
    wrote:
    We have a problem with our fileserver where our indexes are hosted
    remotely, using Lucene 2.9.3.

    This can mean that a segments file is written which is full of ASCII
    zeros. Using the od -ah command, we get:

    0000000 nul nul nul nul nul nul nul....etc

    If opened in Luke, the index opens successfully but has zero
    documents.
    Why does this open correctly in luke, and is there a procedure in
    the
    lucene code that can verify a segments file, e.g. check whether it
    refers to any segments?

    Thanks

    Greg


    Please consider the environment before printing this email.

    This message should be regarded as confidential. If you have
    received
    this email in error please notify the sender and destroy it
    immediately.
    Statements of intent shall only become binding when confirmed in
    hard
    copy by an authorised signatory.  The contents of this email may
    relate to dealings with other companies under the control of Detica
    Limited, details of which can be found at
    http://www.detica.com/statutory-information.
    Detica Limited is registered in England under No: 1337451.
    Registered offices: Surrey Research Park, Guildford, Surrey, GU2
    7YP,
    England.
    Please consider the environment before printing this email.

    This message should be regarded as confidential. If you have received this
    email in error please notify the sender and destroy it immediately.

    Statements of intent shall only become binding when confirmed in hard copy
    by an authorised signatory.  The contents of this email may relate to dealings
    with other companies under the control of Detica Limited, details of which
    can be found at http://www.detica.com/statutory-information.

    Detica Limited is registered in England under No: 1337451.
    Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP,
    England.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    Please consider the environment before printing this email.

    This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.

    Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.  The contents of this email may relate to dealings with other companies under the control of Detica Limited, details of which can be found at http://www.detica.com/statutory-information.

    Detica Limited is registered in England under No: 1337451.
    Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Tarr, Gregory at Jun 28, 2011 at 12:29 pm
    There was a segments_(N-1), which was a valid segments file and opened correctly in luke.

    The trouble came because we had to manually rename these files in order to prevent the index from being wiped.

    Thanks

    Greg

    -----Original Message-----
    From: Michael McCandless
    Sent: 28 June 2011 13:26
    To: java-user@lucene.apache.org
    Subject: Re: Corrupt segments file full of zeros

    Is there only one segments_N file in the index (the one with all 0s)?
    Or is there a segments_(N-1) too?

    Mike McCandless

    http://blog.mikemccandless.com
    On Tue, Jun 28, 2011 at 8:17 AM, Tarr, Gregory wrote:
    We don't have a -9 in the file. It isn't a valid lucene segments file,
    as it only contains zeros.

    We're wondering why this opens in Luke, and why the CheckIndex reports
    that the index is OK.

    -----Original Message-----
    From: mark harwood
    Sent: 28 June 2011 13:09
    To: java-user@lucene.apache.org
    Subject: Re: Corrupt segments file full of zeros

    According to the spec there should at least be an Int32 of  -9 to
    declare the Format -
    http://lucene.apache.org/java/2_9_3/fileformats.html#Segments File



    ----- Original Message ----
    From: Uwe Schindler <uwe@thetaphi.de>
    To: java-user@lucene.apache.org
    Sent: Tue, 28 June, 2011 12:32:34
    Subject: RE: Corrupt segments file full of zeros

    So where is the problem at all? Why should a segments file not contain
    lots of zeroes? If the index is not corrupt all is fine.

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Tarr, Gregory
    Sent: Tuesday, June 28, 2011 11:56 AM
    To: java-user@lucene.apache.org
    Subject: RE: Corrupt segments file full of zeros

    Yes I have done that, and you just get "No problems were detected
    with this
    index"

    Surely there is a major problem with this index?

    Also the check() procedure takes a long time - is there any way you
    can
    just
    do a health check on the segments file?

    Thanks

    Greg

    -----Original Message-----
    From: Shai Erera
    Sent: 28 June 2011 10:36
    To: java-user@lucene.apache.org
    Subject: Re: Corrupt segments file full of zeros

    You can try the CheckIndex tool. You feed it a directory and call
    .check() and it reports the results.

    Shai

    On Tue, Jun 28, 2011 at 11:46 AM, Tarr, Gregory
    wrote:
    We have a problem with our fileserver where our indexes are hosted
    remotely, using Lucene 2.9.3.

    This can mean that a segments file is written which is full of
    ASCII zeros. Using the od -ah command, we get:

    0000000 nul nul nul nul nul nul nul....etc

    If opened in Luke, the index opens successfully but has zero
    documents.
    Why does this open correctly in luke, and is there a procedure in
    the
    lucene code that can verify a segments file, e.g. check whether it
    refers to any segments?

    Thanks

    Greg


    Please consider the environment before printing this email.

    This message should be regarded as confidential. If you have
    received
    this email in error please notify the sender and destroy it
    immediately.
    Statements of intent shall only become binding when confirmed in
    hard
    copy by an authorised signatory.  The contents of this email may
    relate to dealings with other companies under the control of Detica
    Limited, details of which can be found at
    http://www.detica.com/statutory-information.
    Detica Limited is registered in England under No: 1337451.
    Registered offices: Surrey Research Park, Guildford, Surrey, GU2
    7YP,
    England.
    Please consider the environment before printing this email.

    This message should be regarded as confidential. If you have received this
    email in error please notify the sender and destroy it immediately.

    Statements of intent shall only become binding when confirmed in hard copy
    by an authorised signatory.  The contents of this email may relate to dealings
    with other companies under the control of Detica Limited, details of which
    can be found at http://www.detica.com/statutory-information.

    Detica Limited is registered in England under No: 1337451.
    Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP,
    England.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    Please consider the environment before printing this email.

    This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.

    Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.  The contents of this email may relate to dealings with other companies under the control of Detica Limited, details of which can be found at http://www.detica.com/statutory-information.

    Detica Limited is registered in England under No: 1337451.
    Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    Please consider the environment before printing this email.

    This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.

    Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory. The contents of this email may relate to dealings with other companies under the control of Detica Limited, details of which can be found at http://www.detica.com/statutory-information.

    Detica Limited is registered in England under No: 1337451.
    Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Jun 28, 2011 at 12:36 pm
    OK, this is why Lucene (and Luke) consider the index fine, ie, if
    Lucene has problems opening segments_N (all 0s is definitely not a
    valid segments_N file), it falls back to the last commit
    (segments_(N-1)) and opens that instead.

    Ie, IR.open and new IW(...) open the last successful commit.

    Mike McCandless

    http://blog.mikemccandless.com
    On Tue, Jun 28, 2011 at 8:28 AM, Tarr, Gregory wrote:
    There was a segments_(N-1), which was a valid segments file and opened correctly in luke.

    The trouble came because we had to manually rename these files in order to prevent the index from being wiped.

    Thanks

    Greg

    -----Original Message-----
    From: Michael McCandless
    Sent: 28 June 2011 13:26
    To: java-user@lucene.apache.org
    Subject: Re: Corrupt segments file full of zeros

    Is there only one segments_N file in the index (the one with all 0s)?
    Or is there a segments_(N-1) too?

    Mike McCandless

    http://blog.mikemccandless.com
    On Tue, Jun 28, 2011 at 8:17 AM, Tarr, Gregory wrote:
    We don't have a -9 in the file. It isn't a valid lucene segments file,
    as it only contains zeros.

    We're wondering why this opens in Luke, and why the CheckIndex reports
    that the index is OK.

    -----Original Message-----
    From: mark harwood
    Sent: 28 June 2011 13:09
    To: java-user@lucene.apache.org
    Subject: Re: Corrupt segments file full of zeros

    According to the spec there should at least be an Int32 of  -9 to
    declare the Format -
    http://lucene.apache.org/java/2_9_3/fileformats.html#Segments File



    ----- Original Message ----
    From: Uwe Schindler <uwe@thetaphi.de>
    To: java-user@lucene.apache.org
    Sent: Tue, 28 June, 2011 12:32:34
    Subject: RE: Corrupt segments file full of zeros

    So where is the problem at all? Why should a segments file not contain
    lots of zeroes? If the index is not corrupt all is fine.

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Tarr, Gregory
    Sent: Tuesday, June 28, 2011 11:56 AM
    To: java-user@lucene.apache.org
    Subject: RE: Corrupt segments file full of zeros

    Yes I have done that, and you just get "No problems were detected
    with this
    index"

    Surely there is a major problem with this index?

    Also the check() procedure takes a long time - is there any way you
    can
    just
    do a health check on the segments file?

    Thanks

    Greg

    -----Original Message-----
    From: Shai Erera
    Sent: 28 June 2011 10:36
    To: java-user@lucene.apache.org
    Subject: Re: Corrupt segments file full of zeros

    You can try the CheckIndex tool. You feed it a directory and call
    .check() and it reports the results.

    Shai

    On Tue, Jun 28, 2011 at 11:46 AM, Tarr, Gregory
    wrote:
    We have a problem with our fileserver where our indexes are hosted
    remotely, using Lucene 2.9.3.

    This can mean that a segments file is written which is full of
    ASCII zeros. Using the od -ah command, we get:

    0000000 nul nul nul nul nul nul nul....etc

    If opened in Luke, the index opens successfully but has zero
    documents.
    Why does this open correctly in luke, and is there a procedure in
    the
    lucene code that can verify a segments file, e.g. check whether it
    refers to any segments?

    Thanks

    Greg


    Please consider the environment before printing this email.

    This message should be regarded as confidential. If you have
    received
    this email in error please notify the sender and destroy it
    immediately.
    Statements of intent shall only become binding when confirmed in
    hard
    copy by an authorised signatory.  The contents of this email may
    relate to dealings with other companies under the control of Detica
    Limited, details of which can be found at
    http://www.detica.com/statutory-information.
    Detica Limited is registered in England under No: 1337451.
    Registered offices: Surrey Research Park, Guildford, Surrey, GU2
    7YP,
    England.
    Please consider the environment before printing this email.

    This message should be regarded as confidential. If you have received this
    email in error please notify the sender and destroy it immediately.

    Statements of intent shall only become binding when confirmed in hard copy
    by an authorised signatory.  The contents of this email may relate to dealings
    with other companies under the control of Detica Limited, details of which
    can be found at http://www.detica.com/statutory-information.

    Detica Limited is registered in England under No: 1337451.
    Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP,
    England.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    Please consider the environment before printing this email.

    This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.

    Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.  The contents of this email may relate to dealings with other companies under the control of Detica Limited, details of which can be found at http://www.detica.com/statutory-information.

    Detica Limited is registered in England under No: 1337451.
    Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    Please consider the environment before printing this email.

    This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.

    Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.  The contents of this email may relate to dealings with other companies under the control of Detica Limited, details of which can be found at http://www.detica.com/statutory-information.

    Detica Limited is registered in England under No: 1337451.
    Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Tarr, Gregory at Jun 28, 2011 at 12:54 pm
    Michael

    We are not using commit points unfortunately.

    This was a scheduled update to our index, and on observation the index directory had two segments_N files:

    segments_4vb (modified 24 June 2011 02:05:38 size 7.61KB)
    segments_4vc (modified 24 June 2011 02:20:42 size 5.91KB)

    We were not sure which one of these was the real one, so we deleted 4vb and got the following from SegmentInfos:

    Directory listing genA=6312
    Fallback check: 6311; 6311
    Segments.gen check: genB=6311
    Index has 0 docs

    We then deleted 4vc and got the following:

    Directory listing genA=6311
    Fallback check: 6311; 6311
    Segments.gen check: genB=6311
    Index has 40022898 docs

    Opening 4vc in an octal editor yields only ASCII zeros (0000000 nul nul nul nul nul nul nul....etc). It may be that Windows is responsible for this, as our indexes are accessed through a fileserver and we know that a delayed write occurred.

    My question is: why does an index with 4vc open?

    Thanks

    Greg

    -----Original Message-----
    From: Michael McCandless
    Sent: 28 June 2011 13:36
    To: java-user@lucene.apache.org
    Subject: Re: Corrupt segments file full of zeros

    OK, this is why Lucene (and Luke) consider the index fine, ie, if Lucene has problems opening segments_N (all 0s is definitely not a valid segments_N file), it falls back to the last commit
    (segments_(N-1)) and opens that instead.

    Ie, IR.open and new IW(...) open the last successful commit.

    Mike McCandless

    http://blog.mikemccandless.com
    On Tue, Jun 28, 2011 at 8:28 AM, Tarr, Gregory wrote:
    There was a segments_(N-1), which was a valid segments file and opened correctly in luke.

    The trouble came because we had to manually rename these files in order to prevent the index from being wiped.

    Thanks

    Greg

    -----Original Message-----
    From: Michael McCandless
    Sent: 28 June 2011 13:26
    To: java-user@lucene.apache.org
    Subject: Re: Corrupt segments file full of zeros

    Is there only one segments_N file in the index (the one with all 0s)?
    Or is there a segments_(N-1) too?

    Mike McCandless

    http://blog.mikemccandless.com
    On Tue, Jun 28, 2011 at 8:17 AM, Tarr, Gregory wrote:
    We don't have a -9 in the file. It isn't a valid lucene segments
    file, as it only contains zeros.

    We're wondering why this opens in Luke, and why the CheckIndex
    reports that the index is OK.

    -----Original Message-----
    From: mark harwood
    Sent: 28 June 2011 13:09
    To: java-user@lucene.apache.org
    Subject: Re: Corrupt segments file full of zeros

    According to the spec there should at least be an Int32 of  -9 to
    declare the Format -
    http://lucene.apache.org/java/2_9_3/fileformats.html#Segments File



    ----- Original Message ----
    From: Uwe Schindler <uwe@thetaphi.de>
    To: java-user@lucene.apache.org
    Sent: Tue, 28 June, 2011 12:32:34
    Subject: RE: Corrupt segments file full of zeros

    So where is the problem at all? Why should a segments file not
    contain lots of zeroes? If the index is not corrupt all is fine.

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Tarr, Gregory
    Sent: Tuesday, June 28, 2011 11:56 AM
    To: java-user@lucene.apache.org
    Subject: RE: Corrupt segments file full of zeros

    Yes I have done that, and you just get "No problems were detected
    with this
    index"

    Surely there is a major problem with this index?

    Also the check() procedure takes a long time - is there any way you
    can
    just
    do a health check on the segments file?

    Thanks

    Greg

    -----Original Message-----
    From: Shai Erera
    Sent: 28 June 2011 10:36
    To: java-user@lucene.apache.org
    Subject: Re: Corrupt segments file full of zeros

    You can try the CheckIndex tool. You feed it a directory and call
    .check() and it reports the results.

    Shai

    On Tue, Jun 28, 2011 at 11:46 AM, Tarr, Gregory
    wrote:
    We have a problem with our fileserver where our indexes are hosted
    remotely, using Lucene 2.9.3.

    This can mean that a segments file is written which is full of
    ASCII zeros. Using the od -ah command, we get:

    0000000 nul nul nul nul nul nul nul....etc

    If opened in Luke, the index opens successfully but has zero
    documents.
    Why does this open correctly in luke, and is there a procedure in
    the
    lucene code that can verify a segments file, e.g. check whether it
    refers to any segments?

    Thanks

    Greg


    Please consider the environment before printing this email.

    This message should be regarded as confidential. If you have
    received
    this email in error please notify the sender and destroy it
    immediately.
    Statements of intent shall only become binding when confirmed in
    hard
    copy by an authorised signatory.  The contents of this email may
    relate to dealings with other companies under the control of
    Detica Limited, details of which can be found at
    http://www.detica.com/statutory-information.
    Detica Limited is registered in England under No: 1337451.
    Registered offices: Surrey Research Park, Guildford, Surrey, GU2
    7YP,
    England.
    Please consider the environment before printing this email.

    This message should be regarded as confidential. If you have
    received this
    email in error please notify the sender and destroy it immediately.

    Statements of intent shall only become binding when confirmed in
    hard copy
    by an authorised signatory.  The contents of this email may relate
    to dealings
    with other companies under the control of Detica Limited, details of which
    can be found at http://www.detica.com/statutory-information.

    Detica Limited is registered in England under No: 1337451.
    Registered offices: Surrey Research Park, Guildford, Surrey, GU2
    7YP, England.

    --------------------------------------------------------------------
    - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    Please consider the environment before printing this email.

    This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.

    Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.  The contents of this email may relate to dealings with other companies under the control of Detica Limited, details of which can be found at http://www.detica.com/statutory-information.

    Detica Limited is registered in England under No: 1337451.
    Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    Please consider the environment before printing this email.

    This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.

    Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.  The contents of this email may relate to dealings with other companies under the control of Detica Limited, details of which can be found at http://www.detica.com/statutory-information.

    Detica Limited is registered in England under No: 1337451.
    Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    Please consider the environment before printing this email.

    This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.

    Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory. The contents of this email may relate to dealings with other companies under the control of Detica Limited, details of which can be found at http://www.detica.com/statutory-information.

    Detica Limited is registered in England under No: 1337451.
    Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Jun 28, 2011 at 1:05 pm

    On Tue, Jun 28, 2011 at 8:53 AM, Tarr, Gregory wrote:
    Michael

    We are not using commit points unfortunately.
    That's fine -- even if you don't keep multiple commit points in your
    index, when a commit() op fails, then you can end up with two
    segments_N files. The older one is "good" (last successful commit)
    and the new one is broken.
    This was a scheduled update to our index, and on observation the index directory had two segments_N files:

    segments_4vb (modified 24 June 2011 02:05:38 size 7.61KB)
    segments_4vc (modified 24 June 2011 02:20:42 size 5.91KB)
    OK, so you have 2 segments_N files because something went wrong during
    commit of the 2nd one.
    We were not sure which one of these was the real one, so we deleted 4vb and got the following from SegmentInfos:
    It will always be the "older" one that was the last successful commit,
    unless you keep multiple commit points in the index.
    Directory listing genA=6312
    Fallback check: 6311; 6311
    Segments.gen check: genB=6311
    Index has 0 docs
    Hmmm -- what code are you running here, to print the number of docs?
    new IndexWriter(), with create=true? I would have expected IR.open to
    throw an exc here.
    We then deleted 4vc and got the following:

    Directory listing genA=6311
    Fallback check: 6311; 6311
    Segments.gen check: genB=6311
    Index has 40022898 docs

    Opening 4vc in an octal editor yields only ASCII zeros (0000000 nul nul nul nul nul nul nul....etc). It may be that Windows is responsible for this, as our indexes are accessed through a fileserver and we know that a delayed write occurred.

    My question is: why does an index with 4vc open?
    I'm not sure, unless you are opening with IW and create=true.

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark harwood at Jun 28, 2011 at 1:30 pm
    Hi Mike.
    Hmmm -- what code are you running here, to print the number of docs?
    SegmentInfos.setInfoStream(System.out);
    FSDirectory dir = FSDirectory.open(new File("j:/indexes/myindex"));
    IndexReader r = IndexReader.open(dir, true);
    System.out.println("index has "+r.maxDoc()+" docs");

    From my own tests outside of Greg's environment I've found Lucene to be doing
    all the right things and IndexReader falls back gracefully to the previous
    commit e.g. here is the output from when I deliberately killed an update after
    prepareToCommit, leaving segment_2 and segment_3 and then vandalised segment_3
    with all zero bytes:
    SIS [main]: directory listing genA=3
    SIS [main]: fallback check: 2; 2
    SIS [main]: segments.gen check: genB=2
    SIS [main]: primary Exception on 'segments_3': java.io.IOException: read past
    EOF'; will retry: retry=false; gen = 3
    SIS [main]: fallback to prior segment file 'segments_2'
    SIS [main]: success on fallback segments_2

    Lucene does the right thing going back to _2. I can't yet see why in Greg's
    environment (NFS based) it fails to see _4vc as corrupt in the same way the
    above test correctly sees _3 as corrupt.

    Cheers
    Mark


    ----- Original Message ----
    From: Michael McCandless <lucene@mikemccandless.com>
    To: java-user@lucene.apache.org
    Sent: Tue, 28 June, 2011 14:04:40
    Subject: Re: Corrupt segments file full of zeros
    On Tue, Jun 28, 2011 at 8:53 AM, Tarr, Gregory wrote:
    Michael

    We are not using commit points unfortunately.
    That's fine -- even if you don't keep multiple commit points in your
    index, when a commit() op fails, then you can end up with two
    segments_N files. The older one is "good" (last successful commit)
    and the new one is broken.
    This was a scheduled update to our index, and on observation the index
    directory had two segments_N files:

    segments_4vb (modified 24 June 2011 02:05:38 size 7.61KB)
    segments_4vc (modified 24 June 2011 02:20:42 size 5.91KB)
    OK, so you have 2 segments_N files because something went wrong during
    commit of the 2nd one.
    We were not sure which one of these was the real one, so we deleted 4vb and got
    the following from SegmentInfos:
    It will always be the "older" one that was the last successful commit,
    unless you keep multiple commit points in the index.
    Directory listing genA=6312
    Fallback check: 6311; 6311
    Segments.gen check: genB=6311
    Index has 0 docs
    Hmmm -- what code are you running here, to print the number of docs?
    new IndexWriter(), with create=true? I would have expected IR.open to
    throw an exc here.
    We then deleted 4vc and got the following:

    Directory listing genA=6311
    Fallback check: 6311; 6311
    Segments.gen check: genB=6311
    Index has 40022898 docs

    Opening 4vc in an octal editor yields only ASCII zeros (0000000 nul nul nul nul
    nul nul nul....etc). It may be that Windows is responsible for this, as our
    indexes are accessed through a fileserver and we know that a delayed write
    occurred.

    My question is: why does an index with 4vc open?
    I'm not sure, unless you are opening with IW and create=true.

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Jun 28, 2011 at 2:00 pm

    On Tue, Jun 28, 2011 at 9:29 AM, mark harwood wrote:
    Hi Mike.
    Hmmm -- what code are you running here, to print the number of docs?
    SegmentInfos.setInfoStream(System.out);
    FSDirectory dir = FSDirectory.open(new File("j:/indexes/myindex"));
    IndexReader r = IndexReader.open(dir, true);
    System.out.println("index has "+r.maxDoc()+" docs");

    From my own tests outside of Greg's environment I've found Lucene to be doing
    all the right things and IndexReader falls back gracefully to the previous
    commit e.g. here is the output from when I deliberately killed an update after
    prepareToCommit, leaving segment_2 and segment_3 and  then vandalised segment_3
    with all zero bytes:
    SIS [main]: directory listing genA=3
    SIS [main]: fallback check: 2; 2
    SIS [main]: segments.gen check: genB=2
    SIS [main]: primary Exception on 'segments_3': java.io.IOException: read past
    EOF'; will retry: retry=false; gen = 3
    SIS [main]: fallback to prior segment file 'segments_2'
    SIS [main]: success on fallback segments_2

    Lucene does the right thing going back to _2. I can't yet see why in Greg's
    environment (NFS based) it fails to see _4vc as corrupt in the same way the
    above test correctly sees _3 as corrupt.
    Hmm. Mark, if you vandalise segments_3 with 0s, and then remove
    segmetns_2, what happens when you try to open the IndexReader? (I
    would expect exc).

    Greg, can you post the full stdout you see from SIS after enabling its
    infoStream in the case that returns an IR with 0 docs (ie when you
    delete segments_4vb).

    Also: if you don't delete any of the segments_N file, and run the same
    code, how many docs do you get?

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark harwood at Jun 28, 2011 at 2:09 pm
    I've got Greg's bad segment file and it does look to be all zeros and if I drop
    it into an existing index directory with the name segment_N+1 it reproduces the
    error i.e. IndexReader opens the index as if it contains zero docs.
    Preparing a Jira as we speak.


    ----- Original Message ----
    From: Michael McCandless <lucene@mikemccandless.com>
    To: java-user@lucene.apache.org
    Sent: Tue, 28 June, 2011 14:59:48
    Subject: Re: Corrupt segments file full of zeros
    On Tue, Jun 28, 2011 at 9:29 AM, mark harwood wrote:
    Hi Mike.
    Hmmm -- what code are you running here, to print the number of docs?
    SegmentInfos.setInfoStream(System.out);
    FSDirectory dir = FSDirectory.open(new File("j:/indexes/myindex"));
    IndexReader r = IndexReader.open(dir, true);
    System.out.println("index has "+r.maxDoc()+" docs");

    From my own tests outside of Greg's environment I've found Lucene to be doing
    all the right things and IndexReader falls back gracefully to the previous
    commit e.g. here is the output from when I deliberately killed an update after
    prepareToCommit, leaving segment_2 and segment_3 and then vandalised segment_3
    with all zero bytes:
    SIS [main]: directory listing genA=3
    SIS [main]: fallback check: 2; 2
    SIS [main]: segments.gen check: genB=2
    SIS [main]: primary Exception on 'segments_3': java.io.IOException: read past
    EOF'; will retry: retry=false; gen = 3
    SIS [main]: fallback to prior segment file 'segments_2'
    SIS [main]: success on fallback segments_2

    Lucene does the right thing going back to _2. I can't yet see why in Greg's
    environment (NFS based) it fails to see _4vc as corrupt in the same way the
    above test correctly sees _3 as corrupt.
    Hmm. Mark, if you vandalise segments_3 with 0s, and then remove
    segmetns_2, what happens when you try to open the IndexReader? (I
    would expect exc).

    Greg, can you post the full stdout you see from SIS after enabling its
    infoStream in the case that returns an IR with 0 docs (ie when you
    delete segments_4vb).

    Also: if you don't delete any of the segments_N file, and run the same
    code, how many docs do you get?

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Jun 28, 2011 at 4:24 pm
    Here's the issue:

    https://issues.apache.org/jira/browse/LUCENE-3255

    It's because we read the first 0 int to be an ancient segments file
    format, and the next 0 int to mean there are no segments. Yuck!

    This format pre-dates Lucene 1.9, so the fix for 3.x is to stop
    supporting this ancient format... but I don't see any easy way to fix
    this pre-3.x where we must (by our back compat rules) support such an
    ancient index.

    Mike McCandless

    http://blog.mikemccandless.com
    On Tue, Jun 28, 2011 at 10:09 AM, mark harwood wrote:
    I've got Greg's bad segment file and it does look to be all zeros and if I drop
    it into an existing index directory with the name segment_N+1 it reproduces the
    error i.e. IndexReader opens the index as if it contains zero docs.
    Preparing a Jira as we speak.


    ----- Original Message ----
    From: Michael McCandless <lucene@mikemccandless.com>
    To: java-user@lucene.apache.org
    Sent: Tue, 28 June, 2011 14:59:48
    Subject: Re: Corrupt segments file full of zeros
    On Tue, Jun 28, 2011 at 9:29 AM, mark harwood wrote:
    Hi Mike.
    Hmmm -- what code are you running here, to print the number of docs?
    SegmentInfos.setInfoStream(System.out);
    FSDirectory dir = FSDirectory.open(new File("j:/indexes/myindex"));
    IndexReader r = IndexReader.open(dir, true);
    System.out.println("index has "+r.maxDoc()+" docs");

    From my own tests outside of Greg's environment I've found Lucene to be doing
    all the right things and IndexReader falls back gracefully to the previous
    commit e.g. here is the output from when I deliberately killed an update after
    prepareToCommit, leaving segment_2 and segment_3 and  then vandalised segment_3
    with all zero bytes:
    SIS [main]: directory listing genA=3
    SIS [main]: fallback check: 2; 2
    SIS [main]: segments.gen check: genB=2
    SIS [main]: primary Exception on 'segments_3': java.io.IOException: read past
    EOF'; will retry: retry=false; gen = 3
    SIS [main]: fallback to prior segment file 'segments_2'
    SIS [main]: success on fallback segments_2

    Lucene does the right thing going back to _2. I can't yet see why in Greg's
    environment (NFS based) it fails to see _4vc as corrupt in the same way the
    above test correctly sees _3 as corrupt.
    Hmm.  Mark, if you vandalise segments_3 with 0s, and then remove
    segmetns_2, what happens when you try to open the IndexReader?  (I
    would expect exc).

    Greg, can you post the full stdout you see from SIS after enabling its
    infoStream in the case that returns an IR with 0 docs (ie when you
    delete segments_4vb).

    Also: if you don't delete any of the segments_N file, and run the same
    code, how many docs do you get?

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Trejkaz at Jun 29, 2011 at 2:45 am

    On Wed, Jun 29, 2011 at 2:24 AM, Michael McCandless wrote:
    Here's the issue:

    https://issues.apache.org/jira/browse/LUCENE-3255

    It's because we read the first 0 int to be an ancient segments file
    format, and the next 0 int to mean there are no segments.  Yuck!

    This format pre-dates Lucene 1.9, so the fix for 3.x is to stop
    supporting this ancient format... but I don't see any easy way to fix
    this pre-3.x where we must (by our back compat rules) support such an
    ancient index.
    It's not possible to do something based on the existence of further
    zeroes after the first 8 bytes? I would expect the original format to
    have no additional data after that, but I don't exactly know whether a
    corrupt file could be exactly 8 bytes long...

    TX

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Jun 29, 2011 at 10:55 am

    On Tue, Jun 28, 2011 at 10:45 PM, Trejkaz wrote:
    On Wed, Jun 29, 2011 at 2:24 AM, Michael McCandless
    wrote:
    Here's the issue:

    https://issues.apache.org/jira/browse/LUCENE-3255

    It's because we read the first 0 int to be an ancient segments file
    format, and the next 0 int to mean there are no segments.  Yuck!

    This format pre-dates Lucene 1.9, so the fix for 3.x is to stop
    supporting this ancient format... but I don't see any easy way to fix
    this pre-3.x where we must (by our back compat rules) support such an
    ancient index.
    It's not possible to do something based on the existence of further
    zeroes after the first 8 bytes?  I would expect the original format to
    have no additional data after that, but I don't exactly know whether a
    corrupt file could be exactly 8 bytes long...
    Yes, you're right, it is! That would work, as long as the all 0s file
    isn't exactly 8 bytes long (this time yours was 20). But then we are
    still vulnerable if the corruption just happens to produce an 8 byte
    all 0s file...

    Simon also had a good idea, which is to check the version of the prior
    segments file, and refuse to accept this ancient version of the newer
    segments if the prior one is "modern".

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 28, '11 at 8:47a
activeJun 29, '11 at 10:55a
posts18
users6
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase