FAQ
Hello,

I'm using Lucene 2.3.2 and had no problems untill now.

But now I got an corrupt index. When searching, a java.lang.OutOfMemoryError is thrown. I've wrote the following test program:

private static void search(String index, String query) throws CorruptIndexException, IOException, ParseException
{
IndexReader reader = IndexReader.open(index);
//reader.setTermInfosIndexDivisor(10);
Collection col = Reader.getFieldNames(IndexReader.FieldOption.INDEXED);
Iterator it = col.iterator();
String[] fields = new String[col.size()];
int i = 0;
while(it.hasNext())
{
fields[i] = (String)it.next();
System.out.println("field["+i+"]: "+fields[i]);
i++;
}
Analyzer analyzer = new StandardAnalyzer();
MultiFieldQueryParser parser = new MultiFieldQueryParser(fields, analyzer);
parser.setAllowLeadingWildcard(true);
Query quer = parser.parse(query);
System.out.println("Query: "+quer.toString());
quer = quer.rewrite(reader);
System.out.println("rewritten Query: "+quer.toString());
reader.close();
}

If reader.setTermInfosIndexDivisor() is commented out, the stacktrace looks like this:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.index.TermInfosReader.ensureIndexIsRead(TermInfosReader.java:155)
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:202)
at org.apache.lucene.index.TermInfosReader.terms(TermInfosReader.java:277)
at org.apache.lucene.index.SegmentReader.terms(SegmentReader.java:643)
at org.apache.lucene.search.PrefixQuery.rewrite(PrefixQuery.java:42)
at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:385)
at diplom.lucene.Index.search(Index.java:59)
at diplom.lucene.Index.main(Index.java:28)

The index has a size of 59 MB, so it is weird to get an OutOfMemoryException. So with reader.setTermInfosIndexDivisor() set to 10, the stacktrace looks like:

java.lang.IndexOutOfBoundsException: Index: 103, Size: 54
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260)
at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:249)
at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:68)
at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:123)
at org.apache.lucene.index.TermInfosReader.ensureIndexIsRead(TermInfosReader.java:159)
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:202)
at org.apache.lucene.index.TermInfosReader.terms(TermInfosReader.java:277)
at org.apache.lucene.index.SegmentReader.terms(SegmentReader.java:643)
at org.apache.lucene.search.PrefixQuery.rewrite(PrefixQuery.java:42)
at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:385)
at diplom.lucene.Index.search(Index.java:59)
at diplom.lucene.Index.main(Index.java:28)


CheckIndex prints the following:

Segments file=segments_1zrx5 numSegments=1 version=FORMAT_SHARED_DOC_STORE [Lucene 2.3]
1 of 1: name=_5pa8 docCount=117378
compound=true
numFiles=1
size (MB)=57,573
no deletions
test: open reader.........OK
test: fields, norms.......OK [54 fields]
test: terms, freq, prox...FAILED
WARNING: would remove reference to this segment (-fix was not specified); full exception:
java.lang.IndexOutOfBoundsException: Index: 110, Size: 54
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260)
at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:249)
at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:68)
at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:123)
at org.apache.lucene.index.CheckIndex.check(CheckIndex.java:182)
at diplom.lucene.Index.check(Index.java:67)
at diplom.lucene.Index.main(Index.java:28)

WARNING: 1 broken segments detected
WARNING: 117378 documents would be lost if -fix were specified

NOTE: would write new segments file [-fix was not specified]

Index correct: false



Recreating the index didn't solve the problem. And I have no idea for solving it, so every help is greatly appreciated.

Thanks in advance.
Rene
--
Aufgepasst: Sind Ihre Daten beim Online-Banking auch optimal geschützt?
Jetzt absichern: https://homebanking.gmx.net/?mc=mail@footer.hb

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Michael McCandless at Mar 23, 2009 at 12:43 pm
    Something appears to be wrong with your _X.tii file (inside the
    compound file).

    Can you post the code that recreates this broken index?

    Since it appears to be repeatable, could you regenerate your index
    with compound file off, confirm the problem still happens, and then
    post the _X.tii file? I can try to look at it.

    Mike

    René Zöpnek wrote:
    Hello,

    I'm using Lucene 2.3.2 and had no problems untill now.

    But now I got an corrupt index. When searching, a
    java.lang.OutOfMemoryError is thrown. I've wrote the following test
    program:

    private static void search(String index, String query) throws
    CorruptIndexException, IOException, ParseException
    {
    IndexReader reader = IndexReader.open(index);
    //reader.setTermInfosIndexDivisor(10);
    Collection col =
    Reader.getFieldNames(IndexReader.FieldOption.INDEXED);
    Iterator it = col.iterator();
    String[] fields = new String[col.size()];
    int i = 0;
    while(it.hasNext())
    {
    fields[i] = (String)it.next();
    System.out.println("field["+i+"]: "+fields[i]);
    i++;
    }
    Analyzer analyzer = new StandardAnalyzer();
    MultiFieldQueryParser parser = new MultiFieldQueryParser(fields,
    analyzer);
    parser.setAllowLeadingWildcard(true);
    Query quer = parser.parse(query);
    System.out.println("Query: "+quer.toString());
    quer = quer.rewrite(reader);
    System.out.println("rewritten Query: "+quer.toString());
    reader.close();
    }

    If reader.setTermInfosIndexDivisor() is commented out, the
    stacktrace looks like this:

    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at
    org
    .apache
    .lucene.index.TermInfosReader.ensureIndexIsRead(TermInfosReader.java:
    155)
    at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:
    202)
    at
    org.apache.lucene.index.TermInfosReader.terms(TermInfosReader.java:
    277)
    at org.apache.lucene.index.SegmentReader.terms(SegmentReader.java:
    643)
    at org.apache.lucene.search.PrefixQuery.rewrite(PrefixQuery.java:42)
    at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:
    385)
    at diplom.lucene.Index.search(Index.java:59)
    at diplom.lucene.Index.main(Index.java:28)

    The index has a size of 59 MB, so it is weird to get an
    OutOfMemoryException. So with reader.setTermInfosIndexDivisor() set
    to 10, the stacktrace looks like:

    java.lang.IndexOutOfBoundsException: Index: 103, Size: 54
    at java.util.ArrayList.RangeCheck(ArrayList.java:547)
    at java.util.ArrayList.get(ArrayList.java:322)
    at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260)
    at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:249)
    at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:68)
    at
    org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:123)
    at
    org
    .apache
    .lucene.index.TermInfosReader.ensureIndexIsRead(TermInfosReader.java:
    159)
    at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:
    202)
    at
    org.apache.lucene.index.TermInfosReader.terms(TermInfosReader.java:
    277)
    at org.apache.lucene.index.SegmentReader.terms(SegmentReader.java:
    643)
    at org.apache.lucene.search.PrefixQuery.rewrite(PrefixQuery.java:42)
    at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:
    385)
    at diplom.lucene.Index.search(Index.java:59)
    at diplom.lucene.Index.main(Index.java:28)


    CheckIndex prints the following:

    Segments file=segments_1zrx5 numSegments=1
    version=FORMAT_SHARED_DOC_STORE [Lucene 2.3]
    1 of 1: name=_5pa8 docCount=117378
    compound=true
    numFiles=1
    size (MB)=57,573
    no deletions
    test: open reader.........OK
    test: fields, norms.......OK [54 fields]
    test: terms, freq, prox...FAILED
    WARNING: would remove reference to this segment (-fix was not
    specified); full exception:
    java.lang.IndexOutOfBoundsException: Index: 110, Size: 54
    at java.util.ArrayList.RangeCheck(ArrayList.java:547)
    at java.util.ArrayList.get(ArrayList.java:322)
    at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260)
    at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:249)
    at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:68)
    at
    org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:123)
    at org.apache.lucene.index.CheckIndex.check(CheckIndex.java:182)
    at diplom.lucene.Index.check(Index.java:67)
    at diplom.lucene.Index.main(Index.java:28)

    WARNING: 1 broken segments detected
    WARNING: 117378 documents would be lost if -fix were specified

    NOTE: would write new segments file [-fix was not specified]

    Index correct: false



    Recreating the index didn't solve the problem. And I have no idea
    for solving it, so every help is greatly appreciated.

    Thanks in advance.
    Rene
    --
    Aufgepasst: Sind Ihre Daten beim Online-Banking auch optimal
    geschützt?
    Jetzt absichern: https://homebanking.gmx.net/?mc=mail@footer.hb

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • René Zöpnek at Mar 24, 2009 at 8:03 am
    Thanks for your answer, Mike.

    Unfortunately I have no direct access to the server with the corrupt index. So changing the creation process of the index is not possible.

    I've uploaded the index to http://drop.io/hlu53sl (9 MB).



    Here is the code for creating the index:

    public static void createIndex()
    {
    log.info("create index");
    long start = System.currentTimeMillis();
    IndexWriter index = null;
    InitialContext ic = null;
    Connection connect = null;
    PreparedStatement query = null;
    ResultSet result = null;
    try{
    //create index
    index = new IndexWriter("/var/content/index", getAnalyzer(), true);

    //get content data
    ic = new InitialContext();
    javax.sql.DataSource source = (javax.sql.DataSource) ic.lookup("java:/ContentDS");
    connect = source.getConnection();
    query = connect.prepareStatement("SELECT DISTINCT C.* FROM TAB_CONTENT C,TAB_PROJCONTENT PC WHERE C.CONTENT_ID = PC.CONTENT_ID AND NOT C.STORAGE = 'CONTAINER'");
    result = query.executeQuery();

    while(result.next())
    {
    // map file info
    TabContentData data = TabContentMapper.getMapped(result);
    // index metadata
    try{
    indexMetadata(data, index);
    }catch(Exception e)
    {
    log.error("Failed to index "+data.getFileId()+" with id "+data.getContentId(),e);
    }
    }
    log.info("indexing done");
    }catch(Exception e)
    {
    log.error("create index failed",e);
    }
    finally
    {
    //clean up
    try{ result.close(); }catch(Exception e){};
    try{ query.close(); }catch(Exception e){};
    try{ connect.close(); }catch(Exception e){};
    try{ ic.close(); }catch(Exception e){};
    try{ index.optimize(); }catch(Exception e){};
    try{ index.close(); }catch(Exception e){};
    }
    }

    The indexMetadata(data, index); method just maps the column names and the column contents of one content into a lucene document which is then added to the index.


    If you have any further questions, don't hesitate to ask and thank you for your help.

    Greetz!
    René



    Michael McCandless schrieb:
    Something appears to be wrong with your _X.tii file (inside the compound file).

    Can you post the code that recreates this broken index?

    Since it appears to be repeatable, could you regenerate your index with compound file off, confirm the problem still happens, and then post the _X.tii file? I can try to look at it.

    Mike

    René Zöpnek wrote:
    Hello,

    I'm using Lucene 2.3.2 and had no problems untill now.

    But now I got an corrupt index. When searching, a java.lang.OutOfMemoryError is thrown. I've wrote the following test program:

    private static void search(String index, String query) throws CorruptIndexException, IOException, ParseException
    {
    IndexReader reader = IndexReader.open(index);
    //reader.setTermInfosIndexDivisor(10);
    Collection col = Reader.getFieldNames(IndexReader.FieldOption.INDEXED);
    Iterator it = col.iterator();
    String[] fields = new String[col.size()];
    int i = 0;
    while(it.hasNext())
    {
    fields[i] = (String)it.next();
    System.out.println("field["+i+"]: "+fields[i]);
    i++;
    }
    Analyzer analyzer = new StandardAnalyzer();
    MultiFieldQueryParser parser = new MultiFieldQueryParser(fields, analyzer);
    parser.setAllowLeadingWildcard(true);
    Query quer = parser.parse(query);
    System.out.println("Query: "+quer.toString());
    quer = quer.rewrite(reader);
    System.out.println("rewritten Query: "+quer.toString());
    reader.close();
    }

    If reader.setTermInfosIndexDivisor() is commented out, the stacktrace looks like this:

    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at org.apache.lucene.index.TermInfosReader.ensureIndexIsRead(TermInfosReader.java:155)
    at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:202)
    at org.apache.lucene.index.TermInfosReader.terms(TermInfosReader.java:277)
    at org.apache.lucene.index.SegmentReader.terms(SegmentReader.java:643)
    at org.apache.lucene.search.PrefixQuery.rewrite(PrefixQuery.java:42)
    at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:385)
    at diplom.lucene.Index.search(Index.java:59)
    at diplom.lucene.Index.main(Index.java:28)

    The index has a size of 59 MB, so it is weird to get an OutOfMemoryException. So with reader.setTermInfosIndexDivisor() set to 10, the stacktrace looks like:

    java.lang.IndexOutOfBoundsException: Index: 103, Size: 54
    at java.util.ArrayList.RangeCheck(ArrayList.java:547)
    at java.util.ArrayList.get(ArrayList.java:322)
    at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260)
    at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:249)
    at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:68)
    at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:123)
    at org.apache.lucene.index.TermInfosReader.ensureIndexIsRead(TermInfosReader.java:159)
    at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:202)
    at org.apache.lucene.index.TermInfosReader.terms(TermInfosReader.java:277)
    at org.apache.lucene.index.SegmentReader.terms(SegmentReader.java:643)
    at org.apache.lucene.search.PrefixQuery.rewrite(PrefixQuery.java:42)
    at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:385)
    at diplom.lucene.Index.search(Index.java:59)
    at diplom.lucene.Index.main(Index.java:28)


    CheckIndex prints the following:

    Segments file=segments_1zrx5 numSegments=1 version=FORMAT_SHARED_DOC_STORE [Lucene 2.3]
    1 of 1: name=_5pa8 docCount=117378
    compound=true
    numFiles=1
    size (MB)=57,573
    no deletions
    test: open reader.........OK
    test: fields, norms.......OK [54 fields]
    test: terms, freq, prox...FAILED
    WARNING: would remove reference to this segment (-fix was not specified); full exception:
    java.lang.IndexOutOfBoundsException: Index: 110, Size: 54
    at java.util.ArrayList.RangeCheck(ArrayList.java:547)
    at java.util.ArrayList.get(ArrayList.java:322)
    at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260)
    at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:249)
    at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:68)
    at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:123)
    at org.apache.lucene.index.CheckIndex.check(CheckIndex.java:182)
    at diplom.lucene.Index.check(Index.java:67)
    at diplom.lucene.Index.main(Index.java:28)

    WARNING: 1 broken segments detected
    WARNING: 117378 documents would be lost if -fix were specified

    NOTE: would write new segments file [-fix was not specified]

    Index correct: false



    Recreating the index didn't solve the problem. And I have no idea for solving it, so every help is greatly appreciated.

    Thanks in advance.
    Rene
    --
    Aufgepasst: Sind Ihre Daten beim Online-Banking auch optimal geschützt?
    Jetzt absichern: https://homebanking.gmx.net/?mc=mail@footer.hb

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    Aufgepasst: Sind Ihre Daten beim Online-Banking auch optimal geschützt?
    Jetzt absichern: https://homebanking.gmx.net/?mc=mail@footer.hb

    --
    Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Mar 24, 2009 at 11:47 am
    Instead of ignoring the exceptions in your finally clause, can you log
    them? It could be something interesting is happening in there...

    I'll have a look at the index.

    Mike

    "René Zöpnek" wrote:
    Thanks for your answer, Mike.

    Unfortunately I have no direct access to the server with the corrupt index. So changing the creation process of the index is not possible.

    I've uploaded the index to http://drop.io/hlu53sl (9 MB).



    Here is the code for creating the index:

    public static void createIndex()
    {
    log.info("create index");
    long start = System.currentTimeMillis();
    IndexWriter index = null;
    InitialContext ic = null;
    Connection connect = null;
    PreparedStatement query = null;
    ResultSet result = null;
    try{
    //create index
    index = new IndexWriter("/var/content/index", getAnalyzer(), true);

    //get content data
    ic = new InitialContext();
    javax.sql.DataSource source = (javax.sql.DataSource) ic.lookup("java:/ContentDS");
    connect = source.getConnection();
    query = connect.prepareStatement("SELECT DISTINCT C.* FROM TAB_CONTENT C,TAB_PROJCONTENT PC WHERE C.CONTENT_ID = PC.CONTENT_ID AND NOT C.STORAGE = 'CONTAINER'");
    result = query.executeQuery();

    while(result.next())
    {
    // map file info
    TabContentData data = TabContentMapper.getMapped(result);
    // index metadata
    try{
    indexMetadata(data, index);
    }catch(Exception e)
    {
    log.error("Failed to index "+data.getFileId()+" with id "+data.getContentId(),e);
    }
    }
    log.info("indexing done");
    }catch(Exception e)
    {
    log.error("create index failed",e);
    }
    finally
    {
    //clean up
    try{ result.close(); }catch(Exception e){};
    try{ query.close(); }catch(Exception e){};
    try{ connect.close(); }catch(Exception e){};
    try{ ic.close(); }catch(Exception e){};
    try{ index.optimize(); }catch(Exception e){};
    try{ index.close(); }catch(Exception e){};
    }
    }

    The indexMetadata(data, index); method just maps the column names and the column contents of one content into a lucene document which is then added to the index.


    If you have any further questions, don't hesitate to ask and thank you for your help.

    Greetz!
    René



    Michael McCandless schrieb:
    Something appears to be wrong with your _X.tii file (inside the compound file).

    Can you post the code that recreates this broken index?

    Since it appears to be repeatable, could you regenerate your index with compound file off, confirm the problem still happens, and then post the _X.tii file?  I can try to look at it.

    Mike

    René Zöpnek wrote:
    Hello,

    I'm using Lucene 2.3.2 and had no problems untill now.

    But now I got an corrupt index. When searching, a java.lang.OutOfMemoryError is thrown. I've wrote the following test program:

    private static void search(String index, String query) throws CorruptIndexException, IOException, ParseException
    {
    IndexReader reader = IndexReader.open(index);
    //reader.setTermInfosIndexDivisor(10);
    Collection col = Reader.getFieldNames(IndexReader.FieldOption.INDEXED);
    Iterator it = col.iterator();
    String[] fields = new String[col.size()];
    int i = 0;
    while(it.hasNext())
    {
    fields[i] = (String)it.next();
    System.out.println("field["+i+"]: "+fields[i]);
    i++;
    }
    Analyzer analyzer = new StandardAnalyzer();
    MultiFieldQueryParser parser = new MultiFieldQueryParser(fields, analyzer);
    parser.setAllowLeadingWildcard(true);
    Query quer = parser.parse(query);
    System.out.println("Query: "+quer.toString());
    quer = quer.rewrite(reader);
    System.out.println("rewritten Query: "+quer.toString());
    reader.close();
    }

    If reader.setTermInfosIndexDivisor() is commented out, the stacktrace looks like this:

    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at org.apache.lucene.index.TermInfosReader.ensureIndexIsRead(TermInfosReader.java:155)
    at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:202)
    at org.apache.lucene.index.TermInfosReader.terms(TermInfosReader.java:277)
    at org.apache.lucene.index.SegmentReader.terms(SegmentReader.java:643)
    at org.apache.lucene.search.PrefixQuery.rewrite(PrefixQuery.java:42)
    at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:385)
    at diplom.lucene.Index.search(Index.java:59)
    at diplom.lucene.Index.main(Index.java:28)

    The index has a size of 59 MB, so it is weird to get an OutOfMemoryException. So with reader.setTermInfosIndexDivisor() set to 10, the stacktrace looks like:

    java.lang.IndexOutOfBoundsException: Index: 103, Size: 54
    at java.util.ArrayList.RangeCheck(ArrayList.java:547)
    at java.util.ArrayList.get(ArrayList.java:322)
    at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260)
    at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:249)
    at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:68)
    at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:123)
    at org.apache.lucene.index.TermInfosReader.ensureIndexIsRead(TermInfosReader.java:159)
    at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:202)
    at org.apache.lucene.index.TermInfosReader.terms(TermInfosReader.java:277)
    at org.apache.lucene.index.SegmentReader.terms(SegmentReader.java:643)
    at org.apache.lucene.search.PrefixQuery.rewrite(PrefixQuery.java:42)
    at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:385)
    at diplom.lucene.Index.search(Index.java:59)
    at diplom.lucene.Index.main(Index.java:28)


    CheckIndex prints the following:

    Segments file=segments_1zrx5 numSegments=1 version=FORMAT_SHARED_DOC_STORE [Lucene 2.3]
    1 of 1: name=_5pa8 docCount=117378
    compound=true
    numFiles=1
    size (MB)=57,573
    no deletions
    test: open reader.........OK
    test: fields, norms.......OK [54 fields]
    test: terms, freq, prox...FAILED
    WARNING: would remove reference to this segment (-fix was not specified); full exception:
    java.lang.IndexOutOfBoundsException: Index: 110, Size: 54
    at java.util.ArrayList.RangeCheck(ArrayList.java:547)
    at java.util.ArrayList.get(ArrayList.java:322)
    at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260)
    at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:249)
    at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:68)
    at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:123)
    at org.apache.lucene.index.CheckIndex.check(CheckIndex.java:182)
    at diplom.lucene.Index.check(Index.java:67)
    at diplom.lucene.Index.main(Index.java:28)

    WARNING: 1 broken segments detected
    WARNING: 117378 documents would be lost if -fix were specified

    NOTE: would write new segments file [-fix was not specified]

    Index correct: false



    Recreating the index didn't solve the problem. And I have no idea for solving it, so every help is greatly appreciated.

    Thanks in advance.
    Rene
    --
    Aufgepasst: Sind Ihre Daten beim Online-Banking auch optimal geschützt?
    Jetzt absichern: https://homebanking.gmx.net/?mc=mail@footer.hb

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    Aufgepasst: Sind Ihre Daten beim Online-Banking auch optimal geschützt?
    Jetzt absichern: https://homebanking.gmx.net/?mc=mail@footer.hb

    --
    Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Mar 24, 2009 at 11:55 am
    When I run checkIndex on your index, I see a new exception:

    org.apache.lucene.index.CorruptIndexException: Incompatible format
    version: 119865344 expected 1 or lower
    at org.apache.lucene.index.FieldsReader.(SegmentReader.java:519)
    at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:468)
    at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:382)
    at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:395)
    at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:689)

    Is this what you see?

    Mike

    "René Zöpnek" wrote:
    Thanks for your answer, Mike.

    Unfortunately I have no direct access to the server with the corrupt index. So changing the creation process of the index is not possible.

    I've uploaded the index to http://drop.io/hlu53sl (9 MB).



    Here is the code for creating the index:

    public static void createIndex()
    {
    log.info("create index");
    long start = System.currentTimeMillis();
    IndexWriter index = null;
    InitialContext ic = null;
    Connection connect = null;
    PreparedStatement query = null;
    ResultSet result = null;
    try{
    //create index
    index = new IndexWriter("/var/content/index", getAnalyzer(), true);

    //get content data
    ic = new InitialContext();
    javax.sql.DataSource source = (javax.sql.DataSource) ic.lookup("java:/ContentDS");
    connect = source.getConnection();
    query = connect.prepareStatement("SELECT DISTINCT C.* FROM TAB_CONTENT C,TAB_PROJCONTENT PC WHERE C.CONTENT_ID = PC.CONTENT_ID AND NOT C.STORAGE = 'CONTAINER'");
    result = query.executeQuery();

    while(result.next())
    {
    // map file info
    TabContentData data = TabContentMapper.getMapped(result);
    // index metadata
    try{
    indexMetadata(data, index);
    }catch(Exception e)
    {
    log.error("Failed to index "+data.getFileId()+" with id "+data.getContentId(),e);
    }
    }
    log.info("indexing done");
    }catch(Exception e)
    {
    log.error("create index failed",e);
    }
    finally
    {
    //clean up
    try{ result.close(); }catch(Exception e){};
    try{ query.close(); }catch(Exception e){};
    try{ connect.close(); }catch(Exception e){};
    try{ ic.close(); }catch(Exception e){};
    try{ index.optimize(); }catch(Exception e){};
    try{ index.close(); }catch(Exception e){};
    }
    }

    The indexMetadata(data, index); method just maps the column names and the column contents of one content into a lucene document which is then added to the index.


    If you have any further questions, don't hesitate to ask and thank you for your help.

    Greetz!
    René



    Michael McCandless schrieb:
    Something appears to be wrong with your _X.tii file (inside the compound file).

    Can you post the code that recreates this broken index?

    Since it appears to be repeatable, could you regenerate your index with compound file off, confirm the problem still happens, and then post the _X.tii file?  I can try to look at it.

    Mike

    René Zöpnek wrote:
    Hello,

    I'm using Lucene 2.3.2 and had no problems untill now.

    But now I got an corrupt index. When searching, a java.lang.OutOfMemoryError is thrown. I've wrote the following test program:

    private static void search(String index, String query) throws CorruptIndexException, IOException, ParseException
    {
    IndexReader reader = IndexReader.open(index);
    //reader.setTermInfosIndexDivisor(10);
    Collection col = Reader.getFieldNames(IndexReader.FieldOption.INDEXED);
    Iterator it = col.iterator();
    String[] fields = new String[col.size()];
    int i = 0;
    while(it.hasNext())
    {
    fields[i] = (String)it.next();
    System.out.println("field["+i+"]: "+fields[i]);
    i++;
    }
    Analyzer analyzer = new StandardAnalyzer();
    MultiFieldQueryParser parser = new MultiFieldQueryParser(fields, analyzer);
    parser.setAllowLeadingWildcard(true);
    Query quer = parser.parse(query);
    System.out.println("Query: "+quer.toString());
    quer = quer.rewrite(reader);
    System.out.println("rewritten Query: "+quer.toString());
    reader.close();
    }

    If reader.setTermInfosIndexDivisor() is commented out, the stacktrace looks like this:

    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at org.apache.lucene.index.TermInfosReader.ensureIndexIsRead(TermInfosReader.java:155)
    at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:202)
    at org.apache.lucene.index.TermInfosReader.terms(TermInfosReader.java:277)
    at org.apache.lucene.index.SegmentReader.terms(SegmentReader.java:643)
    at org.apache.lucene.search.PrefixQuery.rewrite(PrefixQuery.java:42)
    at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:385)
    at diplom.lucene.Index.search(Index.java:59)
    at diplom.lucene.Index.main(Index.java:28)

    The index has a size of 59 MB, so it is weird to get an OutOfMemoryException. So with reader.setTermInfosIndexDivisor() set to 10, the stacktrace looks like:

    java.lang.IndexOutOfBoundsException: Index: 103, Size: 54
    at java.util.ArrayList.RangeCheck(ArrayList.java:547)
    at java.util.ArrayList.get(ArrayList.java:322)
    at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260)
    at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:249)
    at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:68)
    at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:123)
    at org.apache.lucene.index.TermInfosReader.ensureIndexIsRead(TermInfosReader.java:159)
    at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:202)
    at org.apache.lucene.index.TermInfosReader.terms(TermInfosReader.java:277)
    at org.apache.lucene.index.SegmentReader.terms(SegmentReader.java:643)
    at org.apache.lucene.search.PrefixQuery.rewrite(PrefixQuery.java:42)
    at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:385)
    at diplom.lucene.Index.search(Index.java:59)
    at diplom.lucene.Index.main(Index.java:28)


    CheckIndex prints the following:

    Segments file=segments_1zrx5 numSegments=1 version=FORMAT_SHARED_DOC_STORE [Lucene 2.3]
    1 of 1: name=_5pa8 docCount=117378
    compound=true
    numFiles=1
    size (MB)=57,573
    no deletions
    test: open reader.........OK
    test: fields, norms.......OK [54 fields]
    test: terms, freq, prox...FAILED
    WARNING: would remove reference to this segment (-fix was not specified); full exception:
    java.lang.IndexOutOfBoundsException: Index: 110, Size: 54
    at java.util.ArrayList.RangeCheck(ArrayList.java:547)
    at java.util.ArrayList.get(ArrayList.java:322)
    at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260)
    at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:249)
    at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:68)
    at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:123)
    at org.apache.lucene.index.CheckIndex.check(CheckIndex.java:182)
    at diplom.lucene.Index.check(Index.java:67)
    at diplom.lucene.Index.main(Index.java:28)

    WARNING: 1 broken segments detected
    WARNING: 117378 documents would be lost if -fix were specified

    NOTE: would write new segments file [-fix was not specified]

    Index correct: false



    Recreating the index didn't solve the problem. And I have no idea for solving it, so every help is greatly appreciated.

    Thanks in advance.
    Rene
    --
    Aufgepasst: Sind Ihre Daten beim Online-Banking auch optimal geschützt?
    Jetzt absichern: https://homebanking.gmx.net/?mc=mail@footer.hb

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    Aufgepasst: Sind Ihre Daten beim Online-Banking auch optimal geschützt?
    Jetzt absichern: https://homebanking.gmx.net/?mc=mail@footer.hb

    --
    Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • René Zöpnek at Mar 24, 2009 at 9:00 pm
    Thank you for your help Michael. I've solved the problem by new creation of the index.
    The OutOfErrorException killed the thread, which was responsible for index maintenance.
    So the index recreation failed without an error message. So after recreating the index,
    the problem is solved.

    Sorry for the inconvenience.

    Greetz
    René

    Michael McCandless schrieb:
    When I run checkIndex on your index, I see a new exception:

    org.apache.lucene.index.CorruptIndexException: Incompatible format
    version: 119865344 expected 1 or lower
    at org.apache.lucene.index.FieldsReader.<init>(FieldsReader.java:116)
    at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:519)
    at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:468)
    at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:382)
    at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:395)
    at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:689)

    Is this what you see?

    Mike

    "René Zöpnek" wrote:
    Thanks for your answer, Mike.

    Unfortunately I have no direct access to the server with the corrupt index. So changing the creation process of the index is not possible.

    I've uploaded the index to http://drop.io/hlu53sl (9 MB).



    Here is the code for creating the index:

    public static void createIndex()
    {
    log.info("create index");
    long start = System.currentTimeMillis();
    IndexWriter index = null;
    InitialContext ic = null;
    Connection connect = null;
    PreparedStatement query = null;
    ResultSet result = null;
    try{
    //create index
    index = new IndexWriter("/var/content/index", getAnalyzer(), true);

    //get content data
    ic = new InitialContext();
    javax.sql.DataSource source = (javax.sql.DataSource) ic.lookup("java:/ContentDS");
    connect = source.getConnection();
    query = connect.prepareStatement("SELECT DISTINCT C.* FROM TAB_CONTENT C,TAB_PROJCONTENT PC WHERE C.CONTENT_ID = PC.CONTENT_ID AND NOT C.STORAGE = 'CONTAINER'");
    result = query.executeQuery();

    while(result.next())
    {
    // map file info
    TabContentData data = TabContentMapper.getMapped(result);
    // index metadata
    try{
    indexMetadata(data, index);
    }catch(Exception e)
    {
    log.error("Failed to index "+data.getFileId()+" with id "+data.getContentId(),e);
    }
    }
    log.info("indexing done");
    }catch(Exception e)
    {
    log.error("create index failed",e);
    }
    finally
    {
    //clean up
    try{ result.close(); }catch(Exception e){};
    try{ query.close(); }catch(Exception e){};
    try{ connect.close(); }catch(Exception e){};
    try{ ic.close(); }catch(Exception e){};
    try{ index.optimize(); }catch(Exception e){};
    try{ index.close(); }catch(Exception e){};
    }
    }

    The indexMetadata(data, index); method just maps the column names and the column contents of one content into a lucene document which is then added to the index.


    If you have any further questions, don't hesitate to ask and thank you for your help.

    Greetz!
    René



    Michael McCandless schrieb:
    Something appears to be wrong with your _X.tii file (inside the compound file).

    Can you post the code that recreates this broken index?

    Since it appears to be repeatable, could you regenerate your index with compound file off, confirm the problem still happens, and then post the _X.tii file? I can try to look at it.

    Mike

    René Zöpnek wrote:

    Hello,

    I'm using Lucene 2.3.2 and had no problems untill now.

    But now I got an corrupt index. When searching, a java.lang.OutOfMemoryError is thrown. I've wrote the following test program:

    private static void search(String index, String query) throws CorruptIndexException, IOException, ParseException
    {
    IndexReader reader = IndexReader.open(index);
    //reader.setTermInfosIndexDivisor(10);
    Collection col = Reader.getFieldNames(IndexReader.FieldOption.INDEXED);
    Iterator it = col.iterator();
    String[] fields = new String[col.size()];
    int i = 0;
    while(it.hasNext())
    {
    fields[i] = (String)it.next();
    System.out.println("field["+i+"]: "+fields[i]);
    i++;
    }
    Analyzer analyzer = new StandardAnalyzer();
    MultiFieldQueryParser parser = new MultiFieldQueryParser(fields, analyzer);
    parser.setAllowLeadingWildcard(true);
    Query quer = parser.parse(query);
    System.out.println("Query: "+quer.toString());
    quer = quer.rewrite(reader);
    System.out.println("rewritten Query: "+quer.toString());
    reader.close();
    }

    If reader.setTermInfosIndexDivisor() is commented out, the stacktrace looks like this:

    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at org.apache.lucene.index.TermInfosReader.ensureIndexIsRead(TermInfosReader.java:155)
    at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:202)
    at org.apache.lucene.index.TermInfosReader.terms(TermInfosReader.java:277)
    at org.apache.lucene.index.SegmentReader.terms(SegmentReader.java:643)
    at org.apache.lucene.search.PrefixQuery.rewrite(PrefixQuery.java:42)
    at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:385)
    at diplom.lucene.Index.search(Index.java:59)
    at diplom.lucene.Index.main(Index.java:28)

    The index has a size of 59 MB, so it is weird to get an OutOfMemoryException. So with reader.setTermInfosIndexDivisor() set to 10, the stacktrace looks like:

    java.lang.IndexOutOfBoundsException: Index: 103, Size: 54
    at java.util.ArrayList.RangeCheck(ArrayList.java:547)
    at java.util.ArrayList.get(ArrayList.java:322)
    at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260)
    at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:249)
    at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:68)
    at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:123)
    at org.apache.lucene.index.TermInfosReader.ensureIndexIsRead(TermInfosReader.java:159)
    at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:202)
    at org.apache.lucene.index.TermInfosReader.terms(TermInfosReader.java:277)
    at org.apache.lucene.index.SegmentReader.terms(SegmentReader.java:643)
    at org.apache.lucene.search.PrefixQuery.rewrite(PrefixQuery.java:42)
    at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:385)
    at diplom.lucene.Index.search(Index.java:59)
    at diplom.lucene.Index.main(Index.java:28)


    CheckIndex prints the following:

    Segments file=segments_1zrx5 numSegments=1 version=FORMAT_SHARED_DOC_STORE [Lucene 2.3]
    1 of 1: name=_5pa8 docCount=117378
    compound=true
    numFiles=1
    size (MB)=57,573
    no deletions
    test: open reader.........OK
    test: fields, norms.......OK [54 fields]
    test: terms, freq, prox...FAILED
    WARNING: would remove reference to this segment (-fix was not specified); full exception:
    java.lang.IndexOutOfBoundsException: Index: 110, Size: 54
    at java.util.ArrayList.RangeCheck(ArrayList.java:547)
    at java.util.ArrayList.get(ArrayList.java:322)
    at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260)
    at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:249)
    at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:68)
    at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:123)
    at org.apache.lucene.index.CheckIndex.check(CheckIndex.java:182)
    at diplom.lucene.Index.check(Index.java:67)
    at diplom.lucene.Index.main(Index.java:28)

    WARNING: 1 broken segments detected
    WARNING: 117378 documents would be lost if -fix were specified

    NOTE: would write new segments file [-fix was not specified]

    Index correct: false



    Recreating the index didn't solve the problem. And I have no idea for solving it, so every help is greatly appreciated.

    Thanks in advance.
    Rene
    --
    Aufgepasst: Sind Ihre Daten beim Online-Banking auch optimal geschützt?
    Jetzt absichern: https://homebanking.gmx.net/?mc=mail@footer.hb

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    Aufgepasst: Sind Ihre Daten beim Online-Banking auch optimal geschützt?
    Jetzt absichern: https://homebanking.gmx.net/?mc=mail@footer.hb

    --
    Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    Aufgepasst: Sind Ihre Daten beim Online-Banking auch optimal geschützt?
    Jetzt absichern: https://homebanking.gmx.net/?mc=mail@footer.hb

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMar 23, '09 at 10:43a
activeMar 24, '09 at 9:00p
posts6
users2
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase