FAQ
While indexing using
contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker. The
asserion error is from TermsHashPerField.comparePostings(RawPostingList p1,
RawPostingList p2). A Payload is added to the document representing a UID.
Only 1-2 out of 1 million documents indexed generates this error.

java.lang.AssertionError
problem adding
doc:Document<stored/uncompressed,indexed,tokenized<body:[[Image:Croatia,
Washington.JPG|right|250px|thumb|The Croatian embassy]] The '''Croatian
Embassy in Washington''' is the [[embassy]] of [[Croatia]] in [[Washington,
D.C.]] It is located on [[Embassy Row]] at 2343 [[Massachusetts Avenue
(Washington, DC)|Massachusetts Avenue]], [[Washington DC
(northwest)|Northwest]] near [[Dupont Circle]]. Previously the building had
been home to the [[Austrian Embassy in Washington|Austrian embassy]], but
they left for larger quarters and sold the structure to Croatia in 1993.
The purchase and renovation of the building was largely paid for by the
[[Croatian-American]] community. In front of the embassy is a large
sculpture of [[St. Jerome]] by Croatian sculptor [[Ivan Me?trovi?]].
==External link== *[http://www.croatiaemb.org/ Official site]
[[Category:Embassies in Washington|Croatia]] [[Category:Foreign relations of
Croatia]]> stored/uncompressed,indexed,tokenized<doctitle:Embassy of Croatia
in Washington> stored/uncompressed,indexed,tokenized<docdate:29-JUN-2006
07:27:44.000> stored/uncompressed,indexed,omitNorms<docid:1703107>
indexed,tokenized<_ID:proj.zoie.api.ZoieIndexReader$SinglePayloadTokenStream@e7b3cf>
indexed<id:667162>> ex: java.lang.AssertionError
at
org.apache.lucene.index.TermsHashPerField.comparePostings(TermsHashPerField.java:228)
at
org.apache.lucene.index.TermsHashPerField.quickSort(TermsHashPerField.java:144)
at
org.apache.lucene.index.TermsHashPerField.sortPostings(TermsHashPerField.java:136)
at
org.apache.lucene.index.FreqProxFieldMergeState.(FreqProxTermsWriter.java:202)
at
org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:132)
at org.apache.lucene.index.TermsHash.flush(TermsHash.java:145)
at org.apache.lucene.index.DocInverter.flush(DocInverter.java:74)
at
org.apache.lucene.index.DocFieldConsumers.flush(DocFieldConsumers.java:75)
at
org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
at
org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:574)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3533)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3442)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1922)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1880)

Search Discussions

  • Michael McCandless at Mar 24, 2009 at 7:26 pm
    Hmmmm.

    Jason is this easily/compactly repeated? EG, try to index the N docs
    before that one.

    If you remove the SinglePayloadTokenStream field, does the exception
    still happen?

    Mike

    Jason Rutherglen wrote:
    While indexing using
    contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker.  The
    asserion error is from TermsHashPerField.comparePostings(RawPostingList p1,
    RawPostingList p2).  A Payload is added to the document representing a UID.
    Only 1-2 out of 1 million documents indexed generates this error.

    java.lang.AssertionError
    problem adding
    doc:Document<stored/uncompressed,indexed,tokenized<body:[[Image:Croatia,
    Washington.JPG|right|250px|thumb|The Croatian embassy]] The '''Croatian
    Embassy in Washington''' is the [[embassy]] of [[Croatia]] in [[Washington,
    D.C.]]  It is located on [[Embassy Row]] at 2343 [[Massachusetts Avenue
    (Washington, DC)|Massachusetts Avenue]], [[Washington DC
    (northwest)|Northwest]] near [[Dupont Circle]].  Previously the building had
    been home to the [[Austrian Embassy in Washington|Austrian embassy]], but
    they left for larger quarters and sold the structure to Croatia in 1993.
    The purchase and renovation of the building was largely paid for by the
    [[Croatian-American]] community.  In front of the embassy is a large
    sculpture of [[St. Jerome]] by Croatian sculptor [[Ivan Me?trovi?]].
    ==External link== *[http://www.croatiaemb.org/ Official site]
    [[Category:Embassies in Washington|Croatia]] [[Category:Foreign relations of
    Croatia]]> stored/uncompressed,indexed,tokenized<doctitle:Embassy of Croatia
    in Washington> stored/uncompressed,indexed,tokenized<docdate:29-JUN-2006
    07:27:44.000> stored/uncompressed,indexed,omitNorms<docid:1703107>
    indexed,tokenized<_ID:proj.zoie.api.ZoieIndexReader$SinglePayloadTokenStream@e7b3cf>
    indexed<id:667162>> ex: java.lang.AssertionError
    at
    org.apache.lucene.index.TermsHashPerField.comparePostings(TermsHashPerField.java:228)
    at
    org.apache.lucene.index.TermsHashPerField.quickSort(TermsHashPerField.java:144)
    at
    org.apache.lucene.index.TermsHashPerField.sortPostings(TermsHashPerField.java:136)
    at
    org.apache.lucene.index.FreqProxFieldMergeState.<init>(FreqProxFieldMergeState.java:51)
    at
    org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:202)
    at
    org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:132)
    at org.apache.lucene.index.TermsHash.flush(TermsHash.java:145)
    at org.apache.lucene.index.DocInverter.flush(DocInverter.java:74)
    at
    org.apache.lucene.index.DocFieldConsumers.flush(DocFieldConsumers.java:75)
    at
    org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
    at
    org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:574)
    at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3533)
    at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3442)
    at
    org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1922)
    at
    org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1880)
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Mar 24, 2009 at 7:37 pm
    I was just able to index all of wikipedia, using StandardAnalyzer,
    with assertions enabled, without hitting that exception. Which
    analyzer are you using (besides your payload field)?

    Mike

    Michael McCandless wrote:
    Hmmmm.

    Jason is this easily/compactly repeated?  EG, try to index the N docs
    before that one.

    If you remove the SinglePayloadTokenStream field, does the exception
    still happen?

    Mike

    Jason Rutherglen wrote:
    While indexing using
    contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker.  The
    asserion error is from TermsHashPerField.comparePostings(RawPostingList p1,
    RawPostingList p2).  A Payload is added to the document representing a UID.
    Only 1-2 out of 1 million documents indexed generates this error.

    java.lang.AssertionError
    problem adding
    doc:Document<stored/uncompressed,indexed,tokenized<body:[[Image:Croatia,
    Washington.JPG|right|250px|thumb|The Croatian embassy]] The '''Croatian
    Embassy in Washington''' is the [[embassy]] of [[Croatia]] in [[Washington,
    D.C.]]  It is located on [[Embassy Row]] at 2343 [[Massachusetts Avenue
    (Washington, DC)|Massachusetts Avenue]], [[Washington DC
    (northwest)|Northwest]] near [[Dupont Circle]].  Previously the building had
    been home to the [[Austrian Embassy in Washington|Austrian embassy]], but
    they left for larger quarters and sold the structure to Croatia in 1993.
    The purchase and renovation of the building was largely paid for by the
    [[Croatian-American]] community.  In front of the embassy is a large
    sculpture of [[St. Jerome]] by Croatian sculptor [[Ivan Me?trovi?]].
    ==External link== *[http://www.croatiaemb.org/ Official site]
    [[Category:Embassies in Washington|Croatia]] [[Category:Foreign relations of
    Croatia]]> stored/uncompressed,indexed,tokenized<doctitle:Embassy of Croatia
    in Washington> stored/uncompressed,indexed,tokenized<docdate:29-JUN-2006
    07:27:44.000> stored/uncompressed,indexed,omitNorms<docid:1703107>
    indexed,tokenized<_ID:proj.zoie.api.ZoieIndexReader$SinglePayloadTokenStream@e7b3cf>
    indexed<id:667162>> ex: java.lang.AssertionError
    at
    org.apache.lucene.index.TermsHashPerField.comparePostings(TermsHashPerField.java:228)
    at
    org.apache.lucene.index.TermsHashPerField.quickSort(TermsHashPerField.java:144)
    at
    org.apache.lucene.index.TermsHashPerField.sortPostings(TermsHashPerField.java:136)
    at
    org.apache.lucene.index.FreqProxFieldMergeState.<init>(FreqProxFieldMergeState.java:51)
    at
    org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:202)
    at
    org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:132)
    at org.apache.lucene.index.TermsHash.flush(TermsHash.java:145)
    at org.apache.lucene.index.DocInverter.flush(DocInverter.java:74)
    at
    org.apache.lucene.index.DocFieldConsumers.flush(DocFieldConsumers.java:75)
    at
    org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
    at
    org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:574)
    at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3533)
    at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3442)
    at
    org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1922)
    at
    org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1880)
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jason Rutherglen at Mar 24, 2009 at 9:09 pm
    Using StandardAnalyzer. It's probably the payload field?

    This is the code that creates the payload field:

    private static class SinglePayloadTokenStream extends TokenStream {
    private Token token = new Token(UID_TERM.text(), 0, 0);
    private byte[] buffer = new byte[4];
    private boolean returnToken = false;

    void setUID(int uid) {
    buffer[0] = (byte) (uid);
    buffer[1] = (byte) (uid >> 8);
    buffer[2] = (byte) (uid >> 16);
    buffer[3] = (byte) (uid >> 24);
    token.setPayload(new Payload(buffer));
    returnToken = true;
    }

    public Token next() throws IOException {
    if (returnToken) {
    returnToken = false;
    return token;
    } else {
    return null;
    }
    }
    }

    public static void fillDocumentID(Document doc,int id)
    {
    SinglePayloadTokenStream singlePayloadTokenStream = new
    SinglePayloadTokenStream();
    singlePayloadTokenStream.setUID(id);
    Field f=doc.getField(UID_TERM.field());
    if (f==null)
    {
    f=new Field(UID_TERM.field(), singlePayloadTokenStream);
    doc.add(f);
    }
    else{
    f.setValue(singlePayloadTokenStream);
    }
    f=null;
    f=doc.getField(Indexable.DOCUMENT_ID_FIELD);
    if (f==null)
    {
    f=new
    Field(Indexable.DOCUMENT_ID_FIELD,String.valueOf(id),Store.NO,Index.NOT_ANALYZED);
    doc.add(f);
    }
    else
    {
    f.setValue(String.valueOf(id));
    }
    }
    On Tue, Mar 24, 2009 at 12:36 PM, Michael McCandless wrote:

    I was just able to index all of wikipedia, using StandardAnalyzer,
    with assertions enabled, without hitting that exception. Which
    analyzer are you using (besides your payload field)?

    Mike

    Michael McCandless wrote:
    Hmmmm.

    Jason is this easily/compactly repeated? EG, try to index the N docs
    before that one.

    If you remove the SinglePayloadTokenStream field, does the exception
    still happen?

    Mike

    Jason Rutherglen wrote:
    While indexing using
    contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker. The
    asserion error is from TermsHashPerField.comparePostings(RawPostingList
    p1,
    RawPostingList p2). A Payload is added to the document representing a
    UID.
    Only 1-2 out of 1 million documents indexed generates this error.

    java.lang.AssertionError
    problem adding
    doc:Document<stored/uncompressed,indexed,tokenized<body:[[Image:Croatia,
    Washington.JPG|right|250px|thumb|The Croatian embassy]] The '''Croatian
    Embassy in Washington''' is the [[embassy]] of [[Croatia]] in
    [[Washington,
    D.C.]] It is located on [[Embassy Row]] at 2343 [[Massachusetts Avenue
    (Washington, DC)|Massachusetts Avenue]], [[Washington DC
    (northwest)|Northwest]] near [[Dupont Circle]]. Previously the building
    had
    been home to the [[Austrian Embassy in Washington|Austrian embassy]],
    but
    they left for larger quarters and sold the structure to Croatia in 1993.
    The purchase and renovation of the building was largely paid for by the
    [[Croatian-American]] community. In front of the embassy is a large
    sculpture of [[St. Jerome]] by Croatian sculptor [[Ivan Me?trovi?]].
    ==External link== *[http://www.croatiaemb.org/ Official site]
    [[Category:Embassies in Washington|Croatia]] [[Category:Foreign
    relations of
    Croatia]]> stored/uncompressed,indexed,tokenized<doctitle:Embassy of
    Croatia
    in Washington> stored/uncompressed,indexed,tokenized<docdate:29-JUN-2006
    07:27:44.000> stored/uncompressed,indexed,omitNorms<docid:1703107>
    indexed,tokenized<_ID:proj.zoie.api.ZoieIndexReader$SinglePayloadTokenStream@e7b3cf
    indexed<id:667162>> ex: java.lang.AssertionError
    at
    org.apache.lucene.index.TermsHashPerField.comparePostings(TermsHashPerField.java:228)
    at
    org.apache.lucene.index.TermsHashPerField.quickSort(TermsHashPerField.java:144)
    at
    org.apache.lucene.index.TermsHashPerField.sortPostings(TermsHashPerField.java:136)
    at
    org.apache.lucene.index.FreqProxFieldMergeState.<init>(FreqProxFieldMergeState.java:51)
    at
    org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:202)
    at
    org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:132)
    at org.apache.lucene.index.TermsHash.flush(TermsHash.java:145)
    at org.apache.lucene.index.DocInverter.flush(DocInverter.java:74)
    at
    org.apache.lucene.index.DocFieldConsumers.flush(DocFieldConsumers.java:75)
    at
    org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
    at
    org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:574)
    at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3533)
    at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3442)
    at
    org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1922)
    at
    org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1880)
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jason Rutherglen at Mar 24, 2009 at 9:11 pm
    Mike,

    It only happens when at least 1 million documents are indexed in a
    multithreaded fashion. Maybe I should post the code? I will try indexing
    without the payload field, I assume it won't fail because I indexed
    wikipedia before with no issues.

    Thanks!

    Jason
    On Tue, Mar 24, 2009 at 12:25 PM, Michael McCandless wrote:

    Hmmmm.

    Jason is this easily/compactly repeated? EG, try to index the N docs
    before that one.

    If you remove the SinglePayloadTokenStream field, does the exception
    still happen?

    Mike

    Jason Rutherglen wrote:
    While indexing using
    contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker. The
    asserion error is from TermsHashPerField.comparePostings(RawPostingList p1,
    RawPostingList p2). A Payload is added to the document representing a UID.
    Only 1-2 out of 1 million documents indexed generates this error.

    java.lang.AssertionError
    problem adding
    doc:Document<stored/uncompressed,indexed,tokenized<body:[[Image:Croatia,
    Washington.JPG|right|250px|thumb|The Croatian embassy]] The '''Croatian
    Embassy in Washington''' is the [[embassy]] of [[Croatia]] in
    [[Washington,
    D.C.]] It is located on [[Embassy Row]] at 2343 [[Massachusetts Avenue
    (Washington, DC)|Massachusetts Avenue]], [[Washington DC
    (northwest)|Northwest]] near [[Dupont Circle]]. Previously the building had
    been home to the [[Austrian Embassy in Washington|Austrian embassy]], but
    they left for larger quarters and sold the structure to Croatia in 1993.
    The purchase and renovation of the building was largely paid for by the
    [[Croatian-American]] community. In front of the embassy is a large
    sculpture of [[St. Jerome]] by Croatian sculptor [[Ivan Me?trovi?]].
    ==External link== *[http://www.croatiaemb.org/ Official site]
    [[Category:Embassies in Washington|Croatia]] [[Category:Foreign relations of
    Croatia]]> stored/uncompressed,indexed,tokenized<doctitle:Embassy of Croatia
    in Washington> stored/uncompressed,indexed,tokenized<docdate:29-JUN-2006
    07:27:44.000> stored/uncompressed,indexed,omitNorms<docid:1703107>
    indexed,tokenized<_ID:proj.zoie.api.ZoieIndexReader$SinglePayloadTokenStream@e7b3cf
    indexed<id:667162>> ex: java.lang.AssertionError
    at
    org.apache.lucene.index.TermsHashPerField.comparePostings(TermsHashPerField.java:228)
    at
    org.apache.lucene.index.TermsHashPerField.quickSort(TermsHashPerField.java:144)
    at
    org.apache.lucene.index.TermsHashPerField.sortPostings(TermsHashPerField.java:136)
    at
    org.apache.lucene.index.FreqProxFieldMergeState.<init>(FreqProxFieldMergeState.java:51)
    at
    org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:202)
    at
    org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:132)
    at org.apache.lucene.index.TermsHash.flush(TermsHash.java:145)
    at org.apache.lucene.index.DocInverter.flush(DocInverter.java:74)
    at
    org.apache.lucene.index.DocFieldConsumers.flush(DocFieldConsumers.java:75)
    at
    org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
    at
    org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:574)
    at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3533)
    at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3442)
    at
    org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1922)
    at
    org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1880)
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Mar 24, 2009 at 9:44 pm
    It looks like you are reusing a Field (the f.setValue(...) calls); are
    you sure you're not changing a Document/Field while another thread is
    adding it to the index?

    If you can post the full code, then I can try to run it on my
    wikipedia dump locally.

    Mike

    Jason Rutherglen wrote:
    Mike,

    It only happens when at least 1 million documents are indexed in a
    multithreaded fashion.  Maybe I should post the code?  I will try indexing
    without the payload field, I assume it won't fail because I indexed
    wikipedia before with no issues.

    Thanks!

    Jason

    On Tue, Mar 24, 2009 at 12:25 PM, Michael McCandless <
    lucene@mikemccandless.com> wrote:
    Hmmmm.

    Jason is this easily/compactly repeated?  EG, try to index the N docs
    before that one.

    If you remove the SinglePayloadTokenStream field, does the exception
    still happen?

    Mike

    Jason Rutherglen wrote:
    While indexing using
    contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker.  The
    asserion error is from TermsHashPerField.comparePostings(RawPostingList p1,
    RawPostingList p2).  A Payload is added to the document representing a UID.
    Only 1-2 out of 1 million documents indexed generates this error.

    java.lang.AssertionError
    problem adding
    doc:Document<stored/uncompressed,indexed,tokenized<body:[[Image:Croatia,
    Washington.JPG|right|250px|thumb|The Croatian embassy]] The '''Croatian
    Embassy in Washington''' is the [[embassy]] of [[Croatia]] in
    [[Washington,
    D.C.]]  It is located on [[Embassy Row]] at 2343 [[Massachusetts Avenue
    (Washington, DC)|Massachusetts Avenue]], [[Washington DC
    (northwest)|Northwest]] near [[Dupont Circle]].  Previously the building had
    been home to the [[Austrian Embassy in Washington|Austrian embassy]], but
    they left for larger quarters and sold the structure to Croatia in 1993.
    The purchase and renovation of the building was largely paid for by the
    [[Croatian-American]] community.  In front of the embassy is a large
    sculpture of [[St. Jerome]] by Croatian sculptor [[Ivan Me?trovi?]].
    ==External link== *[http://www.croatiaemb.org/ Official site]
    [[Category:Embassies in Washington|Croatia]] [[Category:Foreign relations of
    Croatia]]> stored/uncompressed,indexed,tokenized<doctitle:Embassy of Croatia
    in Washington> stored/uncompressed,indexed,tokenized<docdate:29-JUN-2006
    07:27:44.000> stored/uncompressed,indexed,omitNorms<docid:1703107>
    indexed,tokenized<_ID:proj.zoie.api.ZoieIndexReader$SinglePayloadTokenStream@e7b3cf
    indexed<id:667162>> ex: java.lang.AssertionError
    at
    org.apache.lucene.index.TermsHashPerField.comparePostings(TermsHashPerField.java:228)
    at
    org.apache.lucene.index.TermsHashPerField.quickSort(TermsHashPerField.java:144)
    at
    org.apache.lucene.index.TermsHashPerField.sortPostings(TermsHashPerField.java:136)
    at
    org.apache.lucene.index.FreqProxFieldMergeState.<init>(FreqProxFieldMergeState.java:51)
    at
    org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:202)
    at
    org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:132)
    at org.apache.lucene.index.TermsHash.flush(TermsHash.java:145)
    at org.apache.lucene.index.DocInverter.flush(DocInverter.java:74)
    at
    org.apache.lucene.index.DocFieldConsumers.flush(DocFieldConsumers.java:75)
    at
    org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
    at
    org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:574)
    at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3533)
    at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3442)
    at
    org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1922)
    at
    org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1880)
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jason Rutherglen at Mar 26, 2009 at 12:41 am
    Each document is being created in a single thread, and the fields of the
    document are not being updated elsewhere. I haven't posted the full code
    yet as it needs to cleaned up. Thanks Mike!
    On Tue, Mar 24, 2009 at 2:43 PM, Michael McCandless wrote:

    It looks like you are reusing a Field (the f.setValue(...) calls); are
    you sure you're not changing a Document/Field while another thread is
    adding it to the index?

    If you can post the full code, then I can try to run it on my
    wikipedia dump locally.

    Mike

    Jason Rutherglen wrote:
    Mike,

    It only happens when at least 1 million documents are indexed in a
    multithreaded fashion. Maybe I should post the code? I will try indexing
    without the payload field, I assume it won't fail because I indexed
    wikipedia before with no issues.

    Thanks!

    Jason

    On Tue, Mar 24, 2009 at 12:25 PM, Michael McCandless <
    lucene@mikemccandless.com> wrote:
    Hmmmm.

    Jason is this easily/compactly repeated? EG, try to index the N docs
    before that one.

    If you remove the SinglePayloadTokenStream field, does the exception
    still happen?

    Mike

    Jason Rutherglen wrote:
    While indexing using
    contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker. The
    asserion error is from
    TermsHashPerField.comparePostings(RawPostingList
    p1,
    RawPostingList p2). A Payload is added to the document representing a UID.
    Only 1-2 out of 1 million documents indexed generates this error.

    java.lang.AssertionError
    problem adding
    doc:Document<stored/uncompressed,indexed,tokenized<body:[[Image:Croatia,
    Washington.JPG|right|250px|thumb|The Croatian embassy]] The
    '''Croatian
    Embassy in Washington''' is the [[embassy]] of [[Croatia]] in
    [[Washington,
    D.C.]] It is located on [[Embassy Row]] at 2343 [[Massachusetts
    Avenue
    (Washington, DC)|Massachusetts Avenue]], [[Washington DC
    (northwest)|Northwest]] near [[Dupont Circle]]. Previously the
    building
    had
    been home to the [[Austrian Embassy in Washington|Austrian embassy]],
    but
    they left for larger quarters and sold the structure to Croatia in
    1993.
    The purchase and renovation of the building was largely paid for by
    the
    [[Croatian-American]] community. In front of the embassy is a large
    sculpture of [[St. Jerome]] by Croatian sculptor [[Ivan Me?trovi?]].
    ==External link== *[http://www.croatiaemb.org/ Official site]
    [[Category:Embassies in Washington|Croatia]] [[Category:Foreign
    relations
    of
    Croatia]]> stored/uncompressed,indexed,tokenized<doctitle:Embassy of Croatia
    in Washington>
    stored/uncompressed,indexed,tokenized<docdate:29-JUN-2006
    07:27:44.000> stored/uncompressed,indexed,omitNorms<docid:1703107>
    indexed,tokenized<_ID:proj.zoie.api.ZoieIndexReader$SinglePayloadTokenStream@e7b3cf
    indexed<id:667162>> ex: java.lang.AssertionError
    at
    org.apache.lucene.index.TermsHashPerField.comparePostings(TermsHashPerField.java:228)
    at
    org.apache.lucene.index.TermsHashPerField.quickSort(TermsHashPerField.java:144)
    at
    org.apache.lucene.index.TermsHashPerField.sortPostings(TermsHashPerField.java:136)
    at
    org.apache.lucene.index.FreqProxFieldMergeState.<init>(FreqProxFieldMergeState.java:51)
    at
    org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:202)
    at
    org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:132)
    at org.apache.lucene.index.TermsHash.flush(TermsHash.java:145)
    at org.apache.lucene.index.DocInverter.flush(DocInverter.java:74)
    at
    org.apache.lucene.index.DocFieldConsumers.flush(DocFieldConsumers.java:75)
    at
    org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
    at
    org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:574)
    at
    org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3533)
    at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3442)
    at
    org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1922)
    at
    org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1880)
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jason Rutherglen at Mar 26, 2009 at 2:36 am
    LuceneError when executed should reproduce the failure. The
    contrib/benchmark libraries are required. MultiThreadDocAdd is a
    multithreaded indexing utility class.
    On Wed, Mar 25, 2009 at 1:06 PM, Jason Rutherglen wrote:

    Each document is being created in a single thread, and the fields of the
    document are not being updated elsewhere. I haven't posted the full code
    yet as it needs to cleaned up. Thanks Mike!


    On Tue, Mar 24, 2009 at 2:43 PM, Michael McCandless <
    lucene@mikemccandless.com> wrote:
    It looks like you are reusing a Field (the f.setValue(...) calls); are
    you sure you're not changing a Document/Field while another thread is
    adding it to the index?

    If you can post the full code, then I can try to run it on my
    wikipedia dump locally.

    Mike

    Jason Rutherglen wrote:
    Mike,

    It only happens when at least 1 million documents are indexed in a
    multithreaded fashion. Maybe I should post the code? I will try indexing
    without the payload field, I assume it won't fail because I indexed
    wikipedia before with no issues.

    Thanks!

    Jason

    On Tue, Mar 24, 2009 at 12:25 PM, Michael McCandless <
    lucene@mikemccandless.com> wrote:
    Hmmmm.

    Jason is this easily/compactly repeated? EG, try to index the N docs
    before that one.

    If you remove the SinglePayloadTokenStream field, does the exception
    still happen?

    Mike

    Jason Rutherglen wrote:
    While indexing using
    contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker. The
    asserion error is from
    TermsHashPerField.comparePostings(RawPostingList
    p1,
    RawPostingList p2). A Payload is added to the document representing
    a
    UID.
    Only 1-2 out of 1 million documents indexed generates this error.

    java.lang.AssertionError
    problem adding
    doc:Document<stored/uncompressed,indexed,tokenized<body:[[Image:Croatia,
    Washington.JPG|right|250px|thumb|The Croatian embassy]] The
    '''Croatian
    Embassy in Washington''' is the [[embassy]] of [[Croatia]] in
    [[Washington,
    D.C.]] It is located on [[Embassy Row]] at 2343 [[Massachusetts
    Avenue
    (Washington, DC)|Massachusetts Avenue]], [[Washington DC
    (northwest)|Northwest]] near [[Dupont Circle]]. Previously the
    building
    had
    been home to the [[Austrian Embassy in Washington|Austrian embassy]],
    but
    they left for larger quarters and sold the structure to Croatia in
    1993.
    The purchase and renovation of the building was largely paid for by
    the
    [[Croatian-American]] community. In front of the embassy is a large
    sculpture of [[St. Jerome]] by Croatian sculptor [[Ivan Me?trovi?]].
    ==External link== *[http://www.croatiaemb.org/ Official site]
    [[Category:Embassies in Washington|Croatia]] [[Category:Foreign
    relations
    of
    Croatia]]> stored/uncompressed,indexed,tokenized<doctitle:Embassy of Croatia
    in Washington>
    stored/uncompressed,indexed,tokenized<docdate:29-JUN-2006
    07:27:44.000> stored/uncompressed,indexed,omitNorms<docid:1703107>
    indexed,tokenized<_ID:proj.zoie.api.ZoieIndexReader$SinglePayloadTokenStream@e7b3cf
    indexed<id:667162>> ex: java.lang.AssertionError
    at
    org.apache.lucene.index.TermsHashPerField.comparePostings(TermsHashPerField.java:228)
    at
    org.apache.lucene.index.TermsHashPerField.quickSort(TermsHashPerField.java:144)
    at
    org.apache.lucene.index.TermsHashPerField.sortPostings(TermsHashPerField.java:136)
    at
    org.apache.lucene.index.FreqProxFieldMergeState.<init>(FreqProxFieldMergeState.java:51)
    at
    org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:202)
    at
    org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:132)
    at org.apache.lucene.index.TermsHash.flush(TermsHash.java:145)
    at org.apache.lucene.index.DocInverter.flush(DocInverter.java:74)
    at
    org.apache.lucene.index.DocFieldConsumers.flush(DocFieldConsumers.java:75)
    at
    org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
    at
    org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:574)
    at
    org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3533)
    at
    org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3442)
    at
    org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1922)
    at
    org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1880)
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Jason Rutherglen at Mar 26, 2009 at 6:09 pm
    I used the NoMergePolicy to build the index as I noticed the indexing is
    faster, meaning the system simply creates large multi-megabyte segments in
    the ram buffer, flushes them out and doesn't worry about merging which
    causes massive disk trashing. I am pondering some benchmarks to find the
    optimal merge policy for realtime search, I'm not sure it's always necessary
    to merge according to the Log system.

    For example, a merge policy that caps the size of each segment at 250
    megabytes, and does no merging could be interesting for realtime where many
    deletes are coming in and the segments with enough deletes need to merged
    away in 1-2 hours. Meaning optimizing may not be best as it requires later
    large merges. Also an interleaving system that does not perform merges if a
    flush is occurring could useful for minimizing disk trash.
    On Wed, Mar 25, 2009 at 3:39 PM, Jason Rutherglen wrote:

    LuceneError when executed should reproduce the failure. The
    contrib/benchmark libraries are required. MultiThreadDocAdd is a
    multithreaded indexing utility class.

    On Wed, Mar 25, 2009 at 1:06 PM, Jason Rutherglen <
    jason.rutherglen@gmail.com> wrote:
    Each document is being created in a single thread, and the fields of the
    document are not being updated elsewhere. I haven't posted the full code
    yet as it needs to cleaned up. Thanks Mike!


    On Tue, Mar 24, 2009 at 2:43 PM, Michael McCandless <
    lucene@mikemccandless.com> wrote:
    It looks like you are reusing a Field (the f.setValue(...) calls); are
    you sure you're not changing a Document/Field while another thread is
    adding it to the index?

    If you can post the full code, then I can try to run it on my
    wikipedia dump locally.

    Mike

    Jason Rutherglen wrote:
    Mike,

    It only happens when at least 1 million documents are indexed in a
    multithreaded fashion. Maybe I should post the code? I will try indexing
    without the payload field, I assume it won't fail because I indexed
    wikipedia before with no issues.

    Thanks!

    Jason

    On Tue, Mar 24, 2009 at 12:25 PM, Michael McCandless <
    lucene@mikemccandless.com> wrote:
    Hmmmm.

    Jason is this easily/compactly repeated? EG, try to index the N docs
    before that one.

    If you remove the SinglePayloadTokenStream field, does the exception
    still happen?

    Mike

    Jason Rutherglen wrote:
    While indexing using
    contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker.
    The
    asserion error is from
    TermsHashPerField.comparePostings(RawPostingList
    p1,
    RawPostingList p2). A Payload is added to the document representing
    a
    UID.
    Only 1-2 out of 1 million documents indexed generates this error.

    java.lang.AssertionError
    problem adding
    doc:Document<stored/uncompressed,indexed,tokenized<body:[[Image:Croatia,
    Washington.JPG|right|250px|thumb|The Croatian embassy]] The
    '''Croatian
    Embassy in Washington''' is the [[embassy]] of [[Croatia]] in
    [[Washington,
    D.C.]] It is located on [[Embassy Row]] at 2343 [[Massachusetts
    Avenue
    (Washington, DC)|Massachusetts Avenue]], [[Washington DC
    (northwest)|Northwest]] near [[Dupont Circle]]. Previously the
    building
    had
    been home to the [[Austrian Embassy in Washington|Austrian
    embassy]], but
    they left for larger quarters and sold the structure to Croatia in
    1993.
    The purchase and renovation of the building was largely paid for by
    the
    [[Croatian-American]] community. In front of the embassy is a large
    sculpture of [[St. Jerome]] by Croatian sculptor [[Ivan Me?trovi?]].
    ==External link== *[http://www.croatiaemb.org/ Official site]
    [[Category:Embassies in Washington|Croatia]] [[Category:Foreign
    relations
    of
    Croatia]]> stored/uncompressed,indexed,tokenized<doctitle:Embassy of Croatia
    in Washington>
    stored/uncompressed,indexed,tokenized<docdate:29-JUN-2006
    07:27:44.000> stored/uncompressed,indexed,omitNorms<docid:1703107>
    indexed,tokenized<_ID:proj.zoie.api.ZoieIndexReader$SinglePayloadTokenStream@e7b3cf
    indexed<id:667162>> ex: java.lang.AssertionError
    at
    org.apache.lucene.index.TermsHashPerField.comparePostings(TermsHashPerField.java:228)
    at
    org.apache.lucene.index.TermsHashPerField.quickSort(TermsHashPerField.java:144)
    at
    org.apache.lucene.index.TermsHashPerField.sortPostings(TermsHashPerField.java:136)
    at
    org.apache.lucene.index.FreqProxFieldMergeState.<init>(FreqProxFieldMergeState.java:51)
    at
    org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:202)
    at
    org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:132)
    at org.apache.lucene.index.TermsHash.flush(TermsHash.java:145)
    at org.apache.lucene.index.DocInverter.flush(DocInverter.java:74)
    at
    org.apache.lucene.index.DocFieldConsumers.flush(DocFieldConsumers.java:75)
    at
    org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
    at
    org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:574)
    at
    org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3533)
    at
    org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3442)
    at
    org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1922)
    at
    org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1880)
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Mar 26, 2009 at 6:18 pm
    Another thing is to limit the max # merge threads CMS will run at
    once. It defaults to 3 now.

    Mike

    On Thu, Mar 26, 2009 at 2:08 PM, Jason Rutherglen
    wrote:
    I used the NoMergePolicy to build the index as I noticed the indexing is
    faster, meaning the system simply creates large multi-megabyte segments in
    the ram buffer, flushes them out and doesn't worry about merging which
    causes massive disk trashing.  I am pondering some benchmarks to find the
    optimal merge policy for realtime search, I'm not sure it's always necessary
    to merge according to the Log system.

    For example, a merge policy that caps the size of each segment at 250
    megabytes, and does no merging could be interesting for realtime where many
    deletes are coming in and the segments with enough deletes need to merged
    away in 1-2 hours.  Meaning optimizing may not be best as it requires later
    large merges.  Also an interleaving system that does not perform merges if a
    flush is occurring could useful for minimizing disk trash.

    On Wed, Mar 25, 2009 at 3:39 PM, Jason Rutherglen <
    jason.rutherglen@gmail.com> wrote:
    LuceneError when executed should reproduce the failure.  The
    contrib/benchmark libraries are required.  MultiThreadDocAdd is a
    multithreaded indexing utility class.

    On Wed, Mar 25, 2009 at 1:06 PM, Jason Rutherglen <
    jason.rutherglen@gmail.com> wrote:
    Each document is being created in a single thread, and the fields of the
    document are not being updated elsewhere.  I haven't posted the full code
    yet as it needs to cleaned up.  Thanks Mike!


    On Tue, Mar 24, 2009 at 2:43 PM, Michael McCandless <
    lucene@mikemccandless.com> wrote:
    It looks like you are reusing a Field (the f.setValue(...) calls); are
    you sure you're not changing a Document/Field while another thread is
    adding it to the index?

    If you can post the full code, then I can try to run it on my
    wikipedia dump locally.

    Mike

    Jason Rutherglen wrote:
    Mike,

    It only happens when at least 1 million documents are indexed in a
    multithreaded fashion.  Maybe I should post the code?  I will try indexing
    without the payload field, I assume it won't fail because I indexed
    wikipedia before with no issues.

    Thanks!

    Jason

    On Tue, Mar 24, 2009 at 12:25 PM, Michael McCandless <
    lucene@mikemccandless.com> wrote:
    Hmmmm.

    Jason is this easily/compactly repeated?  EG, try to index the N docs
    before that one.

    If you remove the SinglePayloadTokenStream field, does the exception
    still happen?

    Mike

    Jason Rutherglen wrote:
    While indexing using
    contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker.
    The
    asserion error is from
    TermsHashPerField.comparePostings(RawPostingList
    p1,
    RawPostingList p2).  A Payload is added to the document representing
    a
    UID.
    Only 1-2 out of 1 million documents indexed generates this error.

    java.lang.AssertionError
    problem adding
    doc:Document<stored/uncompressed,indexed,tokenized<body:[[Image:Croatia,
    Washington.JPG|right|250px|thumb|The Croatian embassy]] The
    '''Croatian
    Embassy in Washington''' is the [[embassy]] of [[Croatia]] in
    [[Washington,
    D.C.]]  It is located on [[Embassy Row]] at 2343 [[Massachusetts
    Avenue
    (Washington, DC)|Massachusetts Avenue]], [[Washington DC
    (northwest)|Northwest]] near [[Dupont Circle]].  Previously the
    building
    had
    been home to the [[Austrian Embassy in Washington|Austrian
    embassy]], but
    they left for larger quarters and sold the structure to Croatia in
    1993.
    The purchase and renovation of the building was largely paid for by
    the
    [[Croatian-American]] community.  In front of the embassy is a large
    sculpture of [[St. Jerome]] by Croatian sculptor [[Ivan Me?trovi?]].
    ==External link== *[http://www.croatiaemb.org/ Official site]
    [[Category:Embassies in Washington|Croatia]] [[Category:Foreign
    relations
    of
    Croatia]]> stored/uncompressed,indexed,tokenized<doctitle:Embassy of Croatia
    in Washington>
    stored/uncompressed,indexed,tokenized<docdate:29-JUN-2006
    07:27:44.000> stored/uncompressed,indexed,omitNorms<docid:1703107>
    indexed,tokenized<_ID:proj.zoie.api.ZoieIndexReader$SinglePayloadTokenStream@e7b3cf
    indexed<id:667162>> ex: java.lang.AssertionError
    at
    org.apache.lucene.index.TermsHashPerField.comparePostings(TermsHashPerField.java:228)
    at
    org.apache.lucene.index.TermsHashPerField.quickSort(TermsHashPerField.java:144)
    at
    org.apache.lucene.index.TermsHashPerField.sortPostings(TermsHashPerField.java:136)
    at
    org.apache.lucene.index.FreqProxFieldMergeState.<init>(FreqProxFieldMergeState.java:51)
    at
    org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:202)
    at
    org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:132)
    at org.apache.lucene.index.TermsHash.flush(TermsHash.java:145)
    at org.apache.lucene.index.DocInverter.flush(DocInverter.java:74)
    at
    org.apache.lucene.index.DocFieldConsumers.flush(DocFieldConsumers.java:75)
    at
    org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
    at
    org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:574)
    at
    org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3533)
    at
    org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3442)
    at
    org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1922)
    at
    org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1880)
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Mar 29, 2009 at 2:05 pm
    I'm unable to reproduce this.

    Jason have you tried on other computers (to rule out eg bad RAM/IO)?

    Mike

    On Wed, Mar 25, 2009 at 6:39 PM, Jason Rutherglen
    wrote:
    LuceneError when executed should reproduce the failure.  The
    contrib/benchmark libraries are required.  MultiThreadDocAdd is a
    multithreaded indexing utility class.

    On Wed, Mar 25, 2009 at 1:06 PM, Jason Rutherglen
    wrote:
    Each document is being created in a single thread, and the fields of the
    document are not being updated elsewhere.  I haven't posted the full code
    yet as it needs to cleaned up.  Thanks Mike!

    On Tue, Mar 24, 2009 at 2:43 PM, Michael McCandless
    wrote:
    It looks like you are reusing a Field (the f.setValue(...) calls); are
    you sure you're not changing a Document/Field while another thread is
    adding it to the index?

    If you can post the full code, then I can try to run it on my
    wikipedia dump locally.

    Mike

    Jason Rutherglen wrote:
    Mike,

    It only happens when at least 1 million documents are indexed in a
    multithreaded fashion.  Maybe I should post the code?  I will try
    indexing
    without the payload field, I assume it won't fail because I indexed
    wikipedia before with no issues.

    Thanks!

    Jason

    On Tue, Mar 24, 2009 at 12:25 PM, Michael McCandless <
    lucene@mikemccandless.com> wrote:
    Hmmmm.

    Jason is this easily/compactly repeated?  EG, try to index the N docs
    before that one.

    If you remove the SinglePayloadTokenStream field, does the exception
    still happen?

    Mike

    Jason Rutherglen wrote:
    While indexing using
    contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker.
    The
    asserion error is from
    TermsHashPerField.comparePostings(RawPostingList p1,
    RawPostingList p2).  A Payload is added to the document representing
    a UID.
    Only 1-2 out of 1 million documents indexed generates this error.

    java.lang.AssertionError
    problem adding

    doc:Document<stored/uncompressed,indexed,tokenized<body:[[Image:Croatia,
    Washington.JPG|right|250px|thumb|The Croatian embassy]] The
    '''Croatian
    Embassy in Washington''' is the [[embassy]] of [[Croatia]] in
    [[Washington,
    D.C.]]  It is located on [[Embassy Row]] at 2343 [[Massachusetts
    Avenue
    (Washington, DC)|Massachusetts Avenue]], [[Washington DC
    (northwest)|Northwest]] near [[Dupont Circle]].  Previously the
    building had
    been home to the [[Austrian Embassy in Washington|Austrian
    embassy]], but
    they left for larger quarters and sold the structure to Croatia in
    1993.
    The purchase and renovation of the building was largely paid for by
    the
    [[Croatian-American]] community.  In front of the embassy is a large
    sculpture of [[St. Jerome]] by Croatian sculptor [[Ivan Me?trovi?]].
    ==External link== *[http://www.croatiaemb.org/ Official site]
    [[Category:Embassies in Washington|Croatia]] [[Category:Foreign
    relations of
    Croatia]]> stored/uncompressed,indexed,tokenized<doctitle:Embassy of Croatia
    in Washington>
    stored/uncompressed,indexed,tokenized<docdate:29-JUN-2006
    07:27:44.000> stored/uncompressed,indexed,omitNorms<docid:1703107>
    indexed,tokenized<_ID:proj.zoie.api.ZoieIndexReader$SinglePayloadTokenStream@e7b3cf
    indexed<id:667162>> ex: java.lang.AssertionError
    at
    org.apache.lucene.index.TermsHashPerField.comparePostings(TermsHashPerField.java:228)
    at
    org.apache.lucene.index.TermsHashPerField.quickSort(TermsHashPerField.java:144)
    at
    org.apache.lucene.index.TermsHashPerField.sortPostings(TermsHashPerField.java:136)
    at
    org.apache.lucene.index.FreqProxFieldMergeState.<init>(FreqProxFieldMergeState.java:51)
    at
    org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:202)
    at
    org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:132)
    at org.apache.lucene.index.TermsHash.flush(TermsHash.java:145)
    at org.apache.lucene.index.DocInverter.flush(DocInverter.java:74)
    at
    org.apache.lucene.index.DocFieldConsumers.flush(DocFieldConsumers.java:75)
    at
    org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
    at

    org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:574)
    at
    org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3533)
    at
    org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3442)
    at

    org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1922)
    at

    org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1880)
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMar 24, '09 at 6:02p
activeMar 29, '09 at 2:05p
posts11
users2
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase