FAQ
Hello,



I'm getting a strange error when I make a Lucene (2.2.0) query w/ the
following call:



java.lang.RuntimeException: there are more terms than documents in field
"objectId", but it's impossible to sort on tokenized fields

at
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.ja
va:377)

at
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72
)

at
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.ja
va:350)

at
org.apache.lucene.search.FieldCacheImpl$11.createValue(FieldCacheImpl.ja
va:461)

at
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72
)

at
org.apache.lucene.search.FieldCacheImpl.getAuto(FieldCacheImpl.java:424)

at
org.apache.lucene.search.FieldSortedHitQueue.comparatorAuto(FieldSortedH
itQueue.java:338)

at
org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHi
tQueue.java:172)

at
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72
)

at
org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSo
rtedHitQueue.java:155)

at
org.apache.lucene.search.FieldSortedHitQueue.<init>(FieldSortedHitQueue.
java:56)

at
org.apache.lucene.search.TopFieldDocCollector.<init>(TopFieldDocCollecto
r.java:41)

at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:122)

at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:74)

at org.apache.lucene.search.Hits.(Searcher.java:55)



The strange thing is that I've read the javadoc for the Sort object
where it says:

The fields used to determine sort order must be carefully chosen.
Documents must contain a single term in such a field, and the value of
the term should indicate the document's relative position in a given
sort order. The field must be indexed, but should not be tokenized, and
does not need to be stored (unless you happen to want it back with the
rest of your document data). In other words:

document.add (new Field ("byNumber", Integer.toString(x),
Field.Store.NO, Field.Index.UN_TOKENIZED));

Therefore when I create my "objectId" field in my document I use the
call:



doc.add(new Field("objectId", s.getObjectId(), Field.Store.NO,
Field.Index.UN_TOKENIZED));



Note: s.getObjectId() returns a String.



After the index is created and I print out a typical document (using the
Document.toString() method) I get this:



Document<stored/uncompressed,indexed

<id:1146513> stored/uncompressed,indexed

<_hibernate_class:com.mycompany.metadb.orm.Series> indexed

<RestrictionLevel:1> indexed,

tokenized<keywords:com.mycompany.metadbsync.index.SeriesTokenStream@134a
b4e> indexed,

tokenized<characteristics:com.
mycompany.metadbsync.index.CharacteristicTokenStream@daa825> indexed

<objectId:DF.SES.AA.derek.Public_01> indexed

<Name:Public 01> indexed

<UserID:derek> indexed

<Data Class:Defined Formulas> indexed

<Location:AA> indexed

<Client:SES> indexed

<DIM1:DF> indexed

<DIM2:SES> indexed

<DIM3:AA> indexed

<DIM4:derek> indexed

<DIM5:Public_01> indexed

<Type:Formula>>



So it looks like it got created correctly.



For what it's worth the query call looks like this:



Hits hits = seriesIndexSearcher.search(query, new Sort("objectId"));



The actual query is a Boolean query with lots of TermQuery clauses and
sub clauses. The term queries are against various of the other fields
shown above, including some of the tokenized fields.



Any help appreciated.



regards,



Bill Chesky



PS. Just as an aside, what does it mean for a field to be stored or not
stored. Looking at the output above, the 'id' field is stored and the
'objectId' is not. Yet both of them get displayed by the
Document.toString() method. So even the objectId field got "stored" at
least in the sense that I understand the term (otherwise how did it get
displayed) so I'm obviously missing something about what "stored" means
in the Lucene context.

Search Discussions

  • Bill Chesky at Apr 23, 2009 at 7:41 pm
    Sorry for that terrible formatting. Let me try again.
    ==========================================================
    Hello,

    I'm getting a strange error when I make a Lucene (2.2.0) query:

    java.lang.RuntimeException: there are more terms than documents in field
    "objectId", but it's impossible to sort on tokenized fields

    The strange thing is that I've read the javadoc for the Sort object
    where it says:

    ====
    The fields used to determine sort order must be carefully chosen.
    Documents must contain a single term in such a field, and the value of
    the term should indicate the document's relative position in a given
    sort order. The field must be indexed, but should not be tokenized, and
    does not need to be stored (unless you happen to want it back with the
    rest of your document data). In other words:

    document.add (new Field ("byNumber", Integer.toString(x),
    Field.Store.NO, Field.Index.UN_TOKENIZED));
    ====

    Therefore when I create my "objectId" field in my document I use the
    call:

    doc.add(new Field("objectId", s.getObjectId(), Field.Store.NO,
    Field.Index.UN_TOKENIZED));

    Note: s.getObjectId() returns a String.

    After the index is created and I print out a typical document (using the
    Document.toString() method) I get this:

    Document<stored/uncompressed,indexed
    <id:1146513> stored/uncompressed,indexed
    <_hibernate_class:com.mycompany.metadb.orm.Series> indexed
    <RestrictionLevel:1> indexed,
    tokenized<keywords:com.mycompany.metadbsync.index.SeriesTokenStream@134a
    b4e> indexed,
    tokenized<characteristics:com.
    mycompany.metadbsync.index.CharacteristicTokenStream@daa825> indexed
    <objectId:DF.SES.AA.derek.Public_01> indexed
    <Name:Public 01> indexed
    <UserID:derek> indexed
    <Data Class:Defined Formulas> indexed
    <Location:AA> indexed
    <Client:SES> indexed
    <DIM1:DF> indexed
    <DIM2:SES> indexed
    <DIM3:AA> indexed
    <DIM4:derek> indexed
    <DIM5:Public_01> indexed
    <Type:Formula>>

    So it looks like it got created correctly.

    For what it's worth the query call looks like this:

    Hits hits = seriesIndexSearcher.search(query, new Sort("objectId"));

    The actual query is a Boolean query with lots of TermQuery clauses and
    sub clauses. The term queries are against various of the other fields
    shown above, including some of the tokenized fields.

    Any help appreciated.

    regards,

    Bill Chesky

    PS. Just as an aside, what does it mean for a field to be stored or not
    stored. Looking at the output above, the 'id' field is stored and the
    'objectId' is not. Yet both of them get displayed by the
    Document.toString() method. So even the objectId field got "stored" at
    least in the sense that I understand the term (otherwise how did it get
    displayed) so I'm obviously missing something about what "stored" means
    in the Lucene context.





    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Doron Cohen at Apr 23, 2009 at 8:01 pm

    On Thu, Apr 23, 2009 at 10:39 PM, wrote:

    I'm getting a strange error when I make a Lucene (2.2.0) query:

    java.lang.RuntimeException: there are more terms than documents in field
    "objectId", but it's impossible to sort on tokenized fields
    Is it possible that, for at least one document, multiple "objectId" fields
    were created?
    This would also create this problem.

    PS. Just as an aside, what does it mean for a field to be stored or not
    stored. Looking at the output above, the 'id' field is stored and the
    'objectId' is not. Yet both of them get displayed by the
    Document.toString() method. So even the objectId field got "stored" at
    least in the sense that I understand the term (otherwise how did it get
    displayed) so I'm obviously missing something about what "stored" means
    in the Lucene context.
    The printed document object is the same document object that was created
    for indexing. But when a document is read from the index (via IndexReader
    API)
    it will only contain the stored fields. For instance, assume that at search
    time you
    would like to get the URL of a result document and display it to the user.
    For this
    you can at indexing time add the URL to a stored field.

    Doron
  • Bill Chesky at Apr 23, 2009 at 8:32 pm
    Doron, thanks for the reply.
    Is it possible that, for at least one document, multiple "objectId" fields
    were created?
    This would also create this problem.
    I read that online as well. I don't think so. We do have an update
    process that updates the index. During the update process we have the
    call:

    // create new doc object ... objectId will always be the same as before
    but // other fields may change
    Document doc = getDocument(s);
    // replace old doc for objectId w/ new
    indexWriter.updateDocument(new Term("objectId", s.getObjectId()), doc);

    However, the getDocument() method is the same method that we use to
    create a brand new document when building the index from scratch. And
    I'm sure we only create the "objectId" field once in that method.

    Unless maybe I'm misunderstanding something about the
    IndexWriter.updateDocument() method. I thought it would delete all
    documents that matched the Term passed and add a new one.

    Unless maybe there is an issue with my Term argument passed to
    updateDocument and it's really not matching the way I think it is and so
    we end of with two different documents with the same value in the
    "objectId" field. Could this situation cause the exception?
    The printed document object is the same document object that was created
    for indexing. But when a document is read from the index (via
    IndexReader
    API)
    it will only contain the stored fields. For instance, assume that at search
    time you would like to get the URL of a result document and display it
    to the user.
    For this
    you can at indexing time add the URL to a stored field.

    Doron
    Ah I think I understand. I was printing those docs out as I stored
    them. That makes total sense now. Thanks for the help.

    regards,

    Bill


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Bill Chesky at Apr 23, 2009 at 9:02 pm
    I figured it out. We are using Hibernate Search and in my ORM class I
    am doing the following:

    @Field(index=Index.TOKENIZED,store=Store.YES)
    protected String objectId;

    So when I persisted a new object to our database I was inadvertently
    creating a document in the Lucene index with the tokenized and stored
    field "objectId". This is a left over from when we were letting
    Hibernate Search build our index for us. We're now building the index
    ourselves so I think if I just remove this, it should work ok.

    Thanks for the help, Doron.

    Bill


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Doron Cohen at Apr 24, 2009 at 10:23 am

    On Thu, Apr 23, 2009 at 11:52 PM, wrote:

    I figured it out. We are using Hibernate Search and in my ORM class I
    am doing the following:

    @Field(index=Index.TOKENIZED,store=Store.YES)
    protected String objectId;

    So when I persisted a new object to our database I was inadvertently
    creating a document in the Lucene index with the tokenized and stored
    field "objectId". This is a left over from when we were letting
    Hibernate Search build our index for us. We're now building the index
    ourselves so I think if I just remove this, it should work ok.

    Great!

    Doron

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedApr 23, '09 at 7:26p
activeApr 24, '09 at 10:23a
posts6
users2
websitelucene.apache.org

2 users in discussion

Bill Chesky: 4 posts Doron Cohen: 2 posts

People

Translate

site design / logo © 2022 Grokbase