FAQ
Hello,

I'm using Lucene 2.9 and when reading java doc for the Sort class I noticed
it says "The field must be indexed, but should not be tokenized".

But I tried to sort on a tokenized field, it works too. Just wondering
what's the difference between tokenized and untokenized in terms of sort?
Why in javadoc and "Lucene in Action" they all mention that the sort field
should not be tokenzied?

Thanks,
-Fujian


--
View this message in context: http://lucene.472066.n3.nabble.com/sort-field-should-not-be-tokenized-tp882569p882569.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Ian Lea at Jun 9, 2010 at 8:58 pm
    Sorting on tokenized fields can work, but may not necessarily do what
    you expect, depending on your requirements and how the field is
    tokenized.

    --
    Ian.
    On Wed, Jun 9, 2010 at 4:35 PM, fujian wrote:


    Hello,

    I'm using Lucene 2.9 and when reading java doc for the Sort class I noticed
    it says "The field must be indexed, but should not be tokenized".

    But I tried to sort on a tokenized field, it works too. Just wondering
    what's the difference between tokenized and untokenized in terms of sort?
    Why in javadoc and "Lucene in Action" they all mention that the sort field
    should not be tokenzied?

    Thanks,
    -Fujian


    --
    View this message in context: http://lucene.472066.n3.nabble.com/sort-field-should-not-be-tokenized-tp882569p882569.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erick Erickson at Jun 9, 2010 at 10:57 pm
    Consider analyzing on whitespace, without
    removing stopwords for the input "the fox is in
    his den". You'd have the terms:
    the
    fox
    is
    in
    his
    den

    What does it mean to sort on this field? Which term
    should be used?

    What if you remove stopwords? What about casing?
    Or any of a myriad of other possible things you'd to
    with an analyzer.

    So the behavior *can* work if you sort on a tokenized
    field, but it'll be "interesting". If you happen to have
    a field that only tokenizes to single terms, you'll
    probably get expected results, but it'll be pretty
    fragile..

    HTH
    Erick
    On Wed, Jun 9, 2010 at 11:35 AM, fujian wrote:



    Hello,

    I'm using Lucene 2.9 and when reading java doc for the Sort class I noticed
    it says "The field must be indexed, but should not be tokenized".

    But I tried to sort on a tokenized field, it works too. Just wondering
    what's the difference between tokenized and untokenized in terms of sort?
    Why in javadoc and "Lucene in Action" they all mention that the sort field
    should not be tokenzied?

    Thanks,
    -Fujian


    --
    View this message in context:
    http://lucene.472066.n3.nabble.com/sort-field-should-not-be-tokenized-tp882569p882569.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Fujian at Jun 10, 2010 at 1:08 am
    Thanks Eric for the detailed explanation. Now I understand what Ian means.

    -Fujian
    --
    View this message in context: http://lucene.472066.n3.nabble.com/sort-field-should-not-be-tokenized-tp882569p884107.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 9, '10 at 3:36p
activeJun 10, '10 at 1:08a
posts4
users3
websitelucene.apache.org

3 users in discussion

Fujian: 2 posts Ian Lea: 1 post Erick Erickson: 1 post

People

Translate

site design / logo © 2022 Grokbase