FAQ
Hi!

I want to store numbers (id) in my index:

long id = 1069421083284;
doc.add(Field.UnStored("in", String.valueOf(id)));

But searching for "id:1069421083284" doesn't return any hits.

Well, did I misunderstand something? UnStored is the number is stored but not
index (analyzed), isn't it? Anyway, Field.Text doesn't work either.

TIA
Timo

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Search Discussions

  • Morus Walter at Mar 5, 2004 at 11:27 am

    lucene@nitwit.de writes:
    Hi!

    I want to store numbers (id) in my index:

    long id = 1069421083284;
    doc.add(Field.UnStored("in", String.valueOf(id)));

    But searching for "id:1069421083284" doesn't return any hits.
    If your field is named 'in' you shouldn't search in 'id'. Right?
    Well, did I misunderstand something? UnStored is the number is stored but not
    index (analyzed), isn't it? Anyway, Field.Text doesn't work either.
    Well, indexing and analyzing are different things.
    UnStored means, the number is not stored (as the name says) but indexed.
    And IIRC it's analyzed before indexing. Shouldn't make a difference for
    a single number.

    What I'd use in this case is an unstored keyword (given that you really don't
    want to have the id returned from lucene, which is the consequence of
    not storing).
    I'm not sure if there's a method to create such a field, but you can do it
    by setting the flags directly.

    HTH
    Morus

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Lucene at Mar 5, 2004 at 1:54 pm

    On Friday 05 March 2004 12:27, Morus Walter wrote:
    doc.add(Field.UnStored("in", String.valueOf(id)));

    But searching for "id:1069421083284" doesn't return any hits.
    If your field is named 'in' you shouldn't search in 'id'. Right?

    Well, indexing and analyzing are different things.
    UnStored means, the number is not stored (as the name says) but indexed.
    And IIRC it's analyzed before indexing. Shouldn't make a difference for
    a single number.

    What I'd use in this case is an unstored keyword (given that you really
    don't want to have the id returned from lucene, which is the consequence of
    not storing).
    Sorry, typo :-)

    I do have severeal docs in index and each doc does have an id. And I just want
    to find a particular doc by its id.

    doc.add(Field.UnIndexed("id", String.valueOf(id)));

    doesn't work either. And as I mentioned not even Field.Text does work....

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Otis Gospodnetic at Mar 5, 2004 at 2:43 pm
    Either store it as a Keyword Field, which does not get Analyzed, or use
    that per-field Analyzer wrapper class.
    Your problem is most likely that you are using something like
    StandardAnalyzer that, I believe, throws out numbers from its input
    before indexing (i.e. your numbers are not getting indexed in the first
    place). Try with Field.Keyword.

    Otis

    --- lucene@nitwit.de wrote:
    Hi!

    I want to store numbers (id) in my index:

    long id = 1069421083284;
    doc.add(Field.UnStored("in", String.valueOf(id)));

    But searching for "id:1069421083284" doesn't return any hits.

    Well, did I misunderstand something? UnStored is the number is stored
    but not
    index (analyzed), isn't it? Anyway, Field.Text doesn't work either.

    TIA
    Timo

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Claude Devarenne at Mar 5, 2004 at 4:27 pm
    Hi,

    I thought it is the StopAnalyzer that weeds out numbers. StandardAnalyzer
    keeps in them in I believe as I ran into the same issue using the jsp demo
    code and just replaced StopAnalyzer with StandardAnalyzer and numbers
    were searchable. This assumes you index with a StandardAnalyzer though.

    Claude
    At 06:42 AM 3/5/2004 -0800, Otis Gospodnetic wrote:
    Either store it as a Keyword Field, which does not get Analyzed, or use
    that per-field Analyzer wrapper class.
    Your problem is most likely that you are using something like
    StandardAnalyzer that, I believe, throws out numbers from its input
    before indexing (i.e. your numbers are not getting indexed in the first
    place). Try with Field.Keyword.

    Otis

    --- lucene@nitwit.de wrote:
    Hi!

    I want to store numbers (id) in my index:

    long id = 1069421083284;
    doc.add(Field.UnStored("in", String.valueOf(id)));

    But searching for "id:1069421083284" doesn't return any hits.

    Well, did I misunderstand something? UnStored is the number is stored
    but not
    index (analyzed), isn't it? Anyway, Field.Text doesn't work either.

    TIA
    Timo

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Lucene at Mar 5, 2004 at 4:47 pm

    On Friday 05 March 2004 15:42, Otis Gospodnetic wrote:
    Try with Field.Keyword.
    Ok, works.

    Another problem: Range searches don't work.

    "id:(1 TO 1069421083284)"

    does return only 1 hit - 1069421083284.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Erik Hatcher at Mar 5, 2004 at 5:02 pm
    Terms in Lucene are text. If you want to deal with number ranges, you
    need to pad them.

    "000000000001" for example. Be sure all numbers have the same width
    and zero padded.

    Lucene use lexicographical ordering, so you must be sure things collate
    in this way.

    Erik
    On Mar 5, 2004, at 11:46 AM, lucene@nitwit.de wrote:
    On Friday 05 March 2004 15:42, Otis Gospodnetic wrote:
    Try with Field.Keyword.
    Ok, works.

    Another problem: Range searches don't work.

    "id:(1 TO 1069421083284)"

    does return only 1 hit - 1069421083284.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Stephane James Vaucher at Mar 5, 2004 at 5:19 pm
    Weird idea, how about transforming your long into a Date and using a
    DateFilter to use a ranged query?

    sv
    On Fri, 5 Mar 2004, Erik Hatcher wrote:

    Terms in Lucene are text. If you want to deal with number ranges, you
    need to pad them.

    "000000000001" for example. Be sure all numbers have the same width
    and zero padded.

    Lucene use lexicographical ordering, so you must be sure things collate
    in this way.

    Erik
    On Mar 5, 2004, at 11:46 AM, lucene@nitwit.de wrote:
    On Friday 05 March 2004 15:42, Otis Gospodnetic wrote:
    Try with Field.Keyword.
    Ok, works.

    Another problem: Range searches don't work.

    "id:(1 TO 1069421083284)"

    does return only 1 hit - 1069421083284.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Lucene at Mar 5, 2004 at 5:34 pm

    On Friday 05 March 2004 18:01, Erik Hatcher wrote:
    "000000000001" for example. Be sure all numbers have the same width
    and zero padded.
    And what about a range like 100 TO 1000?

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Stephane James Vaucher at Mar 5, 2004 at 5:39 pm

    On Fri, 5 Mar 2004 lucene@nitwit.de wrote:
    On Friday 05 March 2004 18:01, Erik Hatcher wrote:
    "000000000001" for example. Be sure all numbers have the same width
    and zero padded.
    And what about a range like 100 TO 1000?
    You mean 0100 To 1000 or 000000000000100 to 000000000001000 ;)

    sv


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Erik Hatcher at Mar 5, 2004 at 9:16 pm
    Another quite cool option is to subclass QueryParser, and override
    getRangeQuery. Do the padding there. This will allow users to type in
    normal looking numbers, and the padding happens automatically. You'll
    need to be sure that numbers padded during indexing matches what
    getRangeQuery does (oh, say through a common function :).

    In fact, this is a great example for LIA. I'll add it! And I'll post
    the code back here in a day or so after I write it.

    Erik

    On Mar 5, 2004, at 12:34 PM, lucene@nitwit.de wrote:
    On Friday 05 March 2004 18:01, Erik Hatcher wrote:
    "000000000001" for example. Be sure all numbers have the same width
    and zero padded.
    And what about a range like 100 TO 1000?

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Erik Hatcher at Mar 6, 2004 at 12:18 am

    On Mar 5, 2004, at 4:16 PM, Erik Hatcher wrote:
    Another quite cool option is to subclass QueryParser, and override
    getRangeQuery. Do the padding there. This will allow users to type
    in normal looking numbers, and the padding happens automatically.
    You'll need to be sure that numbers padded during indexing matches
    what getRangeQuery does (oh, say through a common function :).
    Ok, here is a solution to storing integers and being able to use
    QueryParser cleanly. First a utility to pad the numbers:

    public class NumberUtils {
    private static final DecimalFormat formatter =
    new DecimalFormat("00000"); // make this as wide as you need

    public static String pad(int n) {
    return formatter.format(n);
    }
    }

    Index the relevant fields using the pad function:

    doc.add(Field.Keyword("id", NumberUtils.pad(i)));

    Create a custom QueryParser subclass:

    public class CustomQueryParser extends QueryParser {
    public CustomQueryParser(String field, Analyzer analyzer) {
    super(field, analyzer);
    }

    protected Query getRangeQuery(String field, Analyzer analyzer,
    String part1, String part2,
    boolean inclusive)
    throws ParseException {
    if ("id".equals(field)) {
    try {
    int num1 = Integer.parseInt(part1);
    int num2 = Integer.parseInt(part2);
    return new RangeQuery(new Term(field, NumberUtils.pad(num1)),
    new Term(field, NumberUtils.pad(num2)),
    inclusive);
    } catch (NumberFormatException e) {
    throw new ParseException(e.getMessage());
    }
    }

    return super.getRangeQuery(field, analyzer, part1, part2,
    inclusive);
    }
    }

    Only the "id" field is treated special, but your logic may vary.

    Then use the custom QueryParser:

    CustomQueryParser parser =
    new CustomQueryParser("field", analyzer);

    Query query = parser.parse("id:[37 TO 346]");

    assertEquals("padded", "id:[00037 TO 00346]",
    query.toString("field"));

    Thanks for the idea for a good example for the upcoming Lucene in
    Action book... it's been added!

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Lucene at Mar 7, 2004 at 11:28 am

    On Fri, 5 Mar 2004 19:18:04 -0500, Erik Hatcher wrote:

    Thanks for the idea for a good example for the upcoming Lucene in Action
    book... it's been added!
    Thanks for mentioning me in the book ;)

    What about boolean fields? It's certainly not a good idea to use "true" or
    "false" strings...

    BTW, isn't it slow to treat everything as strings?

    Timo

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Erik Hatcher at Mar 7, 2004 at 2:21 pm

    On Mar 7, 2004, at 6:27 AM, lucene@nitwit.de wrote:
    On Fri, 5 Mar 2004 19:18:04 -0500, Erik Hatcher
    wrote:
    Thanks for the idea for a good example for the upcoming Lucene in
    Action
    book... it's been added!
    Thanks for mentioning me in the book ;)
    Well, I actually already had a comment in the book about why you'd
    override getRangeQuery, and it said this:

    * handle number ranges by padding to match how numbers were indexed

    You did give me the incentive to flesh this out into an example.

    I also created a variant of this to parse range queries like this
    field:[1/1/04 TO 12/31/04] into YYYYMMDD syntax so it becomes
    field:[20040101 TO 20041231]. This is very handy when dealing with
    dates in a typically more sensible YYYYMMDD format and allowing users
    to deal with them naturally also.
    What about boolean fields? It's certainly not a good idea to use
    "true" or
    "false" strings...
    What about them? It all depends on how you want users to be able to
    query based on that flag. Do you want them to say field:true?
    field:on? field:yes? How you translate things in QueryParser is up to
    you - and this may of course have some impact on how you index. You
    could use "0" and "1" instead, and do the translation in a QueryParser
    subclass if you like.
    BTW, isn't it slow to treat everything as strings?
    Ummm, yeah.... Lucene is real slow! :)

    You tell us.... is it slow with your data and environment? If so, give
    us some more details on the scenario.

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Doug Cutting at Mar 8, 2004 at 6:02 pm

    Erik Hatcher wrote:
    private static final DecimalFormat formatter =
    new DecimalFormat("00000"); // make this as wide as you need
    For ints, ten digits is probably safest. Since Lucene uses prefix
    compression on the term dictionary, you don't pay a penalty at search
    time for long shared prefixes.

    Doug

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Timothy Stone at Mar 9, 2004 at 7:39 pm

    lucene@nitwit.de wrote:

    Hi!

    I want to store numbers (id) in my index:

    long id = 1069421083284;
    doc.add(Field.UnStored("in", String.valueOf(id)));

    But searching for "id:1069421083284" doesn't return any hits.

    Well, did I misunderstand something? UnStored is the number is stored but not
    index (analyzed), isn't it? Anyway, Field.Text doesn't work either.

    TIA
    Timo
    Craig Walls wrote an excellent article in JDJ at the end of 2002
    regarding Lucene (not shown in any of the resources BTW). He documents
    using Lucene along side a database as well as provides two classes (and
    others unrelated) that extend the functionality of the StopAnalyzer to
    include numbers and or alpha numerics.

    Check out the article at:
    http://www.sys-con.com/story/print.cfm?storyid=37296

    HTH,
    Tim

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Michael Giles at Mar 9, 2004 at 7:42 pm
    Tim,

    Looks like you can only access it with a subscription. :( Sounds good,
    though.

    -Mike
    At 02:39 PM 3/9/2004, you wrote:
    lucene@nitwit.de wrote:
    Hi!
    I want to store numbers (id) in my index:
    long id = 1069421083284;
    doc.add(Field.UnStored("in", String.valueOf(id)));
    But searching for "id:1069421083284" doesn't return any hits.
    Well, did I misunderstand something? UnStored is the number is stored but
    not index (analyzed), isn't it? Anyway, Field.Text doesn't work either.
    TIA
    Timo
    Craig Walls wrote an excellent article in JDJ at the end of 2002 regarding
    Lucene (not shown in any of the resources BTW). He documents using Lucene
    along side a database as well as provides two classes (and others
    unrelated) that extend the functionality of the StopAnalyzer to include
    numbers and or alpha numerics.

    Check out the article at: http://www.sys-con.com/story/print.cfm?storyid=37296

    HTH,
    Tim

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
    ________________________________________________________________________
    Save and share anything you find online - Furl @ http://www.furl.net
  • Timothy Stone at Mar 9, 2004 at 7:51 pm

    Michael Giles wrote:

    Tim,

    Looks like you can only access it with a subscription. :( Sounds good,
    though.

    -Mike
    Really? I don't have a subscription. Got to it via the archives actually
    now that I think about it:

    Try Volume 7, Issue 12.

    Sorry about that bad URL. But Sys-Con must set a cookie (yep) following
    the sub splash. Try the link again. I just deleted my cookie, got a
    sub-splash and then tried the archive again and it worked.

    Odd, but it works. Get it before sys-con is on to us. :)

    Tim
    At 02:39 PM 3/9/2004, you wrote:

    lucene@nitwit.de wrote:
    Hi!
    I want to store numbers (id) in my index:
    long id = 1069421083284;
    doc.add(Field.UnStored("in", String.valueOf(id)));
    But searching for "id:1069421083284" doesn't return any hits.
    Well, did I misunderstand something? UnStored is the number is stored
    but not index (analyzed), isn't it? Anyway, Field.Text doesn't work
    either.
    TIA
    Timo

    Craig Walls wrote an excellent article in JDJ at the end of 2002
    regarding Lucene (not shown in any of the resources BTW). He documents
    using Lucene along side a database as well as provides two classes
    (and others unrelated) that extend the functionality of the
    StopAnalyzer to include numbers and or alpha numerics.

    Check out the article at:
    http://www.sys-con.com/story/print.cfm?storyid=37296

    HTH,
    Tim

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
    ________________________________________________________________________
    Save and share anything you find online - Furl @ http://www.furl.net

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Lucene at Mar 10, 2004 at 3:23 pm

    On Tuesday 09 March 2004 20:51, Timothy Stone wrote:
    Michael Giles wrote:
    Tim,

    Looks like you can only access it with a subscription. :( Sounds good,
    though.
    Really? I don't have a subscription. Got to it via the archives actually
    now that I think about it:

    Try Volume 7, Issue 12.
    I also need an subscription for:
    http://www.sys-con.com/story/search.cfm?pub=1&ss=lucene

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Olga Dadasheva at Mar 10, 2004 at 4:20 pm
    Try this link and scroll to top:
    http://www.sys-con.com/story/?storyid=37296&DE=1#RES

    Thank you, Tim - excelent article.



    -----Original Message-----
    From: lucene@nitwit.de
    Sent: Wednesday, March 10, 2004 10:23 AM
    To: Lucene Users List
    Subject: Re: Storing numbers

    On Tuesday 09 March 2004 20:51, Timothy Stone wrote:
    Michael Giles wrote:
    Tim,

    Looks like you can only access it with a subscription. :( Sounds good,
    though.
    Really? I don't have a subscription. Got to it via the archives actually
    now that I think about it:

    Try Volume 7, Issue 12.
    I also need an subscription for:
    http://www.sys-con.com/story/search.cfm?pub=1&ss=lucene

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMar 5, '04 at 11:18a
activeMar 10, '04 at 4:20p
posts20
users10
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase