FAQ
Hi all,
I want to query part of a digital string:
say indexed token is "123456789"
I want to query 56789 to match this token
The "Query Parser Syntax" says wildcard search can not
be the first char. So "*56789" is not allowed
How can I do that ?
Thanks.

--

Best Regards,
ZHAO, Wenbo

=======================

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • AHMET ARSLAN at Nov 9, 2009 at 6:13 am

    Hi all,
    I want to query part of a digital string:
    say indexed token is "123456789"
    I want to query 56789 to match this token
    The "Query Parser Syntax" says wildcard search can not
    be the first char.  So "*56789" is not allowed
    How can I do that ?
    Thanks.
    With org.apache.lucene.queryParser.QueryParser's

    void setAllowLeadingWildcard(boolean allowLeadingWildcard)
    Set to true to allow leading wildcard characters.

    method.

    __________________________________________________
    Do You Yahoo!?
    Tired of spam? Yahoo! Mail has the best spam protection around
    http://mail.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Wenbo Zhao at Nov 9, 2009 at 6:21 am
    Thanks a lot. I'm such a fool

    BTW, where can I find better doc other than javadoc ?
    Or how do you get into lucene docs ? I'm really a little crazy
    reading javadoc, all concepts are split into fragments.


    2009/11/9 AHMET ARSLAN <iorixxx@yahoo.com>:
    Hi all,
    I want to query part of a digital string:
    say indexed token is "123456789"
    I want to query 56789 to match this token
    The "Query Parser Syntax" says wildcard search can not
    be the first char.  So "*56789" is not allowed
    How can I do that ?
    Thanks.
    With org.apache.lucene.queryParser.QueryParser's

    void setAllowLeadingWildcard(boolean allowLeadingWildcard)
    Set to true to allow leading wildcard characters.

    method.

    __________________________________________________
    Do You Yahoo!?
    Tired of spam?  Yahoo! Mail has the best spam protection around
    http://mail.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --

    Best Regards,
    ZHAO, Wenbo

    =======================

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • AHMET ARSLAN at Nov 9, 2009 at 6:57 am

    Thanks a lot.  I'm such a fool

    BTW, where can I find better doc other than javadoc ?
    Or how do you get into lucene docs ?   I'm
    really a little crazy
    reading javadoc, all concepts are split into fragments.
    Lucene in Action [1] Second Edition is excellent.

    [1] http://www.manning.com/hatcher3/

    You can get early access edition right now.




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Wenbo Zhao at Nov 9, 2009 at 7:20 am
    Thanks.

    2009/11/9 AHMET ARSLAN <iorixxx@yahoo.com>:
    Thanks a lot.  I'm such a fool

    BTW, where can I find better doc other than javadoc ?
    Or how do you get into lucene docs ?   I'm
    really a little crazy
    reading javadoc, all concepts are split into fragments.
    Lucene in Action [1] Second Edition is excellent.

    [1] http://www.manning.com/hatcher3/

    You can get early access edition right now.




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --

    Best Regards,
    ZHAO, Wenbo

    =======================

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Uwe Schindler at Nov 9, 2009 at 6:58 am
    If you *only* want to do wildcard queries on that field with a * in front, I
    would suggest to reverse the string so the query uses the * at the end.
    Wildcards at the beginning are very slow, because every term from this field
    has to be enumerated. If the wildcard is at the end, because you made all
    strings backwards during indexing it gets thousands of times faster.

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Wenbo Zhao
    Sent: Monday, November 09, 2009 7:21 AM
    To: java-user@lucene.apache.org
    Subject: Re: how to match a term within digital strings?

    Thanks a lot. I'm such a fool

    BTW, where can I find better doc other than javadoc ?
    Or how do you get into lucene docs ? I'm really a little crazy
    reading javadoc, all concepts are split into fragments.


    2009/11/9 AHMET ARSLAN <iorixxx@yahoo.com>:
    Hi all,
    I want to query part of a digital string:
    say indexed token is "123456789"
    I want to query 56789 to match this token
    The "Query Parser Syntax" says wildcard search can not
    be the first char.  So "*56789" is not allowed
    How can I do that ?
    Thanks.
    With org.apache.lucene.queryParser.QueryParser's

    void setAllowLeadingWildcard(boolean allowLeadingWildcard)
    Set to true to allow leading wildcard characters.

    method.

    __________________________________________________
    Do You Yahoo!?
    Tired of spam?  Yahoo! Mail has the best spam protection around
    http://mail.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --

    Best Regards,
    ZHAO, Wenbo

    =======================

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Wenbo Zhao at Nov 9, 2009 at 7:13 am
    Thanks. But i'm indexing number strings like phone numbers and card
    numbers, all kinds of search are possible.
    And fortunately my application is not strict on fast response.
    A search in several seconds is acceptable :-)


    2009/11/9 Uwe Schindler <uwe@thetaphi.de>:
    If you *only* want to do wildcard queries on that field with a * in front, I
    would suggest to reverse the string so the query uses the * at the end.
    Wildcards at the beginning are very slow, because every term from this field
    has to be enumerated. If the wildcard is at the end, because you made all
    strings backwards during indexing it gets thousands of times faster.

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Wenbo Zhao
    Sent: Monday, November 09, 2009 7:21 AM
    To: java-user@lucene.apache.org
    Subject: Re: how to match a term within digital strings?

    Thanks a lot.  I'm such a fool

    BTW, where can I find better doc other than javadoc ?
    Or how do you get into lucene docs ?   I'm really a little crazy
    reading javadoc, all concepts are split into fragments.


    2009/11/9 AHMET ARSLAN <iorixxx@yahoo.com>:
    Hi all,
    I want to query part of a digital string:
    say indexed token is "123456789"
    I want to query 56789 to match this token
    The "Query Parser Syntax" says wildcard search can not
    be the first char.  So "*56789" is not allowed
    How can I do that ?
    Thanks.
    With org.apache.lucene.queryParser.QueryParser's

    void setAllowLeadingWildcard(boolean allowLeadingWildcard)
    Set to true to allow leading wildcard characters.

    method.

    __________________________________________________
    Do You Yahoo!?
    Tired of spam?  Yahoo! Mail has the best spam protection around
    http://mail.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --

    Best Regards,
    ZHAO, Wenbo

    =======================

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --

    Best Regards,
    ZHAO, Wenbo

    =======================

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Wenbo Zhao at Nov 9, 2009 at 6:14 am
    Hi all,
    I think I got an approach, it may not be the best but it works.
    My code is as following, work as query of "*19810919*"
    IndexSearcher isearcher = new IndexSearcher(directory, true);
    IndexReader ir = isearcher.getIndexReader();
    TermEnum te = ir.terms();
    List<String> result = new ArrayList<String>();
    while(te.next()){
    Term t = te.term();
    String text = t.text();
    if(text.indexOf("19810919") >= 0) result.add(text);
    }
    QueryParser parser = new QueryParser(field, analyzer);
    StringBuilder sb = new StringBuilder();
    for(String s : result) sb.append(s).append(' ');
    Query query = parser.parse(sb.toString());
    ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;
    System.out.println(query+"="+hits.length);
    for (int i = 0; i < hits.length; i++) { // Iterate through the results:
    Document hitDoc = isearcher.doc(hits[i].doc);
    System.out.println("ID="+hitDoc.get("id"));
    }
    My index is about 30M, contains 13k+ docs and 1978k+ terms, all digit strings.
    the loop enumerate term used 2.33 seconds.
    But my goal is 3000 times of this index size, I'm afraid enumerate
    term will cause too much time.

    Anybody has better idea ?



    2009/11/9 Wenbo Zhao <wbzhao@travelsky.com>:
    Hi all,
    I want to query part of a digital string:
    say indexed token is "123456789"
    I want to query 56789 to match this token
    The "Query Parser Syntax" says wildcard search can not
    be the first char.  So "*56789" is not allowed
    How can I do that ?
    Thanks.

    --

    Best Regards,
    ZHAO, Wenbo

    =======================


    --

    Best Regards,
    ZHAO, Wenbo

    =======================

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedNov 9, '09 at 4:44a
activeNov 9, '09 at 7:20a
posts8
users4
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase