FAQ
Hi all,

I am in a fix regarding lucene search. I know a little bit about lucene and
have successfully created index and searched a lot of queries on that.
My main worry is that whenever I search for let say "000" it doesn't give me
any result while if I seach for "00000341" it'll give me a hit. Even if I
search for 341 it doesn't give me anything.

I have checked through luke also and luke is also showing no results.

Do I have to use some different analyzer? Currently I am using Keyword
Analyzer.

Thanks
Pranav

Search Discussions

  • Pranav goyal at Jun 22, 2011 at 8:17 am
    I can always use * , ?

    But here I am not talking of this. I just want to get everything which has
    341 in it. How to do it without * or ?
    On Wed, Jun 22, 2011 at 1:00 PM, Pranav goyal wrote:

    Hi all,

    I am in a fix regarding lucene search. I know a little bit about lucene and
    have successfully created index and searched a lot of queries on that.
    My main worry is that whenever I search for let say "000" it doesn't give
    me any result while if I seach for "00000341" it'll give me a hit. Even if I
    search for 341 it doesn't give me anything.

    I have checked through luke also and luke is also showing no results.

    Do I have to use some different analyzer? Currently I am using Keyword
    Analyzer.

    Thanks
    Pranav


    --
    I'm very responsible, when ever something goes wrong they always say I'm
    responsible --
  • Ian Lea at Jun 22, 2011 at 8:41 am
    What does Luke show as being indexed for that field? Other useful
    tips at http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F

    If that field is numeric you could use a NumericField - gets rid of
    problems with leading zeros.

    If by "I just want to get everything which has 341 in it" you mean you
    want to match aaa341bbb and 0000341 and 341, see related thread on
    this list from yesterday. Or
    org.apache.lucene.search.regex.RegexQuery.



    --
    Ian.


    On Wed, Jun 22, 2011 at 9:16 AM, Pranav goyal
    wrote:
    I can always use * , ?

    But here I am not talking of this. I just want to get everything which has
    341 in it. How to do it without * or ?
    On Wed, Jun 22, 2011 at 1:00 PM, Pranav goyal wrote:

    Hi all,

    I am in a fix regarding lucene search. I know a little bit about lucene and
    have successfully created index and searched a lot of queries on that.
    My main worry is that whenever I search for let say "000" it doesn't give
    me any result while if I seach for "00000341" it'll give me a hit. Even if I
    search for 341 it doesn't give me anything.

    I have checked through luke also and luke is also showing no results.

    Do I have to use some different analyzer? Currently I am using Keyword
    Analyzer.

    Thanks
    Pranav


    --
    I'm very responsible, when ever something goes wrong they always say I'm
    responsible --
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Pranav goyal at Jun 23, 2011 at 6:39 am
    I tried it and it worked, although it's having one peculiarity.

    When I search for Item_1 : it gives me 110 hits but when I use *Item_1* it
    gives me 0 hits. What mistake am I doing here?

    Also when I search for *341* it is giving me correct results i.e
    00000341-000-000-DR
    but it's not working for above case.


    Thanks
    Pranav
    On Wed, Jun 22, 2011 at 2:10 PM, Ian Lea wrote:

    What does Luke show as being indexed for that field? Other useful
    tips at
    http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F

    If that field is numeric you could use a NumericField - gets rid of
    problems with leading zeros.

    If by "I just want to get everything which has 341 in it" you mean you
    want to match aaa341bbb and 0000341 and 341, see related thread on
    this list from yesterday. Or
    org.apache.lucene.search.regex.RegexQuery.



    --
    Ian.


    On Wed, Jun 22, 2011 at 9:16 AM, Pranav goyal
    wrote:
    I can always use * , ?

    But here I am not talking of this. I just want to get everything which has
    341 in it. How to do it without * or ?

    On Wed, Jun 22, 2011 at 1:00 PM, Pranav goyal <
    pranavgoyal40341@gmail.com>wrote:
    Hi all,

    I am in a fix regarding lucene search. I know a little bit about lucene
    and
    have successfully created index and searched a lot of queries on that.
    My main worry is that whenever I search for let say "000" it doesn't
    give
    me any result while if I seach for "00000341" it'll give me a hit. Even
    if I
    search for 341 it doesn't give me anything.

    I have checked through luke also and luke is also showing no results.

    Do I have to use some different analyzer? Currently I am using Keyword
    Analyzer.

    Thanks
    Pranav


    --
    I'm very responsible, when ever something goes wrong they always say I'm
    responsible --
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    I'm very responsible, when ever something goes wrong they always say I'm
    responsible --
  • Ian Lea at Jun 23, 2011 at 8:51 am
    What exactly is "it"? Show us what you are indexing, how, and how you
    are building the query and we may be able to help.

    Whenever I see a report of incorrect results on a Mixed Case field I
    always suspect that the term is being lowercased on indexing and not
    at searching, or vice versa.

    --
    Ian.


    On Thu, Jun 23, 2011 at 7:37 AM, Pranav goyal
    wrote:
    I tried it and it worked, although it's having one peculiarity.

    When I search for Item_1 : it gives me 110 hits but when I use *Item_1* it
    gives me 0 hits. What mistake am I doing here?

    Also when I search for *341* it is giving me correct results i.e
    00000341-000-000-DR
    but it's not working for above case.


    Thanks
    Pranav
    On Wed, Jun 22, 2011 at 2:10 PM, Ian Lea wrote:

    What does Luke show as being indexed for that field?  Other useful
    tips at
    http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F

    If that field is numeric you could use a NumericField - gets rid of
    problems with leading zeros.

    If by "I just want to get everything which has 341 in it" you mean you
    want to match aaa341bbb and 0000341 and 341, see related thread on
    this list from yesterday.  Or
    org.apache.lucene.search.regex.RegexQuery.



    --
    Ian.


    On Wed, Jun 22, 2011 at 9:16 AM, Pranav goyal
    wrote:
    I can always use * , ?

    But here I am not talking of this. I just want to get everything which has
    341 in it. How to do it without * or ?

    On Wed, Jun 22, 2011 at 1:00 PM, Pranav goyal <
    pranavgoyal40341@gmail.com>wrote:
    Hi all,

    I am in a fix regarding lucene search. I know a little bit about lucene
    and
    have successfully created index and searched a lot of queries on that.
    My main worry is that whenever I search for let say "000" it doesn't
    give
    me any result while if I seach for "00000341" it'll give me a hit. Even
    if I
    search for 341 it doesn't give me anything.

    I have checked through luke also and luke is also showing no results.

    Do I have to use some different analyzer? Currently I am using Keyword
    Analyzer.

    Thanks
    Pranav


    --
    I'm very responsible, when ever something goes wrong they always say I'm
    responsible --
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    I'm very responsible, when ever something goes wrong they always say I'm
    responsible --
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Pranav goyal at Jun 23, 2011 at 9:09 am
    Here's the code which I am implementing (Indexing and Searching codes are in
    different files)

    Indexing Part :

    d=new Document();
    File indexDir = new File("index-dir");
    KeywordAnalyzer analyzer = new KeywordAnalyzer();


    IndexWriterConfig conf = new IndexWriterConfig(Version.LUCENE_31,
    analyzer);
    try {
    writer = new IndexWriter(FSDirectory.open(indexDir),conf);
    } catch (IOException e1) {
    e1.printStackTrace();
    }
    String q1 = contract.getDocId();
    String q2 = contract.getDocName();
    String q3 = contract.getCustomer(ctx).getMemberName();

    Term term = new Term("DocId",contract.getDocId());
    writer.deleteDocuments(term);

    d.add(new
    Field("DocId",q1,Field.Store.YES,Field.Index.NOT_ANALYZED));
    d.add(new Field("All",q2,Field.Store.NO,Field.Index.NOT_ANALYZED));
    d.add(new Field("Cust",q3,Field.Store.NO,Field.Index.NOT_ANALYZED));

    try {
    writer.addDocument(d);
    writer.close();
    endTime = System.currentTimeMillis();
    //System.out.println("Time taken to index the contract with
    DocID "+q1 +" is -> " +(endTime-startTime));
    }

    catch (IOException e1) {
    e1.printStackTrace();
    }


    Searching Code :

    File indexDir = new File("index-dir");
    KeywordAnalyzer analyzer = new KeywordAnalyzer();
    IndexSearcher searcher = null;

    searcher = new IndexSearcher(FSDirectory.open(indexDir));


    String[] fields = new String[] { "DocId","Item","Cust","All"};
    MultiFieldQueryParser parser = new
    MultiFieldQueryParser(Version.LUCENE_31,fields,analyzer);
    parser.setAllowLeadingWildcard(true);

    String queryString = field.getValue().toString();
    TopDocs results = null;


    Query query1;
    query1 = parser.parse(queryString);
    results = searcher.search(query1,1000);


    System.out.println("total hits: " + results.totalHits);
    ScoreDoc[] hits = results.scoreDocs;
    Document doc = null;
    ArrayList docIds = new ArrayList();
    for (ScoreDoc hit : hits)
    {
    doc = searcher.doc(hit.doc);
    System.out.println(doc.get("DocId"));

    ((ArrayList) docIds).add(doc.get("DocId"));

    }
    // Function which you need not to understand
    IMnCriterion criterion =
    contractQuery.createInCriterion(contractQuery.ATTR_P_DOC_ID, docIds);
    contractQuery.setCriterion(criterion);
    searcher.close();
    }
  • Ian Lea at Jun 23, 2011 at 9:32 am
    Looks OK to me. You are searching on Item without adding any docs
    with that field, you could use writer.updateDocument() rather than
    delete and add, but those are just quibbles and don't explain your
    searching problem.

    Having done most of the hard work, why don't you adapt the code you
    posted into a simple standalone program or test case that demonstrates
    the problem. As simple as possible, no external dependencies, clearly
    showing what you are indexing and what you are searching on, with one
    search that works and one that doesn't.

    One warning: using MultiFieldQueryParser with leading wildcards is
    pretty much guaranteed to be slow on a large index.


    --
    Ian.


    On Thu, Jun 23, 2011 at 10:08 AM, Pranav goyal
    wrote:
    Here's the code which I am implementing (Indexing and Searching codes are in
    different files)

    Indexing Part :

    d=new Document();
    File indexDir = new File("index-dir");
    KeywordAnalyzer analyzer = new KeywordAnalyzer();


    IndexWriterConfig conf = new IndexWriterConfig(Version.LUCENE_31,
    analyzer);
    try {
    writer = new IndexWriter(FSDirectory.open(indexDir),conf);
    } catch (IOException e1) {
    e1.printStackTrace();
    }
    String q1 = contract.getDocId();
    String q2 = contract.getDocName();
    String q3 = contract.getCustomer(ctx).getMemberName();

    Term term = new Term("DocId",contract.getDocId());
    writer.deleteDocuments(term);

    d.add(new
    Field("DocId",q1,Field.Store.YES,Field.Index.NOT_ANALYZED));
    d.add(new Field("All",q2,Field.Store.NO,Field.Index.NOT_ANALYZED));
    d.add(new Field("Cust",q3,Field.Store.NO,Field.Index.NOT_ANALYZED));

    try {
    writer.addDocument(d);
    writer.close();
    endTime = System.currentTimeMillis();
    //System.out.println("Time taken to index the contract with
    DocID "+q1 +" is -> " +(endTime-startTime));
    }

    catch (IOException e1) {
    e1.printStackTrace();
    }


    Searching Code :

    File indexDir = new File("index-dir");
    KeywordAnalyzer analyzer = new KeywordAnalyzer();
    IndexSearcher searcher = null;

    searcher = new IndexSearcher(FSDirectory.open(indexDir));


    String[] fields = new String[] { "DocId","Item","Cust","All"};
    MultiFieldQueryParser parser = new
    MultiFieldQueryParser(Version.LUCENE_31,fields,analyzer);
    parser.setAllowLeadingWildcard(true);

    String queryString = field.getValue().toString();
    TopDocs results = null;


    Query query1;
    query1 = parser.parse(queryString);
    results = searcher.search(query1,1000);


    System.out.println("total hits: " + results.totalHits);
    ScoreDoc[] hits = results.scoreDocs;
    Document doc = null;
    ArrayList docIds =  new ArrayList();
    for (ScoreDoc hit : hits)
    {
    doc = searcher.doc(hit.doc);
    System.out.println(doc.get("DocId"));

    ((ArrayList) docIds).add(doc.get("DocId"));

    }
    // Function which you need not to understand
    IMnCriterion criterion =
    contractQuery.createInCriterion(contractQuery.ATTR_P_DOC_ID, docIds);
    contractQuery.setCriterion(criterion);
    searcher.close();
    }
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Digy digy at Jun 23, 2011 at 9:46 am
    Maybe, you need
    queryParser.setLowercaseExpandedTerms(false)

    DIGY
    On Thu, Jun 23, 2011 at 9:37 AM, Pranav goyal wrote:

    I tried it and it worked, although it's having one peculiarity.

    When I search for Item_1 : it gives me 110 hits but when I use *Item_1* it
    gives me 0 hits. What mistake am I doing here?

    Also when I search for *341* it is giving me correct results i.e
    00000341-000-000-DR
    but it's not working for above case.


    Thanks
    Pranav
    On Wed, Jun 22, 2011 at 2:10 PM, Ian Lea wrote:

    What does Luke show as being indexed for that field? Other useful
    tips at
    http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F
    If that field is numeric you could use a NumericField - gets rid of
    problems with leading zeros.

    If by "I just want to get everything which has 341 in it" you mean you
    want to match aaa341bbb and 0000341 and 341, see related thread on
    this list from yesterday. Or
    org.apache.lucene.search.regex.RegexQuery.



    --
    Ian.


    On Wed, Jun 22, 2011 at 9:16 AM, Pranav goyal
    wrote:
    I can always use * , ?

    But here I am not talking of this. I just want to get everything which has
    341 in it. How to do it without * or ?

    On Wed, Jun 22, 2011 at 1:00 PM, Pranav goyal <
    pranavgoyal40341@gmail.com>wrote:
    Hi all,

    I am in a fix regarding lucene search. I know a little bit about
    lucene
    and
    have successfully created index and searched a lot of queries on that.
    My main worry is that whenever I search for let say "000" it doesn't
    give
    me any result while if I seach for "00000341" it'll give me a hit.
    Even
    if I
    search for 341 it doesn't give me anything.

    I have checked through luke also and luke is also showing no results.

    Do I have to use some different analyzer? Currently I am using Keyword
    Analyzer.

    Thanks
    Pranav


    --
    I'm very responsible, when ever something goes wrong they always say
    I'm
    responsible --
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    I'm very responsible, when ever something goes wrong they always say I'm
    responsible --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 22, '11 at 7:31a
activeJun 23, '11 at 9:46a
posts8
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase