FAQ
Hi all,

I'm not able to see what's wrong in the following sample code.
I'm indexing a document with 5 fields, using five different indexing strategies.
I'm fine the the results for 4 of them, but field B is causing me some
trouble in understanding what's going on.

The value of field B is X (uppercase).
The analyzer is a SimpleAnalyzer, which I use on the QueryParser as well.
But when I search for X (uppercase) on field B, the X is converted to lowercase.
Now, I know that SimpleAnalyzer converts to lowercase, but I was
expecting it not to do so on field B, which is NOT_ANALYZED.

How should I fix my code?

Thank you in advance!
-John



--- code ---


package test;

import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TopDocCollector;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.queryParser.QueryParser;



public class Test
{
public static void main(String[] args)
{
try
{
RAMDirectory idx = new RAMDirectory();
SimpleAnalyzer analyzer = new SimpleAnalyzer();

IndexWriter writer = new IndexWriter(idx, analyzer, true,
IndexWriter.MaxFieldLength.LIMITED);

Document doc = new Document();
doc.add(new Field("A", "X",
Field.Store.YES, Field.Index.NO));
doc.add(new Field("B", "X",
Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("C", "X",
Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field("D", "x",
Field.Store.NO, Field.Index.NOT_ANALYZED));
doc.add(new Field("E", "X",
Field.Store.NO, Field.Index.ANALYZED));
writer.addDocument(doc);
writer.close();

IndexSearcher searcher = new IndexSearcher(idx);
String field = "B";
QueryParser parser = new QueryParser(field, analyzer);
Query query = parser.parse("X");
System.out.println("Query: " + query.toString());

TopDocCollector collector = new TopDocCollector(1);
searcher.search(query, collector);
int numHits = collector.getTotalHits();
System.out.println(numHits + " total matching documents");

if ( numHits > 0)
{
ScoreDoc[] hits = collector.topDocs().scoreDocs;
doc = searcher.doc(hits[0].doc);
System.out.println("A: " + doc.get("A"));
System.out.println("B: " + doc.get("B"));
System.out.println("C: " + doc.get("C"));
System.out.println("D: " + doc.get("D"));
System.out.println("E: " + doc.get("E"));
}
}
catch (Exception e)
{
System.out.println(" caught a " + e.getClass() + "\n with message: "
+ e.getMessage());
}
}

}

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Ian Lea at Mar 5, 2009 at 3:30 pm
    Hi


    I think that the SimpleAnalyzer you are passing to the query parser
    will be downcasing the X. You can fix it using an analyzer that
    doesn't convert to lower case, creating the query directly in code, or
    by using PerFieldAnalyzerWrapper, and no doubt other ways too.

    If you want a direct suggestion: use PerFieldAnalyzerWrapper,
    specifying a different analyzer for field B.


    --
    Ian.

    On Thu, Mar 5, 2009 at 3:17 PM, John Marks wrote:
    Hi all,

    I'm not able to see what's wrong in the following sample code.
    I'm indexing a document with 5 fields, using five different indexing strategies.
    I'm fine the the results for 4 of them, but field B is causing me some
    trouble in understanding what's going on.

    The value of field B is X (uppercase).
    The analyzer is a SimpleAnalyzer, which I use on the QueryParser as well.
    But when I search for X (uppercase) on field B, the X is converted to lowercase.
    Now, I know that SimpleAnalyzer converts to lowercase, but I was
    expecting it not to do so on field B, which is NOT_ANALYZED.

    How should I fix my code?

    Thank you in advance!
    -John



    --- code ---


    package test;

    import org.apache.lucene.analysis.SimpleAnalyzer;
    import org.apache.lucene.store.RAMDirectory;
    import org.apache.lucene.index.IndexWriter;
    import org.apache.lucene.search.IndexSearcher;
    import org.apache.lucene.search.Query;
    import org.apache.lucene.search.TopDocCollector;
    import org.apache.lucene.search.ScoreDoc;
    import org.apache.lucene.document.Document;
    import org.apache.lucene.document.Field;
    import org.apache.lucene.queryParser.QueryParser;



    public class Test
    {
    public static void main(String[] args)
    {
    try
    {
    RAMDirectory idx = new RAMDirectory();
    SimpleAnalyzer analyzer = new SimpleAnalyzer();

    IndexWriter writer = new IndexWriter(idx, analyzer, true,
    IndexWriter.MaxFieldLength.LIMITED);

    Document doc = new Document();
    doc.add(new Field("A", "X",
    Field.Store.YES, Field.Index.NO));
    doc.add(new Field("B", "X",
    Field.Store.YES, Field.Index.NOT_ANALYZED));
    doc.add(new Field("C", "X",
    Field.Store.YES, Field.Index.ANALYZED));
    doc.add(new Field("D", "x",
    Field.Store.NO, Field.Index.NOT_ANALYZED));
    doc.add(new Field("E", "X",
    Field.Store.NO, Field.Index.ANALYZED));
    writer.addDocument(doc);
    writer.close();

    IndexSearcher searcher = new IndexSearcher(idx);
    String field = "B";
    QueryParser parser = new QueryParser(field, analyzer);
    Query query = parser.parse("X");
    System.out.println("Query: " + query.toString());

    TopDocCollector collector = new TopDocCollector(1);
    searcher.search(query, collector);
    int numHits = collector.getTotalHits();
    System.out.println(numHits + " total matching documents");

    if ( numHits > 0)
    {
    ScoreDoc[] hits = collector.topDocs().scoreDocs;
    doc = searcher.doc(hits[0].doc);
    System.out.println("A: " + doc.get("A"));
    System.out.println("B: " + doc.get("B"));
    System.out.println("C: " + doc.get("C"));
    System.out.println("D: " + doc.get("D"));
    System.out.println("E: " + doc.get("E"));
    }
    }
    catch (Exception e)
    {
    System.out.println(" caught a " + e.getClass() + "\n with message: "
    + e.getMessage());
    }
    }

    }

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • John Marks at Mar 6, 2009 at 6:35 am
    Thank you Ian,
    If you want a direct suggestion: use PerFieldAnalyzerWrapper,
    specifying a different analyzer for field B.


    --
    Ian.

    this makes a lot of sense.

    -John

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • John Marks at Mar 6, 2009 at 9:33 am
    Another problem.

    Using the PerFieldAnalyzerWrapper solves the case where I have a
    simple query, such as the following:
    Query query = parser.parse("X");
    or
    Query query = parser.parse("X OR Y");
    but if I use a more complex query like the following:
    Query query = parser.parse("[A TO Z]");
    then, again, the parser transforms the query to lowercase, as shown in
    the code below.

    Output is:
    Query: B:[a TO z]
    0 total matching documents
    while I would have expected to get
    Query: B:[A TO Z]
    ...

    This means that even the KeywordAnalyzer converts A and Z to lowercase
    in the range query?

    Should I report this as a bug?

    -John



    --- code ---
    package test;

    import org.apache.lucene.analysis.PerFieldAnalyzerWrapper;
    import org.apache.lucene.analysis.SimpleAnalyzer;
    import org.apache.lucene.analysis.KeywordAnalyzer;
    import org.apache.lucene.store.RAMDirectory;
    import org.apache.lucene.index.IndexWriter;
    import org.apache.lucene.search.IndexSearcher;
    import org.apache.lucene.search.Query;
    import org.apache.lucene.search.TopDocCollector;
    import org.apache.lucene.search.ScoreDoc;
    import org.apache.lucene.document.Document;
    import org.apache.lucene.document.Field;
    import org.apache.lucene.queryParser.QueryParser;



    public class Test
    {
    public static void main(String[] args)
    {
    try
    {
    RAMDirectory idx = new RAMDirectory();

    PerFieldAnalyzerWrapper aWrapper =
    new PerFieldAnalyzerWrapper(new SimpleAnalyzer());
    aWrapper.addAnalyzer("B", new KeywordAnalyzer());

    IndexWriter writer = new IndexWriter(idx, aWrapper, true,
    IndexWriter.MaxFieldLength.LIMITED);

    Document doc = new Document();
    doc.add(new Field("A", "X",
    Field.Store.YES, Field.Index.NO));
    doc.add(new Field("B", "X",
    Field.Store.YES, Field.Index.NOT_ANALYZED));
    doc.add(new Field("C", "X",
    Field.Store.YES, Field.Index.ANALYZED));
    doc.add(new Field("D", "X",
    Field.Store.NO, Field.Index.NOT_ANALYZED));
    doc.add(new Field("E", "X",
    Field.Store.NO, Field.Index.ANALYZED));
    writer.addDocument(doc);
    writer.close();

    IndexSearcher searcher = new IndexSearcher(idx);
    String field = "B";
    QueryParser parser = new QueryParser(field, aWrapper);
    Query query = parser.parse("[A TO Z]");
    System.out.println("Query: " + query.toString());

    TopDocCollector collector = new TopDocCollector(1);
    searcher.search(query, collector);
    int numHits = collector.getTotalHits();
    System.out.println(numHits + " total matching documents");

    if ( numHits > 0)
    {
    ScoreDoc[] hits = collector.topDocs().scoreDocs;
    doc = searcher.doc(hits[0].doc);
    System.out.println("A: " + doc.get("A"));
    System.out.println("B: " + doc.get("B"));
    System.out.println("C: " + doc.get("C"));
    System.out.println("D: " + doc.get("D"));
    System.out.println("E: " + doc.get("E"));
    }
    }
    catch (Exception e)
    {
    System.out.println(" caught a " + e.getClass() + "\n with message: "
    + e.getMessage());
    }
    }

    }

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ian Lea at Mar 6, 2009 at 11:52 am
    I don't know how QueryParser works behind the scenes but it looks like
    this is at least known behaviour. From the QueryParser javadocs:

    setLowercaseExpandedTerms

    public void setLowercaseExpandedTerms(boolean lowercaseExpandedTerms)

    Whether terms of wildcard, prefix, fuzzy and range queries are to
    be automatically lower-cased or not. Default is true.


    So you will need to call parser.setLowercaseExpandedTerms(false) in
    this case. Might be a problem if you are parsing a complex query with
    multiple range or other expanded queries, some of which you want
    preserved, some not. If things are that complex you'll be better off
    creating your queries via RangeQuery etc. It isn't hard and you can
    still use QueryParser where appropriate - add the resultant queries to
    a BooleanQuery or whatever.


    --
    Ian.

    On Fri, Mar 6, 2009 at 9:33 AM, John Marks wrote:
    Another problem.

    Using the PerFieldAnalyzerWrapper solves the case where I have a
    simple query, such as the following:
    Query query = parser.parse("X");
    or
    Query query = parser.parse("X OR Y");
    but if I use a more complex query like the following:
    Query query = parser.parse("[A TO Z]");
    then, again, the parser transforms the query to lowercase, as shown in
    the code below.

    Output is:
    Query: B:[a TO z]
    0 total matching documents
    while I would have expected to get
    Query: B:[A TO Z]
    ...

    This means that even the KeywordAnalyzer converts A and Z to lowercase
    in the range query?

    Should I report this as a bug?

    -John



    --- code ---
    package test;

    import org.apache.lucene.analysis.PerFieldAnalyzerWrapper;
    import org.apache.lucene.analysis.SimpleAnalyzer;
    import org.apache.lucene.analysis.KeywordAnalyzer;
    import org.apache.lucene.store.RAMDirectory;
    import org.apache.lucene.index.IndexWriter;
    import org.apache.lucene.search.IndexSearcher;
    import org.apache.lucene.search.Query;
    import org.apache.lucene.search.TopDocCollector;
    import org.apache.lucene.search.ScoreDoc;
    import org.apache.lucene.document.Document;
    import org.apache.lucene.document.Field;
    import org.apache.lucene.queryParser.QueryParser;



    public class Test
    {
    public static void main(String[] args)
    {
    try
    {
    RAMDirectory idx = new RAMDirectory();

    PerFieldAnalyzerWrapper aWrapper =
    new PerFieldAnalyzerWrapper(new SimpleAnalyzer());
    aWrapper.addAnalyzer("B", new KeywordAnalyzer());

    IndexWriter writer = new IndexWriter(idx, aWrapper, true,
    IndexWriter.MaxFieldLength.LIMITED);

    Document doc = new Document();
    doc.add(new Field("A", "X",
    Field.Store.YES, Field.Index.NO));
    doc.add(new Field("B", "X",
    Field.Store.YES, Field.Index.NOT_ANALYZED));
    doc.add(new Field("C", "X",
    Field.Store.YES, Field.Index.ANALYZED));
    doc.add(new Field("D", "X",
    Field.Store.NO, Field.Index.NOT_ANALYZED));
    doc.add(new Field("E", "X",
    Field.Store.NO, Field.Index.ANALYZED));
    writer.addDocument(doc);
    writer.close();

    IndexSearcher searcher = new IndexSearcher(idx);
    String field = "B";
    QueryParser parser = new QueryParser(field, aWrapper);
    Query query = parser.parse("[A TO Z]");
    System.out.println("Query: " + query.toString());

    TopDocCollector collector = new TopDocCollector(1);
    searcher.search(query, collector);
    int numHits = collector.getTotalHits();
    System.out.println(numHits + " total matching documents");

    if ( numHits > 0)
    {
    ScoreDoc[] hits = collector.topDocs().scoreDocs;
    doc = searcher.doc(hits[0].doc);
    System.out.println("A: " + doc.get("A"));
    System.out.println("B: " + doc.get("B"));
    System.out.println("C: " + doc.get("C"));
    System.out.println("D: " + doc.get("D"));
    System.out.println("E: " + doc.get("E"));
    }
    }
    catch (Exception e)
    {
    System.out.println(" caught a " + e.getClass() + "\n with message: "
    + e.getMessage());
    }
    }

    }

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMar 5, '09 at 3:18p
activeMar 6, '09 at 11:52a
posts5
users2
websitelucene.apache.org

2 users in discussion

John Marks: 3 posts Ian Lea: 2 posts

People

Translate

site design / logo © 2022 Grokbase