FAQ
Hi,

I'm just starting to work with Lucene, and I guess that I learn best by
working with code, so I've started with the demos in the Lucene
distribution.

I got the IndexFiles.java and IndexHTML.java working, and also the
luceneweb.war is deployed to Tomcat.

I used IndexFiles.java to index some text files, and then used both the
SearchFiles.java and the luceneweb web app to do some testing.

One of the things that I noticed with the luceneweb web app is that when
I searched, the search results returned "Summary" of "null", so I added:

doc.add(new Field("summary", "FooFoo", Field.Store.YES,
Field.Index.NOT_ANALYZED));

to the IndexFiles.java, and ran it again.

I had expected that I'd then be able to do a search for something like
"summary:foofoo", but when I did that, I got no results.

I also tried SearchFiles.java, and again got no results.

I tried using Luke, and that is showing that the "summary" field is in
the indexes, so I'm wondering why I am not able to search on other
fields such as "summary", "path", etc.?

Can anyone explain what else I need to do, esp. in the luceneweb web
app, to be able to search these other fields?

Thanks!

Jim


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Matthew Hall at Jul 28, 2009 at 1:09 pm
    Restart tomcat.

    When the indexes are read in at initialization time they are a snapshot
    of what the indexes contained at that moment.

    Unless the demo specifically either closes its IndexReader and creates a
    new one, or calls IndexReader.reopen periodically (Which I don't
    remember it doing) you will not see updates in the web app until you
    restart.

    Matt

    Ohaya wrote:
    Hi,

    I'm just starting to work with Lucene, and I guess that I learn best
    by working with code, so I've started with the demos in the Lucene
    distribution.

    I got the IndexFiles.java and IndexHTML.java working, and also the
    luceneweb.war is deployed to Tomcat.

    I used IndexFiles.java to index some text files, and then used both
    the SearchFiles.java and the luceneweb web app to do some testing.

    One of the things that I noticed with the luceneweb web app is that
    when I searched, the search results returned "Summary" of "null", so I
    added:

    doc.add(new Field("summary", "FooFoo", Field.Store.YES,
    Field.Index.NOT_ANALYZED));

    to the IndexFiles.java, and ran it again.

    I had expected that I'd then be able to do a search for something like
    "summary:foofoo", but when I did that, I got no results.

    I also tried SearchFiles.java, and again got no results.

    I tried using Luke, and that is showing that the "summary" field is in
    the indexes, so I'm wondering why I am not able to search on other
    fields such as "summary", "path", etc.?

    Can anyone explain what else I need to do, esp. in the luceneweb web
    app, to be able to search these other fields?

    Thanks!

    Jim


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    Matthew Hall
    Software Engineer
    Mouse Genome Informatics
    mhall@informatics.jax.org
    (207) 288-6012



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Matthew Hall at Jul 28, 2009 at 1:11 pm
    Oh, also check to see which Analyzer the demo webapp/indexer is using.
    Its entirely possible the analyzer that has been chosen isn't
    lowercasing input, which could also cause you issues.

    I'd be willing to bet your issue lies in one of these two problems I've
    mentioned ^^

    Matt

    Matthew Hall wrote:
    Restart tomcat.

    When the indexes are read in at initialization time they are a
    snapshot of what the indexes contained at that moment.

    Unless the demo specifically either closes its IndexReader and creates
    a new one, or calls IndexReader.reopen periodically (Which I don't
    remember it doing) you will not see updates in the web app until you
    restart.

    Matt

    Ohaya wrote:
    Hi,

    I'm just starting to work with Lucene, and I guess that I learn best
    by working with code, so I've started with the demos in the Lucene
    distribution.

    I got the IndexFiles.java and IndexHTML.java working, and also the
    luceneweb.war is deployed to Tomcat.

    I used IndexFiles.java to index some text files, and then used both
    the SearchFiles.java and the luceneweb web app to do some testing.

    One of the things that I noticed with the luceneweb web app is that
    when I searched, the search results returned "Summary" of "null", so
    I added:

    doc.add(new Field("summary", "FooFoo", Field.Store.YES,
    Field.Index.NOT_ANALYZED));

    to the IndexFiles.java, and ran it again.

    I had expected that I'd then be able to do a search for something
    like "summary:foofoo", but when I did that, I got no results.

    I also tried SearchFiles.java, and again got no results.

    I tried using Luke, and that is showing that the "summary" field is
    in the indexes, so I'm wondering why I am not able to search on other
    fields such as "summary", "path", etc.?

    Can anyone explain what else I need to do, esp. in the luceneweb web
    app, to be able to search these other fields?

    Thanks!

    Jim


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    Matthew Hall
    Software Engineer
    Mouse Genome Informatics
    mhall@informatics.jax.org
    (207) 288-6012



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ian Lea at Jul 28, 2009 at 1:13 pm
    Hi


    Field.Index.NOT_ANALYZED means it will be stored as is i.e. "FooFoo"
    in your example, and if you search for "foofoo" it won't match. A
    search for "FooFoo" would, assuming that your search terms are not
    being lowercased.



    --
    Ian.


    On Tue, Jul 28, 2009 at 1:56 PM, Ohayawrote:
    Hi,

    I'm just starting to work with Lucene, and I guess that I learn best by
    working with code, so I've started with the demos in the Lucene
    distribution.

    I got the IndexFiles.java and IndexHTML.java working, and also the
    luceneweb.war is deployed to Tomcat.

    I used IndexFiles.java to index some text files, and then used both the
    SearchFiles.java and the luceneweb web app to do some testing.

    One of the things that I noticed with the luceneweb web app is that when I
    searched, the search results returned "Summary" of "null", so I added:

    doc.add(new Field("summary", "FooFoo", Field.Store.YES,
    Field.Index.NOT_ANALYZED));

    to the IndexFiles.java, and ran it again.

    I had expected that I'd then be able to do a search for something like
    "summary:foofoo", but when I did that, I got no results.

    I also tried SearchFiles.java, and again got no results.

    I tried using Luke, and that is showing that the "summary" field is in the
    indexes, so I'm wondering why I am not able to search on other fields such
    as "summary", "path", etc.?

    Can anyone explain what else I need to do, esp. in the luceneweb web app, to
    be able to search these other fields?

    Thanks!

    Jim


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ohaya at Jul 28, 2009 at 1:21 pm
    Ian and Matthew,

    I've tried "foofoo", "summary:foofoo", "FooFoo", and "summary:FooFoo". No results returned for any of those :(.

    Also, Matthew, I bounced Tomcat after running IndexFiles, so I don't think that's the problem either :(...

    I looked at the SearchFiles.java code, and it looks like it's literally using whatever query string I'm entering (ditto for luceneweb). Is there something with the query itself that needs to be modified to support searching on the fields other than the "contents" field (recall, I'm pretty sure that all those other fields are in the index, via Luke)?

    Jim



    ---- Ian Lea wrote:
    Hi


    Field.Index.NOT_ANALYZED means it will be stored as is i.e. "FooFoo"
    in your example, and if you search for "foofoo" it won't match. A
    search for "FooFoo" would, assuming that your search terms are not
    being lowercased.



    --
    Ian.


    On Tue, Jul 28, 2009 at 1:56 PM, Ohayawrote:
    Hi,

    I'm just starting to work with Lucene, and I guess that I learn best by
    working with code, so I've started with the demos in the Lucene
    distribution.

    I got the IndexFiles.java and IndexHTML.java working, and also the
    luceneweb.war is deployed to Tomcat.

    I used IndexFiles.java to index some text files, and then used both the
    SearchFiles.java and the luceneweb web app to do some testing.

    One of the things that I noticed with the luceneweb web app is that when I
    searched, the search results returned "Summary" of "null", so I added:

    doc.add(new Field("summary", "FooFoo", Field.Store.YES,
    Field.Index.NOT_ANALYZED));

    to the IndexFiles.java, and ran it again.

    I had expected that I'd then be able to do a search for something like
    "summary:foofoo", but when I did that, I got no results.

    I also tried SearchFiles.java, and again got no results.

    I tried using Luke, and that is showing that the "summary" field is in the
    indexes, so I'm wondering why I am not able to search on other fields such
    as "summary", "path", etc.?

    Can anyone explain what else I need to do, esp. in the luceneweb web app, to
    be able to search these other fields?

    Thanks!

    Jim


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ian Lea at Jul 28, 2009 at 2:30 pm
    Jim


    Glancing at SearchFiles.java I can see

    Analyzer analyzer = new StandardAnalyzer();
    ...
    QueryParser parser = new QueryParser(field, analyzer);
    ...
    Query query = parser.parse(line);

    so any query term you enter will be run through StandardAnalyzer which
    will, amongst other things, convert it to lowercase and will not match
    the indexed value of FooFoo. If you're just playing, it would
    probably be easiest to tell lucene to analyze the summary field e.g.

    doc.add(new Field("summary", "FooFoo", Field.Store.YES, Field.Index.ANALYZED));

    That will cause FooFoo to be indexed as foofoo and thus should be
    matched on search.


    --
    Ian.

    On Tue, Jul 28, 2009 at 2:22 PM, wrote:
    Ian and Matthew,

    I've tried "foofoo", "summary:foofoo", "FooFoo", and "summary:FooFoo".  No results returned for any of those :(.

    Also, Matthew, I bounced Tomcat after running IndexFiles, so I don't think that's the problem either :(...

    I looked at the SearchFiles.java code, and it looks like it's literally using whatever query string I'm entering (ditto for luceneweb).  Is there something with the query itself that needs to be modified to support searching on the fields other than the "contents" field (recall, I'm pretty sure that all those other fields are in the index, via Luke)?

    Jim



    ---- Ian Lea wrote:
    Hi


    Field.Index.NOT_ANALYZED means it will be stored as is i.e. "FooFoo"
    in your example, and if you search for "foofoo" it won't match.  A
    search for "FooFoo" would, assuming that your search terms are not
    being lowercased.



    --
    Ian.


    On Tue, Jul 28, 2009 at 1:56 PM, Ohayawrote:
    Hi,

    I'm just starting to work with Lucene, and I guess that I learn best by
    working with code, so I've started with the demos in the Lucene
    distribution.

    I got the IndexFiles.java and IndexHTML.java working, and also the
    luceneweb.war is deployed to Tomcat.

    I used IndexFiles.java to index some text files, and then used both the
    SearchFiles.java and the luceneweb web app to do some testing.

    One of the things that I noticed with the luceneweb web app is that when I
    searched, the search results returned "Summary" of "null", so I added:

    doc.add(new Field("summary", "FooFoo", Field.Store.YES,
    Field.Index.NOT_ANALYZED));

    to the IndexFiles.java, and ran it again.

    I had expected that I'd then be able to do a search for something like
    "summary:foofoo", but when I did that, I got no results.

    I also tried SearchFiles.java, and again got no results.

    I tried using Luke, and that is showing that the "summary" field is in the
    indexes, so I'm wondering why I am not able to search on other fields such
    as "summary", "path", etc.?

    Can anyone explain what else I need to do, esp. in the luceneweb web app, to
    be able to search these other fields?

    Thanks!

    Jim


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Matthew Hall at Jul 28, 2009 at 2:32 pm
    Yeah, Ian has it nailed on the head here.

    Can't believe I missed it in the initial writeup.

    Matt

    Ian Lea wrote:
    Jim


    Glancing at SearchFiles.java I can see

    Analyzer analyzer = new StandardAnalyzer();
    ...
    QueryParser parser = new QueryParser(field, analyzer);
    ...
    Query query = parser.parse(line);

    so any query term you enter will be run through StandardAnalyzer which
    will, amongst other things, convert it to lowercase and will not match
    the indexed value of FooFoo. If you're just playing, it would
    probably be easiest to tell lucene to analyze the summary field e.g.

    doc.add(new Field("summary", "FooFoo", Field.Store.YES, Field.Index.ANALYZED));

    That will cause FooFoo to be indexed as foofoo and thus should be
    matched on search.


    --
    Ian.

    On Tue, Jul 28, 2009 at 2:22 PM, wrote:

    Ian and Matthew,

    I've tried "foofoo", "summary:foofoo", "FooFoo", and "summary:FooFoo". No results returned for any of those :(.

    Also, Matthew, I bounced Tomcat after running IndexFiles, so I don't think that's the problem either :(...

    I looked at the SearchFiles.java code, and it looks like it's literally using whatever query string I'm entering (ditto for luceneweb). Is there something with the query itself that needs to be modified to support searching on the fields other than the "contents" field (recall, I'm pretty sure that all those other fields are in the index, via Luke)?

    Jim



    ---- Ian Lea wrote:
    Hi


    Field.Index.NOT_ANALYZED means it will be stored as is i.e. "FooFoo"
    in your example, and if you search for "foofoo" it won't match. A
    search for "FooFoo" would, assuming that your search terms are not
    being lowercased.



    --
    Ian.


    On Tue, Jul 28, 2009 at 1:56 PM, Ohayawrote:
    Hi,

    I'm just starting to work with Lucene, and I guess that I learn best by
    working with code, so I've started with the demos in the Lucene
    distribution.

    I got the IndexFiles.java and IndexHTML.java working, and also the
    luceneweb.war is deployed to Tomcat.

    I used IndexFiles.java to index some text files, and then used both the
    SearchFiles.java and the luceneweb web app to do some testing.

    One of the things that I noticed with the luceneweb web app is that when I
    searched, the search results returned "Summary" of "null", so I added:

    doc.add(new Field("summary", "FooFoo", Field.Store.YES,
    Field.Index.NOT_ANALYZED));

    to the IndexFiles.java, and ran it again.

    I had expected that I'd then be able to do a search for something like
    "summary:foofoo", but when I did that, I got no results.

    I also tried SearchFiles.java, and again got no results.

    I tried using Luke, and that is showing that the "summary" field is in the
    indexes, so I'm wondering why I am not able to search on other fields such
    as "summary", "path", etc.?

    Can anyone explain what else I need to do, esp. in the luceneweb web app, to
    be able to search these other fields?

    Thanks!

    Jim


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    Matthew Hall
    Software Engineer
    Mouse Genome Informatics
    mhall@informatics.jax.org
    (207) 288-6012



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ohaya at Jul 28, 2009 at 3:10 pm
    Hi Matthew and Ian,

    Thanks, I'll try that, but, in the meantime, I've been doing some reading (Lucene in Action), and on pg. 159, section 5.3, it discusses "Querying on multiple fields".

    I was just about to try to what's described in that section, i.e., using MultiFieldQueryParser.parse(), or, as another note on pg. 161 mentions, doing something like:

    doc.add(Field.Unstored("contents", contents + " " + summary);

    So, I guess I'm a little confused (happens a lot :)!): In the situation I'm talking about (starting with the Lucene demo and demo webapp, and trying to be able to index and search more than just the "contents" field), do I not need to use the MultiFieldQueryParser.parse() or do what they call "create a synthentic content"?

    Thanks,
    Jim


    ---- Matthew Hall wrote:
    Yeah, Ian has it nailed on the head here.

    Can't believe I missed it in the initial writeup.

    Matt

    Ian Lea wrote:
    Jim


    Glancing at SearchFiles.java I can see

    Analyzer analyzer = new StandardAnalyzer();
    ...
    QueryParser parser = new QueryParser(field, analyzer);
    ...
    Query query = parser.parse(line);

    so any query term you enter will be run through StandardAnalyzer which
    will, amongst other things, convert it to lowercase and will not match
    the indexed value of FooFoo. If you're just playing, it would
    probably be easiest to tell lucene to analyze the summary field e.g.

    doc.add(new Field("summary", "FooFoo", Field.Store.YES, Field.Index.ANALYZED));

    That will cause FooFoo to be indexed as foofoo and thus should be
    matched on search.


    --
    Ian.

    On Tue, Jul 28, 2009 at 2:22 PM, wrote:

    Ian and Matthew,

    I've tried "foofoo", "summary:foofoo", "FooFoo", and "summary:FooFoo". No results returned for any of those :(.

    Also, Matthew, I bounced Tomcat after running IndexFiles, so I don't think that's the problem either :(...

    I looked at the SearchFiles.java code, and it looks like it's literally using whatever query string I'm entering (ditto for luceneweb). Is there something with the query itself that needs to be modified to support searching on the fields other than the "contents" field (recall, I'm pretty sure that all those other fields are in the index, via Luke)?

    Jim



    ---- Ian Lea wrote:
    Hi


    Field.Index.NOT_ANALYZED means it will be stored as is i.e. "FooFoo"
    in your example, and if you search for "foofoo" it won't match. A
    search for "FooFoo" would, assuming that your search terms are not
    being lowercased.



    --
    Ian.


    On Tue, Jul 28, 2009 at 1:56 PM, Ohayawrote:
    Hi,

    I'm just starting to work with Lucene, and I guess that I learn best by
    working with code, so I've started with the demos in the Lucene
    distribution.

    I got the IndexFiles.java and IndexHTML.java working, and also the
    luceneweb.war is deployed to Tomcat.

    I used IndexFiles.java to index some text files, and then used both the
    SearchFiles.java and the luceneweb web app to do some testing.

    One of the things that I noticed with the luceneweb web app is that when I
    searched, the search results returned "Summary" of "null", so I added:

    doc.add(new Field("summary", "FooFoo", Field.Store.YES,
    Field.Index.NOT_ANALYZED));

    to the IndexFiles.java, and ran it again.

    I had expected that I'd then be able to do a search for something like
    "summary:foofoo", but when I did that, I got no results.

    I also tried SearchFiles.java, and again got no results.

    I tried using Luke, and that is showing that the "summary" field is in the
    indexes, so I'm wondering why I am not able to search on other fields such
    as "summary", "path", etc.?

    Can anyone explain what else I need to do, esp. in the luceneweb web app, to
    be able to search these other fields?

    Thanks!

    Jim


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    Matthew Hall
    Software Engineer
    Mouse Genome Informatics
    mhall@informatics.jax.org
    (207) 288-6012

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Matthew Hall at Jul 28, 2009 at 4:19 pm
    You can choose to do either,

    Having items in multiple fields allows you to apply field specific
    boosts, thusly making matches to certain fields more important to others.

    But, if that's not something that you care about the second technique is
    useful in that it vastly simplifies your index structure (And thusly
    your query structure)

    So, it depends on what you want to be able to do in the end. Do you
    envision doing something like being able to search by the summary and
    the contents at the same time, but weighing hits to the summary as a
    higher priority?
    If so, use multiple fields. If not, keep this first iteration in lucene
    simple, and compress everything down. Also please note that the + " " +
    in the example cited is important. That space will ensure that your
    contents and summary fields will be tokenized properly. (Just in case
    they are single words lets say).

    Matt



    ohaya@cox.net wrote:
    Hi Matthew and Ian,

    Thanks, I'll try that, but, in the meantime, I've been doing some reading (Lucene in Action), and on pg. 159, section 5.3, it discusses "Querying on multiple fields".

    I was just about to try to what's described in that section, i.e., using MultiFieldQueryParser.parse(), or, as another note on pg. 161 mentions, doing something like:

    doc.add(Field.Unstored("contents", contents + " " + summary);

    So, I guess I'm a little confused (happens a lot :)!): In the situation I'm talking about (starting with the Lucene demo and demo webapp, and trying to be able to index and search more than just the "contents" field), do I not need to use the MultiFieldQueryParser.parse() or do what they call "create a synthentic content"?

    Thanks,
    Jim


    ---- Matthew Hall wrote:
    Yeah, Ian has it nailed on the head here.

    Can't believe I missed it in the initial writeup.

    Matt

    Ian Lea wrote:
    Jim


    Glancing at SearchFiles.java I can see

    Analyzer analyzer = new StandardAnalyzer();
    ...
    QueryParser parser = new QueryParser(field, analyzer);
    ...
    Query query = parser.parse(line);

    so any query term you enter will be run through StandardAnalyzer which
    will, amongst other things, convert it to lowercase and will not match
    the indexed value of FooFoo. If you're just playing, it would
    probably be easiest to tell lucene to analyze the summary field e.g.

    doc.add(new Field("summary", "FooFoo", Field.Store.YES, Field.Index.ANALYZED));

    That will cause FooFoo to be indexed as foofoo and thus should be
    matched on search.


    --
    Ian.


    On Tue, Jul 28, 2009 at 2:22 PM, wrote:

    Ian and Matthew,

    I've tried "foofoo", "summary:foofoo", "FooFoo", and "summary:FooFoo". No results returned for any of those :(.

    Also, Matthew, I bounced Tomcat after running IndexFiles, so I don't think that's the problem either :(...

    I looked at the SearchFiles.java code, and it looks like it's literally using whatever query string I'm entering (ditto for luceneweb). Is there something with the query itself that needs to be modified to support searching on the fields other than the "contents" field (recall, I'm pretty sure that all those other fields are in the index, via Luke)?

    Jim



    ---- Ian Lea wrote:

    Hi


    Field.Index.NOT_ANALYZED means it will be stored as is i.e. "FooFoo"
    in your example, and if you search for "foofoo" it won't match. A
    search for "FooFoo" would, assuming that your search terms are not
    being lowercased.



    --
    Ian.


    On Tue, Jul 28, 2009 at 1:56 PM, Ohayawrote:

    Hi,

    I'm just starting to work with Lucene, and I guess that I learn best by
    working with code, so I've started with the demos in the Lucene
    distribution.

    I got the IndexFiles.java and IndexHTML.java working, and also the
    luceneweb.war is deployed to Tomcat.

    I used IndexFiles.java to index some text files, and then used both the
    SearchFiles.java and the luceneweb web app to do some testing.

    One of the things that I noticed with the luceneweb web app is that when I
    searched, the search results returned "Summary" of "null", so I added:

    doc.add(new Field("summary", "FooFoo", Field.Store.YES,
    Field.Index.NOT_ANALYZED));

    to the IndexFiles.java, and ran it again.

    I had expected that I'd then be able to do a search for something like
    "summary:foofoo", but when I did that, I got no results.

    I also tried SearchFiles.java, and again got no results.

    I tried using Luke, and that is showing that the "summary" field is in the
    indexes, so I'm wondering why I am not able to search on other fields such
    as "summary", "path", etc.?

    Can anyone explain what else I need to do, esp. in the luceneweb web app, to
    be able to search these other fields?

    Thanks!

    Jim


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    Matthew Hall
    Software Engineer
    Mouse Genome Informatics
    mhall@informatics.jax.org
    (207) 288-6012


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    Matthew Hall
    Software Engineer
    Mouse Genome Informatics
    mhall@informatics.jax.org
    (207) 288-6012



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ohaya at Jul 28, 2009 at 5:03 pm
    Matthew,

    I'll keep your comments in mind, but I'm still confused about something.

    I currently haven't changed much in the demo, other than adding that doc.add for "summary".

    With JUST that doc.add, having done my reading, I kind of expected NOT to be able to search on the "summary" at all, but it kind of seems like SOMETIMES, I am still getting responses when I search on something in "summary".

    Does that mean that Lucene will automatically do multi-field searching?

    Maybe I've been up too long, but it seems like, for example, when I search on "summary:foofoo" I am not getting a response, but, for example, if I search on:

    summary:foofoo AND contents:test1

    I get results in the search response.

    Since I haven't yet added the MultiField query, shouldn't it ONLY be searching on the "contents" field (because the "summary:foofo" should have been false, and because I am using an AND)?

    Like I said, maybe I've been staring at this too long, and need to do some more structured testing :)...

    Sorry.

    Later,
    Jim




    ---- Matthew Hall wrote:
    You can choose to do either,

    Having items in multiple fields allows you to apply field specific
    boosts, thusly making matches to certain fields more important to others.

    But, if that's not something that you care about the second technique is
    useful in that it vastly simplifies your index structure (And thusly
    your query structure)

    So, it depends on what you want to be able to do in the end. Do you
    envision doing something like being able to search by the summary and
    the contents at the same time, but weighing hits to the summary as a
    higher priority?
    If so, use multiple fields. If not, keep this first iteration in lucene
    simple, and compress everything down. Also please note that the + " " +
    in the example cited is important. That space will ensure that your
    contents and summary fields will be tokenized properly. (Just in case
    they are single words lets say).

    Matt



    ohaya@cox.net wrote:
    Hi Matthew and Ian,

    Thanks, I'll try that, but, in the meantime, I've been doing some reading (Lucene in Action), and on pg. 159, section 5.3, it discusses "Querying on multiple fields".

    I was just about to try to what's described in that section, i.e., using MultiFieldQueryParser.parse(), or, as another note on pg. 161 mentions, doing something like:

    doc.add(Field.Unstored("contents", contents + " " + summary);

    So, I guess I'm a little confused (happens a lot :)!): In the situation I'm talking about (starting with the Lucene demo and demo webapp, and trying to be able to index and search more than just the "contents" field), do I not need to use the MultiFieldQueryParser.parse() or do what they call "create a synthentic content"?

    Thanks,
    Jim


    ---- Matthew Hall wrote:
    Yeah, Ian has it nailed on the head here.

    Can't believe I missed it in the initial writeup.

    Matt

    Ian Lea wrote:
    Jim


    Glancing at SearchFiles.java I can see

    Analyzer analyzer = new StandardAnalyzer();
    ...
    QueryParser parser = new QueryParser(field, analyzer);
    ...
    Query query = parser.parse(line);

    so any query term you enter will be run through StandardAnalyzer which
    will, amongst other things, convert it to lowercase and will not match
    the indexed value of FooFoo. If you're just playing, it would
    probably be easiest to tell lucene to analyze the summary field e.g.

    doc.add(new Field("summary", "FooFoo", Field.Store.YES, Field.Index.ANALYZED));

    That will cause FooFoo to be indexed as foofoo and thus should be
    matched on search.


    --
    Ian.


    On Tue, Jul 28, 2009 at 2:22 PM, wrote:

    Ian and Matthew,

    I've tried "foofoo", "summary:foofoo", "FooFoo", and "summary:FooFoo". No results returned for any of those :(.

    Also, Matthew, I bounced Tomcat after running IndexFiles, so I don't think that's the problem either :(...

    I looked at the SearchFiles.java code, and it looks like it's literally using whatever query string I'm entering (ditto for luceneweb). Is there something with the query itself that needs to be modified to support searching on the fields other than the "contents" field (recall, I'm pretty sure that all those other fields are in the index, via Luke)?

    Jim



    ---- Ian Lea wrote:

    Hi


    Field.Index.NOT_ANALYZED means it will be stored as is i.e. "FooFoo"
    in your example, and if you search for "foofoo" it won't match. A
    search for "FooFoo" would, assuming that your search terms are not
    being lowercased.



    --
    Ian.


    On Tue, Jul 28, 2009 at 1:56 PM, Ohayawrote:

    Hi,

    I'm just starting to work with Lucene, and I guess that I learn best by
    working with code, so I've started with the demos in the Lucene
    distribution.

    I got the IndexFiles.java and IndexHTML.java working, and also the
    luceneweb.war is deployed to Tomcat.

    I used IndexFiles.java to index some text files, and then used both the
    SearchFiles.java and the luceneweb web app to do some testing.

    One of the things that I noticed with the luceneweb web app is that when I
    searched, the search results returned "Summary" of "null", so I added:

    doc.add(new Field("summary", "FooFoo", Field.Store.YES,
    Field.Index.NOT_ANALYZED));

    to the IndexFiles.java, and ran it again.

    I had expected that I'd then be able to do a search for something like
    "summary:foofoo", but when I did that, I got no results.

    I also tried SearchFiles.java, and again got no results.

    I tried using Luke, and that is showing that the "summary" field is in the
    indexes, so I'm wondering why I am not able to search on other fields such
    as "summary", "path", etc.?

    Can anyone explain what else I need to do, esp. in the luceneweb web app, to
    be able to search these other fields?

    Thanks!

    Jim


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    Matthew Hall
    Software Engineer
    Mouse Genome Informatics
    mhall@informatics.jax.org
    (207) 288-6012


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    Matthew Hall
    Software Engineer
    Mouse Genome Informatics
    mhall@informatics.jax.org
    (207) 288-6012



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Matthew Hall at Jul 28, 2009 at 5:10 pm
    Oh.. no.

    If you specifically include a fieldname: blah in your clause, you don't
    need a MultiFieldQueryParser.

    The purpose of the MFQP is to turn queries like this "blah"
    automatically into this "field1: blah" AND "field2: blah" AND "field3:
    blah" (Or OR if you set it up properly)

    When you setup the MFQP you specify what fields you want to have this
    behavior apply to, and can even give each field its own specific analyzer.

    So if in your index you have multiple fields, each of which was created
    with a different analyzer, you could search these effortlessly in your
    webapp using the MFQP.

    (If for example you have an exact_contents and a contents field, one
    where punctuation and capitalization matters, one where it does not)

    Hope that clears things up for you.

    Matt



    ohaya@cox.net wrote:
    Matthew,

    I'll keep your comments in mind, but I'm still confused about something.

    I currently haven't changed much in the demo, other than adding that doc.add for "summary".

    With JUST that doc.add, having done my reading, I kind of expected NOT to be able to search on the "summary" at all, but it kind of seems like SOMETIMES, I am still getting responses when I search on something in "summary".

    Does that mean that Lucene will automatically do multi-field searching?

    Maybe I've been up too long, but it seems like, for example, when I search on "summary:foofoo" I am not getting a response, but, for example, if I search on:

    summary:foofoo AND contents:test1

    I get results in the search response.

    Since I haven't yet added the MultiField query, shouldn't it ONLY be searching on the "contents" field (because the "summary:foofo" should have been false, and because I am using an AND)?

    Like I said, maybe I've been staring at this too long, and need to do some more structured testing :)...

    Sorry.

    Later,
    Jim




    ---- Matthew Hall wrote:
    You can choose to do either,

    Having items in multiple fields allows you to apply field specific
    boosts, thusly making matches to certain fields more important to others.

    But, if that's not something that you care about the second technique is
    useful in that it vastly simplifies your index structure (And thusly
    your query structure)

    So, it depends on what you want to be able to do in the end. Do you
    envision doing something like being able to search by the summary and
    the contents at the same time, but weighing hits to the summary as a
    higher priority?
    If so, use multiple fields. If not, keep this first iteration in lucene
    simple, and compress everything down. Also please note that the + " " +
    in the example cited is important. That space will ensure that your
    contents and summary fields will be tokenized properly. (Just in case
    they are single words lets say).

    Matt



    ohaya@cox.net wrote:
    Hi Matthew and Ian,

    Thanks, I'll try that, but, in the meantime, I've been doing some reading (Lucene in Action), and on pg. 159, section 5.3, it discusses "Querying on multiple fields".

    I was just about to try to what's described in that section, i.e., using MultiFieldQueryParser.parse(), or, as another note on pg. 161 mentions, doing something like:

    doc.add(Field.Unstored("contents", contents + " " + summary);

    So, I guess I'm a little confused (happens a lot :)!): In the situation I'm talking about (starting with the Lucene demo and demo webapp, and trying to be able to index and search more than just the "contents" field), do I not need to use the MultiFieldQueryParser.parse() or do what they call "create a synthentic content"?

    Thanks,
    Jim


    ---- Matthew Hall wrote:

    Yeah, Ian has it nailed on the head here.

    Can't believe I missed it in the initial writeup.

    Matt

    Ian Lea wrote:

    Jim


    Glancing at SearchFiles.java I can see

    Analyzer analyzer = new StandardAnalyzer();
    ...
    QueryParser parser = new QueryParser(field, analyzer);
    ...
    Query query = parser.parse(line);

    so any query term you enter will be run through StandardAnalyzer which
    will, amongst other things, convert it to lowercase and will not match
    the indexed value of FooFoo. If you're just playing, it would
    probably be easiest to tell lucene to analyze the summary field e.g.

    doc.add(new Field("summary", "FooFoo", Field.Store.YES, Field.Index.ANALYZED));

    That will cause FooFoo to be indexed as foofoo and thus should be
    matched on search.


    --
    Ian.


    On Tue, Jul 28, 2009 at 2:22 PM, wrote:


    Ian and Matthew,

    I've tried "foofoo", "summary:foofoo", "FooFoo", and "summary:FooFoo". No results returned for any of those :(.

    Also, Matthew, I bounced Tomcat after running IndexFiles, so I don't think that's the problem either :(...

    I looked at the SearchFiles.java code, and it looks like it's literally using whatever query string I'm entering (ditto for luceneweb). Is there something with the query itself that needs to be modified to support searching on the fields other than the "contents" field (recall, I'm pretty sure that all those other fields are in the index, via Luke)?

    Jim



    ---- Ian Lea wrote:


    Hi


    Field.Index.NOT_ANALYZED means it will be stored as is i.e. "FooFoo"
    in your example, and if you search for "foofoo" it won't match. A
    search for "FooFoo" would, assuming that your search terms are not
    being lowercased.



    --
    Ian.


    On Tue, Jul 28, 2009 at 1:56 PM, Ohayawrote:


    Hi,

    I'm just starting to work with Lucene, and I guess that I learn best by
    working with code, so I've started with the demos in the Lucene
    distribution.

    I got the IndexFiles.java and IndexHTML.java working, and also the
    luceneweb.war is deployed to Tomcat.

    I used IndexFiles.java to index some text files, and then used both the
    SearchFiles.java and the luceneweb web app to do some testing.

    One of the things that I noticed with the luceneweb web app is that when I
    searched, the search results returned "Summary" of "null", so I added:

    doc.add(new Field("summary", "FooFoo", Field.Store.YES,
    Field.Index.NOT_ANALYZED));

    to the IndexFiles.java, and ran it again.

    I had expected that I'd then be able to do a search for something like
    "summary:foofoo", but when I did that, I got no results.

    I also tried SearchFiles.java, and again got no results.

    I tried using Luke, and that is showing that the "summary" field is in the
    indexes, so I'm wondering why I am not able to search on other fields such
    as "summary", "path", etc.?

    Can anyone explain what else I need to do, esp. in the luceneweb web app, to
    be able to search these other fields?

    Thanks!

    Jim


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --
    Matthew Hall
    Software Engineer
    Mouse Genome Informatics
    mhall@informatics.jax.org
    (207) 288-6012


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    Matthew Hall
    Software Engineer
    Mouse Genome Informatics
    mhall@informatics.jax.org
    (207) 288-6012



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    Matthew Hall
    Software Engineer
    Mouse Genome Informatics
    mhall@informatics.jax.org
    (207) 288-6012



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ohaya at Jul 28, 2009 at 6:13 pm
    Matthew,

    Ok, thanks for the clarifications.

    When I have some quiet time, I'll try to re-do the tests I did earlier and post back if any questions.

    Thanks again,
    Jim

    ---- Matthew Hall wrote:
    Oh.. no.

    If you specifically include a fieldname: blah in your clause, you don't
    need a MultiFieldQueryParser.

    The purpose of the MFQP is to turn queries like this "blah"
    automatically into this "field1: blah" AND "field2: blah" AND "field3:
    blah" (Or OR if you set it up properly)

    When you setup the MFQP you specify what fields you want to have this
    behavior apply to, and can even give each field its own specific analyzer.

    So if in your index you have multiple fields, each of which was created
    with a different analyzer, you could search these effortlessly in your
    webapp using the MFQP.

    (If for example you have an exact_contents and a contents field, one
    where punctuation and capitalization matters, one where it does not)

    Hope that clears things up for you.

    Matt

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJul 28, '09 at 12:55p
activeJul 28, '09 at 6:13p
posts12
users3
websitelucene.apache.org

3 users in discussion

Ohaya: 5 posts Matthew Hall: 5 posts Ian Lea: 2 posts

People

Translate

site design / logo © 2022 Grokbase