FAQ
I have a question about empty fields. I want to run a query that will
search against a few particular fields for the query term but then also
also check to see if a two other fields have any value at all. i.e., I
want to search for a set records but don't want to return a record if
that record has blank first and last name fields. Any help would be
greatly appreciated.

Les

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Erick Erickson at May 10, 2007 at 1:03 pm
    You could create a Lucene Filter that had a bit for each document that
    had a first or last name and use that at query time to restrict your
    results appropriately. You could create this at startup time or at
    query time. See CachingWrapperFilter for a way to cache it.


    Another approach would be to add a dummy field to each document,
    something like HASFIRSTORLASTNAME. At index time, when
    you index a document, if it has a first or last name, put "yes" in the
    field. Otherwise, put "no".

    Then, at search time, add an +HASFIRSTORLASTNAME:yes to the
    query......

    You could add as many states to this field as you want.


    Erick

    On 5/10/07, Les Fletcher wrote:

    I have a question about empty fields. I want to run a query that will
    search against a few particular fields for the query term but then also
    also check to see if a two other fields have any value at all. i.e., I
    want to search for a set records but don't want to return a record if
    that record has blank first and last name fields. Any help would be
    greatly appreciated.

    Les

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Les Fletcher at May 10, 2007 at 7:18 pm
    I like the idea of the filter since I am making heavy use of filters for
    this particular query, but how would one go about constructing it
    efficiently at query time? All I can see is hacking around not being
    able to use the * as the first character.

    Les

    Erick Erickson wrote:
    You could create a Lucene Filter that had a bit for each document that
    had a first or last name and use that at query time to restrict your
    results appropriately. You could create this at startup time or at
    query time. See CachingWrapperFilter for a way to cache it.


    Another approach would be to add a dummy field to each document,
    something like HASFIRSTORLASTNAME. At index time, when
    you index a document, if it has a first or last name, put "yes" in the
    field. Otherwise, put "no".

    Then, at search time, add an +HASFIRSTORLASTNAME:yes to the
    query......

    You could add as many states to this field as you want.


    Erick

    On 5/10/07, Les Fletcher wrote:


    I have a question about empty fields. I want to run a query that will
    search against a few particular fields for the query term but then also
    also check to see if a two other fields have any value at all. i.e., I
    want to search for a set records but don't want to return a record if
    that record has blank first and last name fields. Any help would be
    greatly appreciated.

    Les

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Les Fletcher at May 10, 2007 at 8:30 pm
    Would a good solution be to insert a secret string into blank fields
    that represents blank. That way you could search for:

    firstname:(-Xd8fgrSjg) lastname:(-Xd8fgrSjg) some query string

    Les

    Les Fletcher wrote:
    I like the idea of the filter since I am making heavy use of filters
    for this particular query, but how would one go about constructing it
    efficiently at query time? All I can see is hacking around not being
    able to use the * as the first character.

    Les

    Erick Erickson wrote:
    You could create a Lucene Filter that had a bit for each document that
    had a first or last name and use that at query time to restrict your
    results appropriately. You could create this at startup time or at
    query time. See CachingWrapperFilter for a way to cache it.


    Another approach would be to add a dummy field to each document,
    something like HASFIRSTORLASTNAME. At index time, when
    you index a document, if it has a first or last name, put "yes" in the
    field. Otherwise, put "no".

    Then, at search time, add an +HASFIRSTORLASTNAME:yes to the
    query......

    You could add as many states to this field as you want.


    Erick

    On 5/10/07, Les Fletcher wrote:


    I have a question about empty fields. I want to run a query that will
    search against a few particular fields for the query term but then also
    also check to see if a two other fields have any value at all. i.e., I
    want to search for a set records but don't want to return a record if
    that record has blank first and last name fields. Any help would be
    greatly appreciated.

    Les

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Markharw00d at May 10, 2007 at 9:07 pm
    Here's a way to do it using the XML query parser in contrib....
    1) Create this query.xsl file (note use of cached double negative filter)

    <?xml version="1.0" encoding="ISO-8859-1"?>
    <xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/Document">
    <FilteredQuery>
    <Query>
    <UserQuery><xsl:value-of select="query"/></UserQuery>
    </Query>
    <Filter>
    <CachedFilter>
    <BooleanFilter>
    <Clause occurs="mustNot">
    <RangeFilter fieldName="surname" lowerTerm="a"
    upperTerm="z"/>
    </Clause>
    <Clause occurs="mustNot">
    <RangeFilter fieldName="forename" lowerTerm="a"
    upperTerm="z"/>
    </Clause>
    </BooleanFilter>
    </CachedFilter>
    </Filter>
    </FilteredQuery>
    </xsl:template>
    </xsl:stylesheet>

    2) Query as follows:
    //Setup test data
    Analyzer analyzer=new WhitespaceAnalyzer();
    RAMDirectory rd=new RAMDirectory();
    IndexWriter w=new IndexWriter(rd,new WhitespaceAnalyzer(),true);
    Document d=new Document();
    d.add(new Field("contents","foo 1- must not
    match",Field.Store.YES,Field.Index.TOKENIZED));
    d.add(new
    Field("surname","smith",Field.Store.YES,Field.Index.TOKENIZED));
    w.addDocument(d);

    d=new Document();
    d.add(new Field("contents","foo 2- must not
    match",Field.Store.YES,Field.Index.TOKENIZED));
    d.add(new
    Field("forename","fred",Field.Store.YES,Field.Index.TOKENIZED));
    w.addDocument(d);

    d=new Document();
    d.add(new Field("contents","foo 3- must not
    match",Field.Store.YES,Field.Index.TOKENIZED));
    d.add(new
    Field("forename","fred",Field.Store.YES,Field.Index.TOKENIZED));
    d.add(new
    Field("surname","smith",Field.Store.YES,Field.Index.TOKENIZED));
    w.addDocument(d);

    d=new Document();
    d.add(new Field("contents","foo 4- must
    match",Field.Store.YES,Field.Index.TOKENIZED));
    w.addDocument(d);
    w.close();

    IndexSearcher searcher=new IndexSearcher(rd);

    //one-off setup - store these
    QueryTemplateManager qtm=new QueryTemplateManager(
    TestXml.class.getResourceAsStream("query.xsl"));
    CorePlusExtensionsParser cp = new CorePlusExtensionsParser(analyzer,
    new QueryParser("contents",analyzer));

    //get the user form input
    String queryString="foo";
    Properties userInput=new Properties();
    userInput.setProperty("query",queryString);

    // Transform the user input into a Lucene XML query, then pass to
    parser
    Query
    q=cp.getQuery(qtm.getQueryAsDOM(userInput).getDocumentElement());

    Hits h = searcher.search(q);
    for (int i = 0; i < h.length(); i++)
    {
    d=h.doc(i);
    System.out.println(d.get("contents"));
    }

    Cheers
    Mark




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erick Erickson at May 10, 2007 at 9:05 pm
    I was going to suggest something about TermEnum/TermDocs, but
    upon reflection that doesn't work so well because you have to
    enumerate all the terms over all the docs for a field. Ouch.

    But one could combine the two approaches. Don't index
    any "special" values in your firstname or lastname fields.
    I suspect this will hurt you down the road...

    Instead, ONLY for those documents that DO have either a first
    or last name, index an orthogonal field, HASNAMES with
    a single value of "yes" or something. Now you can construct
    your filter efficiently by enumerating the HASNAMES terms/docs
    by only enumerating a single term/value.

    Depending upon how big your index is, you might be able to
    get away with the termenum/terndocs approach by constructing
    your filter at start-up time and caching it away somewhere...

    You *might* also be able to do something at startup time like
    for (each document in the index) {
    get the firstname and lastname. If both null, set your filter bit
    }

    If you use the lazy loading, you may be able to do this without
    loading all of every document....

    I'm curious to know what you settle on and how it works....

    Does this make sense in your application?

    Erick
    On 5/10/07, Les Fletcher wrote:

    I like the idea of the filter since I am making heavy use of filters for
    this particular query, but how would one go about constructing it
    efficiently at query time? All I can see is hacking around not being
    able to use the * as the first character.

    Les

    Erick Erickson wrote:
    You could create a Lucene Filter that had a bit for each document that
    had a first or last name and use that at query time to restrict your
    results appropriately. You could create this at startup time or at
    query time. See CachingWrapperFilter for a way to cache it.


    Another approach would be to add a dummy field to each document,
    something like HASFIRSTORLASTNAME. At index time, when
    you index a document, if it has a first or last name, put "yes" in the
    field. Otherwise, put "no".

    Then, at search time, add an +HASFIRSTORLASTNAME:yes to the
    query......

    You could add as many states to this field as you want.


    Erick

    On 5/10/07, Les Fletcher wrote:


    I have a question about empty fields. I want to run a query that will
    search against a few particular fields for the query term but then also
    also check to see if a two other fields have any value at all. i.e., I
    want to search for a set records but don't want to return a record if
    that record has blank first and last name fields. Any help would be
    greatly appreciated.

    Les

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Les Fletcher at May 10, 2007 at 9:43 pm
    Unfortuantely at the moment we don't make good use of lucene caching, so
    the setting up of the filter on startup doesn't really work for us at
    the moment. Maybe just a general flag field instead of a hasname field
    would work better and be more general. You could just fill this field
    with any flags that need to be set for the particular document. Each
    flag has a different unique tag that gets concatinated into the field
    then you can make use of your filters with:

    flagfield:(+hasfirstname +haslastname)

    and the like.

    How does this sound?

    Les

    Erick Erickson wrote:
    I was going to suggest something about TermEnum/TermDocs, but
    upon reflection that doesn't work so well because you have to
    enumerate all the terms over all the docs for a field. Ouch.

    But one could combine the two approaches. Don't index
    any "special" values in your firstname or lastname fields.
    I suspect this will hurt you down the road...

    Instead, ONLY for those documents that DO have either a first
    or last name, index an orthogonal field, HASNAMES with
    a single value of "yes" or something. Now you can construct
    your filter efficiently by enumerating the HASNAMES terms/docs
    by only enumerating a single term/value.

    Depending upon how big your index is, you might be able to
    get away with the termenum/terndocs approach by constructing
    your filter at start-up time and caching it away somewhere...

    You *might* also be able to do something at startup time like
    for (each document in the index) {
    get the firstname and lastname. If both null, set your filter bit
    }

    If you use the lazy loading, you may be able to do this without
    loading all of every document....

    I'm curious to know what you settle on and how it works....

    Does this make sense in your application?

    Erick
    On 5/10/07, Les Fletcher wrote:


    I like the idea of the filter since I am making heavy use of filters for
    this particular query, but how would one go about constructing it
    efficiently at query time? All I can see is hacking around not being
    able to use the * as the first character.

    Les

    Erick Erickson wrote:
    You could create a Lucene Filter that had a bit for each document that
    had a first or last name and use that at query time to restrict your
    results appropriately. You could create this at startup time or at
    query time. See CachingWrapperFilter for a way to cache it.


    Another approach would be to add a dummy field to each document,
    something like HASFIRSTORLASTNAME. At index time, when
    you index a document, if it has a first or last name, put "yes" in the
    field. Otherwise, put "no".

    Then, at search time, add an +HASFIRSTORLASTNAME:yes to the
    query......

    You could add as many states to this field as you want.


    Erick

    On 5/10/07, Les Fletcher wrote:


    I have a question about empty fields. I want to run a query that
    will
    search against a few particular fields for the query term but then
    also
    also check to see if a two other fields have any value at all.
    i.e., I
    want to search for a set records but don't want to return a record if
    that record has blank first and last name fields. Any help would be
    greatly appreciated.

    Les

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erick Erickson at May 11, 2007 at 12:02 am
    I've thought about a flag field, and I see no reason why that wouldn't
    work quite well, it all depends, I suppose, upon how ugly it would
    eventually get....

    But about caching, what does making a filter have to do with Lucene
    caching<G>? Sure, there exist Lucene filter caching classes, but
    there's no reason you could not implement, say, a singleton class
    whose first invocation filled the underlying filter out and just use that.
    in other parts of your program.

    Anyway, it sounds like you're well on the way to a solution.

    Good luck!
    Erick
    On 5/10/07, Les Fletcher wrote:

    Unfortuantely at the moment we don't make good use of lucene caching, so
    the setting up of the filter on startup doesn't really work for us at
    the moment. Maybe just a general flag field instead of a hasname field
    would work better and be more general. You could just fill this field
    with any flags that need to be set for the particular document. Each
    flag has a different unique tag that gets concatinated into the field
    then you can make use of your filters with:

    flagfield:(+hasfirstname +haslastname)

    and the like.

    How does this sound?

    Les

    Erick Erickson wrote:
    I was going to suggest something about TermEnum/TermDocs, but
    upon reflection that doesn't work so well because you have to
    enumerate all the terms over all the docs for a field. Ouch.

    But one could combine the two approaches. Don't index
    any "special" values in your firstname or lastname fields.
    I suspect this will hurt you down the road...

    Instead, ONLY for those documents that DO have either a first
    or last name, index an orthogonal field, HASNAMES with
    a single value of "yes" or something. Now you can construct
    your filter efficiently by enumerating the HASNAMES terms/docs
    by only enumerating a single term/value.

    Depending upon how big your index is, you might be able to
    get away with the termenum/terndocs approach by constructing
    your filter at start-up time and caching it away somewhere...

    You *might* also be able to do something at startup time like
    for (each document in the index) {
    get the firstname and lastname. If both null, set your filter bit
    }

    If you use the lazy loading, you may be able to do this without
    loading all of every document....

    I'm curious to know what you settle on and how it works....

    Does this make sense in your application?

    Erick
    On 5/10/07, Les Fletcher wrote:


    I like the idea of the filter since I am making heavy use of filters
    for
    this particular query, but how would one go about constructing it
    efficiently at query time? All I can see is hacking around not being
    able to use the * as the first character.

    Les

    Erick Erickson wrote:
    You could create a Lucene Filter that had a bit for each document
    that
    had a first or last name and use that at query time to restrict your
    results appropriately. You could create this at startup time or at
    query time. See CachingWrapperFilter for a way to cache it.


    Another approach would be to add a dummy field to each document,
    something like HASFIRSTORLASTNAME. At index time, when
    you index a document, if it has a first or last name, put "yes" in
    the
    field. Otherwise, put "no".

    Then, at search time, add an +HASFIRSTORLASTNAME:yes to the
    query......

    You could add as many states to this field as you want.


    Erick

    On 5/10/07, Les Fletcher wrote:


    I have a question about empty fields. I want to run a query that
    will
    search against a few particular fields for the query term but then
    also
    also check to see if a two other fields have any value at all.
    i.e., I
    want to search for a set records but don't want to return a record
    if
    that record has blank first and last name fields. Any help would be
    greatly appreciated.

    Les
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMay 10, '07 at 10:35a
activeMay 11, '07 at 12:02a
posts8
users3
websitelucene.apache.org

People

Translate

site design / logo © 2021 Grokbase