FAQ
Hi,

In the documents which contain the volunteer information :

Doc1 :
volunteer krish
volunteer john
volunteer Raj
...

Doc2 :
volunteer krish
volunteer Raj
volunteer Ganesh

Doc3 :
volunteer krish
volunteer Raj

The documents having ONLY krish and Raj as the volunteers need to be found.
As in above snapshot of docs, the doc3 alone qualifies in the result and not
the first two. Using boolean query I could find the three documents.

Can you please suggest me some pointers as to how this can be achieved.

-- Regards
Ba3
--
View this message in context: http://www.nabble.com/Exclusion-search-tp24600949p24600949.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Shai Erera at Jul 22, 2009 at 7:20 am
    Maybe add to each doc a field numVolunteers and then constraint the query to
    vol:krish and vol:raj and numvol:2 (something like that)?
    On Wed, Jul 22, 2009 at 9:49 AM, ba3 wrote:


    Hi,

    In the documents which contain the volunteer information :

    Doc1 :
    volunteer krish
    volunteer john
    volunteer Raj
    ...

    Doc2 :
    volunteer krish
    volunteer Raj
    volunteer Ganesh

    Doc3 :
    volunteer krish
    volunteer Raj

    The documents having ONLY krish and Raj as the volunteers need to be found.
    As in above snapshot of docs, the doc3 alone qualifies in the result and
    not
    the first two. Using boolean query I could find the three documents.

    Can you please suggest me some pointers as to how this can be achieved.

    -- Regards
    Ba3
    --
    View this message in context:
    http://www.nabble.com/Exclusion-search-tp24600949p24600949.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ba3 at Jul 22, 2009 at 8:46 am
    Maybe add to each doc a field numVolunteers and then constraint the query
    to
    vol:krish and vol:raj and numvol:2 (something like that)?
    Thanks for the reply. But.,
    The number of documents run into few thousands. Hence editing them is not an
    option.
    Is there any other ways to solve this scenario.

    -- Rgds
    Ba3
    --
    View this message in context: http://www.nabble.com/Exclusion-search-tp24600949p24602308.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Shai Erera at Jul 22, 2009 at 10:12 am
    Perhaps I misunderstood something, but how do you update a document?

    I mean, if a document contains vol:a, vol:b and vol:c and then you want to
    add vol:d to it, don't you remove the document and add it back?

    If that's what you do, then you can also update the numvols field, right?

    Or .. you mean you already have an index w/ all those documents indexed, and
    you want to search it now? If you cannot rebuild it, may I suggest the
    following - created another index and add documents to it in the same order
    as they were added to the current index. To each document add a 'numvols'
    field. Then use a ParallelReader to search over the two indices in parallel
    with the query I gave before. The two indices should look like this:

    Index 1 Index 2
    ------------------- ------------------------
    Doc: vol1, vol2, vol3 Doc: numvols:3
    Doc: vol1, vol4, vol6, vol7 Doc: numvols: 4
    Doc: vol5 Doc: numvols: 1
    Doc: vol3, vol8 Doc: numvols: 2

    It should work if your index doesn't have deletes. If it has, consider
    optimizing it or call expungeDeletes.

    If your scenario is different, then perhaps try to explain it more
    accurately.

    Shai
    On Wed, Jul 22, 2009 at 11:47 AM, ba3 wrote:

    Maybe add to each doc a field numVolunteers and then constraint the
    query
    to
    vol:krish and vol:raj and numvol:2 (something like that)?
    Thanks for the reply. But.,
    The number of documents run into few thousands. Hence editing them is not
    an
    option.
    Is there any other ways to solve this scenario.

    -- Rgds
    Ba3
    --
    View this message in context:
    http://www.nabble.com/Exclusion-search-tp24600949p24602308.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ba3 at Jul 22, 2009 at 10:48 am
    Yes, the documents were already indexed and the documents do not get updated.

    Maintaining an alternate index is a nice solution. Will try it out.
    Thanks for the pointer.

    If there is a solution which can use the same index it would be great!

    --Rgds
    Ba3


    Perhaps I misunderstood something, but how do you update a document?

    I mean, if a document contains vol:a, vol:b and vol:c and then you want to
    add vol:d to it, don't you remove the document and add it back?

    If that's what you do, then you can also update the numvols field, right?

    Or .. you mean you already have an index w/ all those documents indexed, and
    you want to search it now? If you cannot rebuild it, may I suggest the
    following - created another index and add documents to it in the same order
    as they were added to the current index. To each document add a 'numvols'
    field. Then use a ParallelReader to search over the two indices in parallel
    with the query I gave before. The two indices should look like this:

    Index 1 Index 2
    ------------------- ------------------------
    Doc: vol1, vol2, vol3 Doc: numvols:3
    Doc: vol1, vol4, vol6, vol7 Doc: numvols: 4
    Doc: vol5 Doc: numvols: 1
    Doc: vol3, vol8 Doc: numvols: 2

    It should work if your index doesn't have deletes. If it has, consider
    optimizing it or call expungeDeletes.

    If your scenario is different, then perhaps try to explain it more
    accurately.

    Shai


    --
    View this message in context: http://www.nabble.com/Exclusion-search-tp24600949p24604067.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Steven A Rowe at Jul 22, 2009 at 4:15 pm
    If the number of volunteers is small enough, you could exclude all others in your query, e.g.:

    All volunteers: a, b, c, d, e, f

    Query to include documents containing only volunteers a and b:

    +vol:a +vol:b -vol:c -vol:d -vol:e -vol:f

    Steve
    On 7/22/2009 at 6:49 AM, ba3 wrote:
    Yes, the documents were already indexed and the documents do not get
    updated.

    Maintaining an alternate index is a nice solution. Will try it out.
    Thanks for the pointer.

    If there is a solution which can use the same index it would be great!

    --Rgds
    Ba3


    Perhaps I misunderstood something, but how do you update a document?

    I mean, if a document contains vol:a, vol:b and vol:c and then you want
    to
    add vol:d to it, don't you remove the document and add it back?

    If that's what you do, then you can also update the numvols field,
    right?

    Or .. you mean you already have an index w/ all those documents
    indexed, and
    you want to search it now? If you cannot rebuild it, may I suggest the
    following - created another index and add documents to it in the same
    order
    as they were added to the current index. To each document add a
    'numvols'
    field. Then use a ParallelReader to search over the two indices in
    parallel
    with the query I gave before. The two indices should look like this:

    Index 1 Index 2
    ------------------- ------------------------
    Doc: vol1, vol2, vol3 Doc: numvols:3
    Doc: vol1, vol4, vol6, vol7 Doc: numvols: 4
    Doc: vol5 Doc: numvols: 1
    Doc: vol3, vol8 Doc: numvols: 2

    It should work if your index doesn't have deletes. If it has, consider
    optimizing it or call expungeDeletes.

    If your scenario is different, then perhaps try to explain it more
    accurately.

    Shai

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Phil Whelan at Jul 22, 2009 at 4:37 pm
    If there are only have a few thousand documents, and the number of
    results quite small is this a case where post-search filtering can be
    done?

    I have not done anything like this myself with Lucene, so is this a
    bad idea? If not, what would be the best way to do this?
    org.apache.lucene.search.Filter?

    Thanks,
    Phil

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erick Erickson at Jul 22, 2009 at 8:05 pm
    Can you re-index the documents? Because it's much simpler tojust count the
    number of volunteers *as you add fields to the
    doc to index it* and then just add the count field after you're
    done parsing the document. Your corpus is small, so this
    shouldn't take very long.....

    Or I completely misunderstand.

    Erick
    On Wed, Jul 22, 2009 at 6:48 AM, ba3 wrote:


    Yes, the documents were already indexed and the documents do not get
    updated.

    Maintaining an alternate index is a nice solution. Will try it out.
    Thanks for the pointer.

    If there is a solution which can use the same index it would be great!

    --Rgds
    Ba3


    Perhaps I misunderstood something, but how do you update a document?

    I mean, if a document contains vol:a, vol:b and vol:c and then you want to
    add vol:d to it, don't you remove the document and add it back?

    If that's what you do, then you can also update the numvols field, right?

    Or .. you mean you already have an index w/ all those documents indexed,
    and
    you want to search it now? If you cannot rebuild it, may I suggest the
    following - created another index and add documents to it in the same order
    as they were added to the current index. To each document add a 'numvols'
    field. Then use a ParallelReader to search over the two indices in parallel
    with the query I gave before. The two indices should look like this:

    Index 1 Index 2
    ------------------- ------------------------
    Doc: vol1, vol2, vol3 Doc: numvols:3
    Doc: vol1, vol4, vol6, vol7 Doc: numvols: 4
    Doc: vol5 Doc: numvols: 1
    Doc: vol3, vol8 Doc: numvols: 2

    It should work if your index doesn't have deletes. If it has, consider
    optimizing it or call expungeDeletes.

    If your scenario is different, then perhaps try to explain it more
    accurately.

    Shai


    --
    View this message in context:
    http://www.nabble.com/Exclusion-search-tp24600949p24604067.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ba3 at Jul 24, 2009 at 6:21 am
    Thanks to all for the replies.
    I thought of a mechanism to achieve the results without reindexing or
    updating the documents.

    search1 = boolean query of (vol krish + vol Raj)
    search2 = boolean query(vol - (vol krish and vol Raj))

    Removing the results of search2 from search1 gave the desired result of
    documents having only krish and raj as volunteers.

    -- Regards
    Ba3



    Erick Erickson wrote:
    Can you re-index the documents? Because it's much simpler tojust count the
    number of volunteers *as you add fields to the
    doc to index it* and then just add the count field after you're
    done parsing the document. Your corpus is small, so this
    shouldn't take very long.....

    Or I completely misunderstand.

    Erick
    On Wed, Jul 22, 2009 at 6:48 AM, ba3 wrote:


    Yes, the documents were already indexed and the documents do not get
    updated.

    Maintaining an alternate index is a nice solution. Will try it out.
    Thanks for the pointer.

    If there is a solution which can use the same index it would be great!

    --Rgds
    Ba3


    Perhaps I misunderstood something, but how do you update a document?

    I mean, if a document contains vol:a, vol:b and vol:c and then you want
    to
    add vol:d to it, don't you remove the document and add it back?

    If that's what you do, then you can also update the numvols field, right?

    Or .. you mean you already have an index w/ all those documents indexed,
    and
    you want to search it now? If you cannot rebuild it, may I suggest the
    following - created another index and add documents to it in the same
    order
    as they were added to the current index. To each document add a 'numvols'
    field. Then use a ParallelReader to search over the two indices in
    parallel
    with the query I gave before. The two indices should look like this:

    Index 1 Index 2
    ------------------- ------------------------
    Doc: vol1, vol2, vol3 Doc: numvols:3
    Doc: vol1, vol4, vol6, vol7 Doc: numvols: 4
    Doc: vol5 Doc: numvols: 1
    Doc: vol3, vol8 Doc: numvols: 2

    It should work if your index doesn't have deletes. If it has, consider
    optimizing it or call expungeDeletes.

    If your scenario is different, then perhaps try to explain it more
    accurately.

    Shai


    --
    View this message in context:
    http://www.nabble.com/Exclusion-search-tp24600949p24604067.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --
    View this message in context: http://www.nabble.com/Exclusion-search-tp24600949p24639243.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJul 22, '09 at 6:49a
activeJul 24, '09 at 6:21a
posts9
users5
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase