FAQ
Hello there! I trying to query for a specific document on a efficient way.
My index is structured in a way where I have an id field which is a unique
key for the whole index. When I'm updating/removing a document I was
searching for my id using a Searcher and a TermQuery. But reading the list
it seems that its a bit of overhead, using a reader.termDocs(term) would be
faster.

Here's a piece of code:

private void deleteFromIndex(String id){
Term term = new Term("id",id);
IndexReader reader = readerManager.getIndexReader();
TermDocs termDocs = null;
try {
termDocs = reader.termDocs(term);
while(termDocs.next()){
int index = termDocs.doc();
if(reader.document(index).get("id").equals(id)){
reader.deleteDocument(index);
}
}
} catch (IOException e) {
e.printStackTrace();
}finally{
if(termDocs != null){
try {
termDocs.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}

problem is, reader is not returning any term. When I switch to query it
works. My documents have all being indexed using BrazilianAnalyzer, don't
know if that could be the reason.

Regards

--
"In a world without fences and walls, who needs Gates and Windows?"

Search Discussions

  • Erick Erickson at Jun 20, 2008 at 5:24 pm
    A couple of questions:
    1> I assume by "not returning any docs" you mean that you
    never get into your while loop. Is that true?
    2> I'm a little suspicious of the field labeled "id" and whether
    it's at all possible that this is getting confused with the
    internal Lucene doc ID. This is a wild shot in the dark
    influenced by the fact that I'm working with Groovy where
    property access is equivalent to get.....
    3> What is the form of your unique key? If it's all numeric I
    don't see a problem, but if it has any non-numeric
    characters in it, then your analyzer will probably
    lower-case the term wheres your term constructor
    won't. Which would also explain why searching
    would work.

    If none of this is remotely helpful, could you post a few
    examples of keys that work and those that don't?

    Best
    Erick
    On Fri, Jun 20, 2008 at 12:12 PM, Vinicius Carvalho wrote:

    Hello there! I trying to query for a specific document on a efficient way.
    My index is structured in a way where I have an id field which is a unique
    key for the whole index. When I'm updating/removing a document I was
    searching for my id using a Searcher and a TermQuery. But reading the list
    it seems that its a bit of overhead, using a reader.termDocs(term) would be
    faster.

    Here's a piece of code:

    private void deleteFromIndex(String id){
    Term term = new Term("id",id);
    IndexReader reader = readerManager.getIndexReader();
    TermDocs termDocs = null;
    try {
    termDocs = reader.termDocs(term);
    while(termDocs.next()){
    int index = termDocs.doc();
    if(reader.document(index).get("id").equals(id)){
    reader.deleteDocument(index);
    }
    }
    } catch (IOException e) {
    e.printStackTrace();
    }finally{
    if(termDocs != null){
    try {
    termDocs.close();
    } catch (IOException e) {
    e.printStackTrace();
    }
    }
    }
    }

    problem is, reader is not returning any term. When I switch to query it
    works. My documents have all being indexed using BrazilianAnalyzer, don't
    know if that could be the reason.

    Regards

    --
    "In a world without fences and walls, who needs Gates and Windows?"
  • Karl Wettin at Jun 21, 2008 at 5:09 pm

    20 jun 2008 kl. 18.12 skrev Vinicius Carvalho:


    Hello there! I trying to query for a specific document on a
    efficient way.
    Hi Vinicius,
    termDocs = reader.termDocs(term);
    while(termDocs.next()){
    int index = termDocs.doc();
    if(reader.document(index).get("id").equals(id)){
    reader.deleteDocument(index);
    }
    }
    Iterating documents and string comparing stored values is not very
    efficient. Use a query instead, something like this:

    BooleanQuery query = new BooleanQuery();
    query.add(new TermQuery(term), Occurs.MUST);
    query.add(new TermQuery(new Term("id", id), Occurs.MUST);
    searcher.search(query, new HitCollector() {
    public void collect(int doc, float score) {
    reader.deleteDocument(doc);
    }
    });


    karl

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Vinicius Carvalho at Jun 23, 2008 at 1:56 pm
    I'm sorry, the problem was with the way the id was being indexed. It was
    marked as tokenized, so I when searched for it's untokenized form I was not
    getting the doc, now everything works fine :)

    Regards
    On Sat, Jun 21, 2008 at 2:08 PM, Karl Wettin wrote:


    20 jun 2008 kl. 18.12 skrev Vinicius Carvalho:


    Hello there! I trying to query for a specific document on a efficient way.
    Hi Vinicius,

    termDocs = reader.termDocs(term);
    while(termDocs.next()){
    int index = termDocs.doc();
    if(reader.document(index).get("id").equals(id)){
    reader.deleteDocument(index);
    }
    }
    Iterating documents and string comparing stored values is not very
    efficient. Use a query instead, something like this:

    BooleanQuery query = new BooleanQuery();
    query.add(new TermQuery(term), Occurs.MUST);
    query.add(new TermQuery(new Term("id", id), Occurs.MUST);
    searcher.search(query, new HitCollector() {
    public void collect(int doc, float score) {
    reader.deleteDocument(doc);
    }
    });


    karl

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    "In a world without fences and walls, who needs Gates and Windows?"
  • Chris Hostetter at Jun 24, 2008 at 11:57 pm
    :
    : > termDocs = reader.termDocs(term);
    : > while(termDocs.next()){
    : > int index = termDocs.doc();
    : > if(reader.document(index).get("id").equals(id)){
    : > reader.deleteDocument(index);
    : > }
    : > }
    :
    : Iterating documents and string comparing stored values is not very efficient.
    : Use a query instead, something like this:

    more specificly: there is no reason at all to look at the stored value --
    just ensure that the *indexed* value is the "unique id" (which TermDocs
    will already ensure for you)

    : BooleanQuery query = new BooleanQuery();
    : query.add(new TermQuery(term), Occurs.MUST);
    : query.add(new TermQuery(new Term("id", id), Occurs.MUST);

    note: in the orriginal code, "term" was new Term("id", id) so this is
    unneded ... the goal was to iterate the doc(s) matching a unique(?) id.



    -Hoss


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 20, '08 at 4:13p
activeJun 24, '08 at 11:57p
posts5
users4
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase