FAQ
I have a environment where we have indexed a DB with about 6mil entries
with Lucene, and each row has 25 columns. 20 cols have integer codes
used as filters (indexed/unstored), and the other 5 have (very) large
texts (also indexed/unstored). Currently the search I'm doing is like this:

Hits hits = searcher.search(query);
for (int i = 0; i < this.hits.length(); i++) {
Document doc = this.hits.doc(i);
String s = doc.get("fieldWanted");
// does everything with the result, etc
}

We are trying to reduce memory usage, however. Is it possible to return
a Document object with just the Fields I really need? In the example,
each Document have 25 fields, and I just need one... would this
theoretically make any difference?




--

Marcelo Frantz Schneider
SIC - TCO - Tecnologia em Engenharia do Conhecimento
DÍGITRO TECNOLOGIA
E-mail: marcelo.schneider@digitro.com.br
Site: www.digitro.com


--
Esta mensagem foi verificada pelo sistema de antivírus da Dígitro e
acredita-se estar livre de perigo.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Daan de Wit at Jun 11, 2008 at 11:44 am
    This is possible, you need to provider a FieldSelector to IndexReader#document(docId, selector). This won't work with Hits though, because Hits does not expose the document number, so you need to roll your own solution using TopDocs or HitCollector, for information see the discussion in this issue: https://issues.apache.org/jira/browse/LUCENE-1290

    Kind regards,
    Daan de Wit

    -----Original Message-----
    From: Marcelo Schneider
    Sent: Wednesday, June 11, 2008 13:29
    To: java-user@lucene.apache.org
    Subject: Is it possible to get only one Field from a Document?

    I have a environment where we have indexed a DB with about 6mil entries
    with Lucene, and each row has 25 columns. 20 cols have integer codes
    used as filters (indexed/unstored), and the other 5 have (very) large
    texts (also indexed/unstored). Currently the search I'm doing is like this:

    Hits hits = searcher.search(query);
    for (int i = 0; i < this.hits.length(); i++) {
    Document doc = this.hits.doc(i);
    String s = doc.get("fieldWanted");
    // does everything with the result, etc
    }

    We are trying to reduce memory usage, however. Is it possible to return
    a Document object with just the Fields I really need? In the example,
    each Document have 25 fields, and I just need one... would this
    theoretically make any difference?




    --

    Marcelo Frantz Schneider
    SIC - TCO - Tecnologia em Engenharia do Conhecimento
    DÍGITRO TECNOLOGIA
    E-mail: marcelo.schneider@digitro.com.br
    Site: www.digitro.com


    --
    Esta mensagem foi verificada pelo sistema de antivírus da Dígitro e
    acredita-se estar livre de perigo.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Daan de Wit at Jun 11, 2008 at 12:05 pm
    But I doubt this will solve your memory issue because nonstored fields are not read when retrieving the document.

    -----Original Message-----
    From: Daan de Wit
    Sent: Wednesday, June 11, 2008 13:44
    To: java-user@lucene.apache.org
    Subject: RE: Is it possible to get only one Field from a Document?

    This is possible, you need to provider a FieldSelector to IndexReader#document(docId, selector). This won't work with Hits though, because Hits does not expose the document number, so you need to roll your own solution using TopDocs or HitCollector, for information see the discussion in this issue: https://issues.apache.org/jira/browse/LUCENE-1290

    Kind regards,
    Daan de Wit

    -----Original Message-----
    From: Marcelo Schneider
    Sent: Wednesday, June 11, 2008 13:29
    To: java-user@lucene.apache.org
    Subject: Is it possible to get only one Field from a Document?

    I have a environment where we have indexed a DB with about 6mil entries
    with Lucene, and each row has 25 columns. 20 cols have integer codes
    used as filters (indexed/unstored), and the other 5 have (very) large
    texts (also indexed/unstored). Currently the search I'm doing is like this:

    Hits hits = searcher.search(query);
    for (int i = 0; i < this.hits.length(); i++) {
    Document doc = this.hits.doc(i);
    String s = doc.get("fieldWanted");
    // does everything with the result, etc
    }

    We are trying to reduce memory usage, however. Is it possible to return
    a Document object with just the Fields I really need? In the example,
    each Document have 25 fields, and I just need one... would this
    theoretically make any difference?




    --

    Marcelo Frantz Schneider
    SIC - TCO - Tecnologia em Engenharia do Conhecimento
    DÍGITRO TECNOLOGIA
    E-mail: marcelo.schneider@digitro.com.br
    Site: www.digitro.com


    --
    Esta mensagem foi verificada pelo sistema de antivírus da Dígitro e
    acredita-se estar livre de perigo.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Marcelo Schneider at Jun 11, 2008 at 1:08 pm

    Daan de Wit escreveu:
    But I doubt this will solve your memory issue because nonstored fields are not read when retrieving the document.
    Thanks for the fast reply Daan! Just for clearance, if I had all the
    code fields (filters) stored, then it would make any difference?

    -----Original Message-----
    From: Daan de Wit
    Sent: Wednesday, June 11, 2008 13:44
    To: java-user@lucene.apache.org
    Subject: RE: Is it possible to get only one Field from a Document?

    This is possible, you need to provider a FieldSelector to IndexReader#document(docId, selector). This won't work with Hits though, because Hits does not expose the document number, so you need to roll your own solution using TopDocs or HitCollector, for information see the discussion in this issue: https://issues.apache.org/jira/browse/LUCENE-1290

    Kind regards,
    Daan de Wit

    -----Original Message-----
    From: Marcelo Schneider
    Sent: Wednesday, June 11, 2008 13:29
    To: java-user@lucene.apache.org
    Subject: Is it possible to get only one Field from a Document?

    I have a environment where we have indexed a DB with about 6mil entries
    with Lucene, and each row has 25 columns. 20 cols have integer codes
    used as filters (indexed/unstored), and the other 5 have (very) large
    texts (also indexed/unstored). Currently the search I'm doing is like this:

    Hits hits = searcher.search(query);
    for (int i = 0; i < this.hits.length(); i++) {
    Document doc = this.hits.doc(i);
    String s = doc.get("fieldWanted");
    // does everything with the result, etc
    }

    We are trying to reduce memory usage, however. Is it possible to return
    a Document object with just the Fields I really need? In the example,
    each Document have 25 fields, and I just need one... would this
    theoretically make any difference?
    --
    Esta mensagem foi verificada pelo sistema de antivírus da Dígitro e
    acredita-se estar livre de perigo.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Daan de Wit at Jun 11, 2008 at 1:22 pm
    Yep, using a FieldSelector you can restrict the fields that will be loaded, you can also specify how fields should be loaded (normal, lazy or load the field, and then stop loading the document, i.e. skip other fields).

    -----Original Message-----
    From: Marcelo Schneider
    Sent: Wednesday, June 11, 2008 15:08
    To: java-user@lucene.apache.org
    Subject: Re: Is it possible to get only one Field from a Document?

    Daan de Wit escreveu:
    But I doubt this will solve your memory issue because nonstored fields are not read when retrieving the document.
    Thanks for the fast reply Daan! Just for clearance, if I had all the
    code fields (filters) stored, then it would make any difference?

    -----Original Message-----
    From: Daan de Wit
    Sent: Wednesday, June 11, 2008 13:44
    To: java-user@lucene.apache.org
    Subject: RE: Is it possible to get only one Field from a Document?

    This is possible, you need to provider a FieldSelector to IndexReader#document(docId, selector). This won't work with Hits though, because Hits does not expose the document number, so you need to roll your own solution using TopDocs or HitCollector, for information see the discussion in this issue: https://issues.apache.org/jira/browse/LUCENE-1290

    Kind regards,
    Daan de Wit

    -----Original Message-----
    From: Marcelo Schneider
    Sent: Wednesday, June 11, 2008 13:29
    To: java-user@lucene.apache.org
    Subject: Is it possible to get only one Field from a Document?

    I have a environment where we have indexed a DB with about 6mil entries
    with Lucene, and each row has 25 columns. 20 cols have integer codes
    used as filters (indexed/unstored), and the other 5 have (very) large
    texts (also indexed/unstored). Currently the search I'm doing is like this:

    Hits hits = searcher.search(query);
    for (int i = 0; i < this.hits.length(); i++) {
    Document doc = this.hits.doc(i);
    String s = doc.get("fieldWanted");
    // does everything with the result, etc
    }

    We are trying to reduce memory usage, however. Is it possible to return
    a Document object with just the Fields I really need? In the example,
    each Document have 25 fields, and I just need one... would this
    theoretically make any difference?
    --
    Esta mensagem foi verificada pelo sistema de antivírus da Dígitro e
    acredita-se estar livre de perigo.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Grant Ingersoll at Jun 11, 2008 at 12:23 pm
    For the record, Hits.id(int i) returns the document number. Note,
    though, that Hits is now deprecated, as pointed out by the link to
    1290, so going the TopDocs route is probably better anyway.

    -Grant
    On Jun 11, 2008, at 7:43 AM, Daan de Wit wrote:

    This is possible, you need to provider a FieldSelector to
    IndexReader#document(docId, selector). This won't work with Hits
    though, because Hits does not expose the document number, so you
    need to roll your own solution using TopDocs or HitCollector, for
    information see the discussion in this issue: https://issues.apache.org/jira/browse/LUCENE-1290

    Kind regards,
    Daan de Wit

    -----Original Message-----
    From: Marcelo Schneider
    Sent: Wednesday, June 11, 2008 13:29
    To: java-user@lucene.apache.org
    Subject: Is it possible to get only one Field from a Document?

    I have a environment where we have indexed a DB with about 6mil
    entries
    with Lucene, and each row has 25 columns. 20 cols have integer codes
    used as filters (indexed/unstored), and the other 5 have (very) large
    texts (also indexed/unstored). Currently the search I'm doing is
    like this:

    Hits hits = searcher.search(query);
    for (int i = 0; i < this.hits.length(); i++) {
    Document doc = this.hits.doc(i);
    String s = doc.get("fieldWanted");
    // does everything with the result, etc
    }

    We are trying to reduce memory usage, however. Is it possible to
    return
    a Document object with just the Fields I really need? In the example,
    each Document have 25 fields, and I just need one... would this
    theoretically make any difference?




    --

    Marcelo Frantz Schneider
    SIC - TCO - Tecnologia em Engenharia do Conhecimento
    DÍGITRO TECNOLOGIA
    E-mail: marcelo.schneider@digitro.com.br
    Site: www.digitro.com


    --
    Esta mensagem foi verificada pelo sistema de antivírus da Dígitro e
    acredita-se estar livre de perigo.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --------------------------
    Grant Ingersoll
    http://www.lucidimagination.com

    Lucene Helpful Hints:
    http://wiki.apache.org/lucene-java/BasicsOfPerformance
    http://wiki.apache.org/lucene-java/LuceneFAQ








    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Alex at Jun 11, 2008 at 2:42 pm
    if you have many terms across the fields, you might want to invoke
    IndexReader's setTermInfosIndexDivisor() method, which would
    reduce the in memory term infos used to lookup idf, but a (slightly)
    slower search.



    From: gsingers@apache.org
    To: java-user@lucene.apache.org
    Subject: Re: Is it possible to get only one Field from a Document?
    Date: Wed, 11 Jun 2008 08:22:22 -0400

    For the record, Hits.id(int i) returns the document number. Note,
    though, that Hits is now deprecated, as pointed out by the link to
    1290, so going the TopDocs route is probably better anyway.

    -Grant
    On Jun 11, 2008, at 7:43 AM, Daan de Wit wrote:

    This is possible, you need to provider a FieldSelector to
    IndexReader#document(docId, selector). This won't work with Hits
    though, because Hits does not expose the document number, so you
    need to roll your own solution using TopDocs or HitCollector, for
    information see the discussion in this issue: https://issues.apache.org/jira/browse/LUCENE-1290

    Kind regards,
    Daan de Wit

    -----Original Message-----
    From: Marcelo Schneider
    Sent: Wednesday, June 11, 2008 13:29
    To: java-user@lucene.apache.org
    Subject: Is it possible to get only one Field from a Document?

    I have a environment where we have indexed a DB with about 6mil
    entries
    with Lucene, and each row has 25 columns. 20 cols have integer codes
    used as filters (indexed/unstored), and the other 5 have (very) large
    texts (also indexed/unstored). Currently the search I'm doing is
    like this:

    Hits hits = searcher.search(query);
    for (int i = 0; i < this.hits.length(); i++) {
    Document doc = this.hits.doc(i);
    String s = doc.get("fieldWanted");
    // does everything with the result, etc
    }

    We are trying to reduce memory usage, however. Is it possible to
    return
    a Document object with just the Fields I really need? In the example,
    each Document have 25 fields, and I just need one... would this
    theoretically make any difference?




    --

    Marcelo Frantz Schneider
    SIC - TCO - Tecnologia em Engenharia do Conhecimento
    DÍGITRO TECNOLOGIA
    E-mail: marcelo.schneider@digitro.com.br
    Site: www.digitro.com


    --
    Esta mensagem foi verificada pelo sistema de antivírus da Dígitro e
    acredita-se estar livre de perigo.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    --------------------------
    Grant Ingersoll
    http://www.lucidimagination.com

    Lucene Helpful Hints:
    http://wiki.apache.org/lucene-java/BasicsOfPerformance
    http://wiki.apache.org/lucene-java/LuceneFAQ








    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    _________________________________________________________________
    隨身的 Windows Live Messenger 和 Hotmail,不限時地掌握資訊盡在指間 — Windows Live for Mobile
    http://www.msn.com.tw/msnmobile/

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 11, '08 at 11:30a
activeJun 11, '08 at 2:42p
posts7
users4
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase