FAQ
Hi,

I indexed a large number of large documents, but I did not index the
document themselves.
Now I am interested in getting the vector (i.e.: the terms indexed and the
frequency) of that indexed but unstored field.
doc.getField (fieldname) returns null.
How can I get the data? It must be there, since it's a part of the index, or
am I wrong?

Would be grateful for a quick result (need to submit data for a conference
this weekend).
thanks,
Nir.
--
View this message in context: http://www.nabble.com/Get-the-TokenStream-of-an-indexed-but-unstored-field-tf4211430.html#a11980001
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

Search Discussions

  • Testn at Aug 3, 2007 at 11:37 am
    you can use IndexReader.getTermFreqVectors(int n) to get all terms and their
    frequencies. Make sure when you create an index, you choose option to store
    it by specifying Field.TermVector option.
    Check out http://www.cnlp.org/presentations/slides/AdvancedLuceneEU.pdf



    tierecke wrote:
    Hi,

    I indexed a large number of large documents, but I did not store the
    document themselves, just indexed them.
    Now I am interested in getting the vector (i.e.: the terms indexed and the
    frequency) of that indexed but unstored field.
    doc.getField (fieldname) returns null.
    How can I get the data? It must be there, since it's a part of the index,
    or am I wrong?

    Would be grateful for a quick result (need to submit data for a conference
    this weekend).
    thanks,
    Nir.
    --
    View this message in context: http://www.nabble.com/Get-the-terms-and-frequency-vector-of-an-indexed-but-unstored-field-tf4211430.html#a11981677
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Shailendra Mudgal at Nov 6, 2007 at 8:51 am
    Hi,
    If while indexing we have not set this flag, then is there any other way to
    get this info, i mean the TermFreqVector for a document ??


    On 8/3/07, testn wrote:


    you can use IndexReader.getTermFreqVectors(int n) to get all terms and
    their
    frequencies. Make sure when you create an index, you choose option to
    store
    it by specifying Field.TermVector option.
    Check out http://www.cnlp.org/presentations/slides/AdvancedLuceneEU.pdf



    tierecke wrote:
    Hi,

    I indexed a large number of large documents, but I did not store the
    document themselves, just indexed them.
    Now I am interested in getting the vector (i.e.: the terms indexed and the
    frequency) of that indexed but unstored field.
    doc.getField (fieldname) returns null.
    How can I get the data? It must be there, since it's a part of the index,
    or am I wrong?

    Would be grateful for a quick result (need to submit data for a
    conference
    this weekend).
    thanks,
    Nir.
    --
    View this message in context:
    http://www.nabble.com/Get-the-terms-and-frequency-vector-of-an-indexed-but-unstored-field-tf4211430.html#a11981677
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Karl Wettin at Nov 6, 2007 at 11:36 am

    6 nov 2007 kl. 09.51 skrev Shailendra Mudgal:

    Hi,
    If while indexing we have not set this flag, then is there any
    other way to
    get this info, i mean the TermFreqVector for a document ??
    See TermVectorAccessor in JIRA.

    http://issues.apache.org/jira/browse/LUCENE-1016

    The highligher also has some ad hoc code for extracting the data from
    the inverted index using TermEnum and TermDocs. It can however take
    quite some time.

    --
    karl


    On 8/3/07, testn wrote:


    you can use IndexReader.getTermFreqVectors(int n) to get all terms
    and
    their
    frequencies. Make sure when you create an index, you choose option to
    store
    it by specifying Field.TermVector option.
    Check out http://www.cnlp.org/presentations/slides/
    AdvancedLuceneEU.pdf



    tierecke wrote:
    Hi,

    I indexed a large number of large documents, but I did not store the
    document themselves, just indexed them.
    Now I am interested in getting the vector (i.e.: the terms
    indexed and the
    frequency) of that indexed but unstored field.
    doc.getField (fieldname) returns null.
    How can I get the data? It must be there, since it's a part of the index,
    or am I wrong?

    Would be grateful for a quick result (need to submit data for a
    conference
    this weekend).
    thanks,
    Nir.
    --
    View this message in context:
    http://www.nabble.com/Get-the-terms-and-frequency-vector-of-an-
    indexed-but-unstored-field-tf4211430.html#a11981677
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Tierecke at Aug 3, 2007 at 1:35 pm
    Thanks a lot, that works 100%!...
    Fortunately, I did use the flag to state that Lucene should store the term
    frequency vector. Otherwise, I'd have to index 77GB right now... :-)
    --
    View this message in context: http://www.nabble.com/Get-the-terms-and-frequency-vector-of-an-indexed-but-unstored-field-tf4211430.html#a11983495
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erick Erickson at Aug 3, 2007 at 2:15 pm
    <<<I indexed a large number of large documents, but I did not index the
    document themselves.>>>

    This is really confusing since it's self-contradictory. Could you
    post the lines where you do the document.add() for the fields in
    question?

    Best
    Erick
    On 8/3/07, tierecke wrote:


    Hi,

    I indexed a large number of large documents, but I did not index the
    document themselves.
    Now I am interested in getting the vector (i.e.: the terms indexed and the
    frequency) of that indexed but unstored field.
    doc.getField (fieldname) returns null.
    How can I get the data? It must be there, since it's a part of the index,
    or
    am I wrong?

    Would be grateful for a quick result (need to submit data for a conference
    this weekend).
    thanks,
    Nir.
    --
    View this message in context:
    http://www.nabble.com/Get-the-TokenStream-of-an-indexed-but-unstored-field-tf4211430.html#a11980001
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.
  • Tierecke at Aug 3, 2007 at 2:30 pm
    I fixed my question later. I meant I did not STORE the document themselves.
    Anyway - the issue is already solved, thank to testn.
    But there are new hard (for me) questions.
    Thanks a lot!

    Erick Erickson wrote:
    I indexed a large number of large documents, but I did not index the
    document themselves.

    This is really confusing since it's self-contradictory. Could you
    post the lines where you do the document.add() for the fields in
    question?

    Best
    Erick
    --
    View this message in context: http://www.nabble.com/Get-the-terms-and-frequency-vector-of-an-indexed-but-unstored-field-tf4211430.html#a11984434
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Kai Hu at Aug 6, 2007 at 1:40 am
    you use the flag to state the should store term? does it mean that you use the Field.Store.YES to store the large data? Can it reduce the data's size?

    -----邮件原件-----
    发件人: tierecke
    发送时间: 2007年8月3日 星期五 21:35
    收件人: java-user@lucene.apache.org
    主题: Re: Get the terms and frequency vector of an indexed but unstored field


    Thanks a lot, that works 100%!...
    Fortunately, I did use the flag to state that Lucene should store the term
    frequency vector. Otherwise, I'd have to index 77GB right now... :-)
    --
    View this message in context: http://www.nabble.com/Get-the-terms-and-frequency-vector-of-an-indexed-but-unstored-field-tf4211430.html#a11983495
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Dmitry at Aug 6, 2007 at 2:23 am
    What is advantage to use term
    frequency vector?

    thanks,
    DT
    www.ejinz.com Search News

    ----- Original Message -----
    From: "Kai Hu" <kai.hu@dusee.cn>
    To: <java-user@lucene.apache.org>
    Sent: Sunday, August 05, 2007 8:40 PM
    Subject: 答复: Get the terms and frequency vector of an indexed but unstored
    field


    you use the flag to state the should store term? does it mean that you use
    the Field.Store.YES to store the large data? Can it reduce the data's
    size?

    -----邮件原件-----
    发件人: tierecke
    发送时间: 2007年8月3日 星期五 21:35
    收件人: java-user@lucene.apache.org
    主题: Re: Get the terms and frequency vector of an indexed but unstored
    field


    Thanks a lot, that works 100%!...
    Fortunately, I did use the flag to state that Lucene should store the term
    frequency vector. Otherwise, I'd have to index 77GB right now... :-)
    --
    View this message in context:
    http://www.nabble.com/Get-the-terms-and-frequency-vector-of-an-indexed-but-unstored-field-tf4211430.html#a11983495
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedAug 3, '07 at 9:18a
activeNov 6, '07 at 11:36a
posts9
users7
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase