FAQ
Hello,

I try to use Lucene to make some experiments with the distribution of words
in documents. Using the TermPositionVector, one can compute some statistics
about word positions (mean, standard deviation, etc.).

It is possible to store such statistical values directly in the Lucene
index?

For example, I want to save the statistical values of each Term by the index
process, and get them later in query time.



Term => <docNum, freq, <X1, X2, ... ,Xn> >



Where <X1, X2, ... ,Xn> are some statistical values about the Term position
in docNum.





Should I modify the classes: IndexWriter.java and IndexReader.java?



Thank You

pgaleas

Search Discussions

  • Patricio Galeas at Jun 8, 2006 at 7:54 pm
    Hello,

    I try to use Lucene to make some experiments with the distribution of words
    in documents. Using the TermPositionVector, one can compute some statistics
    about word positions (mean, standard deviation, etc.).

    It is possible to store such statistical values directly in the Lucene
    index?

    For example, I want to save the statistical values of each Term (X1, X2, .,
    Xn) by the index process, and get them later in query time.



    Term => <docNum, freq, <X1, X2, ... ,Xn> >



    Should I modify the classes: IndexWriter.java and IndexReader.java?



    Thank You

    pgaleas
  • Grant Ingersoll at Jun 9, 2006 at 11:51 am
    Hi Patricio,

    As of now, I don't think this is possible. However, we are slowly but
    surely working on similar problems. Please feel free to add your two
    cents to http://wiki.apache.org/jakarta-lucene/FlexibleIndexing as we
    are considering several new ideas related to making indexing more
    flexible.

    For now, I guess you need to store them elsewhere. Or, if you have
    thoughts on what to do you could come up w/ an implementation and submit
    a patch.

    -Grant

    Patricio Galeas wrote:
    Hello,

    I try to use Lucene to make some experiments with the distribution of words
    in documents. Using the TermPositionVector, one can compute some statistics
    about word positions (mean, standard deviation, etc.).

    It is possible to store such statistical values directly in the Lucene
    index?

    For example, I want to save the statistical values of each Term (X1, X2, .,
    Xn) by the index process, and get them later in query time.



    Term => <docNum, freq, <X1, X2, ... ,Xn> >



    Should I modify the classes: IndexWriter.java and IndexReader.java?



    Thank You

    pgaleas



    --

    Grant Ingersoll
    Sr. Software Engineer
    Center for Natural Language Processing
    Syracuse University
    School of Information Studies
    335 Hinds Hall
    Syracuse, NY 13244

    http://www.cnlp.org
    Voice: 315-443-5484
    Fax: 315-443-6886


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 8, '06 at 7:47p
activeJun 9, '06 at 11:51a
posts3
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase