FAQ
Hi,

I am currently working on a project about private information retrieval and
I need to have an inverted index file in txt format as follows:

Term t freq t Inverted list for t
-------------------------------------------------------------------------
and 1 <6, 0.159>
big 2 <2, 0.148> <3, 0.088>
dark 1 <6, 0.079>
.
.
.
.

here the <number1, number2> pairs are indicating: number1: doc ID, where
term t exist with a rank of number2.

I have created an index from 5492 txt files, however the index is composed
of different files and most of the data is not in the text format.

could somebody guide me to achieve this?

Thank you

Sahin.

Search Discussions

  • Uwe Schindler at Sep 21, 2010 at 4:30 pm
    Hi,

    Retrieve a TermEnum and iterate it. By that you get all terms and can
    retrieve the docFreq, which is the second column in your table. Finally for
    each term you position the TermDocs enum on this term to get all document
    ids. Read docs of IndexReader/TermEnum/TermDocs about this.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Sahin Buyrukbilen
    Sent: Tuesday, September 21, 2010 9:12 AM
    To: java-user@lucene.apache.org
    Subject: How to export lucene index to a simple text file?

    Hi,

    I am currently working on a project about private information retrieval and I
    need to have an inverted index file in txt format as follows:

    Term t freq t Inverted list for t
    -------------------------------------------------------------------------
    and 1 <6, 0.159>
    big 2 <2, 0.148> <3, 0.088>
    dark 1 <6, 0.079>
    .
    .
    .
    .

    here the <number1, number2> pairs are indicating: number1: doc ID, where
    term t exist with a rank of number2.

    I have created an index from 5492 txt files, however the index is composed of
    different files and most of the data is not in the text format.

    could somebody guide me to achieve this?

    Thank you

    Sahin.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Sahin Buyrukbilen at Sep 21, 2010 at 4:33 pm
    Thank you Uwe, I will read the docs and try to do it, however do you have an
    example code? I need because I am not very familiar with Java.

    Thank you.

    Sahin
    On Tue, Sep 21, 2010 at 12:29 PM, Uwe Schindler wrote:

    Hi,

    Retrieve a TermEnum and iterate it. By that you get all terms and can
    retrieve the docFreq, which is the second column in your table. Finally for
    each term you position the TermDocs enum on this term to get all document
    ids. Read docs of IndexReader/TermEnum/TermDocs about this.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Sahin Buyrukbilen
    Sent: Tuesday, September 21, 2010 9:12 AM
    To: java-user@lucene.apache.org
    Subject: How to export lucene index to a simple text file?

    Hi,

    I am currently working on a project about private information retrieval and I
    need to have an inverted index file in txt format as follows:

    Term t freq t Inverted list for t
    -------------------------------------------------------------------------
    and 1 <6, 0.159>
    big 2 <2, 0.148> <3, 0.088>
    dark 1 <6, 0.079>
    .
    .
    .
    .

    here the <number1, number2> pairs are indicating: number1: doc ID, where
    term t exist with a rank of number2.

    I have created an index from 5492 txt files, however the index is
    composed
    of
    different files and most of the data is not in the text format.

    could somebody guide me to achieve this?

    Thank you

    Sahin.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Lance Norskog at Sep 22, 2010 at 1:27 am
    The Lucene CheckIndex program opens an index and walks all of the data
    structures. It is a good start for you.

    Sahin Buyrukbilen wrote:
    Thank you Uwe, I will read the docs and try to do it, however do you have an
    example code? I need because I am not very familiar with Java.

    Thank you.

    Sahin

    On Tue, Sep 21, 2010 at 12:29 PM, Uwe Schindlerwrote:

    Hi,

    Retrieve a TermEnum and iterate it. By that you get all terms and can
    retrieve the docFreq, which is the second column in your table. Finally for
    each term you position the TermDocs enum on this term to get all document
    ids. Read docs of IndexReader/TermEnum/TermDocs about this.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Sahin Buyrukbilen
    Sent: Tuesday, September 21, 2010 9:12 AM
    To: java-user@lucene.apache.org
    Subject: How to export lucene index to a simple text file?

    Hi,

    I am currently working on a project about private information retrieval and I
    need to have an inverted index file in txt format as follows:

    Term t freq t Inverted list for t
    -------------------------------------------------------------------------
    and 1<6, 0.159>
    big 2<2, 0.148> <3, 0.088>
    dark 1<6, 0.079>
    .
    .
    .
    .

    here the<number1, number2> pairs are indicating: number1: doc ID, where
    term t exist with a rank of number2.

    I have created an index from 5492 txt files, however the index is
    composed
    of
    different files and most of the data is not in the text format.

    could somebody guide me to achieve this?

    Thank you

    Sahin.
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Sep 22, 2010 at 9:31 am
    Saving the index in text format would also be a fun codec (in 4.0) to create :)

    Ie, the codec would be read/write. The performance wouldn't be great,
    but it'd be neat for debugging, teaching, transparency purposes...

    Mike
    On Tue, Sep 21, 2010 at 9:26 PM, Lance Norskog wrote:
    The Lucene CheckIndex program opens an index and walks all of the data
    structures. It is a good start for you.

    Sahin Buyrukbilen wrote:
    Thank you Uwe, I will read the docs and try to do it, however do you have
    an
    example code? I need because I am not very familiar with Java.

    Thank you.

    Sahin

    On Tue, Sep 21, 2010 at 12:29 PM, Uwe Schindlerwrote:

    Hi,

    Retrieve a TermEnum and iterate it. By that you get all terms and can
    retrieve the docFreq, which is the second column in your table. Finally
    for
    each term you position the TermDocs enum on this term to get all document
    ids. Read docs of IndexReader/TermEnum/TermDocs about this.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Sahin Buyrukbilen
    Sent: Tuesday, September 21, 2010 9:12 AM
    To: java-user@lucene.apache.org
    Subject: How to export lucene index to a simple text file?

    Hi,

    I am currently working on a project about private information retrieval and I
    need to have an inverted index file in txt format as follows:

    Term t    freq t      Inverted list for t

    -------------------------------------------------------------------------
    and          1<6, 0.159>
    big           2<2, 0.148>  <3, 0.088>
    dark         1<6, 0.079>
    .
    .
    .
    .

    here the<number1, number2>  pairs are indicating: number1: doc ID, where
    term t exist with a rank of number2.

    I have created an index from 5492 txt files, however the index is
    composed
    of
    different files and most of the data is not in the text format.

    could somebody guide me to achieve this?

    Thank you

    Sahin.
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Adriano Crestani at Sep 22, 2010 at 9:41 am
    Saving the index in text format would also be a fun codec (in 4.0) to create :)
    A codec like that would be welcome :)

    On Wed, Sep 22, 2010 at 5:31 AM, Michael McCandless
    wrote:
    Saving the index in text format would also be a fun codec (in 4.0) to create :)

    Ie, the codec would be read/write.  The performance wouldn't be great,
    but it'd be neat for debugging, teaching, transparency purposes...

    Mike
    On Tue, Sep 21, 2010 at 9:26 PM, Lance Norskog wrote:
    The Lucene CheckIndex program opens an index and walks all of the data
    structures. It is a good start for you.

    Sahin Buyrukbilen wrote:
    Thank you Uwe, I will read the docs and try to do it, however do you have
    an
    example code? I need because I am not very familiar with Java.

    Thank you.

    Sahin

    On Tue, Sep 21, 2010 at 12:29 PM, Uwe Schindlerwrote:

    Hi,

    Retrieve a TermEnum and iterate it. By that you get all terms and can
    retrieve the docFreq, which is the second column in your table. Finally
    for
    each term you position the TermDocs enum on this term to get all document
    ids. Read docs of IndexReader/TermEnum/TermDocs about this.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Sahin Buyrukbilen
    Sent: Tuesday, September 21, 2010 9:12 AM
    To: java-user@lucene.apache.org
    Subject: How to export lucene index to a simple text file?

    Hi,

    I am currently working on a project about private information retrieval and I
    need to have an inverted index file in txt format as follows:

    Term t    freq t      Inverted list for t

    -------------------------------------------------------------------------
    and          1<6, 0.159>
    big           2<2, 0.148>  <3, 0.088>
    dark         1<6, 0.079>
    .
    .
    .
    .

    here the<number1, number2>  pairs are indicating: number1: doc ID, where
    term t exist with a rank of number2.

    I have created an index from 5492 txt files, however the index is
    composed
    of
    different files and most of the data is not in the text format.

    could somebody guide me to achieve this?

    Thank you

    Sahin.
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedSep 21, '10 at 4:12p
activeSep 22, '10 at 9:41a
posts6
users5
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase