FAQ
Dear list,

some questions about the names of the index files.
With an older Lucene/Solr 4.x version from trunk my index looks like:
_2t1.fdt
_2t1.fdx
_2t1.fnm
_2t1.frq
_2t1.nrm
_2t1.prx
_2t1.tii
_2t1.tis
segments_2
segments.gen

With a most recent version from trunk it looks like:
_3a9.fdt
_3a9.fdx
_3a9.fnm
_3a9_0.frq
_3a9.nrm
_3a9_0.prx
_3a9_0.tii
_3a9_0.tis
segments_4
segments.gen

Why is there an "_0" at some files?
Is it from Lucene or from Solr or a fault in my system?

I also didn't find any information at
http://lucene.apache.org/java/3_0_3/fileformats.html

Both indexes are optimized, any idea?

Regards, Bernd

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Simon Willnauer at Jan 3, 2011 at 12:52 pm
    Hey Bernd,

    On Mon, Jan 3, 2011 at 1:35 PM, Bernd Fehling
    wrote:
    Dear list,

    some questions about the names of the index files.
    With an older Lucene/Solr 4.x version from trunk my index looks like:
    _2t1.fdt
    _2t1.fdx
    _2t1.fnm
    _2t1.frq
    _2t1.nrm
    _2t1.prx
    _2t1.tii
    _2t1.tis
    segments_2
    segments.gen

    With a most recent version from trunk it looks like:
    _3a9.fdt
    _3a9.fdx
    _3a9.fnm
    _3a9_0.frq
    _3a9.nrm
    _3a9_0.prx
    _3a9_0.tii
    _3a9_0.tis
    segments_4
    segments.gen

    Why is there an "_0" at some files?
    Is it from Lucene or from Solr or a fault in my system?
    lucene 4.0 as you might know has the ability to plug in a Codec which
    has full control over how postings are stored, which format is used
    and what files are written. Each Field within a segment can have its
    own codec ie. field "foo" can have "Standard" and Field "bar" uses
    "Pulsing" for instance. In such a case, since Pulsing is just a
    wrapper around Standard - Codec, both codecs try to write the same set
    of files per segment. For that reason we introduced a codec ID valid
    per segment. Its is really an ordinal build from the set of codecs
    used per segment. this ordinal is used to build the filenames, in your
    case you only have one codec (I suppose its Standard - Codec) with the
    ord "0". This ordinal is used for all files that codec writes. The
    files without that ordinal (nrm, fdt, fdx and fnrm) are written by the
    IndexWriter directly but the functionality behind it might be exposed
    via codec sooner or later.

    So, afterall this is a lucene functionality and your system is just
    fine doing the right thing!
    I also didn't find any information at
    http://lucene.apache.org/java/3_0_3/fileformats.html
    Its not a 3.3 feature - codecs are introduced in 4.0 aka. trunk.

    simon
    Both indexes are optimized, any idea?

    Regards, Bernd

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Bernd Fehling at Jan 3, 2011 at 1:22 pm
    Hi Simon,

    thanks a lot for your good explanation.

    Best wishes,
    Bernd


    Am 03.01.2011 13:51, schrieb Simon Willnauer:
    Hey Bernd,

    On Mon, Jan 3, 2011 at 1:35 PM, Bernd Fehling
    wrote:
    Dear list,

    some questions about the names of the index files.
    With an older Lucene/Solr 4.x version from trunk my index looks like:
    _2t1.fdt
    _2t1.fdx
    _2t1.fnm
    _2t1.frq
    _2t1.nrm
    _2t1.prx
    _2t1.tii
    _2t1.tis
    segments_2
    segments.gen

    With a most recent version from trunk it looks like:
    _3a9.fdt
    _3a9.fdx
    _3a9.fnm
    _3a9_0.frq
    _3a9.nrm
    _3a9_0.prx
    _3a9_0.tii
    _3a9_0.tis
    segments_4
    segments.gen

    Why is there an "_0" at some files?
    Is it from Lucene or from Solr or a fault in my system?
    lucene 4.0 as you might know has the ability to plug in a Codec which
    has full control over how postings are stored, which format is used
    and what files are written. Each Field within a segment can have its
    own codec ie. field "foo" can have "Standard" and Field "bar" uses
    "Pulsing" for instance. In such a case, since Pulsing is just a
    wrapper around Standard - Codec, both codecs try to write the same set
    of files per segment. For that reason we introduced a codec ID valid
    per segment. Its is really an ordinal build from the set of codecs
    used per segment. this ordinal is used to build the filenames, in your
    case you only have one codec (I suppose its Standard - Codec) with the
    ord "0". This ordinal is used for all files that codec writes. The
    files without that ordinal (nrm, fdt, fdx and fnrm) are written by the
    IndexWriter directly but the functionality behind it might be exposed
    via codec sooner or later.

    So, afterall this is a lucene functionality and your system is just
    fine doing the right thing!
    I also didn't find any information at
    http://lucene.apache.org/java/3_0_3/fileformats.html
    Its not a 3.3 feature - codecs are introduced in 4.0 aka. trunk.

    simon
    Both indexes are optimized, any idea?

    Regards, Bernd
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJan 3, '11 at 12:35p
activeJan 3, '11 at 1:22p
posts3
users2
websitelucene.apache.org

2 users in discussion

Bernd Fehling: 2 posts Simon Willnauer: 1 post

People

Translate

site design / logo © 2022 Grokbase