FAQ
Hi Lucene Team,

If you know or if there is any way of splitting Lucene indexing segments to smaller segments of size less than 1 GB, can you please know me?
Here I am giving one index segments sizes, total size of index is 9.7 GB, here there are three Lucene files a) _12r7.prx b) _kft.prx c) _ls6.prx of size greater than 1 GB.
I want to split them to different pieces and want to reduce their size.

[[email protected] ~]# ls -lh /index/TP_0000000000000000499/
total 9.7G
-rw-r--r-- 1 appuser appuser 80M Jul 27 13:53 _12r7.fdt
-rw-r--r-- 1 appuser appuser 1.4M Jul 27 13:53 _12r7.fdx
-rw-r--r-- 1 appuser appuser 397 Jul 27 13:53 _12r7.fnm
-rw-r--r-- 1 appuser appuser 649M Jul 27 13:58 _12r7.frq
-rw-r--r-- 1 appuser appuser 3.9M Jul 27 13:58 _12r7.nrm
-rw-r--r-- 1 appuser appuser 2.2G Jul 27 13:58 _12r7.prx
-rw-r--r-- 1 appuser appuser 33 Jul 27 13:58 _12r7.stats
-rw-r--r-- 1 appuser appuser 334K Jul 27 13:58 _12r7.tii
-rw-r--r-- 1 appuser appuser 28M Jul 27 13:58 _12r7.tis
-rw-r--r-- 1 appuser appuser 24K Jul 27 14:44 _12ts.fdt
-rw-r--r-- 1 appuser appuser 400 Jul 27 14:44 _12ts.fdx
-rw-r--r-- 1 appuser appuser 361 Jul 27 14:44 _12ts.fnm
-rw-r--r-- 1 appuser appuser 90K Jul 27 14:44 _12ts.frq
-rw-r--r-- 1 appuser appuser 1.1K Jul 27 14:44 _12ts.nrm
-rw-r--r-- 1 appuser appuser 218K Jul 27 14:44 _12ts.prx
-rw-r--r-- 1 appuser appuser 25 Jul 27 14:44 _12ts.stats
-rw-r--r-- 1 appuser appuser 8.7K Jul 27 14:44 _12ts.tii
-rw-r--r-- 1 appuser appuser 656K Jul 27 14:44 _12ts.tis
-rw-r--r-- 1 appuser appuser 309K Jul 27 14:44 _12tt.fdt
-rw-r--r-- 1 appuser appuser 5.1K Jul 27 14:44 _12tt.fdx
-rw-r--r-- 1 appuser appuser 361 Jul 27 14:44 _12tt.fnm
-rw-r--r-- 1 appuser appuser 1.9M Jul 27 14:44 _12tt.frq
-rw-r--r-- 1 appuser appuser 14K Jul 27 14:44 _12tt.nrm
-rw-r--r-- 1 appuser appuser 3.7M Jul 27 14:44 _12tt.prx
-rw-r--r-- 1 appuser appuser 29 Jul 27 14:44 _12tt.stats
-rw-r--r-- 1 appuser appuser 38K Jul 27 14:44 _12tt.tii
-rw-r--r-- 1 appuser appuser 2.6M Jul 27 14:44 _12tt.tis
-rw-r--r-- 1 appuser appuser 62M Jul 15 19:51 _kft.fdt
-rw-r--r-- 1 appuser appuser 1.3M Jul 15 19:51 _kft.fdx
-rw-r--r-- 1 appuser appuser 397 Jul 15 19:51 _kft.fnm
-rw-r--r-- 1 appuser appuser 626M Jul 15 20:40 _kft.frq
-rw-r--r-- 1 appuser appuser 3.5M Jul 15 20:40 _kft.nrm
-rw-r--r-- 1 appuser appuser 2.6G Jul 15 20:40 _kft.prx
-rw-r--r-- 1 appuser appuser 31 Jul 15 20:40 _kft.stats
-rw-r--r-- 1 appuser appuser 20K Jul 19 23:01 _kft_sv.del
-rw-r--r-- 1 appuser appuser 295K Jul 15 20:40 _kft.tii
-rw-r--r-- 1 appuser appuser 25M Jul 15 20:40 _kft.tis
-rw-r--r-- 1 appuser appuser 6.6K Jul 19 18:32 _ls6_aj.del
-rw-r--r-- 1 appuser appuser 17M Jul 17 18:21 _ls6.fdt
-rw-r--r-- 1 appuser appuser 418K Jul 17 18:21 _ls6.fdx
-rw-r--r-- 1 appuser appuser 397 Jul 17 18:21 _ls6.fnm
-rw-r--r-- 1 appuser appuser 556M Jul 17 19:13 _ls6.frq
-rw-r--r-- 1 appuser appuser 1.2M Jul 17 19:13 _ls6.nrm
-rw-r--r-- 1 appuser appuser 2.9G Jul 17 19:13 _ls6.prx
-rw-r--r-- 1 appuser appuser 31 Jul 17 19:13 _ls6.stats
-rw-r--r-- 1 appuser appuser 155K Jul 17 19:13 _ls6.tii
-rw-r--r-- 1 appuser appuser 14M Jul 17 19:13 _ls6.tis
-rw-r--r-- 1 appuser appuser 20 Jul 27 14:44 segments.gen
-rw-r--r-- 1 appuser appuser 158 Jul 27 14:44 segments_pg5
[[email protected] ~]#

[[email protected] ~]# ls -lh /index/TP_0000000000000000499/ | grep G
total 9.7G
-rw-r--r-- 1 appuser appuser 2.2G Jul 27 13:58 _12r7.prx
-rw-r--r-- 1 appuser appuser 2.6G Jul 15 20:40 _kft.prx
-rw-r--r-- 1 appuser appuser 2.9G Jul 17 19:13 _ls6.prx
[[email protected] ~]#

Regards
Ravi

Search Discussions

  • Uwe Schindler at Jul 27, 2011 at 10:28 am
    Hi,

    See MultiPassIndexSplitter or PKIndexSplitter in Lucene contrib/misc.

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: [email protected]

    -----Original Message-----
    From: Gudi, Ravi Sankar
    Sent: Wednesday, July 27, 2011 11:26 AM
    To: [email protected]
    Subject: Is There a Way To Split The Lucene Index Segments To Samller Size
    Less Than 1 GB

    Hi Lucene Team,

    If you know or if there is any way of splitting Lucene indexing segments to
    smaller segments of size less than 1 GB, can you please know me?
    Here I am giving one index segments sizes, total size of index is 9.7 GB, here
    there are three Lucene files a) _12r7.prx b) _kft.prx c) _ls6.prx of size greater
    than 1 GB.
    I want to split them to different pieces and want to reduce their size.

    [[email protected] ~]# ls -lh /index/TP_0000000000000000499/
    total 9.7G
    -rw-r--r-- 1 appuser appuser 80M Jul 27 13:53 _12r7.fdt
    -rw-r--r-- 1 appuser appuser 1.4M Jul 27 13:53 _12r7.fdx
    -rw-r--r-- 1 appuser appuser 397 Jul 27 13:53 _12r7.fnm
    -rw-r--r-- 1 appuser appuser 649M Jul 27 13:58 _12r7.frq
    -rw-r--r-- 1 appuser appuser 3.9M Jul 27 13:58 _12r7.nrm
    -rw-r--r-- 1 appuser appuser 2.2G Jul 27 13:58 _12r7.prx
    -rw-r--r-- 1 appuser appuser 33 Jul 27 13:58 _12r7.stats
    -rw-r--r-- 1 appuser appuser 334K Jul 27 13:58 _12r7.tii
    -rw-r--r-- 1 appuser appuser 28M Jul 27 13:58 _12r7.tis
    -rw-r--r-- 1 appuser appuser 24K Jul 27 14:44 _12ts.fdt
    -rw-r--r-- 1 appuser appuser 400 Jul 27 14:44 _12ts.fdx
    -rw-r--r-- 1 appuser appuser 361 Jul 27 14:44 _12ts.fnm
    -rw-r--r-- 1 appuser appuser 90K Jul 27 14:44 _12ts.frq
    -rw-r--r-- 1 appuser appuser 1.1K Jul 27 14:44 _12ts.nrm
    -rw-r--r-- 1 appuser appuser 218K Jul 27 14:44 _12ts.prx
    -rw-r--r-- 1 appuser appuser 25 Jul 27 14:44 _12ts.stats
    -rw-r--r-- 1 appuser appuser 8.7K Jul 27 14:44 _12ts.tii
    -rw-r--r-- 1 appuser appuser 656K Jul 27 14:44 _12ts.tis
    -rw-r--r-- 1 appuser appuser 309K Jul 27 14:44 _12tt.fdt
    -rw-r--r-- 1 appuser appuser 5.1K Jul 27 14:44 _12tt.fdx
    -rw-r--r-- 1 appuser appuser 361 Jul 27 14:44 _12tt.fnm
    -rw-r--r-- 1 appuser appuser 1.9M Jul 27 14:44 _12tt.frq
    -rw-r--r-- 1 appuser appuser 14K Jul 27 14:44 _12tt.nrm
    -rw-r--r-- 1 appuser appuser 3.7M Jul 27 14:44 _12tt.prx
    -rw-r--r-- 1 appuser appuser 29 Jul 27 14:44 _12tt.stats
    -rw-r--r-- 1 appuser appuser 38K Jul 27 14:44 _12tt.tii
    -rw-r--r-- 1 appuser appuser 2.6M Jul 27 14:44 _12tt.tis
    -rw-r--r-- 1 appuser appuser 62M Jul 15 19:51 _kft.fdt
    -rw-r--r-- 1 appuser appuser 1.3M Jul 15 19:51 _kft.fdx
    -rw-r--r-- 1 appuser appuser 397 Jul 15 19:51 _kft.fnm
    -rw-r--r-- 1 appuser appuser 626M Jul 15 20:40 _kft.frq
    -rw-r--r-- 1 appuser appuser 3.5M Jul 15 20:40 _kft.nrm
    -rw-r--r-- 1 appuser appuser 2.6G Jul 15 20:40 _kft.prx
    -rw-r--r-- 1 appuser appuser 31 Jul 15 20:40 _kft.stats
    -rw-r--r-- 1 appuser appuser 20K Jul 19 23:01 _kft_sv.del
    -rw-r--r-- 1 appuser appuser 295K Jul 15 20:40 _kft.tii
    -rw-r--r-- 1 appuser appuser 25M Jul 15 20:40 _kft.tis
    -rw-r--r-- 1 appuser appuser 6.6K Jul 19 18:32 _ls6_aj.del
    -rw-r--r-- 1 appuser appuser 17M Jul 17 18:21 _ls6.fdt
    -rw-r--r-- 1 appuser appuser 418K Jul 17 18:21 _ls6.fdx
    -rw-r--r-- 1 appuser appuser 397 Jul 17 18:21 _ls6.fnm
    -rw-r--r-- 1 appuser appuser 556M Jul 17 19:13 _ls6.frq
    -rw-r--r-- 1 appuser appuser 1.2M Jul 17 19:13 _ls6.nrm
    -rw-r--r-- 1 appuser appuser 2.9G Jul 17 19:13 _ls6.prx
    -rw-r--r-- 1 appuser appuser 31 Jul 17 19:13 _ls6.stats
    -rw-r--r-- 1 appuser appuser 155K Jul 17 19:13 _ls6.tii
    -rw-r--r-- 1 appuser appuser 14M Jul 17 19:13 _ls6.tis
    -rw-r--r-- 1 appuser appuser 20 Jul 27 14:44 segments.gen
    -rw-r--r-- 1 appuser appuser 158 Jul 27 14:44 segments_pg5 [[email protected]
    172-1.oxford.com ~]#

    [[email protected] ~]# ls -lh /index/TP_0000000000000000499/ |
    grep G total 9.7G
    -rw-r--r-- 1 appuser appuser 2.2G Jul 27 13:58 _12r7.prx
    -rw-r--r-- 1 appuser appuser 2.6G Jul 15 20:40 _kft.prx
    -rw-r--r-- 1 appuser appuser 2.9G Jul 17 19:13 _ls6.prx [[email protected]
    1.oxford.com ~]#

    Regards
    Ravi

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Anshum at Jul 27, 2011 at 10:43 am
    Hi Ravi,
    You could reindex the data and have a sharding mechanism i.e. Open 3 index
    writers, read documents from source and add to the appropriate indexwriter
    to create a sharded index.
    Incase you'd want to avoid the re-indexing due to whatever reasons, you may
    create 3 copies of the index and delete documents basis some criteria
    leaving data/documents for only that shard e.g.
    Index contains docs with ids 1,2,3,4,5,6,7,8,9
    You could create 3 copies, fire a delete on the first one to retain only
    Docs 1,2 and 3.
    Similarly do it for the other 2 shards. Leaving you with 3 indexes.

    On the other hand, why do you want to split a 9G index? Is there a reason?
    performance issue? It'd be good if you could share the reason as the problem
    could be completely different.

    --
    Anshum Gupta
    http://ai-cafe.blogspot.com


    2011/7/27 Gudi, Ravi Sankar <[email protected]>
    Hi Lucene Team,

    If you know or if there is any way of splitting Lucene indexing segments to
    smaller segments of size less than 1 GB, can you please know me?
    Here I am giving one index segments sizes, total size of index is 9.7 GB,
    here there are three Lucene files a) _12r7.prx b) _kft.prx c) _ls6.prx of
    size greater than 1 GB.
    I want to split them to different pieces and want to reduce their size.

    [[email protected] ~]# ls -lh /index/TP_0000000000000000499/
    total 9.7G
    -rw-r--r-- 1 appuser appuser 80M Jul 27 13:53 _12r7.fdt
    -rw-r--r-- 1 appuser appuser 1.4M Jul 27 13:53 _12r7.fdx
    -rw-r--r-- 1 appuser appuser 397 Jul 27 13:53 _12r7.fnm
    -rw-r--r-- 1 appuser appuser 649M Jul 27 13:58 _12r7.frq
    -rw-r--r-- 1 appuser appuser 3.9M Jul 27 13:58 _12r7.nrm
    -rw-r--r-- 1 appuser appuser 2.2G Jul 27 13:58 _12r7.prx
    -rw-r--r-- 1 appuser appuser 33 Jul 27 13:58 _12r7.stats
    -rw-r--r-- 1 appuser appuser 334K Jul 27 13:58 _12r7.tii
    -rw-r--r-- 1 appuser appuser 28M Jul 27 13:58 _12r7.tis
    -rw-r--r-- 1 appuser appuser 24K Jul 27 14:44 _12ts.fdt
    -rw-r--r-- 1 appuser appuser 400 Jul 27 14:44 _12ts.fdx
    -rw-r--r-- 1 appuser appuser 361 Jul 27 14:44 _12ts.fnm
    -rw-r--r-- 1 appuser appuser 90K Jul 27 14:44 _12ts.frq
    -rw-r--r-- 1 appuser appuser 1.1K Jul 27 14:44 _12ts.nrm
    -rw-r--r-- 1 appuser appuser 218K Jul 27 14:44 _12ts.prx
    -rw-r--r-- 1 appuser appuser 25 Jul 27 14:44 _12ts.stats
    -rw-r--r-- 1 appuser appuser 8.7K Jul 27 14:44 _12ts.tii
    -rw-r--r-- 1 appuser appuser 656K Jul 27 14:44 _12ts.tis
    -rw-r--r-- 1 appuser appuser 309K Jul 27 14:44 _12tt.fdt
    -rw-r--r-- 1 appuser appuser 5.1K Jul 27 14:44 _12tt.fdx
    -rw-r--r-- 1 appuser appuser 361 Jul 27 14:44 _12tt.fnm
    -rw-r--r-- 1 appuser appuser 1.9M Jul 27 14:44 _12tt.frq
    -rw-r--r-- 1 appuser appuser 14K Jul 27 14:44 _12tt.nrm
    -rw-r--r-- 1 appuser appuser 3.7M Jul 27 14:44 _12tt.prx
    -rw-r--r-- 1 appuser appuser 29 Jul 27 14:44 _12tt.stats
    -rw-r--r-- 1 appuser appuser 38K Jul 27 14:44 _12tt.tii
    -rw-r--r-- 1 appuser appuser 2.6M Jul 27 14:44 _12tt.tis
    -rw-r--r-- 1 appuser appuser 62M Jul 15 19:51 _kft.fdt
    -rw-r--r-- 1 appuser appuser 1.3M Jul 15 19:51 _kft.fdx
    -rw-r--r-- 1 appuser appuser 397 Jul 15 19:51 _kft.fnm
    -rw-r--r-- 1 appuser appuser 626M Jul 15 20:40 _kft.frq
    -rw-r--r-- 1 appuser appuser 3.5M Jul 15 20:40 _kft.nrm
    -rw-r--r-- 1 appuser appuser 2.6G Jul 15 20:40 _kft.prx
    -rw-r--r-- 1 appuser appuser 31 Jul 15 20:40 _kft.stats
    -rw-r--r-- 1 appuser appuser 20K Jul 19 23:01 _kft_sv.del
    -rw-r--r-- 1 appuser appuser 295K Jul 15 20:40 _kft.tii
    -rw-r--r-- 1 appuser appuser 25M Jul 15 20:40 _kft.tis
    -rw-r--r-- 1 appuser appuser 6.6K Jul 19 18:32 _ls6_aj.del
    -rw-r--r-- 1 appuser appuser 17M Jul 17 18:21 _ls6.fdt
    -rw-r--r-- 1 appuser appuser 418K Jul 17 18:21 _ls6.fdx
    -rw-r--r-- 1 appuser appuser 397 Jul 17 18:21 _ls6.fnm
    -rw-r--r-- 1 appuser appuser 556M Jul 17 19:13 _ls6.frq
    -rw-r--r-- 1 appuser appuser 1.2M Jul 17 19:13 _ls6.nrm
    -rw-r--r-- 1 appuser appuser 2.9G Jul 17 19:13 _ls6.prx
    -rw-r--r-- 1 appuser appuser 31 Jul 17 19:13 _ls6.stats
    -rw-r--r-- 1 appuser appuser 155K Jul 17 19:13 _ls6.tii
    -rw-r--r-- 1 appuser appuser 14M Jul 17 19:13 _ls6.tis
    -rw-r--r-- 1 appuser appuser 20 Jul 27 14:44 segments.gen
    -rw-r--r-- 1 appuser appuser 158 Jul 27 14:44 segments_pg5
    [[email protected] ~]#

    [[email protected] ~]# ls -lh /index/TP_0000000000000000499/ |
    grep G
    total 9.7G
    -rw-r--r-- 1 appuser appuser 2.2G Jul 27 13:58 _12r7.prx
    -rw-r--r-- 1 appuser appuser 2.6G Jul 15 20:40 _kft.prx
    -rw-r--r-- 1 appuser appuser 2.9G Jul 17 19:13 _ls6.prx
    [[email protected] ~]#

    Regards
    Ravi
  • Mihai Caraman at Jul 27, 2011 at 12:54 pm
    smaller segments of size less than 1 GB, can you please know me?
    As i recall, the optimize mechanism can be told how many segments to create.
    So you can verify your index size and know how many segments to create
    before optimization.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJul 27, '11 at 10:07a
activeJul 27, '11 at 12:54p
posts4
users4
websitelucene.apache.org

People

Translate

site design / logo © 2023 Grokbase