FAQ
Hello,

I'm using Lucene2.4. I'm developing a web application that using Lucene (via
compass) to do the searches.
I'm intending to deploy the application in Google App Engine
(http://code.google.com/appengine/), which limits files length to be smaller
than 10MB. I've read about the various policies supported by Lucene to limit
the file sizes, but on matter which policy I used and which parameters, the
index files still grew to be lot more the 10MB. Looking at the code, I've
managed to limit the cfs files (predicting the file size in
CompoundFileWriter before closing the file) - I guess that will degrade
performance, but it's OK for now. But now the FDT files are becoming huge
(about 60MB) and I cant identifiy a way to limit those files.

Is there some built-in and correct way to limit these files length? If no,
can someone direct me please how should I tweak the source code to achieve
that?

Thanks for any help.
--
View this message in context: http://www.nabble.com/How-to-avoid-huge-index-files-tp25347505p25347505.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Dvora at Sep 10, 2009 at 6:33 am
    Hello again,

    Can someone please comment on that, whether what I'm looking is possible or
    not?


    Dvora wrote:
    Hello,

    I'm using Lucene2.4. I'm developing a web application that using Lucene
    (via compass) to do the searches.
    I'm intending to deploy the application in Google App Engine
    (http://code.google.com/appengine/), which limits files length to be
    smaller than 10MB. I've read about the various policies supported by
    Lucene to limit the file sizes, but on matter which policy I used and
    which parameters, the index files still grew to be lot more the 10MB.
    Looking at the code, I've managed to limit the cfs files (predicting the
    file size in CompoundFileWriter before closing the file) - I guess that
    will degrade performance, but it's OK for now. But now the FDT files are
    becoming huge (about 60MB) and I cant identifiy a way to limit those
    files.

    Is there some built-in and correct way to limit these files length? If no,
    can someone direct me please how should I tweak the source code to achieve
    that?

    Thanks for any help.
    --
    View this message in context: http://www.nabble.com/How-to-avoid-huge-index-files-tp25347505p25378056.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Sep 10, 2009 at 9:07 am
    First, you need to limit the size of segments initially created by
    IndexWriter due to newly added documents. Probably the simplest way
    is to call IndexWriter.commit() frequently enough. You might want to
    use IndexWriter.ramSizeInBytes() to gauge how much RAM is currently
    consumed by IndexWriter's buffer to determine when to commit. But it
    won't be an exact science, ie, the segment size will be different from
    the RAM buffer size. So, experiment w/ it...

    Second, you need to prevent merging from creating a segment that's too
    large. For this I would use the setMaxMergeMB method of the
    LogByteSizeMergePolicy (which is IndexWriter's default merge policy).
    But note that this max size applies to the *input* segments, so you'd
    roughly want that to be 1.0 MB (your 10.0 MB divided by the merge
    factor = 10), but probably make it smaller to be sure things stay
    small enough.

    Note that with this approach, if your index is large enough, you'll
    wind up with many segments and search performance will suffer when
    compared to an index that doesn't have this max 10.0 MB file size
    restriction.

    Mike
    On Thu, Sep 10, 2009 at 2:32 AM, Dvora wrote:

    Hello again,

    Can someone please comment on that, whether what I'm looking is possible or
    not?


    Dvora wrote:
    Hello,

    I'm using Lucene2.4. I'm developing a web application that using Lucene
    (via compass) to do the searches.
    I'm intending to deploy the application in Google App Engine
    (http://code.google.com/appengine/), which limits files length to be
    smaller than 10MB. I've read about the various policies supported by
    Lucene to limit the file sizes, but on matter which policy I used and
    which parameters, the index files still grew to be lot more the 10MB.
    Looking at the code, I've managed to limit the cfs files (predicting the
    file size in CompoundFileWriter before closing the file) - I guess that
    will degrade performance, but it's OK for now. But now the FDT files are
    becoming huge (about 60MB) and I cant identifiy a way to limit those
    files.

    Is there some built-in and correct way to limit these files length? If no,
    can someone direct me please how should I tweak the source code to achieve
    that?

    Thanks for any help.
    --
    View this message in context: http://www.nabble.com/How-to-avoid-huge-index-files-tp25347505p25378056.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Dvora at Sep 10, 2009 at 9:27 am
    Hi,

    Thanks a lot for that, will peforms the experiments and publish the results.
    I'm aware to the risk of peformance degredation, but for the pilot I'm
    trying to run I think it's acceptable.

    Thanks again!



    Michael McCandless-2 wrote:
    First, you need to limit the size of segments initially created by
    IndexWriter due to newly added documents. Probably the simplest way
    is to call IndexWriter.commit() frequently enough. You might want to
    use IndexWriter.ramSizeInBytes() to gauge how much RAM is currently
    consumed by IndexWriter's buffer to determine when to commit. But it
    won't be an exact science, ie, the segment size will be different from
    the RAM buffer size. So, experiment w/ it...

    Second, you need to prevent merging from creating a segment that's too
    large. For this I would use the setMaxMergeMB method of the
    LogByteSizeMergePolicy (which is IndexWriter's default merge policy).
    But note that this max size applies to the *input* segments, so you'd
    roughly want that to be 1.0 MB (your 10.0 MB divided by the merge
    factor = 10), but probably make it smaller to be sure things stay
    small enough.

    Note that with this approach, if your index is large enough, you'll
    wind up with many segments and search performance will suffer when
    compared to an index that doesn't have this max 10.0 MB file size
    restriction.

    Mike
    On Thu, Sep 10, 2009 at 2:32 AM, Dvora wrote:

    Hello again,

    Can someone please comment on that, whether what I'm looking is possible
    or
    not?


    Dvora wrote:
    Hello,

    I'm using Lucene2.4. I'm developing a web application that using Lucene
    (via compass) to do the searches.
    I'm intending to deploy the application in Google App Engine
    (http://code.google.com/appengine/), which limits files length to be
    smaller than 10MB. I've read about the various policies supported by
    Lucene to limit the file sizes, but on matter which policy I used and
    which parameters, the index files still grew to be lot more the 10MB.
    Looking at the code, I've managed to limit the cfs files (predicting the
    file size in CompoundFileWriter before closing the file) - I guess that
    will degrade performance, but it's OK for now. But now the FDT files are
    becoming huge (about 60MB) and I cant identifiy a way to limit those
    files.

    Is there some built-in and correct way to limit these files length? If
    no,
    can someone direct me please how should I tweak the source code to
    achieve
    that?

    Thanks for any help.
    --
    View this message in context:
    http://www.nabble.com/How-to-avoid-huge-index-files-tp25347505p25378056.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/How-to-avoid-huge-index-files-tp25347505p25380052.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Sep 10, 2009 at 10:43 am
    You're welcome!

    Another, bottoms-up option would be to make a custom Directory impl
    that simply splits up files above a certain size. That'd be more
    generic and more reliable...

    Mike
    On Thu, Sep 10, 2009 at 5:26 AM, Dvora wrote:

    Hi,

    Thanks a lot for that, will peforms the experiments and publish the results.
    I'm aware to the risk of peformance degredation, but for the pilot I'm
    trying to run I think it's acceptable.

    Thanks again!



    Michael McCandless-2 wrote:
    First, you need to limit the size of segments initially created by
    IndexWriter due to newly added documents.  Probably the simplest way
    is to call IndexWriter.commit() frequently enough.  You might want to
    use IndexWriter.ramSizeInBytes() to gauge how much RAM is currently
    consumed by IndexWriter's buffer to determine when to commit.  But it
    won't be an exact science, ie, the segment size will be different from
    the RAM buffer size.  So, experiment w/ it...

    Second, you need to prevent merging from creating a segment that's too
    large.  For this I would use the setMaxMergeMB method of the
    LogByteSizeMergePolicy (which is IndexWriter's default merge policy).
    But note that this max size applies to the *input* segments, so you'd
    roughly want that to be 1.0 MB (your 10.0 MB divided by the merge
    factor = 10), but probably make it smaller to be sure things stay
    small enough.

    Note that with this approach, if your index is large enough, you'll
    wind up with many segments and search performance will suffer when
    compared to an index that doesn't have this max 10.0 MB file size
    restriction.

    Mike
    On Thu, Sep 10, 2009 at 2:32 AM, Dvora wrote:

    Hello again,

    Can someone please comment on that, whether what I'm looking is possible
    or
    not?


    Dvora wrote:
    Hello,

    I'm using Lucene2.4. I'm developing a web application that using Lucene
    (via compass) to do the searches.
    I'm intending to deploy the application in Google App Engine
    (http://code.google.com/appengine/), which limits files length to be
    smaller than 10MB. I've read about the various policies supported by
    Lucene to limit the file sizes, but on matter which policy I used and
    which parameters, the index files still grew to be lot more the 10MB.
    Looking at the code, I've managed to limit the cfs files (predicting the
    file size in CompoundFileWriter before closing the file) - I guess that
    will degrade performance, but it's OK for now. But now the FDT files are
    becoming huge (about 60MB) and I cant identifiy a way to limit those
    files.

    Is there some built-in and correct way to limit these files length? If
    no,
    can someone direct me please how should I tweak the source code to
    achieve
    that?

    Thanks for any help.
    --
    View this message in context:
    http://www.nabble.com/How-to-avoid-huge-index-files-tp25347505p25378056.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/How-to-avoid-huge-index-files-tp25347505p25380052.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Dvora at Sep 10, 2009 at 11:23 am
    Hi again,

    Can you add some details and guidelines how to implement that? Different
    files types have different structure, is such spliting doable without
    knowing Lucene internals?


    Michael McCandless-2 wrote:
    You're welcome!

    Another, bottoms-up option would be to make a custom Directory impl
    that simply splits up files above a certain size. That'd be more
    generic and more reliable...

    Mike
    On Thu, Sep 10, 2009 at 5:26 AM, Dvora wrote:

    Hi,

    Thanks a lot for that, will peforms the experiments and publish the
    results.
    I'm aware to the risk of peformance degredation, but for the pilot I'm
    trying to run I think it's acceptable.

    Thanks again!



    Michael McCandless-2 wrote:
    First, you need to limit the size of segments initially created by
    IndexWriter due to newly added documents.  Probably the simplest way
    is to call IndexWriter.commit() frequently enough.  You might want to
    use IndexWriter.ramSizeInBytes() to gauge how much RAM is currently
    consumed by IndexWriter's buffer to determine when to commit.  But it
    won't be an exact science, ie, the segment size will be different from
    the RAM buffer size.  So, experiment w/ it...

    Second, you need to prevent merging from creating a segment that's too
    large.  For this I would use the setMaxMergeMB method of the
    LogByteSizeMergePolicy (which is IndexWriter's default merge policy).
    But note that this max size applies to the *input* segments, so you'd
    roughly want that to be 1.0 MB (your 10.0 MB divided by the merge
    factor = 10), but probably make it smaller to be sure things stay
    small enough.

    Note that with this approach, if your index is large enough, you'll
    wind up with many segments and search performance will suffer when
    compared to an index that doesn't have this max 10.0 MB file size
    restriction.

    Mike
    On Thu, Sep 10, 2009 at 2:32 AM, Dvora wrote:

    Hello again,

    Can someone please comment on that, whether what I'm looking is
    possible
    or
    not?


    Dvora wrote:
    Hello,

    I'm using Lucene2.4. I'm developing a web application that using
    Lucene
    (via compass) to do the searches.
    I'm intending to deploy the application in Google App Engine
    (http://code.google.com/appengine/), which limits files length to be
    smaller than 10MB. I've read about the various policies supported by
    Lucene to limit the file sizes, but on matter which policy I used and
    which parameters, the index files still grew to be lot more the 10MB.
    Looking at the code, I've managed to limit the cfs files (predicting
    the
    file size in CompoundFileWriter before closing the file) - I guess
    that
    will degrade performance, but it's OK for now. But now the FDT files
    are
    becoming huge (about 60MB) and I cant identifiy a way to limit those
    files.

    Is there some built-in and correct way to limit these files length? If
    no,
    can someone direct me please how should I tweak the source code to
    achieve
    that?

    Thanks for any help.
    --
    View this message in context:
    http://www.nabble.com/How-to-avoid-huge-index-files-tp25347505p25378056.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context:
    http://www.nabble.com/How-to-avoid-huge-index-files-tp25347505p25380052.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/How-to-avoid-huge-index-files-tp25347505p25381489.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Uwe Schindler at Sep 10, 2009 at 11:29 am
    The idea is just to put a layer on top of the abstract file system function
    supplied by directory. Whenever somebody wants to create a file and write
    data to it, the methods create more than one file and switch e.g. after 10
    Megabytes to another file. E.g. look into MMapDirectory that uses MMap to
    map files into address space. Because MappedByteBuffer only supports 32 bit
    offsets, there will be created different mappings for the same file (the
    file is splitted up into parts of 2 Gigabytes). You could use similar code
    here and just use another file, if somebody seeks or writes above the 10 MiB
    limit. Just "virtualize" the files.

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    From: Dvora
    Sent: Thursday, September 10, 2009 1:23 PM
    To: java-user@lucene.apache.org
    Subject: Re: How to avoid huge index files


    Hi again,

    Can you add some details and guidelines how to implement that? Different
    files types have different structure, is such spliting doable without
    knowing Lucene internals?


    Michael McCandless-2 wrote:
    You're welcome!

    Another, bottoms-up option would be to make a custom Directory impl
    that simply splits up files above a certain size. That'd be more
    generic and more reliable...

    Mike
    On Thu, Sep 10, 2009 at 5:26 AM, Dvora wrote:

    Hi,

    Thanks a lot for that, will peforms the experiments and publish the
    results.
    I'm aware to the risk of peformance degredation, but for the pilot I'm
    trying to run I think it's acceptable.

    Thanks again!



    Michael McCandless-2 wrote:
    First, you need to limit the size of segments initially created by
    IndexWriter due to newly added documents.  Probably the simplest way
    is to call IndexWriter.commit() frequently enough.  You might want to
    use IndexWriter.ramSizeInBytes() to gauge how much RAM is currently
    consumed by IndexWriter's buffer to determine when to commit.  But it
    won't be an exact science, ie, the segment size will be different from
    the RAM buffer size.  So, experiment w/ it...

    Second, you need to prevent merging from creating a segment that's too
    large.  For this I would use the setMaxMergeMB method of the
    LogByteSizeMergePolicy (which is IndexWriter's default merge policy).
    But note that this max size applies to the *input* segments, so you'd
    roughly want that to be 1.0 MB (your 10.0 MB divided by the merge
    factor = 10), but probably make it smaller to be sure things stay
    small enough.

    Note that with this approach, if your index is large enough, you'll
    wind up with many segments and search performance will suffer when
    compared to an index that doesn't have this max 10.0 MB file size
    restriction.

    Mike
    On Thu, Sep 10, 2009 at 2:32 AM, Dvora wrote:

    Hello again,

    Can someone please comment on that, whether what I'm looking is
    possible
    or
    not?


    Dvora wrote:
    Hello,

    I'm using Lucene2.4. I'm developing a web application that using
    Lucene
    (via compass) to do the searches.
    I'm intending to deploy the application in Google App Engine
    (http://code.google.com/appengine/), which limits files length to be
    smaller than 10MB. I've read about the various policies supported by
    Lucene to limit the file sizes, but on matter which policy I used
    and
    which parameters, the index files still grew to be lot more the
    10MB.
    Looking at the code, I've managed to limit the cfs files (predicting
    the
    file size in CompoundFileWriter before closing the file) - I guess
    that
    will degrade performance, but it's OK for now. But now the FDT files
    are
    becoming huge (about 60MB) and I cant identifiy a way to limit those
    files.

    Is there some built-in and correct way to limit these files length?
    If
    no,
    can someone direct me please how should I tweak the source code to
    achieve
    that?

    Thanks for any help.
    --
    View this message in context:
    http://www.nabble.com/How-to-avoid-huge-index-files-
    tp25347505p25378056.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context:
    http://www.nabble.com/How-to-avoid-huge-index-files-
    tp25347505p25380052.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/How-to-avoid-huge-
    index-files-tp25347505p25381489.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Dvora at Sep 10, 2009 at 12:26 pm
    Me again :-)

    I'm looking at the code of FSDirectory and MMapDirectory, and found that its
    somewhat difficult for to understand how should subclass the FSDirectory and
    adjust it to my needs. If I understand correct, MMapDirectory overrides the
    openInput() method and returns MultiMMapIndexInput if the file size exceeds
    the threshold. What I'm not understand is that how the new impl should keep
    track on the generated files (or shouldn't it?..) so when searhcing Lucene
    will know in which file to search - I'm confused :-)

    Can I bother you so you supply some kind of psuedo code illustrating how the
    implementation should look like?

    Thanks again for your huge help!


    Uwe Schindler wrote:
    The idea is just to put a layer on top of the abstract file system
    function
    supplied by directory. Whenever somebody wants to create a file and write
    data to it, the methods create more than one file and switch e.g. after 1
    Megabytes to another file. E.g. look into MMapDirectory that uses MMap to
    map files into address space. Because MappedByteBuffer only supports 32
    bit
    offsets, there will be created different mappings for the same file (the
    file is splitted up into parts of 2 Gigabytes). You could use similar code
    here and just use another file, if somebody seeks or writes above the 10
    MiB
    limit. Just "virtualize" the files.

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    From: Dvora
    Sent: Thursday, September 10, 2009 1:23 PM
    To: java-user@lucene.apache.org
    Subject: Re: How to avoid huge index files


    Hi again,

    Can you add some details and guidelines how to implement that? Different
    files types have different structure, is such spliting doable without
    knowing Lucene internals?


    Michael McCandless-2 wrote:
    You're welcome!

    Another, bottoms-up option would be to make a custom Directory impl
    that simply splits up files above a certain size. That'd be more
    generic and more reliable...

    Mike
    On Thu, Sep 10, 2009 at 5:26 AM, Dvora wrote:

    Hi,

    Thanks a lot for that, will peforms the experiments and publish the
    results.
    I'm aware to the risk of peformance degredation, but for the pilot I'm
    trying to run I think it's acceptable.

    Thanks again!



    Michael McCandless-2 wrote:
    First, you need to limit the size of segments initially created by
    IndexWriter due to newly added documents.  Probably the simplest way
    is to call IndexWriter.commit() frequently enough.  You might want to
    use IndexWriter.ramSizeInBytes() to gauge how much RAM is currently
    consumed by IndexWriter's buffer to determine when to commit.  But it
    won't be an exact science, ie, the segment size will be different
    from
    the RAM buffer size.  So, experiment w/ it...

    Second, you need to prevent merging from creating a segment that's
    too
    large.  For this I would use the setMaxMergeMB method of the
    LogByteSizeMergePolicy (which is IndexWriter's default merge policy).
    But note that this max size applies to the *input* segments, so you'd
    roughly want that to be 1.0 MB (your 10.0 MB divided by the merge
    factor = 10), but probably make it smaller to be sure things stay
    small enough.

    Note that with this approach, if your index is large enough, you'll
    wind up with many segments and search performance will suffer when
    compared to an index that doesn't have this max 10.0 MB file size
    restriction.

    Mike
    On Thu, Sep 10, 2009 at 2:32 AM, Dvora wrote:

    Hello again,

    Can someone please comment on that, whether what I'm looking is
    possible
    or
    not?


    Dvora wrote:
    Hello,

    I'm using Lucene2.4. I'm developing a web application that using
    Lucene
    (via compass) to do the searches.
    I'm intending to deploy the application in Google App Engine
    (http://code.google.com/appengine/), which limits files length to
    be
    smaller than 10MB. I've read about the various policies supported
    by
    Lucene to limit the file sizes, but on matter which policy I used
    and
    which parameters, the index files still grew to be lot more the
    10MB.
    Looking at the code, I've managed to limit the cfs files
    (predicting
    the
    file size in CompoundFileWriter before closing the file) - I guess
    that
    will degrade performance, but it's OK for now. But now the FDT
    files
    are
    becoming huge (about 60MB) and I cant identifiy a way to limit
    those
    files.

    Is there some built-in and correct way to limit these files length?
    If
    no,
    can someone direct me please how should I tweak the source code to
    achieve
    that?

    Thanks for any help.
    --
    View this message in context:
    http://www.nabble.com/How-to-avoid-huge-index-files-
    tp25347505p25378056.html
    Sent from the Lucene - Java Users mailing list archive at
    Nabble.com.
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context:
    http://www.nabble.com/How-to-avoid-huge-index-files-
    tp25347505p25380052.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/How-to-avoid-huge-
    index-files-tp25347505p25381489.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/How-to-avoid-huge-index-files-tp25347505p25382376.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ted Stockwell at Sep 10, 2009 at 2:56 pm
    Another alternative is storing the indexes in the Google Datastore, I think Compass already supports that (though I have not used it).

    Also, I have successfully run Lucene on GAE using GaeVFS (http://code.google.com/p/gaevfs/) to store the index in the Datastore.
    (I developed a Lucene Directory implementation on top of GaeVFS that's available at http://sf.net/contrail).


    Dvora wrote:
    Hello,

    I'm using Lucene2.4. I'm developing a web application that using Lucene
    (via compass) to do the searches.
    I'm intending to deploy the application in Google App Engine
    (http://code.google.com/appengine/), which limits files length to be
    smaller than 10MB.




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Dvora at Sep 10, 2009 at 7:19 pm
    Is it possible to upload to GAE an already exist index? My index is data I'm
    collecting for long time, and I prefer not to give it up.



    ted stockwell wrote:
    Another alternative is storing the indexes in the Google Datastore, I
    think Compass already supports that (though I have not used it).

    Also, I have successfully run Lucene on GAE using GaeVFS
    (http://code.google.com/p/gaevfs/) to store the index in the Datastore.
    (I developed a Lucene Directory implementation on top of GaeVFS that's
    available at http://sf.net/contrail).


    Dvora wrote:
    Hello,

    I'm using Lucene2.4. I'm developing a web application that using Lucene
    (via compass) to do the searches.
    I'm intending to deploy the application in Google App Engine
    (http://code.google.com/appengine/), which limits files length to be
    smaller than 10MB.




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/How-to-avoid-huge-index-files-tp25347505p25389394.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ted Stockwell at Sep 10, 2009 at 8:01 pm
    Not at the moment.
    Actually, I'm already working on a remote copy utility for gaevfs that will upload large files and folders but the first cut is about a week away.


    ----- Original Message ----
    From: Dvora <barak.yaish@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Thursday, September 10, 2009 2:18:35 PM
    Subject: Re: How to avoid huge index files


    Is it possible to upload to GAE an already exist index? My index is data I'm
    collecting for long time, and I prefer not to give it up.



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedSep 8, '09 at 2:38p
activeSep 10, '09 at 8:01p
posts11
users4
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase