I am using rubyzip and am trying to put a huge csv file with 1.4
million rows into the zip file.
Using jruby I get a out of heap error.

I believe the error happens in the block below:

Zip::ZipOutputStream.open(zip_path) do |zos|
zos.put_next_entry(File.basename(csv_path))
zos.print IO.read(csv_path)
end

--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To post to this group, send email to rubyonrails-talk@googlegroups.com.
To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.

Search Discussions

  • Luis Lavena at May 9, 2012 at 5:27 pm

    On Wednesday, May 9, 2012 1:52:27 PM UTC-3, Jedrin wrote:
    I am using rubyzip and am trying to put a huge csv file with 1.4
    million rows into the zip file.
    Using jruby I get a out of heap error.

    I believe the error happens in the block below:

    Zip::ZipOutputStream.open(zip_path) do |zos|
    zos.put_next_entry(File.basename(csv_path))
    zos.print IO.read(csv_path)
    end
    You're reading the entire file contents into memory and then saving.

    Look if there is a way for you to stream chunks (16 kilobytes for example)
    into the zip stream.

    --
    Luis Lavena

    --
    You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
    To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-talk/-/pd99kWagyskJ.
    To post to this group, send email to rubyonrails-talk@googlegroups.com.
    To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
  • Jedrin at May 9, 2012 at 6:07 pm
    The error happens on the line:

    zos.print IO.read(csv_path)

    I see that
    p zos.class
    shows:
    Zip::ZipOutputStream

    and that the print method is inherited from:
    http://rubyzip.sourceforge.net/classes/IOExtras/AbstractOutputStream.html
    where print is shown to be this according to doc:

    # File lib/zip/ioextras.rb, line 130
    def print(*params)
    self << params.to_s << $\.to_s
    end



    I am not sure offhand how to stream the data, but gathered that the
    problem was from reading the
    file into memory

    --
    You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
    To post to this group, send email to rubyonrails-talk@googlegroups.com.
    To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
  • Greg Akins at May 9, 2012 at 6:42 pm

    On Wed, May 9, 2012 at 2:07 PM, Jedrin wrote:


    I am not sure offhand how to stream the data, but gathered that the
    problem was from reading the
    file into memory

    The default heapsize for the jvm is pretty small. I believe you can pass
    args to jvm when you start jruby

    if you do something like -xmx1024m (Not sure that syntax is exactly
    correct, but it's close) you might get enough. Of course that depends on
    the size of the file

    --
    Greg Akins
    http://twitter.com/akinsgre

    --
    You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
    To post to this group, send email to rubyonrails-talk@googlegroups.com.
    To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
  • Jedrin at May 9, 2012 at 7:31 pm

    On May 9, 2:42 pm, Greg Akins wrote:
    The default heapsize for the jvm is pretty small.  I believe you can pass
    args to jvm when you start jruby

    if you do something like -xmx1024m (Not sure that syntax is exactly
    correct, but it's close) you might get enough.  Of course that depends on
    the size of the file

    --
    Greg Akinshttp://twitter.com/akinsgre
    Well, the csv file has something like 1.4 million rows and maybe 20
    columns or something like that. When I get a chance, maybe I'll look
    into that if that seems like the thing to try ..


    --
    You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
    To post to this group, send email to rubyonrails-talk@googlegroups.com.
    To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
  • Robert Walker at May 9, 2012 at 8:06 pm

    Jedrin wrote in post #1060204:
    On May 9, 2:42pm, Greg Akins wrote:

    The default heapsize for the jvm is pretty small. I believe you can pass
    args to jvm when you start jruby

    if you do something like -xmx1024m (Not sure that syntax is exactly
    correct, but it's close) you might get enough. Of course that depends on
    the size of the file

    --
    Greg Akinshttp://twitter.com/akinsgre
    Well, the csv file has something like 1.4 million rows and maybe 20
    columns or something like that. When I get a chance, maybe I'll look
    into that if that seems like the thing to try ..
    "When I get a chance, maybe..."???

    Greg gave you the answer. A default JVM instance heap space is limited
    to 64 Megabytes. If the file you're loading, plus the memory consumed by
    your application, goes over that memory limit the JVM will report "out
    of memory" and begin exhibiting unpredictable behavior.

    It make no difference how much physical RAM your machine might contain.
    The JVM will NOT use more heap space that the maximum defined by the
    -xmx argument (-xmx64m being the default when not specified).

    --
    Posted via http://www.ruby-forum.com/.

    --
    You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
    To post to this group, send email to rubyonrails-talk@googlegroups.com.
    To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
  • Jedrin at May 9, 2012 at 8:42 pm

    Greg gave you the answer. A default JVM instance heap space is limited
    to 64 Megabytes. If the file you're loading, plus the memory consumed by
    your application, goes over that memory limit the JVM will report "out
    of memory" and begin exhibiting unpredictable behavior.

    It make no difference how much physical RAM your machine might contain.
    The JVM will NOT use more heap space that the maximum defined by the
    -xmx argument (-xmx64m being the default when not specified).

    --
    Posted viahttp://www.ruby-forum.com/.
    So I launched my sinatra app like this and from my google searches the
    -J arg looks like what I want.

    jruby -J-Xmx1024m -S recordset.rb

    When I tried to download the csv file (which the server puts into the
    zip file and then crashes),
    I got the same heap space error, but it seemed like it did run longer
    before it crashed. II try to increase that number much higher than
    1024m, I get:

    Error occurred during initialization of VM
    Could not reserve enough space for object heap
    JVM creation failed




    --
    You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
    To post to this group, send email to rubyonrails-talk@googlegroups.com.
    To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
  • Greg Akins at May 9, 2012 at 8:52 pm

    On Wed, May 9, 2012 at 4:42 PM, Jedrin wrote:
    When I tried to download the csv file (which the server puts into the
    zip file and then crashes),
    I got the same heap space error, but it seemed like it did run longer
    before it crashed. II try to increase that number much higher than
    1024m, I get:
    The heap contains all the objects created for the application.. In this
    case, it looks like your file is still too big

    Error occurred during initialization of VM
    Could not reserve enough space for object heap
    JVM creation failed
    This means that you tried to allocate more than is available on the machine

    Are you doing this for a single load, or will it be an application that
    will commonly receive large files?

    If it's the latter, I'd probably try to redesign the code you're using to
    load the files. Sounds like this is part of a third party gem? If that's
    the case, maybe they have some mechanism for handling larger files?

    --
    Greg Akins
    http://twitter.com/akinsgre

    --
    You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
    To post to this group, send email to rubyonrails-talk@googlegroups.com.
    To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
  • Jedrin at May 9, 2012 at 9:21 pm

    The heap contains all the objects created for the application.. In this
    case, it looks like your file is still too big
    Error occurred during initialization of VM
    Could not reserve enough space for object heap
    JVM creation failed
    This means that you tried to allocate more than is available on the machine

    Are you doing this for a single load, or will it be an application that
    will commonly receive large files?

    If it's the latter, I'd probably try to redesign the code you're using to
    load the files.  Sounds like this is part of a third party gem?  If that's
    the case, maybe they have some mechanism for handling larger files?

    --
    Greg Akinshttp://twitter.com/akinsgre
    What I do is create a csv file from the database. I had some memory
    problems there, but using active record find_in_batches() seemed to
    solve that.

    The CSV file has 1.4 million rows. It gets created successfully. I
    then use rubyzip gem to create a zip file that just contains that CSV
    file. I just used examples I found from google searches on how to
    create the zip file which are shown earlier up in the thread. I looked
    at the class info on the web for rubyzip and didn't see an obvious way
    to stream data into the zip file. Tomorrow I can look at perhaps some
    other way to create a zip file using a different gem or some such ..



    --
    You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
    To post to this group, send email to rubyonrails-talk@googlegroups.com.
    To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
  • Luis Lavena at May 9, 2012 at 11:59 pm

    On Wednesday, May 9, 2012 6:21:39 PM UTC-3, Jedrin wrote:
    The heap contains all the objects created for the application.. In this
    case, it looks like your file is still too big
    Error occurred during initialization of VM
    Could not reserve enough space for object heap
    JVM creation failed
    This means that you tried to allocate more than is available on the machine
    Are you doing this for a single load, or will it be an application that
    will commonly receive large files?

    If it's the latter, I'd probably try to redesign the code you're using to
    load the files. Sounds like this is part of a third party gem? If that's
    the case, maybe they have some mechanism for handling larger files?

    --
    Greg Akinshttp://twitter.com/akinsgre
    What I do is create a csv file from the database. I had some memory
    problems there, but using active record find_in_batches() seemed to
    solve that.

    The CSV file has 1.4 million rows. It gets created successfully. I
    then use rubyzip gem to create a zip file that just contains that CSV
    file. I just used examples I found from google searches on how to
    create the zip file which are shown earlier up in the thread. I looked
    at the class info on the web for rubyzip and didn't see an obvious way
    to stream data into the zip file. Tomorrow I can look at perhaps some
    other way to create a zip file using a different gem or some such ..
    As I mentioned in my previous reply and similar to the problem you had when
    creating the file: you're trying to load the whole thing.

    There are two options for this:

    A) You stream the contents of your CSV file, reading by chunks into a
    ZipStream

    or

    B) You zip the file from outside Ruby (shelling out to gzip for example)

    --
    Luis Lavena


    --
    You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
    To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-talk/-/mwyK5VTPabEJ.
    To post to this group, send email to rubyonrails-talk@googlegroups.com.
    To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
  • Jedrin at May 11, 2012 at 2:11 pm

    As I mentioned in my previous reply and similar to the problem you had when
    creating the file: you're trying to load the whole thing.

    There are two options for this:

    A) You stream the contents of your CSV file, reading by chunks into a
    ZipStream
    That's exactly what I would like to do, I wasn't sure offhand if the
    zip method will read it that way or how to pass it. I was hoping for
    an idea on how to do that.

    The code where it all happens is here and the second line is where it
    crashes:

    zos.put_next_entry(File.basename(fpath))
    zos.print IO.read(fpath)

    zos is an instance of Zip::ZipOutputStream.
    The print method is inherited from IOExtras::AbstractOutputStream

    According to the docs, print() is like this
    def print(*params)
    self << params.to_s << $\.to_s
    end

    Since it does params.to_s, I'm guessing that is going to put it all
    into memory.
    The other methods may have similar problems.

    However, the putc method looked interesting.

    There is a putc() defined like this according to the docs:

    def putc(anObject)
    self << case anObject
    when Fixnum then anObject.chr
    when String then anObject
    else raise TypeError, "putc: Only Fixnum and String
    supported"
    end
    anObject
    end


    So I tried that, here is my code, and the output follows, but the file
    I was trying to zip was another zip file. It appeared to be a bit
    bigger than it should have been and when I tried to open it, I got an
    error saying it was corrupted.

    This isn't quite the same CSV problem, but I am doing a zip file into
    a zip file here.


    def zput(zos,fpath)
    p fpath
    zos.put_next_entry(File.basename(fpath))
    f = File.new(fpath)
    chunk_sz = 10000000
    while !f.eof?
    data = f.read(chunk_sz)
    zos.putc data
    puts 'read ' + data.size.to_s + ' bytes'
    end
    end


    "web.war"
    read 10000000 bytes
    read 10000000 bytes
    read 8573823 bytes
    "data.war"
    read 10000000 bytes
    read 8655347 bytes
    "big.zip"
    read 10000000 bytes
    read 10000000 bytes
    read 10000000 bytes
    read 10000000 bytes
    read 10000000 bytes
    read 10000000 bytes
    read 10000000 bytes
    read 10000000 bytes
    read 10000000 bytes
    read 10000000 bytes
    read 10000000 bytes
    read 10000000 bytes
    read 10000000 bytes
    read 10000000 bytes
    read 10000000 bytes
    read 10000000 bytes
    read 10000000 bytes
    read 3431079 bytes

    --
    You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
    To post to this group, send email to rubyonrails-talk@googlegroups.com.
    To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
  • Jedrin at May 11, 2012 at 2:26 pm
    I changed the putc about to a write in the above post, followed by
    zos.print "" at the very end. print() adds $\ to the file it appears.
    My byte size of the zip file inside the zip was short by two bytes and
    I still get corrupted zip file errors on that.

    --
    You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
    To post to this group, send email to rubyonrails-talk@googlegroups.com.
    To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
  • Jedrin at May 11, 2012 at 8:48 pm
    It's late Friday and I am done for the day, but I just tried something
    else. It may be that I need to open the file in binary mode and I
    didn't. Initial tests seem to indicate that may be the case. Thanks
    for everyone's help.

    --
    You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
    To post to this group, send email to rubyonrails-talk@googlegroups.com.
    To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouprubyonrails-talk @
categoriesrubyonrails
postedMay 9, '12 at 4:52p
activeMay 11, '12 at 8:48p
posts13
users4
websiterubyonrails.org
irc#RubyOnRails

People

Translate

site design / logo © 2022 Grokbase