Hi,

today I went into a problem, when I had to iterate over a big result
set. ActiveRecord produces a huge array of model instances, which
consumed 1GB of memory on my machine. A comparable perl DBI script
only used some KB for iterating over the result set.

Then I was suggested to use batch (#find_each) for the problem. But
basically batch does the same thing by splitting the query into
several queries, which only return 1000 model instances in an array,
which still consumes more memory than necessary, but not that much.
The problem: it was much slower (it took 25 minutes, while the version
without batch took 90 seconds and the perl script took only 40
seconds) on my legacy database.

So I was searching for a method, which yields a model instance at a
time from the result set of the query, but I could not find anything.

At the end, I tried to find a solution myself and so I came up to this
simple solution

https://github.com/tvw/ar-each_model

I want to put into discussion here, since I wonder, why similar things
do not exist in ActiveRecord.

Regards
Thomas

--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To post to this group, send email to rubyonrails-talk@googlegroups.com.
To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.

Search Discussions

  • Robert Walker at Jan 11, 2012 at 8:13 pm

    tvw wrote in post #1040336:
    today I went into a problem, when I had to iterate over a big result
    set. ActiveRecord produces a huge array of model instances, which
    consumed 1GB of memory on my machine. A comparable perl DBI script
    only used some KB for iterating over the result set.
    It is well known that ActiveRecord objects are pretty heavy weight. My
    guess is that your Perl script is instead returning you something very
    light weight. Something that is little more than key value pairs for
    each database row.

    I'm also assuming that whatever you're doing with this large result set
    probably doesn't require "smart" heavy weight model objects.

    In the past I used a framework that had a similar issue. It, however,
    had a built-in mechanism for dealing with the issue. It had something
    called "raw row fetching." Rather than returning true model objects,
    with all the intelligence built into them, you could opt to fetch raw
    rows that were represented by an array of dictionaries (hashes). A raw
    row could then be transformed into a full fledged model object on
    demand.

    I don't know if ActiveRecord provides anything similar to this
    out-of-the-box. But, I'm sure someone must have developed something like
    this for Rails.

    --
    Posted via http://www.ruby-forum.com/.

    --
    You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
    To post to this group, send email to rubyonrails-talk@googlegroups.com.
    To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
  • Tvw at Jan 13, 2012 at 9:22 am
    Hi Robert,
    On 11 Jan., 21:13, Robert Walker wrote:
    tvw wrote in post #1040336:
    today I went into a problem, when I had to iterate over a big result
    set. ActiveRecord produces a huge array of model instances, which
    consumed 1GB of memory on my machine. A comparable perl DBI script
    only used some KB for iterating over the result set.
    It is well known that ActiveRecord objects are pretty heavy weight. My
    guess is that your Perl script is instead returning you something very
    light weight. Something that is little more than key value pairs for
    each database row.
    ActiveRecord objects are by far not that heavy weight, but it is true,
    that the perl script returns just a hash for each row. So if I would
    get an ActiveRecord object for each row, I would not have tried to
    find a better solution. But ActiveRecord produces an entire
    Enumeration object containing all records of the query as ActiveRecord
    objects, before it starts returning the first object in the loop. If
    the perl script would produce an array of hashes, which is the light
    weight counterpart, it would consume a massive and very noticeable
    amount of memory too, though not as much as ActiveRecord. But in the
    perl script, there never exists more than one hash at the same time,
    which is the last row retrieved from the database.
    I'm also assuming that whatever you're doing with this large result set
    probably doesn't require "smart" heavy weight model objects.
    Yes, that is true, but dealing with the objects is more comfortable,
    since they are smart and it just costs me only a few bytes and only a
    little more time with the solution, I found for the problem.
    In the past I used a framework that had a similar issue. It, however,
    had a built-in mechanism for dealing with the issue. It had something
    called "raw row fetching." Rather than returning true model objects,
    with all the intelligence built into them, you could opt to fetch raw
    rows that were represented by an array of dictionaries (hashes). A raw
    row could then be transformed into a full fledged model object on
    demand.
    I don't know if ActiveRecord provides anything similar to this
    out-of-the-box. But, I'm sure someone must have developed something like
    this for Rails.
    The ActiveRecord database adapters provide such a mechanism, and that
    is what my solution uses: It retrieves the rows from the database as
    hashes and generates an ActiveRecord object, which it then yields.
    This just costs a few bytes overhead over raw hashes and a little more
    time.

    The funny thing was, that with that solution the results in my sqlite3
    demo database worked as expected, while against my legacy sqlserver
    database, it consumed a lot of memory too. The reason was, that the
    underlying TinyTDS database driver, which the sqlserver adapter for
    ActiveRecord uses, caches each record it retrieves by default and must
    be turned off.

    So the funny thing is, when you use the ActiveRecord adapter, you do
    not only have the memory consumption, which results in AR building an
    Enumaration of objects, but on top of this, you have a cache of all
    database rows in the driver too, since the adapter uses the default
    option, as far as I can see.

    Regards
    Thomas

    --
    You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
    To post to this group, send email to rubyonrails-talk@googlegroups.com.
    To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
  • Matt Jones at Jan 12, 2012 at 10:30 pm

    On Jan 10, 7:18 pm, tvw wrote:
    Hi,

    today I went into a problem, when I had to iterate over a big result
    set. ActiveRecord produces a huge array of model instances, which
    consumed 1GB of memory on my machine. A comparable perl DBI script
    only used some KB for iterating over the result set.
    How many records was this? 1gb is pretty crazy.
    Then I was suggested to use batch (#find_each) for the problem. But
    basically batch does the same thing by splitting the query into
    several queries, which only return 1000 model instances in an array,
    which still consumes more memory than necessary, but not that much.
    The problem: it was much slower (it took 25 minutes, while the version
    without batch took 90 seconds and the perl script took only 40
    seconds) on my legacy database.
    This sounds like your legacy DB is somehow not indexing the ID column
    - after all, instantiating all those objects only takes 90 seconds.

    --Matt Jones

    --
    You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
    To post to this group, send email to rubyonrails-talk@googlegroups.com.
    To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
  • Tvw at Jan 13, 2012 at 9:47 am

    On 12 Jan., 23:29, Matt Jones wrote:
    On Jan 10, 7:18 pm, tvw wrote:

    Hi,
    today I went into a problem, when I had to iterate over a big result
    set. ActiveRecord produces a huge array of model instances, which
    consumed 1GB of memory on my machine. A comparable perl DBI script
    only used some KB for iterating over the result set.
    How many records was this? 1gb is pretty crazy.
    400000 records
    Then I was suggested to use batch (#find_each) for the problem. But
    basically batch does the same thing by splitting the query into
    several queries, which only return 1000 model instances in an array,
    which still consumes more memory than necessary, but not that much.
    The problem: it was much slower (it took 25 minutes, while the version
    without batch took 90 seconds and the perl script took only 40
    seconds) on my legacy database.
    This sounds like your legacy DB is somehow not indexing the ID column
    - after all, instantiating all those objects only takes 90 seconds.
    Matt, the ID column is indexed, but probably not all fields the query
    uses, since the table is used for logging events and must be fast for
    writing rather than reading. The 400_000 records I retrieve, are
    400_000 among millions of records, which lie close to each other, but
    not next to each other. And the database is in heavy production while
    reading those data.

    So some amount of the 40 seconds, the perl script runs, does the query
    itself cost. When you now do 400 queries in batches rather than 1
    query to retrieve all records, and the query itself may cost you only
    1 second, you already have spent 400 seconds which is about 7 minutes
    without retrieving and processing a single row.

    Regards
    Thomas

    --
    You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
    To post to this group, send email to rubyonrails-talk@googlegroups.com.
    To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouprubyonrails-talk @
categoriesrubyonrails
postedJan 11, '12 at 10:35a
activeJan 13, '12 at 9:47a
posts5
users3
websiterubyonrails.org
irc#RubyOnRails

3 users in discussion

Tvw: 3 posts Matt Jones: 1 post Robert Walker: 1 post

People

Translate

site design / logo © 2021 Grokbase