Hi!

I'm trying to speed up some queries in my app, running Rails 3.2 and need
some help. I'm running the app at Heroku on postgresql. I'm new to
postgresql and need some help to optimize a query so it effectively uses
indices.

This is the query I'm currently working on:
http://snipt.net/Linuus/rails-query?key=ee73173643e8d21a5a487d8a329c7a26

In the link above you can also see the full EXPLAIN (query plan) as well as
my current indices. It doesn't seem to use any of my indices at the
moment...

So, any help is very appreciated.

Regards
Linus

--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-talk/-/kWghe778QIAJ.
To post to this group, send email to rubyonrails-talk@googlegroups.com.
To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.

Search Discussions

  • Jim at Jan 25, 2012 at 5:10 pm
    Would probably help if you posted the generated SQL as well as the
    output of EXPLAIN ANALYZE, which shows not just what the planner
    thinks the costs are, but what the costs actually were.

    You could also try running ANALYZE on the tables in question, or
    ANALYZE VERBOSE and checking if the estimates are pretty close to
    reality, e.g.

    ANALYZE VERBOSE categories;

    Jim Crate

    --
    You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
    To post to this group, send email to rubyonrails-talk@googlegroups.com.
    To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
  • Linus Pettersson at Jan 25, 2012 at 7:20 pm
    Thank you. I have added the generated SQL to the snipt in my first post.

    Regarding the EXPLAIN ANALYZE, can I somehow run that on Heroku? I'm using
    their shared database so I don't have access to any psql console...

    I did run it on my local machine though. The thing is that it differs in
    database version (Local is PostgreSQL 9.1.2 while Heroku still uses 8.3 or
    something) and also in size.

    Anyway, the EXPLAIN ANALYZE from my local machine can be found here:
    http://snipt.net/Linuus/postgresql?key=e4728f444b24839e3f80adf3829bcba9

    Regards
    Linus

    --
    You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
    To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-talk/-/UqLl5MkpgPcJ.
    To post to this group, send email to rubyonrails-talk@googlegroups.com.
    To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
  • Peter Vandenabeele at Jan 25, 2012 at 7:30 pm

    On Wed, Jan 25, 2012 at 8:20 PM, Linus Pettersson wrote:

    Thank you. I have added the generated SQL to the snipt in my first post.

    Regarding the EXPLAIN ANALYZE, can I somehow run that on Heroku? I'm using
    their shared database so I don't have access to any psql console...

    I did run it on my local machine though. The thing is that it differs in
    database version (Local is PostgreSQL 9.1.2 while Heroku still uses 8.3 or
    something) and also in size.
    On Debian/Ubuntu you typically have a choice of an 8.4 or a 9.1 ...

    Try

    $ apt-cache search postgresql-8.4

    and then you could install that alongside 9.1.

    It will run on a different port
    (one on 5432 and one on 5433 IIRC, check $ netstat -atn).

    That said, I have no information that the optimization would be
    different for such a "simple" query.

    Anyway, the EXPLAIN ANALYZE from my local machine can be found here:
    http://snipt.net/Linuus/postgresql?key=e4728f444b24839e3f80adf3829bcba9
    <speculative>
    Just out of curiosity ... what problem are you trying to solve ? Did you
    actually see any slow queries ? I don't see obvious problems with the
    query ... Not sure if this is the area that deserves your first attention
    wrt performance optimization ... there is this whole story about
    "premature optimization" (Google it to find many rants about it ...).
    </speculative>

    HTH,

    Peter

    --
    You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
    To post to this group, send email to rubyonrails-talk@googlegroups.com.
    To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
  • Linus Pettersson at Jan 25, 2012 at 7:51 pm
    Well, maybe it's not necessary... It is the slowest of my queries as far as
    I can see anyway.

    I'm using Rails 3.2 and I have enabled automatic EXPLAIN on slow queries
    (>0.5s). This query is currently being "explained" in my console when I run
    it on my localhost. "SQL (556.5ms) ..."

    556.5ms seems a bit slow to me, isn't it?

    Then a bunch of other stuff is happening and I get this:
    "Completed 200 OK in 2737ms (Views: 813.5ms | ActiveRecord: 703.2ms)"

    So, 550ms of the total 703ms is the above query.

    Maybe I'm just picky? :)

    --
    You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
    To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-talk/-/kDHHedgd5AcJ.
    To post to this group, send email to rubyonrails-talk@googlegroups.com.
    To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
  • Peter Vandenabeele at Jan 25, 2012 at 9:56 pm

    On Wed, Jan 25, 2012 at 8:51 PM, Linus Pettersson wrote:

    Well, maybe it's not necessary... It is the slowest of my queries as far
    as I can see anyway.

    I'm using Rails 3.2 and I have enabled automatic EXPLAIN on slow queries
    (>0.5s). This query is currently being "explained" in my console when I run
    it on my localhost. "SQL (556.5ms) ..."

    556.5ms seems a bit slow to me, isn't it?
    Indeed.

    TL;DR
    * On my system, a similar query seems a lot faster
    * you may have a bad "sort" on categories.name, subcategories.name without
    index ?


    I had a similar database structure here, filled it with 10,000 records (you
    seem to have around 2,200 ?),
    postgresql 9.1 on Ubuntu and ran a similar query:

    cl1 = ContactLine.includes(:contact) ; nil
    cl2 = cl1.includes(:contact => :person) ; nil
    cl3 = cl2.where("people.first_name = 'M' OR people.first_name =
    'longer_word'") ; nil # 8 times 'M' and 8 times 'longer_word'
    cl4 = cl3.order("contacts.email") ; nil

    167:0> puts cl4.explain
    SQL (16.9ms) SELECT "contact_lines"."id" AS ...
    FROM "contact_lines"
    LEFT OUTER JOIN "contacts" ON "contacts"."id" = "contact_lines"."contact_id"
    LEFT OUTER JOIN "people" ON "people"."id" = "contacts"."person_id"
    WHERE (people.first_name = 'M' OR people.first_name = 'longer_word')
    ORDER BY contacts.email
    EXPLAIN (1.4ms) EXPLAIN SELECT "contact_lines"."id" ... <same query
    description>
    EXPLAIN for: SELECT "contact_lines"."id" ...
    QUERY PLAN
    --------------------------------------------------------------------------------------------------------------------------
    Sort (cost=691.05..691.06 rows=1 width=2699)
    Sort Key: contacts.email
    -> Hash Join (cost=455.53..691.04 rows=1 width=2699)
    Hash Cond: (contact_lines.contact_id = contacts.id)
    -> Seq Scan on contact_lines (cost=0.00..198.00 rows=10000
    width=50)
    -> Hash (cost=455.52..455.52 rows=1 width=2649)
    -> Hash Join (cost=229.01..455.52 rows=1 width=2649)
    Hash Cond: (contacts.person_id = people.id)
    -> Seq Scan on contacts (cost=0.00..189.00
    rows=10000 width=41)
    -> Hash (cost=229.00..229.00 rows=1 width=2608)
    -> Seq Scan on people (cost=0.00..229.00
    rows=1 width=2608)
    Filter: (((first_name)::text = 'M'::text)
    OR ((first_name)::text = 'longer_word'::text))
    (12 rows)
    => nil

    The log on a development server (plain `rails s`) is:

    Started GET "/contact_lines" for 127.0.0.1 at 2012-01-25 22:26:41 +0100
    Processing by ContactLinesController#index as HTML
    SQL (8.6ms) SELECT "contact_lines"."id" AS t0_r0...
    ... ORDER BY contacts.email
    Rendered contact_lines/index.html.haml within layouts/application (99.4ms)
    Completed 200 OK in 105ms (Views: 89.5ms | ActiveRecord: 14.7ms)


    About indexes ... The above is without explicitly set indexes.

    Assuming that the first_name::text filtering would be the most expensive, I
    added
    this migration:

    class AddIndexFirstName < ActiveRecord::Migration
    def change
    add_index :people, :first_name
    end
    end

    This worked:

    # \d people
    Table "public.people"
    Column | Type |
    Modifiers
    --------------+-----------------------------+-----------------------------------------------------
    id | integer | not null default
    nextval('people_id_seq'::regclass)
    full_name | character varying(255) |
    first_name | character varying(255) |
    ...
    Indexes:
    "people_pkey" PRIMARY KEY, btree (id)
    "index_people_on_first_name" btree (first_name)

    The EXPLAIN changed:

    SQL (12.7ms) SELECT "contact_lines"."id" AS t0_r0, ...

    QUERY PLAN
    --------------------------------------------------------------------------------------------------------------------------------
    Sort (cost=474.59..474.59 rows=1 width=2699)
    Sort Key: contacts.email
    -> Hash Join (cost=239.07..474.58 rows=1 width=2699)
    Hash Cond: (contact_lines.contact_id = contacts.id)
    -> Seq Scan on contact_lines (cost=0.00..198.00 rows=10000
    width=50)
    -> Hash (cost=239.05..239.05 rows=1 width=2649)
    -> Hash Join (cost=12.54..239.05 rows=1 width=2649)
    Hash Cond: (contacts.person_id = people.id)
    -> Seq Scan on contacts (cost=0.00..189.00
    rows=10000 width=41)
    -> Hash (cost=12.53..12.53 rows=1 width=2608)
    -> Bitmap Heap Scan on people
    (cost=8.52..12.53 rows=1 width=2608)
    Recheck Cond: (((first_name)::text =
    'M'::text) OR ((first_name)::text = 'longer_word'::text))
    -> BitmapOr (cost=8.52..8.52 rows=1
    width=0)
    -> Bitmap Index Scan on
    index_people_on_first_name (cost=0.00..4.26 rows=1 width=0)
    Index Cond:
    ((first_name)::text = 'M'::text)
    -> Bitmap Index Scan on
    index_people_on_first_name (cost=0.00..4.26 rows=1 width=0)
    Index Cond:
    ((first_name)::text = 'longer_word'::text)
    (17 rows)

    But the performance was the same. A log of a hit to the server.

    Processing by ContactLinesController#index as HTML
    SQL (11.7ms) SELECT "contact_lines"."id" AS t0_r0, ...
    ...
    WHERE (people.first_name = 'M' OR people.first_name = 'longer_word')
    Rendered contact_lines/index.html.haml within layouts/application
    (104.1ms)
    Completed 200 OK in 112ms (Views: 90.8ms | ActiveRecord: 19.8ms)



    Maybe it is related to something in your query plan that I do not see here
    and is suspicious:

    from http://snipt.net/Linuus/postgresql?key=e4728f444b24839e3f80adf3829bcba9

    Sort (cost=3267.32..3272.25 rows=1972 width=2104) (actual
    time=39.308..39.468 rows=1880 loops=1)
    Sort Key: categories.name, subcategories.name
    Sort Method: quicksort Memory: 313kB ####### ??


    from
    http://snipt.net/Linuus/rails-query?key=ee73173643e8d21a5a487d8a329c7a26

    you seem to not have an index on those 2 `name` columns ?


    What happens to performance when you remove try sort ?

    Category.eager_load(:subcategories)
    .joins("INNER JOIN products AS p ON resellercategories.id =
    p.resellercategory_id")
    ## WITHOUT SORT .order("categories.name ASC, subcategories.name ASC")
    .where("p.gender = 'unisex' OR p.gender = 'male'")

    If that makes a significant difference, what happens when you add indices
    on those 2
    columns (and possibly remove some indices on other columns) ?


    Then a bunch of other stuff is happening and I get this:
    "Completed 200 OK in 2737ms (Views: 813.5ms | ActiveRecord: 703.2ms)"

    So, 550ms of the total 703ms is the above query.

    Maybe I'm just picky? :)
    Those indeed seem "large" numbers for a new/fresh application...

    Peter

    --
    You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
    To post to this group, send email to rubyonrails-talk@googlegroups.com.
    To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
  • Linus Pettersson at Jan 26, 2012 at 5:15 pm
    Hi

    I tested to remove the .order(...) part and indeed, the query time goes
    down to ~100ms. However, it doesn't help to add indices, at least not as I
    did :)

    add_index :categories, :name
    add_index :subcategories, :name

    Did some more testing and if I keep the .order... but don't join the
    products table I get a query that runs at about ~55ms. So the bottleneck
    seems to be the products table.
    The query that I'm running looks like this:

    Category.eager_load(:subcategories)
    .joins("INNER JOIN products AS p ON resellercategories.id =
    p.resellercategory_id")
    .order("categories.name ASC, subcategories.name ASC")

    (Skipping the gender here...)

    What I have is Categories and Subcategories. They are related to each other
    through a Resellercategories table. Products are related to
    Resellercategories.
    So, the reason that I want to join the products as well is because I only
    want to show categories and subcategories that actually have some products
    (there are some empty categories/subcategories still).

    So the above query is what we came up with in another thread here in the
    group.

    - Maybe there is a better way to check if a category/subcategory has
    products without joining the entire products table?


    BTW, in my dev database there is about 8700 rows in products table, 2200
    rows in resellercategories, 5 rows in categories and 45 rows in
    subcategories.

    --
    You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
    To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-talk/-/1AqHbcGn9n8J.
    To post to this group, send email to rubyonrails-talk@googlegroups.com.
    To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
  • Linus Pettersson at Jan 26, 2012 at 5:32 pm
    Also, another reason for the query that I forgot to mention is that if a
    user filters the products for female products only for instance, it should
    only show categories and subcategories that contains products for that
    gender.

    The gender is specified in the products table...

    --
    You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
    To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-talk/-/KXfbWuombtcJ.
    To post to this group, send email to rubyonrails-talk@googlegroups.com.
    To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
  • Peter Vandenabeele at Jan 26, 2012 at 5:37 pm

    On Thu, Jan 26, 2012 at 6:15 PM, Linus Pettersson wrote:

    Hi

    I tested to remove the .order(...) part and indeed, the query time goes
    down to ~100ms. However, it doesn't help to add indices, at least not as I
    did :)

    add_index :categories, :name
    add_index :subcategories, :name

    Did some more testing and if I keep the .order... but don't join the
    products table I get a query that runs at about ~55ms. So the bottleneck
    seems to be the products table.
    The query that I'm running looks like this:

    Category.eager_load(:subcategories)
    .joins("INNER JOIN products AS p ON resellercategories.id =
    p.resellercategory_id")
    .order("categories.name ASC, subcategories.name ASC")

    (Skipping the gender here...)

    What I have is Categories and Subcategories. They are related to each
    other through a Resellercategories table. Products are related to
    Resellercategories.
    So, the reason that I want to join the products as well is because I only
    want to show categories and subcategories that actually have some products
    (there are some empty categories/subcategories still).

    So the above query is what we came up with in another thread here in the
    group.

    - Maybe there is a better way to check if a category/subcategory has
    products without joining the entire products table?

    It is possible to add a :counter_cache , but then you need to make sure you
    use the proper methods for each product that you add or remove from the
    association.

    Alternative to the default counter cache (from Rails), you could build your
    own
    logic as in:

    * has_male_products
    * ...

    changing you query to ...

    Category.eager_load(:subcategories).
    where(:has_male_products => true).
    order(...)

    Then you would need to set the cache on the appropriate categories in
    an after_save on the product you are
    creating/updating/deactivating/(deleting ?).

    Both ideas would probably be faster for querying, but certainly more
    complex for making sure that cache is always correct.

    HTH,

    Peter

    --
    You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
    To post to this group, send email to rubyonrails-talk@googlegroups.com.
    To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
  • Peter Vandenabeele at Jan 26, 2012 at 7:08 pm

    On Thu, Jan 26, 2012 at 6:37 PM, Peter Vandenabeele wrote:

    On Thu, Jan 26, 2012 at 6:15 PM, Linus Pettersson <
    linus.pettersson@gmail.com> wrote:
    Hi

    I tested to remove the .order(...) part and indeed, the query time goes
    down to ~100ms. However, it doesn't help to add indices, at least not as I
    did :)

    add_index :categories, :name
    add_index :subcategories, :name

    Did some more testing and if I keep the .order... but don't join the
    products table I get a query that runs at about ~55ms. So the bottleneck
    seems to be the products table.
    The query that I'm running looks like this:

    Category.eager_load(:subcategories)
    .joins("INNER JOIN products AS p ON resellercategories.id =
    p.resellercategory_id")
    .order("categories.name ASC, subcategories.name ASC")

    (Skipping the gender here...)

    What I have is Categories and Subcategories. They are related to each
    other through a Resellercategories table. Products are related to
    Resellercategories.
    So, the reason that I want to join the products as well is because I only
    want to show categories and subcategories that actually have some products
    (there are some empty categories/subcategories still).

    So the above query is what we came up with in another thread here in the
    group.

    - Maybe there is a better way to check if a category/subcategory has
    products without joining the entire products table?

    It is possible to add a :counter_cache , but then you need to make sure you
    use the proper methods for each product that you add or remove from the
    association.

    Alternative to the default counter cache (from Rails), you could build
    your own
    logic as in:

    * has_male_products
    * ...

    changing you query to ...

    Category.eager_load(:subcategories).
    where(:has_male_products => true).
    order(...)

    Then you would need to set the cache on the appropriate categories in
    an after_save on the product you are
    creating/updating/deactivating/(deleting ?).

    Both ideas would probably be faster for querying, but certainly more
    complex for making sure that cache is always correct.
    Sorry to reply to my own post.


    TL;DR Is there pagination? Then a smaller set may return much faster.


    I was thinking over my reply and may have forgotten a fundamental
    aspect ... If you say 2200 categories, 8000 products.

    How many entries does you query return ?
    (replace .add with .count at the end).

    How many do you need ?

    What happens when you add .limit(20) to your query ?

    By which "primary object" do you want to sort and paginate ?
    (I will assume 'Product' in the discussion below).

    With the includes that are currently implemented, you may
    have to redo the whole query into 2 qeuries ...

    1) for fetching the "primary objects"
    (e.g. exactly 20 Products, no additional "has_many" data, because
    that would increase the number of returned rows for 1 product and
    make proper pagination in the database impossible; including
    "belongs_to" here is no problem)

    2) a second query for fetching eventual "has_many" data
    on those 20 "primary products" (is that "Reification" ?)

    If the performance problem could be solved by taking the
    pagination into account, that would be a _much_ better
    solution that building cache columns in this early phase
    of your project.

    HTH,

    Peter

    --
    You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
    To post to this group, send email to rubyonrails-talk@googlegroups.com.
    To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
  • Linus Pettersson at Jan 26, 2012 at 7:51 pm
    This query is only to get the appropriate Categories & Subcategories. There
    are 5 Categories and 45 Subcategories.

    My products are imported from webshops which are using different categories
    for the same things ("tshirt", "t-shirts", "t-shirt", "short sleeved
    shirts" may all be the same). To cope with this issue I have the
    "Resellercategories" that I relate to a specific Category and Subcategory.

    Category/Subcategory -> Resellercategory -> Product

    (Resellercategories are never seen in the frontend)

    If I filter the products for "male" I only want to display (non-empty)
    Categories/Subcategories that have male products.


    To answer your question, there is no pagination on the Categories and
    Subcategories that I'm fetching here with this query. The query only
    returns the appropriate Categories/Subcategories. So if filtered by "male"
    it returns 3 Categories and 12 Subcategories (In my dev environment).


    I am using pagination for the actual products though, but that is a
    separate query.

    --
    You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
    To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-talk/-/ZGBq4G24MIUJ.
    To post to this group, send email to rubyonrails-talk@googlegroups.com.
    To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
  • Peter Vandenabeele at Jan 26, 2012 at 8:07 pm

    On Thu, Jan 26, 2012 at 8:51 PM, Linus Pettersson wrote:

    This query is only to get the appropriate Categories & Subcategories.
    There are 5 Categories and 45 Subcategories.

    My products are imported from webshops which are using different
    categories for the same things ("tshirt", "t-shirts", "t-shirt", "short
    sleeved shirts" may all be the same). To cope with this issue I have the
    "Resellercategories" that I relate to a specific Category and Subcategory.

    Category/Subcategory -> Resellercategory -> Product

    (Resellercategories are never seen in the frontend)

    If I filter the products for "male" I only want to display (non-empty)
    Categories/Subcategories that have male products.


    To answer your question, there is no pagination on the Categories and
    Subcategories that I'm fetching here with this query. The query only
    returns the appropriate Categories/Subcategories. So if filtered by "male"
    it returns 3 Categories and 12 Subcategories (In my dev environment).
    OK, I understand (small amount of categories, much more products).

    If you find no other way, then the caching the "has_male_products" etc
    in the Category/Subcategory may be the best remaining way.

    I was also thinking, maybe it is acceptable that the set of Categories
    with "male_products" etc. is only update every hour or so. That would
    avoid the complexity of real-time update of that cache column. On the
    other hand, an after_save on products is not _that_ difficult.

    Just as a test, does your performance improve significantly if you add
    a "has_male_products" column on categories and filter on that?

    HTH,

    Peter

    --
    You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
    To post to this group, send email to rubyonrails-talk@googlegroups.com.
    To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouprubyonrails-talk @
categoriesrubyonrails
postedJan 25, '12 at 4:14p
activeJan 26, '12 at 8:07p
posts12
users3
websiterubyonrails.org
irc#RubyOnRails

People

Translate

site design / logo © 2021 Grokbase