FAQ
I was working on this a bit w/ grim_radical/nlewis on Friday afternoon, and
wanted to loop back, figured at least starting the conversion here again
was a good idea.

We have starting having problems with one query slowing our puppetdb way,
way down, and seemingly blocking other queries/node runs.

We are an HPC shop, and have about 2K nodes. of that, about 1500 do the
following:

@@sshkey { "${hostname}":
host_aliases => ["$fqdn", "$ipaddress" ],
type => "rsa",
key => $sshrsakey,
ensure => present,
}

Sshkey <<| type == "rsa" |>> {ensure => present}

Thats it. nothing really crazy/special in there.

This ends up as:

LOG: duration: 5690.773 ms execute <unnamed>: select results.* from
(SELECT certname_catalogs.certname, catalog_resources.resource,
catalog_resources.type, catalog_resources.title,catalog_resources.tags,
catalog_resources.exported, catalog_resources.sourcefile,
catalog_resources.sourceline, rp.name, rp.value FROM catalog_resources JOIN
certname_catalogs USING(catalog) LEFT OUTER JOIN resource_params rp
USING(resource) INNER JOIN certnames ON certname_catalogs.certname =
certnames.name WHERE (catalog_resources.type = $1) AND
(catalog_resources.exported = $2) AND (certnames.deactivated IS NULL) AND
(NOT ((certname_catalogs.certname = $3))) AND (catalog_resources.resource
IN (SELECT rp.resource FROM resource_params rp WHERE rp.name = $4 AND
rp.value = $5))) results LIMIT 50001
DETAIL: parameters: $1 = 'Sshkey', $2 = 't', $3 = 'hero4209', $4 = 'type',
$5 = '"rsa"'

Would adding an index on this be an option (i'm not a huge postgres guru,
maybe I'm using the wrong terms).

As soon as we commented out the collection, like:

#Sshkey <<| type == "rsa" |>> {ensure => present}

Things all go back to normal, and nodes run nice and quickly. With that in
there, nodes would hang running and start timing out. Our 2K nodes are on a
2 hour run interval.

Any help/thoughts? I'm in irc as sjoeboo as well.


--
Matthew Nicholson

--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.

Search Discussions

  • Jcbollinger at Nov 15, 2012 at 2:53 pm

    On Wednesday, November 14, 2012 11:37:23 AM UTC-6, Matt wrote:
    I was working on this a bit w/ grim_radical/nlewis on Friday afternoon,
    and wanted to loop back, figured at least starting the conversion here
    again was a good idea.

    We have starting having problems with one query slowing our puppetdb way,
    way down, and seemingly blocking other queries/node runs.

    We are an HPC shop, and have about 2K nodes. of that, about 1500 do the
    following:

    @@sshkey { "${hostname}":
    host_aliases => ["$fqdn", "$ipaddress" ],
    type => "rsa",
    key => $sshrsakey,
    ensure => present,
    }

    Sshkey <<| type == "rsa" |>> {ensure => present}

    Thats it. nothing really crazy/special in there.

    This ends up as:

    LOG: duration: 5690.773 ms execute <unnamed>: select results.* from
    (SELECT certname_catalogs.certname, catalog_resources.resource,
    catalog_resources.type, catalog_resources.title,catalog_resources.tags,
    catalog_resources.exported, catalog_resources.sourcefile,
    catalog_resources.sourceline, rp.name, rp.value FROM catalog_resources
    JOIN certname_catalogs USING(catalog) LEFT OUTER JOIN resource_params rp
    USING(resource) INNER JOIN certnames ON certname_catalogs.certname =
    certnames.name WHERE (catalog_resources.type = $1) AND
    (catalog_resources.exported = $2) AND (certnames.deactivated IS NULL) AND
    (NOT ((certname_catalogs.certname = $3))) AND (catalog_resources.resource
    IN (SELECT rp.resource FROM resource_params rp WHERE rp.name = $4 AND
    rp.value = $5))) results LIMIT 50001
    DETAIL: parameters: $1 = 'Sshkey', $2 = 't', $3 = 'hero4209', $4 =
    'type', $5 = '"rsa"'

    Would adding an index on this be an option (i'm not a huge postgres guru,
    maybe I'm using the wrong terms).

    As soon as we commented out the collection, like:

    #Sshkey <<| type == "rsa" |>> {ensure => present}

    Things all go back to normal, and nodes run nice and quickly. With that in
    there, nodes would hang running and start timing out. Our 2K nodes are on a
    2 hour run interval.

    Any help/thoughts? I'm in irc as sjoeboo as well.

    First, make sure you are using thin_storeconfigs. After making the switch
    (if it is a switch), it may take some time for the all nodes' changes to
    propagate to the DB, but the difference should be a lot fewer rows in your
    DB. That could speed you up far more than any indexing.

    Also, I presume that you are already using the PostgreSQL back-end instead
    of the built-in one, but if not then you should switch now. The built-in
    back end is simply not up to the task of handing so many nodes efficiently.

    The query itself looks like it could use some optimization, but that's out
    of your hands unless you want to hack on Puppet itself.

    I don't know which columns may be indexed already, and I didn't find any
    documentation of the schema at PL or in puppetlabs' GitHub (what's up with
    that, PL?). It might indeed be the case that adding indexes on one or more
    key columns would help you out, but you really ought to tackle this in a
    systematic manner.

    Specifically, use a query analyzer (I presume Postgres has one) to identify
    the expensive parts of that query, and consider adding indices that will
    improve those parts (e.g. indices on columns of long tables that serve as
    join columns or WHERE criteria). Lather, rinse, repeat until it's good
    enough or you can't do any better.

    Be aware also that time saved in the query will be partially offset by time
    consumed in maintaining each additional index you create. If you choose
    strategically then you could conceivably see a dramatic overall gain, but
    don't go overboard.


    John

    --
    You received this message because you are subscribed to the Google Groups "Puppet Users" group.
    To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/e_-i0KSXnIUJ.
    To post to this group, send email to puppet-users@googlegroups.com.
    To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
  • Deepak Giridharagopal at Nov 15, 2012 at 8:26 pm
    We (grim_radical, nlew, and cprice) have continued working with Matt
    (sjoeboo) on the #puppet IRC channel over the last few days. Apologies for
    not updating this thread accordingly!
    On Thu, Nov 15, 2012 at 7:53 AM, jcbollinger wrote:


    On Wednesday, November 14, 2012 11:37:23 AM UTC-6, Matt wrote:

    I was working on this a bit w/ grim_radical/nlewis on Friday afternoon,
    and wanted to loop back, figured at least starting the conversion here
    again was a good idea.

    We have starting having problems with one query slowing our puppetdb way,
    way down, and seemingly blocking other queries/node runs.

    We are an HPC shop, and have about 2K nodes. of that, about 1500 do the
    following:

    @@sshkey { "${hostname}":
    host_aliases => ["$fqdn", "$ipaddress" ],
    type => "rsa",
    key => $sshrsakey,
    ensure => present,
    }

    Sshkey <<| type == "rsa" |>> {ensure => present}

    Thats it. nothing really crazy/special in there.

    This ends up as:

    LOG: duration: 5690.773 ms execute <unnamed>: select results.* from
    (SELECT certname_catalogs.certname, catalog_resources.resource,
    catalog_resources.type, catalog_resources.title,**catalog_resources.tags,
    catalog_resources.exported, catalog_resources.sourcefile,
    catalog_resources.sourceline, rp.name, rp.value FROM catalog_resources
    JOIN certname_catalogs USING(catalog) LEFT OUTER JOIN resource_params rp
    USING(resource) INNER JOIN certnames ON certname_catalogs.certname =
    certnames.name WHERE (catalog_resources.type = $1) AND
    (catalog_resources.exported = $2) AND (certnames.deactivated IS NULL) AND
    (NOT ((certname_catalogs.certname = $3))) AND (catalog_resources.resource
    IN (SELECT rp.resource FROM resource_params rp WHERE rp.name = $4 AND
    rp.value = $5))) results LIMIT 50001
    DETAIL: parameters: $1 = 'Sshkey', $2 = 't', $3 = 'hero4209', $4 =
    'type', $5 = '"rsa"'

    Would adding an index on this be an option (i'm not a huge postgres guru,
    maybe I'm using the wrong terms).

    As soon as we commented out the collection, like:

    #Sshkey <<| type == "rsa" |>> {ensure => present}

    Things all go back to normal, and nodes run nice and quickly. With that
    in there, nodes would hang running and start timing out. Our 2K nodes are
    on a 2 hour run interval.

    Any help/thoughts? I'm in irc as sjoeboo as well.

    First, make sure you are using thin_storeconfigs. After making the switch
    (if it is a switch), it may take some time for the all nodes' changes to
    propagate to the DB, but the difference should be a lot fewer rows in your
    DB. That could speed you up far more than any indexing.
    Just an FYI...thin_storeconfigs doesn't have an effect when using PuppetDB,
    as we always store the entire catalog every time. I'm confident we can come
    up with a perf fix without resorting to storing significantly less
    information, but that largely depends on how clever we are. :)

    Also, I presume that you are already using the PostgreSQL back-end instead
    of the built-in one, but if not then you should switch now. The built-in
    back end is simply not up to the task of handing so many nodes efficiently.

    The query itself looks like it could use some optimization, but that's out
    of your hands unless you want to hack on Puppet itself.

    I don't know which columns may be indexed already, and I didn't find any
    documentation of the schema at PL or in puppetlabs' GitHub (what's up with
    that, PL?). It might indeed be the case that adding indexes on one or more
    key columns would help you out, but you really ought to tackle this in a
    systematic manner.
    There are two pieces to this: there is the puppetdb query that's formulated
    on the puppet side, inside the puppetdb "resource" terminus. Then there's
    the translation of that query to low-level SQL, which happens inside of the
    PuppetDB daemon.

    The schema is defined here:
    https://github.com/puppetlabs/puppetdb/blob/master/src/com/puppetlabs/puppetdb/scf/migrate.clj

    It is currently represented as a "base" schema, with migrations on top that
    modify it. We should probably have a complementary version of this code
    that has the entire schema in totality in one shot, instead of building it
    up incrementally. Or at a minimum, a dev document that outlines the schema.
    I'd very much welcome some community help on that!

    Specifically, use a query analyzer (I presume Postgres has one) to
    identify the expensive parts of that query, and consider adding indices
    that will improve those parts (e.g. indices on columns of long tables that
    serve as join columns or WHERE criteria). Lather, rinse, repeat until it's
    good enough or you can't do any better.

    Be aware also that time saved in the query will be partially offset by
    time consumed in maintaining each additional index you create. If you
    choose strategically then you could conceivably see a dramatic overall
    gain, but don't go overboard.
    This is great advice, and it's the way we've been approaching debugging the
    problem on the IRC channel. We've learned a few things:

    * The plan for the query when the problem was originally spotted looked
    like so: http://explain.depesz.com/s/gssC

    * We added an index on the "exported" field in catalog_resources. We
    re-analyzed the query when the system was mostly idle, and got the
    following plan: http://explain.depesz.com/s/I7e

    * That plan looks *significantly* better after that change, but once the
    load on the database machine ramped back up again, the plan looked like so:
    http://explain.depesz.com/s/en6

    * The current catalog duplication rate Matt is seeing is somewhat low, <
    50%. That could be causing bloat in the table, resulting in suboptimal
    performance. We're now experimenting with changing the puppetdb GC interval
    to clean out old rows more regularly, and seeing if that correlates to
    decreased performance.

    * We've also changed the index we added to a partial index, so that index
    scans are cheaper.

    Stay tuned, and if anyone is interested in helping out with the debugging
    effort, we're talking about this in #puppet on Freenode!

    deepak


    John

    --
    You received this message because you are subscribed to the Google Groups
    "Puppet Users" group.
    To view this discussion on the web visit
    https://groups.google.com/d/msg/puppet-users/-/e_-i0KSXnIUJ.

    To post to this group, send email to puppet-users@googlegroups.com.
    To unsubscribe from this group, send email to
    puppet-users+unsubscribe@googlegroups.com.
    For more options, visit this group at
    http://groups.google.com/group/puppet-users?hl=en.
    --
    You received this message because you are subscribed to the Google Groups "Puppet Users" group.
    To post to this group, send email to puppet-users@googlegroups.com.
    To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
  • Jakov Sosic at Nov 22, 2012 at 12:43 am

    On 11/15/2012 09:26 PM, Deepak Giridharagopal wrote:
    We (grim_radical, nlew, and cprice) have continued working with Matt
    (sjoeboo) on the #puppet IRC channel over the last few days. Apologies
    for not updating this thread accordingly!
    jcbollinger asked about documentation of the schema of database, you
    seem to missed that part for some reason?

    --
    You received this message because you are subscribed to the Google Groups "Puppet Users" group.
    To post to this group, send email to puppet-users@googlegroups.com.
    To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
  • Deepak Giridharagopal at Nov 22, 2012 at 1:27 am

    On Wed, Nov 21, 2012 at 5:43 PM, Jakov Sosic wrote:
    On 11/15/2012 09:26 PM, Deepak Giridharagopal wrote:
    We (grim_radical, nlew, and cprice) have continued working with Matt
    (sjoeboo) on the #puppet IRC channel over the last few days. Apologies
    for not updating this thread accordingly!
    jcbollinger asked about documentation of the schema of database, you
    seem to missed that part for some reason?
    From my earlier reply:

    "The schema is defined here:
    https://github.com/puppetlabs/puppetdb/blob/master/src/com/puppetlabs/puppetdb/scf/migrate.clj

    It is currently represented as a "base" schema, with migrations on top that
    modify it. We should probably have a complementary version of this code
    that has the entire schema in totality in one shot, instead of building it
    up incrementally. Or at a minimum, a dev document that outlines the schema.
    I'd very much welcome some community help on that!"

    deepak

    --
    You received this message because you are subscribed to the Google Groups "Puppet Users" group.
    To post to this group, send email to puppet-users@googlegroups.com.
    To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
  • Jakov Sosic at Nov 22, 2012 at 3:29 am

    On 11/22/2012 02:20 AM, Deepak Giridharagopal wrote:

    From my earlier reply:
    Ah, sorry, my mistake. I didn't saw that because the first part of your
    answer was top-posted so I didn't even bother to scroll further down...

    Thank you nevertheless :)


    --
    Jakov Sosic
    www.srce.unizg.hr

    --
    You received this message because you are subscribed to the Google Groups "Puppet Users" group.
    To post to this group, send email to puppet-users@googlegroups.com.
    To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppuppet-users @
categoriespuppet
postedNov 14, '12 at 5:37p
activeNov 22, '12 at 3:29a
posts6
users4
websitepuppetlabs.com

People

Translate

site design / logo © 2022 Grokbase