On Friday, May 10, 2013 3:17:00 PM UTC-5, dsdtas wrote:
Earlier this week, I applied RHEL patches to a couple of dev server with
puppet 0.25.5 and now I can no longer run puppetd commands without
constantly getting the message:
[root@dev2 ~]# puppetd --test --verbose --noop
notice: Run of Puppet configuration client already in progress; skipping
Killing the process and then clearing out the lock file every time is not
really an option.
How about stopping the agent before performing OS upgrades (as opposed to
just applying updates released for the current OS version)?
Also, I am finding that puppetd --enable is not having any effect on my
problem
Just to be sure: you're running that with privilege, right?
I am guessing that some puppet dependency got updated by the update from
RHEL 5.5 to 5.6. Any suggestions on how to troubleshoot this?
If by "troubleshoot this" you mean get Puppet working correctly again, then
there is probably no alternative to forcing Puppet stopped and then
removing the lock file. That might involve manually killing a stalled
puppetd. If you prefer, that could take the form of restarting the whole
server, after which it would be safe to remove the lock file without
shutting down the agent. You really should not remove the lock file,
neither manually nor via "puppetd --enable", while the puppetd process that
created it is alive, however.
If by "troubleshoot this" you mean determine what went wrong and why, then
you need to gather information, including:
- The actual state of the system. Is the agent in fact running? Is the
lock file in fact present?
- The logs of the most recent Puppet activity not related to your failed
/ skipped runs and diagnostic efforts. What did puppet last do -- or what
was it in the process of doing -- when it entered the state it is now in?
- The updates that were actually applied to get from your (possibly
updated) RHEL 5.5 to the current (possibly updated) 5.6.
Then you need to conduct an analysis, on which I cannot advise you in any
detail. I think it more likely that the update clobbered application of
some resource, causing the agent to stall in a manner tied to the resource
type and its chosen provider, than that the update clobbered the core
puppet engine. But I can't be sure.
Ultimately, even if you are able to form a good hypothesis about what
happened, and even if you are able to test that hypothesis to prove its
plausibility, I don't know any way to be *certain* that what you come up
with is the correct explanation for what actually happened. You'll have to
use your best judgement.
John