Hello Daniel,
I was thinking about doing that exact same thing, I also configured a proxy-minion to check devices like switches and router that can't run the minion, only finished a draft a few hours ago.
Were you doing the checks one minion at a time or were you doing role based scheduling, pretty much like Sensu (salt -G 'role:web' cmd.run check -returner some_returner)? I think doing role based scheduling should have a bit more performance. I was thinking of using redis as the retuner which should be able to handle 60 users, or am I mistaken?
Never thought the job cache could be an issue, that would be an interesting discussion
From:
salt-users@googlegroups.com On Behalf Of Daniel Jagszent
Sent: Monday, November 3, 2014 5:54 PM
To:
salt-users@googlegroups.comSubject: Re: [salt-users] [RFC] A monitoring solution based on the powers of Salt
Hello Arnold,
I tried using Salt's 0MQ for executing checks on the minions. I wrote a simple daemon on the master that executed (custom) Salt modules on the minions in periodic intervals and pushed the results every 60 seconds or so to an Icinga instance (also via Salt). Even with some optimizations (combine checks as much as possible, spread out the checks to decrease the master load, limit the number of parallel checks) I ran into scaling problems with only approx. 60 minions.
I had to decrease the keep_jobs option to 1 hour. The default of 24 hours would result in millions of files in job cache directory - Salt's job garbage collection could not handle this. I also had to increase the worker_threads to approx. the amount of Minons otherwise timeouts in executing salt modules would be too common. (Even after increasing the worker_threads timeouts occurred every now and then). That's why we abandoned using Salt's 0MQ communication for monitoring. We now use (of course Salt managed) NSCA-ng server/clients to collect the monitor checks.
Maybe RAET will solve this problems. Maybe they were uniq to my setup in the first place. Anyways, I'm looking forward to see how Elija evolves.
PS: Here you can find the (quite undocumented) daemon I used:
https://gist.github.com/d--j/317b28a5fb14ac89227f[cid:
image001.jpg@01cff791.8ff17b80]
Arnold Bechtoldt
1. November 2014 12:02
Hey Salt users,
I'm going to work on a monitoring solution that uses Salt to execute
checks using Salt's execution modules and returning data to a data store
like Elasticsearch.
The idea is to make it very modular and the processing logic easy to
customize for several use cases.
Since Salt and Elasticsearch is doing most of the whole work it should
be easy to build the missing part "check/ job processing" and
"notification/ event triggering".
I'm looking for some comments about this. Some suitable use cases would
be nice, especially ones that aren't that easy to implement with today's
monitoring giants Nagios and friends.
Is there any interest in this project? Feel free to add your feature
requests on Github (<
https://github.com/matchBIT/elija-monitoring><
https://github.com/matchBIT/elija-monitoring>).
Thanks!
Arnold
--
You received this message because you are subscribed to the Google Groups "Salt-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to salt-users+
unsubscribe@googlegroups.com .
For more options, visit
https://groups.google.com/d/optout.--
You received this message because you are subscribed to the Google Groups "Salt-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to salt-users+unsubscribe@googlegroups.com.
For more options, visit
https://groups.google.com/d/optout.