FAQ
This morning one salt-master was under heavy load. We terminated several
involved processes: mainly bash scripts started by reactors, some local
jobs fetching data from other hosts through salt and building local pillar
dicts.
But it did not help. The salt-master was and is not starting. Just to
ensure, we restarted the whole machine, but same behaviour. The salt-master
stops starting.
We used the troubleshooting chapter of the docs
(https://salt.readthedocs.org/en/latest/topics/troubleshooting/master.html?highlight=troubleshooting).
salt-master -l debug or salt-master -l trace + killall -SIGUSR1 salt-master
did not show up anything.
There was no change on the salt-master within the last days.

Any help or hints are appreciated.

# salt-master --versions


            Salt: 2014.1.10
          Python: 2.7.6 (default, Feb 26 2014, 00:34:35)
          Jinja2: 2.7.2
        M2Crypto: 0.21.1
  msgpack-python: 0.4.2
    msgpack-pure: Not Installed
        pycrypto: 2.6.1
          PyYAML: 3.10
           PyZMQ: 14.0.1
             ZMQ: 3.2.4




# salt-master -l trace

[DEBUG ] Reading configuration from /etc/salt/master
[DEBUG ] Including configuration from '/etc/salt/master.d/nodegroups.conf'
[DEBUG ] Reading configuration from /etc/salt/master.d/nodegroups.conf
[DEBUG ] Including configuration from '/etc/salt/master.d/reactor.conf'
[DEBUG ] Reading configuration from /etc/salt/master.d/reactor.conf
[DEBUG ] Including configuration from '/etc/salt/master.d/sqlite.conf'
[DEBUG ] Reading configuration from /etc/salt/master.d/sqlite.conf
[TRACE ] loading log_handlers in
['/var/cache/salt/master/extmods/log_handlers',
'/usr/lib/python2.7/dist-packages/salt/log/handlers']
[TRACE ] Skipping /var/cache/salt/master/extmods/log_handlers, it is not
a directory
[TRACE ] None of the required configuration sections,
'logstash_udp_handler' and 'logstash_zmq_handler', were found the in the
configuration. Not loading the Logstash logging handlers module.
[DEBUG ] Configuration file path: /etc/salt/master
[TRACE ] Trying pysss.getgrouplist for 'root'
[TRACE ] Trying generic group list for 'root'
[TRACE ] Generic group list for user 'root': ['root']


Canceling by hand gives this output:

KeyboardInterrupt:
Traceback (most recent call last):
   File "/usr/bin/salt-master", line 10, in <module>
     salt_master()
   File "/usr/lib/python2.7/dist-packages/salt/scripts.py", line 25, in
salt_master
     master.start()
   File "/usr/lib/python2.7/dist-packages/salt/__init__.py", line 120, in
start
     self.prepare()
   File "/usr/lib/python2.7/dist-packages/salt/__init__.py", line 83, in
prepare
     pki_dir=self.config['pki_dir'],
   File "/usr/lib/python2.7/dist-packages/salt/utils/verify.py", line 230,
in verify_env
     for root, dirs, files in os.walk(dir_):
   File "/usr/lib/python2.7/os.py", line 294, in walk
     for x in walk(new_path, topdown, onerror, followlinks):
   File "/usr/lib/python2.7/os.py", line 294, in walk
     for x in walk(new_path, topdown, onerror, followlinks):
   File "/usr/lib/python2.7/os.py", line 294, in walk
     for x in walk(new_path, topdown, onerror, followlinks):
   File "/usr/lib/python2.7/os.py", line 294, in walk
     for x in walk(new_path, topdown, onerror, followlinks):
   File "/usr/lib/python2.7/os.py", line 276, in walk
     names = listdir(top)
KeyboardInterrupt
Traceback (most recent call last):
   File "/usr/bin/salt-master", line 10, in <module>
     salt_master()
   File "/usr/lib/python2.7/dist-packages/salt/scripts.py", line 25, in
salt_master
     master.start()
   File "/usr/lib/python2.7/dist-packages/salt/__init__.py", line 120, in
start
     self.prepare()
   File "/usr/lib/python2.7/dist-packages/salt/__init__.py", line 83, in
prepare
     pki_dir=self.config['pki_dir'],
   File "/usr/lib/python2.7/dist-packages/salt/utils/verify.py", line 230,
in verify_env
     for root, dirs, files in os.walk(dir_):
   File "/usr/lib/python2.7/os.py", line 294, in walk
     for x in walk(new_path, topdown, onerror, followlinks):
   File "/usr/lib/python2.7/os.py", line 294, in walk
     for x in walk(new_path, topdown, onerror, followlinks):
   File "/usr/lib/python2.7/os.py", line 294, in walk
     for x in walk(new_path, topdown, onerror, followlinks):
   File "/usr/lib/python2.7/os.py", line 294, in walk
     for x in walk(new_path, topdown, onerror, followlinks):
   File "/usr/lib/python2.7/os.py", line 276, in walk
     names = listdir(top)
KeyboardInterrupt

--
You received this message because you are subscribed to the Google Groups "Salt-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to salt-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

  • Lorenzo Marschall at Oct 14, 2014 at 9:21 am
    We were able to find the reason and fix it.

    See the abort output above:
    File "/usr/lib/python2.7/dist-
    packages/salt/utils/verify.py", line 230, in verify_env

    Checking the involved script, helped us to find the source of the problem.

    The filetree in
    /var/cache/salt/master/jobs
    was huge or at the limits of the filesystem. We realized it when we issued
    a du -s -h command.

    After cleaning up everything was fine and starting again.

    Just a hint for the developers:

    Our /etc/salt/master config has standard values concerning this part:
    #job_cache: True
    #keep_jobs: 24

    It might be helpful to throw a warning to the log every 60 seconds that the
    script is still cycling through the directory "jobs" or whatever.

    Regards,
      Lorenzo

    --
    You received this message because you are subscribed to the Google Groups "Salt-users" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to salt-users+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Volker Schwicking at Oct 14, 2014 at 12:43 pm
    Hi Lorenzo,
    On 14/10/14 11:21, Lorenzo Marschall wrote:
    The filetree in
    /var/cache/salt/master/jobs
    was huge or at the limits of the filesystem. We realized it when we
    issued a du -s -h command.

    After cleaning up everything was fine and starting again.

    Just a hint for the developers:

    Our /etc/salt/master config has standard values concerning this part:
    #job_cache: True
    #keep_jobs: 24
    Answering just you, so people on the list are not annoyed by me
    mentioning it again and again.

    Maybe this daemon might suite you :-)

    https://github.com/felskrone/salt-eventsd

    - felskrone

    --
    You received this message because you are subscribed to the Google Groups "Salt-users" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to salt-users+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupsalt-users @
postedOct 13, '14 at 10:53a
activeOct 14, '14 at 12:43p
posts3
users2

People

Translate

site design / logo © 2022 Grokbase