Fork me on GitHub

Process supervision

If you ever had to administer some servers you necessarily stumbled upon this problem:
How can I make sure all my applications are running and restarted on crash ?

There are three different schools I know of for doing this:

My applications never crashs

let's call this one the silly man, you are sure your application will never ever crash and so a basic init script will work for you but if something fails you will probably never know until a client calls your support.

In this category we have the init system used on many linux/bsd hosts, they vary in execution but the idea stays the same: you have a script, when called with start as argument it starts the application and when called with stop it stops it.
This approach works well if you are sure the applications will behave well no matter what and cannot crash (or have their how supervisor mechanism).

Polling (by PIDs)

The second way of doing it is to have a dedicated application watching process on a regular basis using their pids, for me the problem with this approach is the lag between the time the process crash and the time the external supervisor notices that the application is not running.

Some seconds can look ridiculous but it is not even with a low traffic system especially with persistant connections.

In this category we have (These are the main one I know, I am sure there is a lot more):

  • monit
  • god (ruby application)
  • bluepill (ruby too)

The first one is a C application and works more or less but I dropped it the first day I noticed it did not restart a process as I requested it to.

The other two have an incredibly horrible syntax to configure what you want to monitor so I never really tried any of them but I am not really fond of the way those three do things anyway.

Subprocesses

Daemontools

The last way of supervising processes is to spawn them as child of the supervisor, I used daemontools until now and it works really well. If one of your application crash daemontools is notified right away and can restart the process without delay.
The things I always missed with daemontools is the ability to control (with something else than command line) and be notified of what actually happens and the lack of resources usage monitoring (if you want to to restart a process taking 100% for too much time).

Daemontools have another particularity: it supposes your daemon will not fork in the background and output its logs to stdout so they can be piped to a dedicated process writing and rotating them on disk.

I really love to consider a daemon that way for many reasons:

  • since you do not need to go in the background you can run your application in production the same way as in development plus you don't need to invest time figuring out how to go in the background.

  • no need for any specialized logger class/object, just write on stdout and you are good to go, the default process for handling logs with daemontools can add the timestamp in front of the lines you write on stdout so you can even remove that from your code.

  • no need to write a pid file anywhere, this is not that the task is hard but it is one less thing you have to worry about.

  • the daemontools logger process can guarantee you that the space taken by the log files will never exceed what you allow (X files of Y bytes) unlike syslog (newsyslog to be precise) for example which only do checks on a regular basis and cannot prevent an application to flood your disk (at least not the syslog installed by default on most distributions).

Supervisor

In this category I only had daemontools for a long time but I recently stumbled on another interesting alternative: supervisor.

it does most if not all the things described above but add some nice ones:

  • an XLM-RPC interface allowing full control over supervisor from another process.
  • you can register notification process which will be notified of any state change for your applications (start, stop, restart).
  • you can use the registration functionality to register specialized process monitoring the resources used by each application and act on it with the XML-RPC interface
  • process have more than one state, if a process crash on start it will be put aside and it will try to restart it later instead of burning the cpu like daemontools does.

I am just in testing phases for supervisord currently but I have high hopes for it.

If you have other interesting supervision application I would gladly hear about them if they have interesting options to offer.