Cronline (Cron visualization)
Aug 10 2011, 9:06AM
I recently hacked up a project I'm calling 'Cronline' and I'm hoping to get some feedback on further development.
We run a large number of crontab scripts on different systems, under different user accounts. We see the mail that results from most jobs, and get reports when errors occur. We have a general sense of how long each script takes.
That said, it's a fairly opaque system. The goal of cronline is too pull all that activity into one timeline to see clearly how long jobs are taking, and which jobs end up running simultaneously.
There has also been a suggestion to exercise some control over cron scripts as a whole - ie maintenance-mode that would prevent all scripts from running - this is certainly possible with the system as I've laid it out - but I'm not sure it should be part of the same project.
I have a working implementation as follows:
- A simple bash script that stores the start datetime, then runs $* (all arguments given to the script as a command), then logs the [command] '<start datetime>' '<end datetime>'. Currently logs by username (eg. /var/local/cronline/bob.log)
- A modification to any given crontab in which a job mycmd arg1 arg2 is replaced with /usr/local/bin/cronline mycmd arg1 arg2
- A simple python script that runs through the cronline logs, and spits out a JSON file suitable for use with Simile Timeline. The script currently just takes 2 arguments: The log file, and a color to distinguish different user crontab jobs.
- A simple directory structure with the Simile Timeline project, and a directory for the JSON output of the python script. It's served up by Apache (see screenshot below).
Desired Features and Fixes
I should probably fix:
- Why the intermediary python processing? It could just produce the JSON directly... That's true, but that would make it harder to implement on a distrubted system (see the next bullet)
- Use standard system logging: As above, if the bash script used system logging, it could be used in combination with rsyslog to aggregate logs onto one system, which would then run the python script to produce the finished JSON.
- Capture exit code for each job, and color-code that into the final timeline
- Capture additional stats (time for example can capture CPU% and Avg Memory use)
- Timespans under 10 min should just be a singular event (no end date so that it renders as a dot)
- Logrotate-type action on the JSON output - and possibly some way to go back to an archive. The accumulated JSON data will get too large pretty quick.
Would this be useful to you?
Any suggestions on how to structure this?
Any other suggestions?
blog comments powered by Disqus