Boy, do I hate admin. Unfortunately, as a UNIX Guru, it's our lot in life to get stuck with it, since quite often, no-one else knows how to tame the beast.
The number one, most important thing in admin is documentation. If it's not written down somewhere, you will forget something important. If, like me, you are keeping notes on-line, make sure that they are stored on at least two different machines. There is nothing more frustrating that trying to fix the machine with all you notes on it.
After documentation, comes source control. All files that you modify should be kept in some form of source control. That way, when you make a stupid mistake, you can always back it out. This should also give you a log message for every time you change your code.
IMHO, the best source control (for less than $1k/seat) is CVS. It's free, it runs on almost any platform, it's got client/server capabilities, and it uses RCS for versioning, so you can always look at and repair the repository by hand. It's also extensible, so you can make it do most anything.
At about the same time, you will need an automated method of maintaining and changing the configuration on many machines. Two free tools for this are Cfengine and PIKT.
One problem with admin is how often you actually have to be in front of the machine to fix it. To minimize this, I like to set up ssh wherever I go, and tunnel it through whatever firewalls I find. The other advantage is that you can then tunnel back useful protocols. You should probably setup a reasoably paranoid ssh configureation.
You then step into the problem of trying to have a consistent working environment on these boxes.
My favourite, right now, is bash. It's sh, with enough stolen from csh to make me happy, and a nice command line editor. One annoying thing is that, unlike csh, there is no startup file read by bash on a non-interactive invocation. Unfortunately, that is what it thinks is happening when I start it using ssh host xterm. Why not use xterm -ls to make it think it's a login shell? Well some broken xterms don't recognize -ls. The workaround is to set the environment variable ENV to $HOME/.bashrc, by adding a line to $HOME/.ssh/environment. Another minor nit is that .bash_profile is read if you are a login shell, but it's easy to source .bashrc in this.
You'd think that since a working compiler is so vital to everything that you could rely on the vendor's compiler to build your tools. Yeah, right. First thing, always get gcc, binutils, and gdb running.
Well, not with gcc, but with trying to compile them on some platforms, especially HPUX. You need to get GNU sed before you can compile gcc, 'cause HP's is broken. Thankfully, some kind soul provided a binary. Next problem, on HP/UX 10.20, the stage2 compiler can't compile itself. It seems to work ok on other stuff like binutils, bison, and make (the latter two being required to build the former).
Big Brother is a a set of shell scripts and a few simple programs that do basic network and system monitoring. The results are sent back to a central host, which builds a set of HTML pages containing the current status. It uses the refresh HTML option to keep the display current.
It has grown quite sophisticated in the latest revisions, and allows you to add external scripts that it runs for you at pre-determined times. It can be setup to e-mail or page you when things go bad.
It can also generate WML output for WAP enabled phones or devices like the RIM Blackberry
Well, sort of! What it really is is a program that can grab any variable(s) from an snmp node, and graph 'em. It will do daily, weekly, monthly, and yearly charts, with auto-refresh.
It's wonderful to show this to the people complaining about the "slow network" and tell them they're utilizing about 1% of available bandwidth!
It uses Perl to collect the stats from the various SNMP daemons, and a small C program which uses the GD package for generating the graphs, so is fairly lightweight on the monitoring host. Of course you are using perl for everything, so the text is always in memory!
It can also call an external program to generate the data. All it requires is for the program to return 4 lines. "Input" count, "Output" count, uptime, and a keyword. The throughput chart is generated from the NOCOL throughput monitor by a 10 line perl script.
NOCOL, is like Big Brother, using monitoring programs and a central daemon to collect the status reports. The main difference being that there are many more, and more sophisticated monitor programs. The downside is that, depending on your architecture, you may have problems getting them all to compile and/or work.
It is easy to develop new monitoring agents for it, both a C and perl library are included to interact with the monitor daemon. The following agents are included
The latest beta version also includes a web-based status screen, along with the original curses based one.
Tony Lill, Tony.Lill@AJLC.Waterloo.ON.CA
President, A. J. Lill Consultants (519) 241 2461
539 Grand Valley Dr., Cambridge, Ont. fax/data (519) 650 3571
"Welcome to All Things UNIX, where if it's not UNIX, it's CRAP!"