Adding an application to observium | A. J. Lill Consultants

Submitted by ajlill on Thu, 09/01/2016 - 19:38

Now that I have my CEPH cluster up and running, I obviously want to monitor it. I'm currently using both nagios and observium. I've tried a few times to add graphing to nagios, but to no avail, so I'm going to try to add it to observium.

Observium is supposedly designed to be modular and extensible. It has a unix-agent script that can be used to pull arbitrary data to be processed and displayed. So I started with the official documentation on adding application monitoring. If I was being generous, I'd call it sparse, if not, I'd use another word starting with s. At least it pointed me to the directories containing the files for other apps. Unfortunately, whomever wrote these files seems to have been allergic to comments. Plus, there is a fair bit of hidden chicanery, as well as multiple API changes, which makes using the current code base as examples pretty near useless. These notes are as accurate as I can make them from my reverse engineering of CE 10/16

First, create an extension module for the unix agent. The script needs to output <<<identifier>>> at the start. This identifier should be of the form app-name-instance, with the -instance being optional. It will be split at the dashes, so make sure that these are the only dashes in the identifier. If you look at the built-in unix agent scripts, you will notice that some (all) of them do not have the app- part. That's because that level is added by custom kludges in includes/poller/unix-agent.inc.php.

For CEPH, I just used a shell script to call a few of the more useful CEPH commands, and told ceph to output the data in json. Then I just had to echo a little json wrapper bits for them. I used ceph df, ceph osd pool stats and ceph pg stats. In the polling code I just called a PHP function to parse the json and then extracted the values I needed. Basically, between the unix agent script and the poller, you need to assemble the key:value pairs you want to store in the RRD file. What processing you do where depends on if you prefer coding in shell or PHP.

Second, add a polling script in includes/polling/applications/<app>.inc.php. This script will be called with a multi-dimensional array called $agent_data. The levels will be named after the dash separated parts of the identifier, so in this case, I was using $agent_data['app']['ceph']. First thing you will need to do is call discover_app(). This takes the device (host) id, the application name, and optional instance and returns an index to a row in the application database, creating it if necessary. This index is used in a number of places.

if (!empty($agent_data['app']['ceph']))
{
$app_id = discover_app($device, 'ceph');

$rrd_filename = get_rrd_path($device, "app-ceph-global.rrd");

$ceph = json_decode($agent_data['app']['ceph'],true);

You now need to get this data into an RRD file. When I first wrote this I used rrdtool_create and rrdtool_update, since that's what the poller I based it on used. That's a pretty low level interface, and, apparently deprecated. The recommended API is rrdtool_update_ng which deals with creating and updating the file. This function takes 3 required options, the $device, rrd type, and a data array, and an optional instance and rrd option. The $device variable is made available by the poller. The rrd type is a string, if it's a built-in application, and an associative array otherwise. Don't bother grepping for an example, it's buried in includes/definitions/definitions.dat, which is a compressed memory blob. Here's an example

    $rrrd_type = array (
      "file"=>"filename.rrd",
      "ds"=>array (
        "variable_name"=> array (
          "type"=>"GAUGE",
          "min"=>0,
          "max"=>38400,
        )
      )
    )

The ds array is indexed by the rrd variable name. If you are using instances, rrdtool will replace the string %index% in the filename with the instance. Finally, the data array is an associative array where key is the variable name and the value is the data. You will also want to call update_application($app_id, $data), which stores the data in the database for some reason.

Third, add a file to define what is displayed in html/pages/device/apps/<app>.inc.php. You apparently set $app_graphs['default'] to an associative array mapping graph name to title. The name should be of the form app_string, each of these names will refer to a <name>.inc.php file in html/includes/graphs/application which defines how the graph will be displayed.

$app_graphs['default'] = array('ceph_usage' => '% Usage',
                'ceph_sizes' => 'Bytes Used',
                'ceph_io_bits' => 'Client Traffic',
                'ceph_io_ops' => 'Client IO Ops');

Fourth, create the <name>.inc.php files in html/includes/graphs/application for each graph you defined above. Find an existing graph that does what you want and modify the code for your use. There is no documentation for any of this stuff, and I don't feel like creating it. That will mostly just be setting the RRD file and modifying the data bits that will be extracted from the RRD.

Finally, add a row like

$config['app']['ceph']['top'] = array('usage', 'sizes', 'io_bits', 'io_ops');

in ./includes/definitions/apps.inc.php to define the top 4 graphs to show in the summary page available from the top-level apps menu

With that, I finally got some data graphed. This is only some global data. I would like to be able to graph data for each pool, but that's for another day.