This page is outdated as of Feb 26 2015, as state sensors have been rewritten. I keep an updated document on:
Subject: Observium how to make observium not to poll certain mibs Hi, Sorry if this has been answered. But I didn't find the answer in doing search. So we have a router bug of memory leak core-dump which was caused by observium trying to do snmpwalk against a certain mib. Vendor advise us. Mib Browser provided by Observium - Intuitive Network Monitoring FORTINET-FORTIGATE-MIB Various non-monotonically increase Counter32 values have been changed to Gauge32. Adding an OS type to Observium can be relatively straight forward or it can be quite complex, depending upon the structure and design of the MIBs and SNMP implementations involved. Adding an ICT Platinum DC Power device, looks like its missing the correct MIB translations. It categorizes it correctly as a power device but doesn't discovery any of the important power status. I know this is a newer device model. See attached Vendor MIB file.
Observium straight out of the SVN repository (if you bought the subscription) doesn’t come with alert-checkers, which is unfortunate, as you need to figure out how this alerting system works by trial and error. Goal of this blog post is to give some examples of generic alert-checkers, and provide some more explanation on Metrics & Attributes, and some of the values that go with it. This document is off course not complete, and can always be improved. Please give me feedback to improve this.
Observium has a very powerful way of using entity types & check conditions to do alerting. But you do need to know how this is implemented.
There is some documentation on the Observium site itself, which is useful to read:
Creating an alert checker
Let’s go through the steps that are involved to actually create/add an alert checker in Observium
First of all when you create an alert,you’ll need to pick the ‘entity’ type for what you are building the alert for. An entity type is nothing more than a “thing” for which you would like to see alerts.
These are the ones that are available as of 12/12/2014:
- BGP Peer
- Netscaler vServer
- Netscaler Service
They kinda speak for them selves, if you want alerts on things that go on with ports, pick ports, if you want something that has to do with a sensor, pick that one. Device is a very generic one, and will just give you status things on wether it’s up/down and it’s uptime and the response time for ping/snmp, the entity type Device has nothing to do with Ports or Sensor on the device itself, for alerting for that, pick actually Ports or Sensor
Alert Checker details
Once you picked the entity type, there’s a couple of more things that need to be filled in but these are simple, pick a name for the alert, and pick a message you want to be included once an alert is sent out.
Use Alert Delay to set the amount of poller runs that a condition of your alert checker should persist until it actually starts alerting. This could be useful when for example you’re creating a check for processor usage, but you don’t want to be alerted on every CPU spike that is happening. If you set a delay of say, 2, it’ll take 2 poller runs for actually alerting (providing the condition for which you are checking hasn’t changed off course)
Send Recovery button is self explanatory, and the Severity is currently not in use
Then we come to the Checker Conditions, this is where you actually implement the check for a specific entity.
It’s important to know what Metrics & Attributes are, see the overview below for a complete list of Metrics & Attributes
When filling in the fields for Checker Conditions, you use the Metrics mentioned in this page.
These need to be single lined entries, you can put as much in there if you want but you usually have one to check for a single condition, or two, for example to check an upper and lower limit. Use the boolean to switch between ANY or ALL of these conditions to match.
A single line consists of three values:
- the actual metric
- a “test” (le, ge, lt, gt, ne, match and notmatch)
- a value
In these input fields you’ll create the first association rule, in other words, which subset of the entity type you selected needs alerting based on the conditions specified in the previous pane. When initially creating an alert checker, it allows for ony 1 association rule. Once it’s added, you can later on add more association rules to it.
These association rules are made from a “device association” and an “entity association”. First input field you’ll do your device matching, based on the attributes for devices. Second input field you’ll do your entity matching, using the attributes for the entity type you want to associate it with (this can off course be different then the condition you’re checking for)
This works in sort of the same way as the Checker Conditions. It uses the same line method (metric,test,value), however with some exceptions:
- instead of using metrics, you’ll be using attributes
- you can’t use a device attribute twice in the same association rule, so for example multiple “hostname match bla” statements with in the same association rule won’t work
- for a single device association line, you can have multiple entity association lines
That last exception allows for more specific filtering, for example, you would want to match against all sensor classes (sensor_class) that are of type “state”, but when that nets you to many results, you can add a match for it’s description (sensor_descr), or you’d want to match all ports of type (ifType) ethernetCsmacd, but you only want certain ones with a specific description (ifAlias)
If you scrolled down here to just copy/paste some alert-checkers, perfectly fine, but don’t complain if they don’t work, PLEASE read how these work above.
The following is a set of very useful alert checkers:
|Alert||Entity type||Check Conditions||Check Conditions boolean||Device match||Entity match|
|Device down||Device||device_status equals 0||ANY||*||*|
|Processor usage is above 80%||Processor||processor_usage greater 80||ALL||*||processor_descr match processor|
|Memory usage is above 70%||Memory||mempool_perc greater 70||ALL||*||*|
|State sensor is in ALERT state!||Sensor||sensor_event equals alert||ANY||*||sensor_class equals state|
|Fanspeed is above or under treshold||Sensor||sensor_value greater @sensor_limit|
sensor_value less @sensor_limit_low
|ANY||*||sensor_class equals fanspeed|
|Temperature is higher then 50 degrees||Sensor||sensor_value gt 50||ANY||*||sensor_class equals temperature|
|Traffic exceeds 85%||Port||ifInOctets_perc ge 85|
ifOutOctets_perc ge 85
|ANY||*||ifType equals ethernetCsmacd|
|BGP Session down||BGP Peer||bgpPeerState notequals established||ANY||*||bgpPeerRemoteAs equals 41552|
|Storage exceeds 85% of disk capacity||Storage||storage_perc ge 85||ANY||*||storage_type equals hrStorageFixedDisk|
|Port has encountered errors or discards||Port||ifInErrors_rate gt 1|
ifOutErrors_rate gt 1
|ANY||*||ifType equals ethernetCsmacd|
|Port is enabled, but operationally down||Port||ifAdminStatus equals up|
ifOperStatus notequals up
|ALL||*||ifType equals ethernetCsmacd|
Per entity overview of Attributes , Metrics and their values (if any)
|device_status||0 = down, 1 = up|
|device_status_type||reason for down, ‘snmp’/’ping’|
|device_ping||response in ms|
|device_snmp||response in ms|
|hostname||Self explanatory, this is the hostname for the device|
|os|| cisco,asa,junos,linux,printer, generic, etc.|
For an up-to-date list see /opt/observium/includes/definitions/os.inc.php
|sysName||Derived through SNMP|
|sysDescr||Derived through SNMP|
|sysContact||Derived through SNMP|
|hardware||Derived through SNMP|
|serial||Derived through SNMP|
|ifInOctets_rate & ifOutOctets_rate||number|
|ifInOctets_perc & ifOutOctets_perc||0-100 percentage|
|ifInUcastPkts_rate & ifOutUcastPkts_rate||number|
|ifInErrors_rate & ifOutErrors_rate||number|
|rx_ave_pktsize & tx_ave_pktsize|
|ifSpeed||interface speed derived through SNMP in mbit|
|ifSpeed||interface speed in a mbit number|
|ifAlias||the interface description|
|ifDescr||Location of the interface, (blade, slot, etc)|
|ifType||name of interface as described by IANA, see https://www.iana.org/assignments/ianaiftype-mib/ianaiftype-mib|
|ifPhyAddress||MAC address of the interface|
Observium Update Mibs
|sensor_event||up, warning, alert, down|
|sensor_class||voltage, current, power, frequency, humidity, fanspeed, temperature, dbm, state|
|poller_type||possible types: snmp, agent, ipmi|
|vsvr_name||this matches vsvr_fullname except when longer then 32chars, it becomes a randomstring|
|svc_name||this matches vsvr_fullname except when longer then 32chars, it becomes a randomstring|
Observium Install Mibs
Observium is an amazing quasi-opensource solution used to monitor up/down and performance of your networks. It allows you to monitor things such as interface usage, CPU, memory, disk, temperature, BGP, SLA etc.
To upgrade your existing Obervium installation, you will need to
Connect to your Observium server using either ssh or Hyper Visor ‘console’ feature. I recommend ssh as it will be easier to copy/paste.
First, you will need to move to the directory your Observium is installed.
Now you will move the observium directory to another directory named obervium-old. (You can choose any name you wish)
mv observium observium-old
Next you will need to download the latest version. The great thing about Observium is that the link below, is the ‘latest’ Observium version. There is no need to figure our the actual version number and add the version number to the link.
wget -O observium-community-latest.tar.gz http://www.observium.org/observium-community-latest.tar.gz
NOTE: If you do not have ‘wget’ installed on your server, you can easily install it by entering
yum install wget
Next you will unpack the file you downloaded using wget in the previous step.
tar zxvf observium-community-latest.tar.gz
This will untar the file you downloaded into a directory name ‘observium’ in the /opt parent directory.
Now that the file has been extracted, you will need to restore the RRD, log and config.php files from your original installation.
mv /opt/observium-old/rrd observium/
mv /opt/observium-old/*log* observium/
PHP Config file
mv /opt/observium-old/config.php observium/
Observium Custom Mibs
Now that your files have been restored, you will need to update the Database Schema
Observium recommends that if you have not updated in the last 12 months, you should force a rediscovery of all devices. To do this, from the command line enter:
/opt/observium/discovery.php -h all
Once this is complete, you can delete the temporary backup you created in step 2.
rm -rf observium-old