Saturday 5 October 2019

Monitoring With MRTG

After much more of a battle than I was expecting, I finally have MRTG running on a Pi and generating some useful graphs about my home network and it's boxes.  If you're not familiar, MRTG is the Multi-Router Traffic Grapher (although it almost stood for Memcard Reformatted Totally Gone).  It gets it's information via SNMP and then draws pretty graphs with the data.

This was my first hurdle - trying to draw upon experience gained 19-20 years ago, when I was doing exactly what I'm trying to accomplish here but for a tier 1 ISP, and instead of trying to monitor a small home network I was trying to monitor their core backbone transit/peering network(s).  Therefore despite the fact I was trying to remember stuff from before the last ice age, I assumed this would be easier than finding a pub in Ireland.  I was wrong.

My Cisco Catalyst 3750 was super easy to set up and get graphing.  Firstly enable and configure SNMP on the switch, and by configure I mean just give it a couple of SNMP community names and that's basically it - SNMP configured and ready to go!


MRTG is likewise just as easy.  Use the cfgmaker script pointed at the switch with relevant SNMP community details, and boom!  It spits out a MRTG configuration file with the switch's details, an entry for each physical port on the switch each of which contains instructions for MRTG on how to make a pretty graph.

All well and good so far.  And to be fair when I did all this back in 4500BC the ISP had all Cisco kit and I was only monitoring switches and routers, not individual hosts.  Also I didn't really learn SNMP either - I played around with a MIB tree but never really mastered it and once the bosses had the pretty graphs I was re-tasked with something else and I've skillfully avoided any heavy SNMP ever since.

I say that my switch went perfectly, however that's not strictly true as I still haven't added the switch's system stats to MRTG (CPU usage %, CPU temp, free mem, fan speeds etc), just the interface statistics.  Although I now know why I can't currently get these stats, so that's something!

So with the Pi's, there's no easy way to just add what you want graphed.  I managed to get mem usage, CPU usage and network interface statistics directly from SNMP, but of course this means setting up and configuring an SNMP server on each Raspberry Pi - something that unfortunately is a tad less trivial than on the Cisco.

Firstly, I got in all sorts of muddles with communities and SNMP versions and community permissions and users and access lists and crap and crap and crap!  Eventually I did enough RTFM'ing* to untie my knotted up permissions and communities and get a working SNMP server so I just copied it's snmpd.conf to all the RasPi's (obviously changing things like IP etc) and hey presto! I can now "walk" the SNMP MIBs of the Pi and find out which ones deal with CPU, mem etc and construct a MRTG template to turn those numbers into pictures!

A MRTG config block looks like this:

#---------------------------------------------------------------
#    Raspberry Pi - Memory
#---------------------------------------------------------------

Target[skynet-router-mem]: .1.3.6.1.2.1.25.2.3.1.6.1&.1.3.6.1.2.1.25.2.3.1.6.3:public@192.168.0.1
MaxBytes[skynet-router-mem]: 100524288
Options[skynet-router-mem]: integer, gauge, nopercent, growright, unknaszero, noo
YLegend[skynet-router-mem]: Mem - 1K pages
Factor[skynet-router-mem]: 1024
ShortLegend[skynet-router-mem]: B
LegendI[skynet-router-mem]: Physical  
LegendO[skynet-router-mem]: Virtual   
Legend1[skynet-router-mem]: Physical
Legend2[skynet-router-mem]: Virtual Memory
Title[skynet-router-mem]: Skynet Management Memory Usage
PageTop[skynet-router-mem]: <H1>skynet-router - Memory Usage</H1>

#---------------------------------------------------------------

In this example, we take the amount of used physical memory (OID .1.3.6.1.2.1.25.2.3.1.6.1) and this gets plotted on the graph as the incoming bandwidth rate (LegendI[skynet-router-mem]).  The second value is amount of used virtual memory (OID .1.3.6.1.2.1.25.2.3.1.6.3) which gets plotted as the outgoing rate by default by MRTG.  I actually copied this config, and I think some of the scale numbers are wrong.  It displays the correct values on the graph, so that's the main thing!


----


MORE TO FOLLOW

====


----
*RTFM - Read The Fucking Manual

1 comment:

  1. TODO:
    Fix uptime
    Fix cpu on mgmt & osmc
    Add pi voltages
    Add disk stats
    Add sys info for 3750

    ReplyDelete