This article was originally published on March 3, 2015, by Charlie von Metzradt, co-founder of Hosted Graphite, for the Hosted Graphite blog. Since then, Hosted Graphite has become MetricFire but our goal has stayed the same: Monitoring should be accessible. For more information and for updates on new features, book a time with our team!
One of the really useful things about Graphite (and maybe the main one if you were going to pick one standout that has led to its wide adoption), is that you can just fire a new metric at the collector and Graphite will happily accept it and you get useful graphs. Add some code to your app, or configure a plugin for collectd or diamond, restart your app, and quickly your new metrics appear like magic!
There are two possible issues with this:
- With a lack of basic controls, this can also be a problem - if someone commits a chunk of unreviewed code that fires off a rapidly changing or random element in a metric name, you’re going to end up with a whole lot of junk in your system. Add a username to your metric name in a system with a few million unique users. Whoops!
- Metric name collisions - if you have more than one server sending any given metric name at the same time, it’s like the movie Highlander (with fewer Freddie Mercury/lightning effects): THERE CAN BE ONLY ONE!
As Jason Dixon summed it up in a recent post on the Graphite-dev mailing list:
It’s assumed that you avoid namespace collisions in each backend cluster. Otherwise, whichever backend returns the query first, “wins”.
Let’s say someone accidentally uses the same metric name in a few different places - a picture trying to get useful information from two completely separate sets of data that have been interpolated side-by-side rather than collected together and processed.
Well, that sucks. When building out the backend for Hosted Graphite, we spent a lot of time trying to figure out the best and worst parts of Graphite so we can focus on the good and eliminate or mitigate the bad. In the usual love-hate relationship that people have with Graphite the fast metric creation is great, and collisions are just sort of annoying behavior.
In our setup, we have control over where we collect from - and have removed any issues with metric collisions. We’re greedy! If you send the same metric name from multiple servers, we collect all the data. By default, we display the average, but we also collect all the data points and give you a true sum, minimum, and maximum as well as a few other more exotic views like a random sampling of data to be used for percentile data.
Not suffering from metric namespace collisions is particularly useful if you don’t want to pre-aggregate your data somewhere yourself, or you’re looking to count something quickly across servers. No weird interpolations, just data that does what it’s supposed to.