Table of Contents
- Introduction: Graphite monitoring
- Key Takeaways
- Prerequisites
- System Update
- Graphite Stack Installation
- Install and Configure Database
- Graphite Web Configuration
- Graphite Schema
- Static Content
- Carbon Configuration
- Nginx
- StatsD
- Supervisord
- Exploring StatsD and Graphite Interaction
- Dashboard Configuration
- Conclusion
Introduction: Graphite monitoring
In this post, we will go through the process of configuring and installing Graphite on an Ubuntu machine.
What is Graphite Monitoring?
In short; Graphite stores, collects, and visualizes time-series data in real time. It provides operations teams with instrumentation, allowing for visibility on varying levels of granularity concerning the behavior and mannerisms of the system. This leads to error detection, resolution, and continuous improvement. Graphite is composed of the following components.
- Carbon: receives metrics over the network and writes to disk using a storage backend.
- Whisper: file-based time-series database.
- Web: Django app which renders graphs and dashboards.
Sign up for the MetricFire free trial to set up Graphite and build your Grafana dashboard. You can also book a demo and talk to the MetricFire team on how you can best set up your monitoring stack.
Key Takeaways
- Graphite is a tool that stores, collects, and visualizes time-series data in real-time. It offers granular visibility into system behavior, aiding in error detection, resolution, and continuous improvement.
- You can customize the Graphite web app's user interface, including graph dimensions and themes, to suit your preferences.
Prerequisites
Ubuntu 20.04 with at least 2GB of RAM.
System Update
sudo apt update
sudo apt upgrade -y
Graphite Stack Installation
First, we must satisfy build dependencies for the various Graphite monitoring tool components. This is done via the command line:
sudo apt -y install python3-dev python3-pip libcairo2-dev libffi-dev build-essential
Set PythonPath to augment the default search path for module files.
export PYTHONPATH="/opt/graphite/lib/:/opt/graphite/webapp/"
Install the data storage engine.
sudo -H pip3 install --no-binary=:all:
https://github.com/graphite-project/whisper/tarball/master
Install Carbon data-caching daemon.
sudo -H pip3 install --no-binary=:all:
https://github.com/graphite-project/carbon/tarball/master
Install the web-based visualization frontend.
sudo -H pip3 install --no-binary=:all:
https://github.com/graphite-project/graphite-web/tarball/master
Install and Configure Database
Graphite uses SQLite as the default database to store Django attributes such as dashboards, preferences, and graphs. Metric data is not stored here. However, here we will demonstrate PostgreSQL integration. The following is the software required for communication between Graphite and PostgreSQL.
sudo apt-get install postgresql libpq-dev python3-psycopg2
The next step is to create a database with a username and password. The TeamPassword password generator helps here.
sudo -u postgres psql
CREATE USER metric WITH PASSWORD '$SECURE_PASS';
CREATE DATABASE fire WITH OWNER metric;
\q
Graphite Web Configuration
Graphite-web uses the convention of importing a local_settings.py file from the web app settings.py module - Graphite-web’s runtime configuration loads from here. We must copy an example template before adding our desired configuration to the web app.
cd /opt/graphite/webapp/graphite
cp local_settings.py.example local_settings.py
sudo nano /etc/graphite/local_settings.py
Uncomment and edit the following attributes secret_key, timezone, remote_user_authentication, debug, and databases sections as outlined below.
SECRET_KEY = '$SECURE_PASS'
Set this to a long, random unique string to use as a secret key for this install. This key salts the hashes; used in auth tokens, CRSF middleware, cookie storage, etc. - should be set identically among instances if used behind a load balancer - use uuidgen.
TIME_ZONE = 'Europe/Amsterdam'
Set your local timezone (Django's default is America/Chicago). If your graphs appear to be offset by a couple of hours, then this probably needs to be explicitly set to your local time zone.
DEBUG = True
We also set DEBUG to True here because current versions of Django will not serve static files (JavaScript, images, and so on.) from the development server we are using in our demonstration. A more formal installation would leave the DEBUG setting disabled.
USE_REMOTE_USER_AUTHENTICATION = True
REMOTE_USER authentication. See:
https://docs.djangoproject.com/en/dev/howto/auth-remote-user/
DATABASES = {
'default': {
'NAME': 'fire',
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'USER': 'metric',
'PASSWORD': '$SECURE_PASS',
'HOST': '127.0.0.1',
'PORT': ''
}
}
Above is an example of using PostgreSQL. The default database is SQLite; 'django.db.backends.sqlite3'.
PostgreSQL, mySQL, sqlite3, and Oracle are all Graphite compatible.
Graphite Schema
It is necessary to set up an initial Graphite schema with the following command.
sudo -H PYTHONPATH=/opt/graphite/webapp django-admin migrate
--settings=graphite.settings --run-syncdb
At this point, the database is empty, so we need a user that has complete access to the administration system. The Django-admin script outlined below; with the “createsuperuser” arg, will prompt you for a username, e-mail, and password; creating an admin user for managing other users on the web front end.
sudo -H PYTHONPATH=/opt/graphite/webapp django-admin createsuperuser
--settings=graphite.settings
Static Content
/opt/graphite/static is the default location for Graphite-web’s static content. One must manually populate the directory with the following command:
sudo -H PYTHONPATH=/opt/graphite/webapp django-admin collectstatic --noinput
--settings=graphite.settings
Carbon Configuration
Next, there are two configuration files that Carbon uses to control its cache and aggregation abilities, as well as the output storage format. We must copy the example configuration files as a template for carbon.conf and storage-schemas.conf.
sudo cp /opt/graphite/conf/carbon.conf.example /opt/graphite/conf/carbon.conf
sudo cp /opt/graphite/conf/storage-schemas.conf.example
/opt/graphite/conf/storage-schemas.conf
Add the following to storage-schemas.conf to define retention and downsampling requirements; as recommended by StatsD.
sudo nano /opt/graphite/conf/storage-schemas.conf
[stats]
pattern = ^stats.*
retentions = 10s:6h,1m:6d,10m:1800d
The above translates for all metrics starting with 'stats' (i.e. all metrics sent by StatsD), capture:
- Six hours of 10-second data (what we consider "near-real-time")
- Six days of 1-minute data
- Five years of 10-minute data
The recommendations also outline aggregation specifications to ensure matching patterns; preventing data from being corrupted or discarded when downsampled.
Edit the conf/storage-aggregation.conf file to mimic the following.
[min]
pattern = \.lower$
xFilesFactor = 0.1
aggregationMethod = min
[max]
pattern = \.upper(_\d+)?$
xFilesFactor = 0.1
aggregationMethod = max
[sum]
pattern = \.sum$
xFilesFactor = 0
aggregationMethod = sum
[count]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum
[count_legacy]
pattern = ^stats_counts.*
xFilesFactor = 0
aggregationMethod = sum
[default_average]
pattern = .*
xFilesFactor = 0.3
aggregationMethod = average
Metrics ending with .lower or .upper, only the minimum and the maximum value retained. See StatsD for more details.
At this point, we can do a quick test to ensure the setup is correct. Run the web interface under the Django development server with the following commands.
cd /opt/graphite
sudo PYTHONPATH=`pwd`/whisper ./bin/run-graphite-devel-server.py
--libs=`pwd`/webapp/ /opt/graphite/
By default, the server will listen on port 8080, and point your web browser to http://127.0.0.1.
The graphite interface should appear. If not the debug mode configuration should provide enough information; if not tail the latest process log.
tail -f /opt/graphite/storage/log/webapp/*.log
Nginx
We will now expose the web application using Nginx which will proxy requests for Gunicorn, which in turn listens locally on port 8080 serving the web app (Django application).
sudo apt install gunicorn nginx
sudo ln -s /usr/local/bin/gunicorn /opt/graphite/bin/gunicorn
Create Nginx log files and add the correct permissions.
sudo touch /var/log/nginx/graphite.access.log
sudo touch /var/log/nginx/graphite.error.log
sudo chmod 640 /var/log/nginx/graphite.*
sudo chown www-data:www-data /var/log/nginx/graphite.*
Create a configuration file called /etc/nginx/sites-available/graphite and add the following content. Change the HOSTNAME to match your server name.
upstream graphite {
server 127.0.0.1:8080 fail_timeout=0;
}
server {
listen 80 default_server;
server_name HOSTNAME;
root /opt/graphite/webapp;
access_log /var/log/nginx/graphite.access.log;
error_log /var/log/nginx/graphite.error.log;
location = /favicon.ico {
return 204;
}
# serve static content from the "content" directory
location /static {
alias /opt/graphite/webapp/content;
expires max;
}
location / {
try_files $uri @graphite;
}
location @graphite {
proxy_pass_header Server;
proxy_set_header Host $http_host;
proxy_redirect off;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Scheme $scheme;
proxy_connect_timeout 10;
proxy_read_timeout 10;
proxy_pass http://graphite;
}
}
We need to enable the server block files by creating symbolic links from these files to the sites-enabled directory, which Nginx reads from during startup.
sudo ln -s /etc/nginx/sites-available/graphite /etc/nginx/sites-enabled
sudo rm -f /etc/nginx/sites-enabled/default
Then validate Nginx configuration.
sudo nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
Finally, restart the Nginx service.
sudo systemctl restart nginx
StatsD
Applications use a collector client to feed device metrics upstream to a Graphite server; typically using StatsD or CollectD. StatsD is an event counter/aggregation service; listening on a UDP port for incoming metrics data it periodically sends aggregated events upstream to a back-end such as Graphite.
Today, StatsD refers to the original protocol written at Etsy and to the myriad of services that now implement this protocol.
StatsD requires Node; to install, use the following commands.
curl -L -s https://deb.nodesource.com/setup_10.x | sudo bash
sudo apt install -y nodejs git
ln -s /usr/bin/node /usr/local/bin/node
Clone StatsD from the Etsy repository.
sudo git clone https://github.com/etsy/statsd.git /opt/statsd
Add the following configuration for Graphite integration.
sudo nano /opt/statsd/localConfig.js
{
graphitePort: 2003,
graphiteHost: "127.0.0.1",
port: 8125,
backends: [ "./backends/graphite" ]
}
Supervisord
We will use supervisor to manage the Carbon, StatsD and Gunicorn processes. A configuration file is required for each process; outlined below.
sudo apt install -y supervisor
StatsD.
sudo nano /etc/supervisor/conf.d/statsd.conf
[program:statd]
command=/usr/local/bin/node /opt/statsd/stats.js /opt/statsd/localConfig.js
process_name=%(program_name)s
autostart=true
autorestart=true
stopsignal=QUIT
Gunicorn.
sudo nano /etc/supervisor/conf.d/gunicorn.conf
[program:gunicorn]
command = /opt/graphite/bin/gunicorn -b 127.0.0.1:8080 -w 2 --pythonpath
/opt/graphite/webapp/ wsgi:application
directory = /opt/graphite/webapp/
autostart=true
autorestart=true
redirect_stderr = true
Carbon.
sudo nano /etc/supervisor/conf.d/carbon.conf
[program:carbon]
command = /opt/graphite/bin/carbon-cache.py --debug start
autostart=true
autorestart=true
redirect_stderr = true
Restart supervisor for the new configuration to be reloaded.
sudo systemctl restart supervisor
sudo systemctl enable supervisor
The following command will reveal if the processes are running successfully or not.
sudo supervisorctl
carbon RUNNING pid 1320, uptime 1:41:29
gunicorn RUNNING pid 1321, uptime 1:41:29
statsd RUNNING pid 1322, uptime 1:41:29
If there is an error you can debug with the following.
systemctl status supervisor
tail -f /var/log/supervisor/supervisord.log
Exploring StatsD and Graphite Interaction
Now that we are up and running, we can send data to StatsD and examine the feedback in the graphite web app. StatsD accepts the following format.
echo "metric_name:metric_value|type_specification" | nc -u -w0 127.0.0.1 8125
Metric name and value are self-explanatory; below is a list of the commonly used data types and their applications. These are:
- Gauges
- Timers
- Counters
- Sets
Gauges are a constant data type. Best used for instrumentation; an example would be the current load of the system. They are not subject to averaging, and they don’t change unless you directly alter them.
echo "demo.gauge:100|g" | nc -u -w0 127.0.0.1 8125
The new stat is accessible under stats > gauges > demo with the tree hierarchy on the left-hand side.
Wait 10 seconds (flush rate) and send another data point.
echo "demo.gauge:125|g" | nc -u -w0 127.0.0.1 8125
Notice how it maintains its value until the next one is set.
Timers measure the duration of a process, crucial for measuring application performance, database calls, render times, etc.
echo "demo.timer:250|ms" | nc -u -w0 127.0.0.1 8125
echo "demo.timer:258|ms" | nc -u -w0 127.0.0.1 8125
echo "demo.timer:175|ms" | nc -u -w0 127.0.0.1 8125
StatsD will provide us with percentiles, average (mean), standard deviation, sum, and lower and upper bounds for the flush interval; vital information for modeling and understanding how a system behaves in the wild.
Counters are the most basic and default type and are used to measure the frequency of an event per minute, for example, failed login attempts. An example of how to count the amount of calls to an endpoint.
<metric name>:<value>|c[|@<rate>]
echo "demo.count:1|c" | nc -u -w0 127.0.0.1 8125
echo "demo.count:1|c" | nc -u -w0 127.0.0.1 8125
echo "demo.count:1|c" | nc -u -w0 127.0.0.1 8125
echo "demo.count:1|c" | nc -u -w0 127.0.0.1 8125
echo "demo.count:1|c" | nc -u -w0 127.0.0.1 8125
When viewing the graph, we can observe the average number of events per second during one minute; the count metric shows us the number of occurrences within the flush interval.
Sets count the number of unique occurrences between flushes. When a metric sends a unique value, an event is counted. For example, it is possible to count the number of users accessing your system as a UID accessing multiple times will only be counted once. By cross-referencing the graph with the commands below, we can see only two recorded values.
echo "demo.set:100|s" | nc -u -w0 127.0.0.1 8125
echo "demo.set:100|s" | nc -u -w0 127.0.0.1 8125
echo "demo.set:100|s" | nc -u -w0 127.0.0.1 8125
echo "demo.set:8|s" | nc -u -w0 127.0.0.1 8125
Dashboard Configuration
It is possible to modify the graphite web app UI to our bespoke preferences. First, we need to create the configuration files by copying the default template files.
cd /opt/graphite/conf
cp dashboard.conf.example dashboard.conf
cp graphTemplates.conf.example graphTemplates.conf
We can modify the dashboards to have larger tile sizes to prevent eye strain when reading the data.
sudo nano /opt/graphite/conf/dashboard.conf
[ui]
default_graph_width = 450
default_graph_height = 450
automatic_variants = true
refresh_interval = 60
autocomplete_delay = 375
merge_hover_delay = 750
We can also modify the theme and aesthetics. For example, the following set of attributes gives us a solarized dark-style theme.
Sudo nano /opt/graphite/conf/graphTemplates.conf
[solarized-dark]
background = #002b36
foreground = #839496
majorLine = #fdf6e3
minorLine = #eee8d5
lineColors = 268bd2aa,859900aa,dc322faa,d33682aa,db4b16aa,b58900aa,2aa198aa,6c71c4aa
fontName = Sans
fontSize = 10
Conclusion
As you can see the process of setting up Graphite can become an installation maze. To get the best out of Graphite requires mastery, and this requires time in the trenches; and learning the ins and outs of the system.
MetricFire can provide this expertise for your team and deliver a fully hosted Graphite solution tailored to the needs and nuances of your system. Your team will not have to worry about scalability, releases, plugins, maintenance, tuning or backups. Everything will work out of the box tailored to your needs with 24/7, 365 continuous automated monitoring from around the world.
We took the best parts of open-source Graphite and supercharged them. We also added everything that is missing in vanilla Graphite: a built-in agent, team accounts, granular dashboard permissions, and integrations to other technologies and services like AWS, Heroku, logging tools, and more.
MetricFire’s Hosted Graphite will help you visualize your data without any setup hassles. Go ahead and avail your free trial to get started, or contact us for a quick and easy demo and learn from one of our MetricFire engineers!