Budget Guide to Server Monitoring

COMPARISONS

May 15, 2020 ∙ 13 min read

MetricFire Blogger

Introduction
Server Monitoring for the Enterprise
In-house Server Monitoring – Pros and Cons
Alternatives to In-House Server Monitoring
Conclusion

Introduction

In the modern IT environment, it is critical to proactively monitor your servers and related infrastructure. Today, there is a wide array of monitoring solutions, and each of them has its pros and cons – some are platform-specific, some are better suited to on-premise servers and others work best on cloud platforms. Some are easier to start deploying than others, offering a wider range of integrations with data sources than alternatives, or they feature slicker UI’s that are easier to understand than others. And most importantly, there is a significant variance in costs among the solutions.

In this article, we will go through and compare the most popular server-monitoring solutions available today. Is there a solution that is platform-agnostic, cost-effective, and simple to use? The answer is ‘yes’ – MetricFire offers all these and more. You should sign up for a free demo and the free trial here.

‍

Server Monitoring for the Enterprise

Let us start by specifying and clarifying what ‘server monitoring’ is and what is covered. At a very basic level, we can define server monitoring as a set of processes and actions geared towards reviewing and analyzing a server for availability, operations, performance, security, and other operations-related processes.

In this article, we are mostly concerned with the performance of the actual hardware of the server (both physical and virtual hardware). We are excluding application performance, except to the extent that application performance is directly related to and affected by the performance of the underlying server resources.

To illustrate how server monitoring becomes involved in APM, consider an application metric that relies on a reserved amount of server memory (such as for an in-memory database). In this case, the application metric is clearly adversely affected if the server’s memory is regularly full and the reserved memory cannot be allocated, right? In this case, we can use the application’s reserved memory requirement as a base figure when setting the server’s memory limits or metrics.

Managing an increasing number of servers with limited human resources is the challenge to overcome when monitoring your servers and applications. It is simply not smart or feasible to increase the number of IT personnel dedicated to monitoring as your servers increase. Clearly what is needed is a scalable solution to server monitoring – with bonus points to those solutions that allow your IT monitoring people to quickly identify the most crucial pain-points in the servers being monitored.

This can be achieved either as a fully in-house solution, or by outsourcing to specialized 3rd-party tools and services. We will look at these two different methods in the subsequent sections in this article, and analyze how best to scale your server monitoring on a budget.

Throughout this analysis, we'll look at how each method handles monitoring the metrics listed below. This is not an exhaustive list, but it covers the most common and most useful metrics that a majority of IT departments will want to keep an eye on:

Requests per second (Average Load)
Average response time
Server error rates
Peak response time
Total server uptime
Average CPU utilization
Thread counts
Memory utilization
Disk I/O rates

‍

In-house Server Monitoring – Pros and Cons

As mentioned previously, one approach to server monitoring is to handle it all in-house. This includes the following:

Shortlist and select a particular solution to use for your server monitoring. This could be the on-premise version of one of the popular server-monitoring solutions such as Prometheus, Grafana, Nagios, etc. Or if you use virtually hosted or cloud servers, you can utilize the cloud provider’s server-monitoring solution, for instance, AWS’s CloudWatch. Note that because of the often complex nature of these solutions, you may still need to engage an outside consultant or vendor to help, especially at the original setup and configuration stage.
You will need a set of IT engineers who are fully trained and knowledgeable on the details of server monitoring – they'll need to set the building blocks, add new services, and change existing ones. You can either train your internal IT staff on server monitoring or interview and recruit a new resource(s) with the requisite experience, and perhaps certification, in your preferred solution.
Coupled to the above is: You will need redundancy for your IT engineer that handles the monitoring, so if they leave or are suddenly unavailable you have a backup to take their place. That is why we recommend “a set of IT engineers”.
You will need to define and refine the exact metrics to be monitored, and on which specific servers. This, of course, calls for thorough knowledge not only of your servers but also of the applications and services running on them. For example, disk I/O and server uptime could be more important for a database server than for a clustered web server, CPU thread count may be more relevant to a server hosting a middleware app in Javascript than for a server hosting the front-end app, and so on.

‍

As you may have noticed by reading between the lines in the above points, in-house server monitoring can get really expensive in terms of IT human resources, as well as paying for the actual software you are using. Plus, you may incur both one-off and recurring related costs - a new server to host the server-monitoring solution, training for your IT engineers, extra consulting services, etc.

The knock-on effect of this significant investment is that once set up, you are most likely tied to your in-house solution for at least the next few years. If it turns out that the solution you chose was not the best, you are stuck with that suboptimal choice.

Clearly, in-house server monitoring is not an ideal setup. It is perhaps best left to very large IT departments that can spare the costs and headaches of an in-house solution, or for those organizations that absolutely must use an in-house solution, usually this happens for security reasons - such as defense contractors or high-security biotech firms.

‍

Alternatives to In-House Server Monitoring

As an alternative to in-house server monitoring, let’s take an in-depth look at some cloud monitoring solutions:

‍

AWS CloudWatch

The first of these is AWS's monitoring solution, CloudWatch. For users hosting servers on the ubiquitous AWS platform, CloudWatch is an obvious solution. However, CloudWatch has 3 main limitations even for users whose server infrastructure is wholly hosted on AWS:

First, CloudWatch can get quite expensive, especially if you have a lot of servers and/ or you have several metrics per server. As explained on the CloudWatch pricing page, you would pay about $21 per month just to monitor 10 server instances with 7 standard metrics each – the absolute bare minimum. And this does not include any API requests, custom metrics, or archiving your logs.
Secondly, AWS (and many other proprietary cloud platforms) deliberately make it difficult to leave their ecosystem once you are using their services. Assume you are using CloudWatch and want to migrate your server instances and monitoring capabilities to a new cloud provider. This is currently difficult to do without either manual transfer or custom, 3rd party migration solutions. But why is this? After all, a Linux server instance and its metrics are exactly the same on AWS as on say, Microsoft’s Azure or Google’s Cloud Platform. This potential vendor lock-in is a significant headache for IT departments as they consider transitioning to cloud providers. To combat this, MetricFire has developed an integration to help AWS users get more out of their AWS data.
Lastly, AWS’s CloudWatch is only available if your servers are on AWS. If not, then you need to go through a list of installations and configurations to install its agent on your on-premise servers.

‍

As we shall see shortly, MetricFire’s monitoring solution is designed exactly to overcome these limitations.

‍

MetricFire

MetricFire is a hosted service that combines Prometheus, Graphite, and Grafana. It offers a complete infrastructure and application monitoring platform which helps customers collect, store, and visualize time-series data from any source. MericFire’s monitoring platform is fully cloud-hosted, and the monitoring agents can be deployed on both on-premise and cloud servers.

MetricFire’s support engineers are always available to help out on alerting design, analytics, and overall monitoring. And it contains a full-featured web UI that allows you to send metrics and visualize your data directly on the platform. You can extend the product functionality using plugins such as GitHub, PagerDuty, Slack, Heroku, CircleCI, and more.

Typical use cases are for monitoring servers, applications, IT networks, or any other infrastructure. MetricFire’s most important USP by far is its cost - it offers a far more affordable alternative to enterprise monitoring solutions. As explained on the pricing page, MetricFire’s monitoring solutions are about half the cost of Datadog’s, and are much more affordable than CloudWatch because of the use of bundled services and features, as opposed to CloudWatch’s itemized pricing that gets very expensive very fast.

Compared to the monitoring platforms above, MetricFire has additional and unique features, such as:

Properly tiered customer plans that are based on real-world setups. Unlike the rigid one or two customer tiers offered by the likes of New Relic or Datadog, MetricFire offers no less than 7 clearly demarcated plans ranging from Basic for individuals ($85 per month), to Large for growing teams ($1599 per month), to Premium Annual ($3849 per month) and Enterprise plans.
24-7 customer support, compared to some like New Relic that only offer 8-5 support for standard plans, with 24-7 available only to premium/ higher-tier customers.
An always-open-source philosophy, which means your data is always yours and there's no danger of vendor lock-in.
Choice of data locations – your data can be hosted on the most convenient and closest data centers.
Community dashboards: community-sourced and supported dashboards such as those in MetricFire-hosted Grafana, are invariably richer and better than those developed by a small team in a corporation, like Datadog’s proprietary ones. Plus, MetricFire’s customer-support team is always on hand to help customize your dashboards.

Grafana Labs

Grafana is an online open-source tool for running analytics and monitoring. Grafana integrates with several data sources and can create excellent dashboards. It is especially useful for comparing and analyzing trends and metrics over longer time periods.

However, Grafana is a complex beast that can be overwhelming for beginners to master and utilize. This is where MetricFire's customer support pulls ahead. Check out the MetricFire solution, available with with all MetricFire packages.

Grafana Labs is a private commercial solution that helps users deploy and use Grafana for their server monitoring needs. They offer two solutions depending on your needs: Grafana Cloud is targeted at smaller-scale users. It includes a dedicated Grafana instance and is compatible with both Prometheus and Graphite.

Grafana Cloud pricing starts at $49/ month for the standard version (with a 30-day free trial) and customized pricing for the Pro version. The other solution is Grafana Enterprise, designed for larger organizations that want to utilize even more of the Grafana stack: not just Grafana itself, but also the Prometheus and Graphite backends. You can read more about these solutions here.

‍

Datadog

Datadog is a cloud-based infrastructure & application monitoring tool. Datadog is used mostly in environments with a need to monitor a wide range of tools and services over the cloud - from network to system to server monitoring. Datadog covers it all with its 200+ integrations for tools and services, making it easier to monitor every component of the tech stack. It also includes a useful recorder to create your own tests that cannot easily be defined using APIs or single metrics. Like with Grafana, the product’s complexities mean a steep learning curve as it takes some time to get used to.

Datadog originally began as a simpler cloud infrastructure monitoring service with dashboards, alerting, and visualizations of metrics. As cloud adoption increased, Datadog grew rapidly and expanded its product offering to cover service providers including Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform, Red Hat OpenShift, and OpenStack. Datadog quite recently launched its application monitoring service as well. It is possible to integrate with apps such as PagerDuty or slack to receive notifications.

Datadog is free for up to 5 hosts (but with only a 1-day data retention period) and offers a 14-day free trial. After that, customers are billed at $15/host per month, and network performance at $5 per host per month. Log management is billed at $1.27 per month for every million log events and security monitoring at $0.20 per month for every GB of analyzed logs. You can check the updated pricing of the Datadog infrastructure on the website, but it is clear that Datadog pricing can get expensive for a wide range of monitoring metrics.

‍

New Relic

Yet another alternative to Grafana and Datadog, New Relic is especially good at monitoring real-time events, making it useful for IT departments and organizations that host real-time applications like web servers and gaming services. It also provides preconfigured dashboards for a wide array of cloud platforms and their integrations, including the big 3 - Amazon Web Services, Microsoft Azure, Google Cloud Platform.

You can also build custom integrations using New Relic’s integrations SDK. However, its integrations are somewhat clumsily documented and so not easy for everyone to set up; plus they require at least an intermediate level of technical understanding. New Relic’s general documentation and front-end UI are also not as polished as its main competitors.

Pricing includes a 30-day free trial, and after that kicks off at $14.40 per month for the Pro version which comes with 13 months of data retention and up to 2275 integration events. Beyond that, New Relic only states on their pricing page that they offer “flexible pricing options for customers in highly dynamic environments”.

‍

Conclusion

It is clear that MetricFire is a greatly affordable and complete monitoring solution, unlike many of its alternatives that fail in one or more key areas of customer concern. MetricFire integrates with the big providers, such as AWS and Azure, as well as many other data sources. Use MetricFire to monitor your infrastructure as well as data coming from systems across your stack. Check out the MetricFire demo and also sign up for a free trial here.