guide-to-adding-k8-inventory-stats-to-your-telegraf-daemonset

Guide to Adding K8 Inventory Stats to Your Telegraf Daemonset

Table of Contents

Introduction

Having insights into your Kubernetes environment is crucial for ensuring optimal resource allocation and preventing potential performance bottlenecks. It also enables proactive monitoring of application health and security, helping to quickly identify and resolve issues before they impact users. In addition to the Telegraf Kubernetes plugin, the kube_inventory plugin provides valuable metadata about Kubernetes resources, such as pods, nodes, and services, giving deeper visibility into the state of your cluster. By integrating these stats with your other K8 performance metrics, you can better correlate infrastructure changes with application behavior, enabling more comprehensive monitoring and troubleshooting.

Prerequisites

As outlined in our related article around configuring a Telegraf Daemonset with your K8 clusters, you will need to set your context using the kubectl command line tool.

  • kubectl config get-contexts
  • kubectl config use-context <context-name>

Creating the Telegraf Daemonset File Structure

You can refer to our detailed article around configuring a Telegraf Daemonset, but all you really need to do is clone the public metricfire/telegraf-daemonset repository:

In addition to the pre-configured inputs.kubernetes plugin defined on the config.yaml file, we will show you how to add a few things to your manifest to also collect inventory stats.

Project Directory:

telegraf-daemonset/
├── kustomization.yaml
└── resources/
    ├── config.yaml
    ├── daemonset.yaml
    ├── namespace.yaml
    ├── role.yaml
    ├── role-binding.yaml
    └── service_account.yaml

First, you will add the following block to your resources/config.yaml file:

[[inputs.kube_inventory]]
      namespace = ""

Next, you will add the following section to your role.yaml file, as outlined in the official GitHub kube_inventory Telegraf plugin docs:

---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: influx:cluster:viewer
  labels:
    rbac.authorization.k8s.io/aggregate-view-telegraf: "true"
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes", "nodes"]
    verbs: ["get", "list"]

---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: influx:telegraf
aggregationRule:
  clusterRoleSelectors:
    - matchLabels:
        rbac.authorization.k8s.io/aggregate-view-telegraf: "true"
    - matchLabels:
        rbac.authorization.k8s.io/aggregate-to-view: "true"
rules: [] # Rules are automatically filled in by the controller manager.

Finally, you will add the following section to your role-binding.yaml file:

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: influx:telegraf:viewer
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: influx:telegraf
subjects:
  - kind: ServiceAccount
    name: telegraf
    namespace: default

Deploying the Telegraf Daemonset

At this point in your project directory, you should have the kustomization.yaml file and the resources directory (containing the other 6 yaml files).

Since you should already be using the correct cluster context, you can test (dry run) and then deploy the kustomization.yaml file from within your root project directory:

  • kubectl apply -k . --dry-run=client
  • kubectl apply -k .

Expected output:

namespace/monitoring configured
serviceaccount/telegraf-sa configured
clusterrole.rbac.authorization.k8s.io/influx:cluster:viewer configured
clusterrole.rbac.authorization.k8s.io/influx:telegraf configured
clusterrole.rbac.authorization.k8s.io/telegraf-cluster-role configured
clusterrolebinding.rbac.authorization.k8s.io/influx:telegraf:viewer configured
clusterrolebinding.rbac.authorization.k8s.io/telegraf-sa-binding configured
configmap/telegraf-config configured
daemonset.apps/telegraf-inventory created



You can now get a list of running daemonsets in your cluster and see one with the Name telegraf and a Namespace of monitoring:

  • kubectl get daemonsets --all-namespaces

Telegraf will now be collecting/forwarding metrics from both the inputs.kubernetes, and inputs.kube_inventory plugins.



Since this article details the kube_inventory plugin, you can look for the following metric patterns in your HG account - which are coming specifically from the kube_inventory Telegraf plugin:

["persistentvolumes", "nodes", "daemonsets", "deployments", "endpoints", "persistentvolumeclaims", "services", "statefulsets", "resourcequotas"]



These metrics will be in the Graphite format and can be used in HG to create custom dashboards and alerts. See the official GitHub repository for additional details and configuration options for the inputs.kube_inventory plugin.

Use Your Graphite Metrics to Create Dashboards and Alerts

Navigate to your Hosted Graphite trial account => Metrics Search. When using the metricfire/telegraf-daemonset your metrics will be prefixed with telegraf-k8, however in this example these metrics are simply prefixed with: sandbox.<node-id>

example of node status_condition metrics returned:

sandbox.<node-id>.DiskPressure.<node-id>.False.<k8-version>.kubernetes_node.status_condition
sandbox.<node-id>.MemoryPressure.<node-id>.False.<k8-version>.kubernetes_node.status_condition
sandbox.l<node-id>.NetworkUnavailable.<node-id>.False.<k8-version>.kubernetes_node.status_condition
sandbox.<node-id>.PIDPressure.<node-id>.False.<k8-version>.kubernetes_node.status_condition
sandbox.<node-id>.Ready.<node-id>.True.<k8-version>.kubernetes_node.status_condition

Then you can navigate to Dashboards to create a new Grafana visualization using the Graphite metrics from your list:

Guide to Adding K8 Inventory Stats to Your Telegraf Daemonset - 1

Since the plugin returns multiple stats from each K8 node, you can use Graphite functions like groupByNode() to combine all metrics at a certain index in the series, and average them.



Here's are some additional queries from the returned persistentvolume metrics:

Guide to Adding K8 Inventory Stats to Your Telegraf Daemonset - 2



You can also use these Graphite metrics to create custom alerts that will notify you via Slack, PagerDuty, MSTeams, OpsGenie, etc.



In the Hosted Graphite UI, just navigate to Graphite Alerts => Add Alert, then name the alert and add an alerting metric:

Guide to Adding K8 Inventory Stats to Your Telegraf Daemonset - 3



Then set the alert criteria, in this example if the node status changes from 1 => 0, I want to receive a notification:

Guide to Adding K8 Inventory Stats to Your Telegraf Daemonset - 4

Conclusion

Monitoring your Kubernetes clusters is key to keeping your applications healthy, stable, and performing well. It helps you catch issues like resource shortages or failed deployments early, so you can fix them before they cause bigger problems. Storing time-series data from your metrics allows you to spot long-term trends, like changes in resource usage or recurring issues, which is super helpful for planning and optimizing your infrastructure. Having that historical data lets you make smarter decisions, track improvements, and anticipate future needs.

Pairing time-series data with tools like dashboards and alerts gives you real-time visibility into key metrics and helps you visualize trends in an easy-to-digest way. Alerts ensure you’re immediately notified of critical issues, while dashboards allow for deeper analysis, making it easier to understand the context behind performance changes and take action quickly.

Sign up here for a free trial of our Hosted Graphite and Grafana servicesIf you have any questions about our products or about how MetricFire can help your company, book a demo and talk to us directly

You might also like other posts...
metricfire Sep 04, 2024 · 9 min read

Guide to Monitoring Nagios Plugins Using Telegraf

Nagios is an open-source monitoring system used to track the performance and health of... Continue Reading

metricfire Aug 23, 2024 · 9 min read

Step By Step Guide to Monitoring RavenDB Using Telegraf

Monitoring the performance of RavenDB is crucial to ensure optimal system operation, quickly identify... Continue Reading

metricfire Aug 22, 2024 · 8 min read

MetricFire’s Pricing

Are you looking for an affordable monitoring solution? MetricFire offers an easy-to-use, efficient, and... Continue Reading

header image

We strive for
99.999% uptime

Because our system is your system.

14-day trial 14-day trial
No Credit Card Required No Credit Card Required