Table of Contents
Introduction
Prometheus is becoming a popular tool for monitoring Python applications despite the fact that it was originally designed for single-process multi-threaded applications, rather than multi-process.
Prometheus was developed in the Soundcloud environment and was inspired by Google’s Borgmon. In its original environment, Borgmon relies on straightforward methods of service discovery - where Borg can easily find all jobs running on a cluster.
Prometheus inherits these assumptions, so Prometheus assumes that one target is a single multi-threaded process. Prometheus’s client libraries also assume that metrics come from various libraries and subsystems, in multiple threads of execution, running in a shared address space.
To get started, sign up for the MetricFire free trial, where you can start using our Prometheus alternative on our platform and try out what you learn from this article.
Key Takeaways
- Prometheus is gaining popularity as a monitoring tool for Python applications, even though it was originally designed for single-process multi-threaded applications, not multi-process applications.
- Prometheus was developed at Soundcloud and was inspired by Google's Borgmon. It inherits some assumptions from Borgmon, including straightforward service discovery methods.
- Prometheus assumes that a target represents a single multi-threaded process and that metrics come from various libraries and subsystems running in multiple threads within a shared address space.
- When Prometheus scrapes a multi-process application, it may receive different values from different workers for the same metric, leading to inconsistency.
- Despite the challenges of monitoring multi-process applications with Prometheus, the article emphasizes that the suggested solutions offer workarounds to effectively use Prometheus as a monitoring tool for various applications, including IT resources and Application Performance Monitoring (APM).
Problems with integrating Prometheus into Python WSGI applications
We start to see the breakdown when we run a Python app under a WSGI application server. With WSGI applications, requests are allocated across many different workers, rather than to a single process. Each of these workers is deployed using multiple processes. This results in a multi-process application.
When this kind of application exports to Prometheus, Prometheus gets multiple different workers responding to its scrape request. The workers each respond with the value that they know. This means that Prometheus could scrape a counter metric and have it returned as 100, then immediately after it gets returned as 200. Each worker is exporting its own value, so the counter metric measures random pieces of information rather than the whole job.
To handle these issues, we have four solutions listed below.
Sum all of the worker nodes
If you give a unique label to each metric, then you can query all of them at once, and effectively query the whole job. For example, if you give each worker a label such as worker_name, you can write a query such as:
sum by (instance, http_status) (sum without (worker_name) (rate(request_count[5m])))
This results in aggregating all of the worker nodes for one job at once. The problem with this is getting an explosion in the number of metrics you have.
Multi-process mode
This method is our favorite here at MetricFire. We actually use this method to monitor our own application with Prometheus. This method entails using the Prometheus Python Client, which handles multi-process apps on gunicorn application server.
The Django Prometheus Client
This method designates each worker as a completely separate target. The Django Prometheus client sets it up so that each worker is listening for Prometheus’s scrape requests through its own port.
StatsD exporter
This method rejects the concept that Prometheus must scrape our application directly. Instead, export metrics from your app to a locally running StatsD instance, and set up Prometheus to scrape the StatsD instance instead of the application. This gives you more control over what’s counted by each counter.
Conclusion
Although multi-process applications cannot be natively monitored with Prometheus, these four solutions are great workarounds. This allows us to use Prometheus as the main monitoring tool throughout the corporation, for both IT resources as well as APM.
For more information about how Prometheus can be used to monitor Python apps, check out our articles on Python Based Exporters, and our series on Developing and Deploying a Python API with Kubernetes.
To try out our Prometheus alternative, check out our free trial. You can use Hosted Graphite directly in our platform, and monitor metrics without any setup. Also, talk to us directly by booking a demo - we’re always happy to talk with you about your company’s monitoring needs.