Oct 9, 2024
Prometheus Query Language (PromQL) Tutorial
The Prometheus Query Language (PromQL) is used to query data inside of the Prometheus time-series database. This tutorial will allow you to perform PromQL queries on various metric types (histograms, counters, gauges), and use PromQL to build a Grafana dashboard.
Table of Contents
Setting Up the Environment
Metric Types
2.1 Counter
2.2 Gauge
2.3 Histogram
Accessing Prometheus
The Prometheus Time-Series Database
Labels
PromQL
6.1 Range Vector
6.2 Instant Vector
6.3 Label Matching
6.4 Aggregation
6.5 Grouping and Aggregation
6.6 Aggregation over Time
Setting Up Grafana
Setting up the Data Source
Import a Dashboard
Analyzing each PromQL Query
Conclusion
Setting Up the Environment
Before we dive into PromQL, let's set up our Prometheus environment.
Clone the repository:
Start the environment:
This command will start Prometheus, Grafana, and a sample application that exposes metrics. For more details on how this setup works, refer to our previous article: Prometheus Docker Compose Setup with Grafana.
Metric Types
Before we start querying with PromQL, it's crucial to understand the different types of metrics our application exposes.
First, make a GET request to localhost:8000
to confirm the app is running. Then, navigate to localhost:8000/metrics
to view all the metrics.
Counter
# HELP
provides a description of the metric and # TYPE
indicates the metric type. http_request_total
is deemed a counter metric because it only increments.
The metric name is followed by labels (in curly braces) that specify attributes like method
, path
, and status
, each with a corresponding value. This structure allows the metric to be split across multiple time series, each representing a unique combination of label values.
For instance, if you make a request to http://localhost:8000/
, the counter for that specific path will increase, while other paths remain unaffected. Refreshing the metrics page (/metrics
) will increment its respective counter. For example, after these actions, you might see:
This granularity enables precise tracking of requests across different endpoints and HTTP methods.
Gauge
Gauge metrics represent values that can both increase and decrease. They're typically used for measuring current states that fluctuate, such as CPU usage, memory consumption, or concurrent requests. In this example, process_cpu_usage
shows the current CPU usage percentage. If you refresh the page, you might see this value change:
Histogram
Histogram metrics measure and group values into predefined ranges (buckets). For example, request durations count how many requests fall into each duration range (e.g., 0-5ms, 5-10ms, 10-25ms). This grouping is advantageous because it helps us quickly see patterns (e.g., most requests take 5-10ms) and spot outliers (e.g., a few requests taking >100ms).
In this example, each line represents a bucket with an upper bound (le = less than or equal to):
Here, 1 request took ≤0.005 seconds, and 2 requests took ≤0.01 seconds. This means the second request took between 0.005 and 0.01 seconds.
The value for each bucket is cumulative so all subsequent buckets also show 2.
The bottom of the histogram aggregates the data into two sums: the total number of requests, and the amount of time it took for all requests to complete.
Accessing Prometheus
Now that we've examined the metrics exposed by our application, let's confirm that Prometheus is successfully scraping these metrics.
In your web browser, navigate to http://localhost:9090 to access the Prometheus UI.
In the query bar at the top of the page, enter the following PromQL query:
Click the "Execute" button or press Enter.
You should see results similar to this:
If you see these results, it means Prometheus is correctly scraping and storing the metrics from our application, and we're ready to start exploring more complex PromQL queries.
The Prometheus Time-Series Database
Prometheus scrapes data at regular intervals and stores it in a time-series database. A time series is a sequence of data points, where each data point is a timestamp associated with a value.
Let's assume Prometheus is configured to scrape the python application every 20 seconds, and over the course of 200 seconds (10 scrapes), 5 requests were made to the root path:
The data stored inside of Prometheus might look something like this:
This time series applies to the metric http_request_total
with labels:
The first number in each pair is a Unix timestamp (seconds since January 1, 1970), and the second number is the value of the counter at that time. Note how:
The value starts at 1.0 and never decreases (characteristic of a counter)
It doesn't change every scrape, reflecting periods where no new requests were made
By the end, it reaches 5.0, representing the initial request plus the 4 additional requests made during this period
Labels
Labels split metrics into multiple time series, allowing us to track different aspects of the same metric. For example:
Here, the path
label differentiates between requests made to "/"
and "/items"
.
PromQL
Let's begin by examining how we can visualize time series data using PromQL. We'll use the `http_request_total` metric for our examples.
Range Vector
This query returns a range vector, showing data points for the last 5 minutes. You'll see output similar to:
This output reflects how Prometheus stores data points at regular intervals, allowing us to see how the metric changes over time.
Instant Vector
To get the most recent value for each time series, we can simply omit the time range:
This query returns the latest value that Prometheus has scraped for each time series of the http_request_total
metric.
Label Matching
We can use label matching to focus on specific metrics. Let's look at requests to the root path:
By adding the path="/"
label matcher, we've refined our query to track only the requests made to the root path.
Aggregation
Aggregation functions take an instant vector (which contains the most recent value for every matching time series) as input and produce a single scalar value. Here are some key functions:
sum(v instant-vector): Calculates the sum of all values in the vector
This output shows the total number of HTTP requests across all paths and methods.
avg(v instant-vector): Calculates the average of all values in the vector
This result indicates an average CPU usage of 65% across all instances.
max(v instant-vector): Returns the maximum value from all values in the vector
This shows the highest response time across all endpoints.
min(v instant-vector): Returns the minimum value from all values in the vector
This displays the lowest memory usage among all monitored processes.
Grouping and Aggregation
PromQL allows you to group values in an instant vector before aggregating them, which results in a smaller vector with fewer elements. Here are some examples:
sum by (label)(v instant-vector): groups values based on a label and aggregates each group into a sum. For example:
This groups HTTP requests based on their method and sums the total requests for each group.
avg by (label)(v instant-vector): groups values based on a label and aggregates each group into an average. For example:
This shows the average CPU usage for each pod.
Aggregation over Time
Aggregation over time functions take a range vector as input and produce an instant vector. Here are some key functions:
sum_over_time(v range-vector): Calculates the sum of all values in the specified range vector for each time series
This query calculates the total number of requests over the last hour for every time series
This output returns an input vector with multiple elements because of the metric's unique combination of labels.
avg_over_time(v range-vector): Calculates the average of all values in the specified range vector for each time series
This query calculates the average CPU usage over the last hour for every time series
This output returns an input vector with one element because the metric only has one label.
rate(v range-vector): Calculates the per-second average rate in the specified range vector.
This query calculates the average per-second rate of HTTP requests over the last minute for every time series.
The output displays an instant vector with two elements, showing the per-second rates based on the last minute of data:
0.2 requests per second for the root path ("
/
")0.1 requests per second for the "
/metrics
" path:
increase(v range-vector): Calculates the increase in the value of the time series in the specified range vector.
This query shows the total increase in HTTP requests over the last hour for every time series:
The output displays an instant vector with two elements, showing the total increase over the last hour:
720 additional requests for the root path ("/")
360 additional requests for the "/metrics" path:
Setting Up Grafana
Access Grafana at http://localhost:3000
. The default login is admin/admin.
Setting up the Data Source
In order for Grafana to query the Prometheus data, we need to set up Prometheus as a data source:
Click on Settings (Gear Icon)
Go to Configuration > Data Sources.
Click "Add data source" and select Prometheus.
Set the URL to
http://prometheus:9090
.We use
prometheus:9090
instead oflocalhost:9090
because Grafana and Prometheus are on the same Docker network, andprometheus
resolves to the Prometheus container's IP.
Click "Save & Test" to ensure the connection is working.
Import a Dashboard
Go to Dashboards > Import and paste the JSON from grafana-dashboard.json
:
Each panel in the dashboard uses a PromQL query to visualize metrics from your FastAPI application.
Analyzing each PromQL Query
Now, let's break down the PromQL queries used in each panel of our Grafana dashboard:
1. Request Rate Panel
This query calculates the per-second average rate of HTTP requests over the last minute for every time series.
2. Average Response Time Panel
This query calculates the average response time by dividing the rate of increase in the sum of request durations by the rate of increase in the count of requests.
3. Memory Usage Panel
This query directly uses the process_resident_memory_bytes gauge metric to display current memory usage.
4. CPU Usage Panel
This query uses the process_cpu_usage gauge metric to show current CPU usage.
Each of these queries utilizes concepts we've discussed earlier in this tutorial, demonstrating how PromQL can be used to create insightful visualizations of your application's performance.
Conclusion
This tutorial covered PromQL basics, from simple queries to complex aggregations and time-based operations. We've explored how to use PromQL to extract insights from Prometheus metrics and create Grafana visualizations. With this knowledge, you can now effectively monitor and analyze your applications using Prometheus and Grafana. Remember, mastering PromQL comes with practice. Experiment with different queries to gain valuable insights into your system's performance.