Summaries for Timing

A gauge can only hold the last value of what is was set to, so how can we time events and measure latency?

The answer is to use a summary. It will track both the total time taken by events there were and how many events there were:

package io.robustperception.java_examples;

import io.prometheus.client.Summary;
import io.prometheus.client.hotspot.DefaultExports;
import io.prometheus.client.exporter.HTTPServer;
import java.util.Random;

public class JavaExample {
  static final Summary functionLatency = Summary.build()
      .name("my_function_latency_seconds")
      .help("Latency of my function").register();

  static void myFunction() throws Exception {
    Summary.Timer requestTimer = functionLatency.startTimer();
    try {
      Thread.sleep(new Random().nextInt(1000));
    } finally {
      requestTimer.observeDuration();
    }
  }

  public static void main(String[] args) throws Exception {
    DefaultExports.initialize();
    HTTPServer server = new HTTPServer(8000);
    while (true) {
      myFunction();
      Thread.sleep(1000);
    }
  }
}

Here the startTimer() is called when you want to start timing, and observeDuration when you want to stop. A try..finally is used to handle any exceptions that might be thrown.

The metrics output will include my_function_latency_seconds_sum and my_function_latency_seconds_count, and from these the latency in seconds can be calculated with the expression rate(my_function_latency_seconds_sum[1m]) / rate(my_function_latency_seconds_sum[1m]) in PromQL.