Queue Worker Observability

In this guide, we'll walk you through the process of using the Telemetry SDK to monitor worker queues and latencies, including the calculation of P90, P95, and P99 latencies. By the end of this guide, you'll have a setup that logs and analyzes queue lengths and processing latencies to help you maintain optimal performance.

Prerequisites

  • A valid API key for Telemetry

  • Basic understanding of JavaScript and Node.js

  • A system with worker queues to monitor

Step 1: Install Telemetry SDK

First, you need to install the Telemetry SDK in your project. If you haven't done so already, run the following command:

npm install telemetry-sh

Step 2: Initialize Telemetry

After installing the SDK, import and initialize Telemetry in your project. Replace YOUR_API_KEY with your actual Telemetry API key.

import telemetry from "telemetry-sh";

telemetry.init("YOUR_API_KEY");

Step 3: Log Queue Lengths and Latencies

To monitor your worker queues and latencies, you need to log relevant data points such as queue length and processing time. Here’s an example function to log this data:

const monitorQueue = (queueName, queueLength, latency) => {
  telemetry.log("worker_queue_metrics", {
    queue_name: queueName,
    queue_length: queueLength,
    latency: latency, // in milliseconds
    timestamp: new Date().toISOString()
  });
};

// Example usage
monitorQueue("email_queue", 15, 200); // Replace with actual metrics

Step 4: Automate Queue Monitoring

You should set up your system to automatically log queue metrics at regular intervals or after each job is processed. Here’s an example where you monitor a queue every minute:

const getQueueMetrics = () => {
  // Replace with actual logic to get queue length and latency
  const queueName = "email_queue";
  const queueLength = getQueueLength(queueName); // Replace with your function to get queue length
  const latency = getQueueLatency(queueName); // Replace with your function to get latency

  monitorQueue(queueName, queueLength, latency);
};

// Set an interval to monitor the queue every minute
setInterval(getQueueMetrics, 60 * 1000);

Step 5: Query and Analyze Queue Metrics with P90, P95, and P99 Latencies

Once you've logged sufficient data, you can query it using Telemetry's query API to analyze your worker queues and latencies. The following query calculates the P90, P95, and P99 latencies, as well as the average latency and maximum queue length for each queue:

const results = await telemetry.query(`
  WITH percentiles AS (
    SELECT
      queue_name,
      latency,
      PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY latency) OVER (PARTITION BY queue_name) AS p90_latency,
      PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY latency) OVER (PARTITION BY queue_name) AS p95_latency,
      PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY latency) OVER (PARTITION BY queue_name) AS p99_latency
    FROM
      worker_queue_metrics
  )
  SELECT
    queue_name,
    AVG(latency) AS avg_latency,
    MAX(queue_length) AS max_queue_length,
    MAX(p90_latency) AS p90_latency,
    MAX(p95_latency) AS p95_latency,
    MAX(p99_latency) AS p99_latency
  FROM
    percentiles
  GROUP BY
    queue_name
`);

console.log(results);

Step 6: Explore Data with Telemetry's UI

Telemetry's UI allows you to visualize and explore your queue metrics interactively. Visit Telemetry Dashboard and log in with your credentials to create dashboards, charts, and more based on your worker queue data, including the P90, P95, and P99 latencies.

Conclusion

By following these steps, you can effectively monitor the performance of your worker queues and track latencies using the Telemetry SDK. With the inclusion of P90, P95, and P99 latencies, you gain deeper insights into the tail latencies in your system, helping you optimize performance and maintain high levels of service.

Last updated