Queue Worker Observability
In this guide, we'll walk you through the process of using the Telemetry SDK to monitor worker queues and latencies, including the calculation of P90, P95, and P99 latencies. By the end of this guide, you'll have a setup that logs and analyzes queue lengths and processing latencies to help you maintain optimal performance.
Prerequisites
A valid API key for Telemetry
Basic understanding of JavaScript and Node.js
A system with worker queues to monitor
Step 1: Install Telemetry SDK
First, you need to install the Telemetry SDK in your project. If you haven't done so already, run the following command:
npm install telemetry-sh
Step 2: Initialize Telemetry
After installing the SDK, import and initialize Telemetry in your project. Replace YOUR_API_KEY
with your actual Telemetry API key.
import telemetry from "telemetry-sh";
telemetry.init("YOUR_API_KEY");
Step 3: Log Queue Lengths and Latencies
To monitor your worker queues and latencies, you need to log relevant data points such as queue length and processing time. Here’s an example function to log this data:
const monitorQueue = (queueName, queueLength, latency) => {
telemetry.log("worker_queue_metrics", {
queue_name: queueName,
queue_length: queueLength,
latency: latency, // in milliseconds
timestamp: new Date().toISOString()
});
};
// Example usage
monitorQueue("email_queue", 15, 200); // Replace with actual metrics
Step 4: Automate Queue Monitoring
You should set up your system to automatically log queue metrics at regular intervals or after each job is processed. Here’s an example where you monitor a queue every minute:
const getQueueMetrics = () => {
// Replace with actual logic to get queue length and latency
const queueName = "email_queue";
const queueLength = getQueueLength(queueName); // Replace with your function to get queue length
const latency = getQueueLatency(queueName); // Replace with your function to get latency
monitorQueue(queueName, queueLength, latency);
};
// Set an interval to monitor the queue every minute
setInterval(getQueueMetrics, 60 * 1000);
Step 5: Query and Analyze Queue Metrics with P90, P95, and P99 Latencies
Once you've logged sufficient data, you can query it using Telemetry's query API to analyze your worker queues and latencies. The following query calculates the P90, P95, and P99 latencies, as well as the average latency and maximum queue length for each queue:
const results = await telemetry.query(`
WITH percentiles AS (
SELECT
queue_name,
latency,
PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY latency) OVER (PARTITION BY queue_name) AS p90_latency,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY latency) OVER (PARTITION BY queue_name) AS p95_latency,
PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY latency) OVER (PARTITION BY queue_name) AS p99_latency
FROM
worker_queue_metrics
)
SELECT
queue_name,
AVG(latency) AS avg_latency,
MAX(queue_length) AS max_queue_length,
MAX(p90_latency) AS p90_latency,
MAX(p95_latency) AS p95_latency,
MAX(p99_latency) AS p99_latency
FROM
percentiles
GROUP BY
queue_name
`);
console.log(results);
Step 6: Explore Data with Telemetry's UI
Telemetry's UI allows you to visualize and explore your queue metrics interactively. Visit Telemetry Dashboard and log in with your credentials to create dashboards, charts, and more based on your worker queue data, including the P90, P95, and P99 latencies.
Conclusion
By following these steps, you can effectively monitor the performance of your worker queues and track latencies using the Telemetry SDK. With the inclusion of P90, P95, and P99 latencies, you gain deeper insights into the tail latencies in your system, helping you optimize performance and maintain high levels of service.
Last updated