Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

If you have anything you would like to have monitored out of the box, please log a DAM ticket, and we’ll take a look.

Adding email alerting to Grafana

Open the Grafana UI, and go alerting and then Notification Channels

...

Press the big “Add Channel” button. Give the channel a name. The specific name is just for identification, it doesn’t matter feature wise. Then type the email addresses to send alerts to separated by semi colon ;. Then press “test”. You (And anyone whose email you put in the field) should receive a test mail from Grafana shortly after. This is critically important that this works, so make sure you get the email before moving on.

Adding a Dashboard for sending alerts

To actually send alerts in Grafana you need to create a dashboard with data, that can actually be alerted on.

To do this, select the big “+” button on the left to create a new dashboard. Add a new panel. This should take you into an editor with a lot of options.

In “metrics” row, you can enter a query to execute it. Enter the following:

Code Block
rabbitmq_queue_messages_ready{queue!~"nsb.delay-level-.+",queue!~".*(error|instance-1)$"}

This will create a separate line for each queue in RabbitMQ, so we can monitoring if they explode in size. In the legend field, enter {{queue}} so each line get the name of the queue it belongs to. If you have done everything correctly, it should look roughly like this:

...

In this case all the queues are empty, so they are not displayed. Additionally error queues are not displayed, as they are filtered out by the specific query. Feel free to mess around with the visualization settings on the right, they only affect the rendering, but doesn’t change how the alert works.

Next go into the “alert” tab, and press “Create Alert”

Here you have quite a lot of options. Start by giving the alert a name.

By default the alerts looks at the average size of each individual queue over the last 5 minutes. Enter a value in the “Is above” field. What exactly the value should be depends on your requirements, but a good number is 10.000, since at that point something is usually about to go wrong. Tweak the number if you feel it’s too low or too high.

Next scroll down until you get to a section called “Notifications”. Here you can select a channel to send the alert to. Select the channel you created earlier. You can optionally enter a message if you want to. (Might be a good idea to enter the customer name or something, so people can quickly see where things are going wrong.) Grafana will include data from the graph automatically when it sends a mail, so you can see which queue went above size, and what it’s at. In the end your settings should look something like this:

...

Now go to the upper right of the screen and press “apply”. This should take you back to the dashboard. Go to the top of the screen and press the “save dashboard” button. This will prompt you to give the dashboard a name.

FAQ

Make the monitoring dashboard as default dashboard

  • Star the dashboard (add to favorites)

  • Click “Configuration” → “Preferences” and select your dashboard

  • Click “Save”

...

My Prometheus service has entered a paused state, and will not resume

If your service has reached this state, it’s very likely that you have another Prometheus instance running on the server (likely in a command prompt), which then locks the Prometheus service from starting up. Simply close this other instance, and resume the service.