跳转至

7. RabbitMQ

rabbitmq/rabbitmq-prometheus

1. Rabbitmq node down

Less than 3 nodes running in RabbitMQ cluster

  - alert: RabbitmqNodeDown
    expr: sum(rabbitmq_build_info) < 3
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Rabbitmq node down (instance {{ $labels.instance }})
      description: "Less than 3 nodes running in RabbitMQ cluster\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

2. Rabbitmq node not distributed

Distribution link state is not 'up'

  - alert: RabbitmqNodeNotDistributed
    expr: erlang_vm_dist_node_state < 3
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Rabbitmq node not distributed (instance {{ $labels.instance }})
      description: "Distribution link state is not 'up'\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

3. Rabbitmq instances different versions

Running different version of Rabbitmq in the same cluster, can lead to failure.

  - alert: RabbitmqInstancesDifferentVersions
    expr: count(count(rabbitmq_build_info) by (rabbitmq_version)) > 1
    for: 1h
    labels:
      severity: warning
    annotations:
      summary: Rabbitmq instances different versions (instance {{ $labels.instance }})
      description: "Running different version of Rabbitmq in the same cluster, can lead to failure.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

4. Rabbitmq memory high

A node use more than 90% of allocated RAM

  - alert: RabbitmqMemoryHigh
    expr: rabbitmq_process_resident_memory_bytes / rabbitmq_resident_memory_limit_bytes * 100 > 90
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Rabbitmq memory high (instance {{ $labels.instance }})
      description: "A node use more than 90% of allocated RAM\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

5. Rabbitmq file descriptors usage

A node use more than 90% of file descriptors

  - alert: RabbitmqFileDescriptorsUsage
    expr: rabbitmq_process_open_fds / rabbitmq_process_max_fds * 100 > 90
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Rabbitmq file descriptors usage (instance {{ $labels.instance }})
      description: "A node use more than 90% of file descriptors\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

6. Rabbitmq too many unack messages

Too many unacknowledged messages

  - alert: RabbitmqTooManyUnackMessages
    expr: sum(rabbitmq_queue_messages_unacked) BY (queue) > 1000
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: Rabbitmq too many unack messages (instance {{ $labels.instance }})
      description: "Too many unacknowledged messages\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

7 Rabbitmq too many connections

The total connections of a node is too high

  - alert: RabbitmqTooManyConnections
    expr: rabbitmq_connections > 1000
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Rabbitmq too many connections (instance {{ $labels.instance }})
      description: "The total connections of a node is too high\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

8 Rabbitmq no queue consumer

A queue has less than 1 consumer

  - alert: RabbitmqNoQueueConsumer
    expr: rabbitmq_queue_consumers < 1
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: Rabbitmq no queue consumer (instance {{ $labels.instance }})
      description: "A queue has less than 1 consumer\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

9. Rabbitmq unroutable messages

A queue has unroutable messages

  - alert: RabbitmqUnroutableMessages
    expr: increase(rabbitmq_channel_messages_unroutable_returned_total[1m]) > 0 or increase(rabbitmq_channel_messages_unroutable_dropped_total[1m]) > 0
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Rabbitmq unroutable messages (instance {{ $labels.instance }})
      description: "A queue has unroutable messages\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

2 RabbitMQ : kbudde/rabbitmq-exporter

10.1. Rabbitmq node down

RabbitMQ node down

- alert: RabbitmqDown
  expr: rabbitmq_up == 0
  for: 5m
  labels:
    severity: error
  annotations:
    summary: "Rabbitmq down (instance {{ $labels.instance }})"
    description: "RabbitMQ node down\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

10.2. Rabbitmq Cluster down

Less than 3 nodes running in RabbitMQ cluster

- alert: ClusterDown
  expr: rabbitmq_running < 3
  for: 5m
  labels:
    severity: error
  annotations:
    summary: "Cluster down (instance {{ $labels.instance }})"
    description: "Less than 3 nodes running in RabbitMQ cluster\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

10.3. Rabbitmq cluster partition

Cluster partition

- alert: ClusterPartition
  expr: rabbitmq_partitions > 0
  for: 5m
  labels:
    severity: error
  annotations:
    summary: "Cluster partition (instance {{ $labels.instance }})"
    description: "Cluster partition\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

10.4. Rabbitmq out of memory

Memory available for RabbmitMQ is low (< 10%)

- alert: OutOfMemory
  expr: rabbitmq_node_mem_used / rabbitmq_node_mem_limit * 100 > 90
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Out of memory (instance {{ $labels.instance }})"
    description: "Memory available for RabbmitMQ is low (< 10%)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

10.5. Too many connections

RabbitMQ instance has too many connections (> 1000)

- alert: TooManyConnections
  expr: rabbitmq_connectionsTotal > 1000
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Too many connections (instance {{ $labels.instance }})"
    description: "RabbitMQ instance has too many connections (> 1000)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

10.6. Dead letter queue filling up

Dead letter queue is filling up (> 10 msgs)

- alert: DeadLetterQueueFillingUp
  expr: rabbitmq_queue_messages{queue="my-dead-letter-queue"} > 10
  for: 5m
  labels:
    severity: error
  annotations:
    summary: "Dead letter queue filling up (instance {{ $labels.instance }})"
    description: "Dead letter queue is filling up (> 10 msgs)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

10.7. Too many messages in queue

Queue is filling up (> 1000 msgs)

- alert: TooManyMessagesInQueue
  expr: rabbitmq_queue_messages_ready{queue="my-queue"} > 1000
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Too many messages in queue (instance {{ $labels.instance }})"
    description: "Queue is filling up (> 1000 msgs)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

10.8. Rabbitmq Slow queue consuming

Queue messages are consumed slowly (> 60s)

- alert: SlowQueueConsuming
  expr: time() - rabbitmq_queue_head_message_timestamp{queue="my-queue"} > 60
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Slow queue consuming (instance {{ $labels.instance }})"
    description: "Queue messages are consumed slowly (> 60s)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

10.9. No consumer

Queue has no consumer

- alert: NoConsumer
  expr: rabbitmq_queue_consumers == 0
  for: 5m
  labels:
    severity: error
  annotations:
    summary: "No consumer (instance {{ $labels.instance }})"
    description: "Queue has no consumer\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

10.10. Too many consumers

Queue should have only 1 consumer

- alert: TooManyConsumers
  expr: rabbitmq_queue_consumers > 1
  for: 5m
  labels:
    severity: error
  annotations:
    summary: "Too many consumers (instance {{ $labels.instance }})"
    description: "Queue should have only 1 consumer\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

10.11. Unactive exchange

Exchange receive less than 5 msgs per second

- alert: UnactiveExchange
  expr: rate(rabbitmq_exchange_messages_published_in_total{exchange="my-exchange"}[1m]) < 5
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Unactive exchange (instance {{ $labels.instance }})"
    description: "Exchange receive less than 5 msgs per second\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"