跳转至

4 Cortex

1. Cortex ruler configuration reload failure

Cortex ruler configuration reload failure (instance {{ $labels.instance }})

  - alert: CortexRulerConfigurationReloadFailure
    expr: cortex_ruler_config_last_reload_successful != 1
    for: 0m
    labels:
      severity: warning
    annotations:
      summary: Cortex ruler configuration reload failure (instance {{ $labels.instance }})
      description: "Cortex ruler configuration reload failure (instance {{ $labels.instance }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

2. Cortex not connected to Alertmanager

Cortex not connected to Alertmanager (instance {{ $labels.instance }})

  - alert: CortexNotConnectedToAlertmanager
    expr: cortex_prometheus_notifications_alertmanagers_discovered < 1
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Cortex not connected to Alertmanager (instance {{ $labels.instance }})
      description: "Cortex not connected to Alertmanager (instance {{ $labels.instance }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

3. Cortex notification are being dropped

Cortex notification are being dropped due to errors (instance {{ $labels.instance }})

  - alert: CortexNotificationAreBeingDropped
    expr: rate(cortex_prometheus_notifications_dropped_total[5m]) > 0
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Cortex notification are being dropped (instance {{ $labels.instance }})
      description: "Cortex notification are being dropped due to errors (instance {{ $labels.instance }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

4. Cortex notification error

Cortex is failing when sending alert notifications (instance {{ $labels.instance }})

  - alert: CortexNotificationError
    expr: rate(cortex_prometheus_notifications_errors_total[5m]) > 0
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Cortex notification error (instance {{ $labels.instance }})
      description: "Cortex is failing when sending alert notifications (instance {{ $labels.instance }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

5. Cortex ingester unhealthy

Cortex has an unhealthy ingester

  - alert: CortexIngesterUnhealthy
    expr: cortex_ring_members{state="Unhealthy", name="ingester"} > 0
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Cortex ingester unhealthy (instance {{ $labels.instance }})
      description: "Cortex has an unhealthy ingester\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

6. Cortex frontend queries stuck

There are queued up queries in query-frontend.

  - alert: CortexFrontendQueriesStuck
    expr: sum by (job) (cortex_query_frontend_queue_length) > 0
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: Cortex frontend queries stuck (instance {{ $labels.instance }})
      description: "There are queued up queries in query-frontend.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"