跳转至

1 Basic resource monitoring

https://awesome-prometheus-alerts.grep.to/alertmanager

1 Prometheus self-monitoring

  • Prometheus job missing
  • Prometheus target missing
  • Prometheus all targets missing
  • Prometheus configuration reload failure
  • Prometheus too many restarts
  • Prometheus AlertManager job missing
  • Prometheus AlertManager configuration reload failure
  • Prometheus AlertManager config not synced
  • Prometheus AlertManager E2E dead man switch
  • Prometheus not connected to alertmanager
  • Prometheus rule evaluation failures
  • Prometheus template text expansion failures
  • Prometheus rule evaluation slow
  • Prometheus notifications backlog
  • Prometheus AlertManager notification failing
  • Prometheus target empty
  • Prometheus target scraping slow
  • Prometheus large scrape
  • Prometheus target scrape duplicate
  • Prometheus TSDB checkpoint creation failures
  • Prometheus TSDB checkpoint deletion failures
  • Prometheus TSDB compactions failed
  • Prometheus TSDB head truncations failed
  • Prometheus TSDB reload failure
  • Prometheus TSDB WAL corruptions
  • Prometheus TSDB WAL truncations failed
  • Prometheus exporter down

2 Host node-exporter

  1. Out of memory
  2. Host memory under memory pressure
  3. Unusual network throughput in
  4. Unusual network throughput out
  5. Unusual disk read rate
  6. Unusual disk write rate
  7. Out of disk space
  8. Host disk will fill in 24 hours
  9. Out of inodes
  10. Host inodes will fill in 24 hours
  11. Unusual disk read latency
  12. Unusual disk write latency
  13. High CPU load
  14. Host CPU steal noisy neighbor
  15. Host Context switching
  16. Swap is filling up
  17. Host Systemd service crashed
  18. Host physical component too hot
  19. Host node overtemperature alarm
  20. Host RAID array got inactive
  21. Host RAID disk failure
  22. Host kernel version deviations
  23. Host OOM kill detected
  24. Host EDAC Correctable Errors detected
  25. Host EDAC Uncorrectable Errors detected
  26. Host Network Receive Errors
  27. Host Network Transmit Errors
  28. Host Network Interface Saturated
  29. Host Network Bond Degraded
  30. Host conntrack limit
  31. Host clock skew
  32. Host clock not synchronising
  33. Host requires reboot

3 Docker containers : cAdvisor

  1. Container killed
  2. Container CPU usage
  3. Container Memory usage
  4. Container Volume usage
  5. Container Volume I/O usage
  6. Container absent
  7. Container high throttle rate

4. Blackbox : prometheus

  1. Blackbox probe failed
  2. Blackbox slow probe
  3. Blackbox probe HTTP failure
  4. SSL certificate will expire soon
  5. SSL certificate expired
  6. Blackbox probe slow HTTP
  7. Blackbox probe slow ping

5. Windows Server

  1. Windows Server collector Error
  2. Windows Server service Status
  3. Windows Server CPU Usage
  4. Windows Server memory Usage
  5. Windows Server disk Space Usage

6. VMware

  1. Virtual Machine Memory Warning
  2. Virtual Machine Memory Critical
  3. High Number of Snapshots
  4. Outdated Snapshots

7.Netdata

  1. Netdata high cpu usage
  2. Host CPU steal noisy neighbor
  3. Netdata high memory usage
  4. Netdata low disk space
  5. Netdata predicted disk full
  6. Netdata MD mismatch cnt unsynchronized blocks
  7. Netdata disk reallocated sectors
  8. Netdata disk current pending sector
  9. Netdata reported uncorrectable disk sectors