1 Basic resource monitoring
https://awesome-prometheus-alerts.grep.to/alertmanager
1 Prometheus self-monitoring
- Prometheus job missing
- Prometheus target missing
- Prometheus all targets missing
- Prometheus configuration reload failure
- Prometheus too many restarts
- Prometheus AlertManager job missing
- Prometheus AlertManager configuration reload failure
- Prometheus AlertManager config not synced
- Prometheus AlertManager E2E dead man switch
- Prometheus not connected to alertmanager
- Prometheus rule evaluation failures
- Prometheus template text expansion failures
- Prometheus rule evaluation slow
- Prometheus notifications backlog
- Prometheus AlertManager notification failing
- Prometheus target empty
- Prometheus target scraping slow
- Prometheus large scrape
- Prometheus target scrape duplicate
- Prometheus TSDB checkpoint creation failures
- Prometheus TSDB checkpoint deletion failures
- Prometheus TSDB compactions failed
- Prometheus TSDB head truncations failed
- Prometheus TSDB reload failure
- Prometheus TSDB WAL corruptions
- Prometheus TSDB WAL truncations failed
- Prometheus exporter down
2 Host node-exporter
- Out of memory
- Host memory under memory pressure
- Unusual network throughput in
- Unusual network throughput out
- Unusual disk read rate
- Unusual disk write rate
- Out of disk space
- Host disk will fill in 24 hours
- Out of inodes
- Host inodes will fill in 24 hours
- Unusual disk read latency
- Unusual disk write latency
- High CPU load
- Host CPU steal noisy neighbor
- Host Context switching
- Swap is filling up
- Host Systemd service crashed
- Host physical component too hot
- Host node overtemperature alarm
- Host RAID array got inactive
- Host RAID disk failure
- Host kernel version deviations
- Host OOM kill detected
- Host EDAC Correctable Errors detected
- Host EDAC Uncorrectable Errors detected
- Host Network Receive Errors
- Host Network Transmit Errors
- Host Network Interface Saturated
- Host Network Bond Degraded
- Host conntrack limit
- Host clock skew
- Host clock not synchronising
- Host requires reboot
3 Docker containers : cAdvisor
- Container killed
- Container CPU usage
- Container Memory usage
- Container Volume usage
- Container Volume I/O usage
- Container absent
- Container high throttle rate
4. Blackbox : prometheus
- Blackbox probe failed
- Blackbox slow probe
- Blackbox probe HTTP failure
- SSL certificate will expire soon
- SSL certificate expired
- Blackbox probe slow HTTP
- Blackbox probe slow ping
5. Windows Server
- Windows Server collector Error
- Windows Server service Status
- Windows Server CPU Usage
- Windows Server memory Usage
- Windows Server disk Space Usage
6. VMware
- Virtual Machine Memory Warning
- Virtual Machine Memory Critical
- High Number of Snapshots
- Outdated Snapshots
7.Netdata
- Netdata high cpu usage
- Host CPU steal noisy neighbor
- Netdata high memory usage
- Netdata low disk space
- Netdata predicted disk full
- Netdata MD mismatch cnt unsynchronized blocks
- Netdata disk reallocated sectors
- Netdata disk current pending sector
- Netdata reported uncorrectable disk sectors