第一节 Centos7 Prometheus安装部署+监控+绘图+告警
1 准备
1-1 部署架构图

1-2 准备机器
master machine: 192.168.33.10
$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
2 安装部分
在官网下载对应的压缩包文件,解压、添加系统服务器、启动。
2-1 Node_exporter
$ wget https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz
$ sudo tar zxf node_exporter-0.18.1.linux-amd64.tar.gz -C /usr/local
$ sudo vim /etc/systemd/system/node_exporter.service
[Unit]
Description=node_exporter
After=network.target
[Service]
Restart=on-failure
ExecStart=/usr/local/node_exporter-0.18.1.linux-amd64/node_exporter
[Install]
WantedBy=multi-user.target
$ sudo systemctl start node_exporter
$ systemctl status node_exporter
$ sudo systemctl enable node_exporter
$ systemctl status node_exporter
● node_exporter.service - node_exporter
   Loaded: loaded (/etc/systemd/system/node_exporter.service; enabled; vendor preset: disabled)
   Active: active (running) since Sun 2019-06-23 07:56:53 UTC; 6min ago
 Main PID: 7492 (node_exporter)
   CGroup: /system.slice/node_exporter.service
           └─7492 /usr/local/node_exporter-0.18.1.linux-amd64/node_exporter
验证

2 AlertManager
2-1 安装命令
$ wget https://github.com/prometheus/alertmanager/releases/download/v0.17.0/alertmanager-0.17.0.linux-amd64.tar.gz
$ sudo tar zxf alertmanager-0.17.0.linux-amd64.tar.gz  -C /usr/local
$ sudo vim /etc/systemd/system/alertmanager.service
[Unit]
Description=Alertmanager
After=network-online.target
[Service]
Restart=on-failure
ExecStart=/usr/local/alertmanager-0.17.0.linux-amd64/alertmanager --config.file=/usr/local/alertmanager-0.17.0.linux-amd64/alertmanager.yml
[Install]
WantedBy=multi-user.target
$ sudo systemctl start alertmanager
$ systemctl status alertmanager
$ sudo systemctl enable alertmanager
$ netstat -anlpt | grep 9093
tcp        0      0 192.168.33.10:39518     192.168.33.10:9093      ESTABLISHED -
tcp6       0      0 :::9093                 :::*                    LISTEN      -
$ systemctl status alertmanager
● alertmanager.service - Alertmanager
   Loaded: loaded (/etc/systemd/system/alertmanager.service; enabled; vendor preset: disabled)
   Active: active (running) since Sun 2019-06-23 07:50:39 UTC; 22min ago
 Main PID: 7404 (alertmanager)
   CGroup: /system.slice/alertmanager.service
           └─7404 /usr/local/alertmanager-0.17.0.linux-amd64/alertmanager --config.file=/usr/local/alertmanager-0.17.0.linux-amd64/alertmanager.yml
2-2 验证
192.168.33.10:9093 

3 Prometheus
$ wget https://github.com/prometheus/prometheus/releases/download/v2.10.0/prometheus-2.10.0.linux-amd64.tar.gz
$ sudo tar zxf prometheus-2.9.2.linux-amd64.tar.gz -C /usr/local
$ sudo vim /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network-online.target
[Service]
Restart=on-failure
ExecStart=/usr/local/prometheus-2.10.0.linux-amd64/prometheus --config.file=/usr/local/prometheus-2.10.0.linux-amd64/prometheus.yml --storage.tsdb.path=/var/lib/prometheus --web.external-url=http://0.0.0.0:9090
[Install]
WantedBy=multi-user.target
$ sudo systemctl start prometheus
$ systemctl status prometheus
$ sudo systemctl enable prometheus

4 Grafana
4-1 安装
$ wget https://mirrors.tuna.tsinghua.edu.cn/grafana/yum/el7/grafana-5.4.2-1.x86_64.rpm
$ sudo yum install initscripts fontconfig
$ sudo yum install -y urw-fonts
- 缺少 urw-fonts
- initscripts fontconfig
$ rpm -ivh grafana-5.4.2-1.x86_64.rpm
$ sudo systemctl start grafana-server
$ systemctl status grafana-server
$ systemctl enable grafana-server
4-2 验证

$ sudo vim /etc/grafana/grafana.ini
/security
5 配置部分
5-1 AlertManager
- 配置文件
global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp-mail.outlook.com:587'
  smtp_from: 'xichao2015@outlook.com'
  smtp_auth_username: '###'
  smtp_auth_password: '###'
  smtp_require_tls: false
route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'toemail'
receivers:
- name: 'toemail'
  email_configs:
  - to: 'jacob.xi@sap.com'
    send_resolved: true
- name: 'web.hook'
  webhook_configs:
  - url: 'http://192.168.33.10:5001/'
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']
5-2 Prometheus
$ sudo vim /usr/local/prometheus-2.10.0.linux-amd64/prometheus.yml
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
       - 192.168.33.10:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "rules/host_rules.yml"
  # - "first_rules.yml"
  # - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    scrape_interval: 5s
    static_configs:
    - targets: ['192.168.33.10:9090']
  - job_name: 'my target'
    static_configs:
    - targets: ['192.168.33.10:9100']
$ cd /usr/local/prometheus-2.10.0.linux-amd64
$ sudo mkdir {configs,rules}
$ sudo vim host_rules.yml
groups:
- name: 'Linux Instances'
  rules:
  - alert: InstanceDown
    expr: up == 0
    for: 5s
    labels:
      severity: page
    # Prometheus templates apply here in the annotation and label fields of the alert.
    #     annotations:
    #           description: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 s.'
$ sudo systemctl restart prometheus
6 验证
6-1 查看目标
Prometheus  =>  status => rules

Prometheus  =>  Alerts

6-2 查看监控数据
https://grafana.com/dashboards/9276
Import Data sources

Import Dashboard
https://grafana.com/dashboards/9276 

7 告警
7-1 模拟node_exporter宕机
$ sudo systemctl stop node_exporter


查看邮箱收件箱
7-2 参考文章
https://www.jianshu.com/p/e59cfd15612e
8 报错
Grafana showing No data points
We need sync datetime
$ sudo reboot