Monitoring The Bottom Turtle (3/N)
So... making Prometheus monitor the etcd cluster (and underlying hardware) turns out to be... pretty trivial. Prometheus provides a daemon called "node_exporter" which does the heavy lifting from a machine perspective, while etcd supports Prometheus out of the box. In order to hook is all together one must:
- Create an init script/systemd file for node_exporter
- Update bootstrap.sh to install node_exporter.
- Update prometheus.yml to scrape node_exporter and the etcd.
First up, node_exporter.service:
[Unit] Description=Prometheus Node Exporter Wants=network-online.target After=network-online.target [Service] Type=simple Environment="GOMAXPROCS=1" User=prometheus Group=prometheus ExecReload=/bin/kill -HUP $MAINPID ExecStart=/usr/local/bin/node_exporter SyslogIdentifier=prometheus Restart=always [Install] WantedBy=multi-user.target
And the additions to bootstrap.sh:
wget -nv https://github.com/prometheus/node_exporter/releases/download/v0.17.0/node_exporter-0.17.0.linux-amd64.tar.gz tar zxf node_exporter-0.17.0.linux-amd64.tar.gz cp node_exporter-0.17.0.linux-amd64/node_exporter /usr/local/bin/ mv /vagrant/node_exporter.service /etc/systemd/system service node_exporter start rm -rf node_exporter-0.17.0.linux-amd64*
And, lastly, the revised prometheus.yml:
global: scrape_interval: 15s # By default, scrape targets every 15 seconds. scrape_configs: # The job name is added as a label `job=Two additional jobs have been added, 'node_exporter' and 'etcd', the former which scrapes the node_exporter process (which lives on port 9100 by default) and the latter which scrapes etcd on its default port.` to any timeseries scraped from this config. - job_name: 'prometheus' # Override the global default and scrape targets from this job every 5 seconds. scrape_interval: 5s static_configs: - targets: ['turtle1:9090', 'turtle2:9090', 'turtle3:9090'] - job_name: 'node_exporter' static_configs: - targets: ['turtle1:9100', 'turtle2:9100', 'turtle3:9100'] - job_name: 'etcd' static_configs: - targets: ['turtle1:2379', 'turtle2:2379', 'turtle3:2379']
Once all that has been put in place and the Vagrant cluster has been rebuilt you should see 9 targets across three jobs (prometheus, etcd, and node-exporter) when you navigate to http://localhost:9090/.
Ok, great, you've got Prometheus gathering all sorts of number. Now what?
Prometheus does two main things with this information, alerting and visualization. I trust that Prometheus alerting works; paging someone when things go south is a well-understood problem. The only complication is that there are multiple servers all monitoring the same thing, so in production you'd get multiple emails. Not sure how you'd fix that, given that Prometheus servers are independent by design.
As for visualization, I'll let the Prometheus docs speak for themselves:
Console templates are the most powerful way to create templates that can be easily managed in source control. There is a learning curve though, so users new to this style of monitoring should try out Grafana first.
Yeah, let's do that. I've been wanting to mess with Grafana for awhile anyway.