Tuesday, April 09, 2019

Implementing TLS for an Etcd Custer (3/N)

Alrighty then... there's been a tremendous amount of yak shaving to get up to this point. Which, when you think about it, isn't entirely surprsing since we basically stood up our own mini-CA. Anyway, at this point we're ready to make the cluster talk TLS.

We've already installed a cert on the first cluster; let's rinse and repeat for the second node. Assuming your Vault instance is already unsealed, grab the CA certificate and trust it:

[root@turtle2 ~]# curl https://turtle1.localdomain:8200/v1/pki/ca/pem > /etc/pki/ca-trust/source/anchors/vault_root.pem
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1171  100  1171    0     0  13353      0 --:--:-- --:--:-- --:--:-- 13459
[root@turtle2 ~]# update-ca-trust
Auth to Vault:
[root@turtle2 ~]# export VAULT_ADDR=https://turtle1.localdomain:8200
[root@turtle2 ~]# vault login
Token (will be hidden):
Success! You are now authenticated. The token information displayed below
...
and the generate a cert and move it into position:
[root@turtle2 ~]# vault write -format json pki/issue/etcd-cluster common_name=turtle2.localdomain > turtle2.localdomain.json
[root@turtle2 ~]# jq -r .data.certificate turtle2.localdomain.json > /etc/pki/tls/certs/turtle2.localdomain.crt
[root@turtle2 ~]# jq -r .data.private_key turtle2.localdomain.json > /etc/pki/tls/private/turtle2.localdomain.key
The process for doing the third node is pretty much identical.

At this point we've checked off all of the required materials. We've got a CA cert and each of the etcd nodes has its own cert as well. At this point we should just be able to update the config and things should automagically work. Except...

Apparently there's a minor issue where an etcd cluster that is initially started in HTTP mode doesn't deal gracefully with the conversion to HTTPS. Since we've no data to save its easiest just to plaster the cluster and start over:

[root@turtle1 ~]# rm -rf /var/lib/etcd/default.etcd
Repeat for other nodes as needed.

Here's a new config, based on the guidance from the etcd security docs:

ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
ETCD_LISTEN_PEER_URLS="https://10.0.0.2:2380"
ETCD_LISTEN_CLIENT_URLS="https://10.0.0.2:2379,https://127.0.0.1:2379"
ETCD_NAME="turtle1"
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://turtle1.localdomain:2380"
ETCD_ADVERTISE_CLIENT_URLS="https://turtle1.localdomain:2379"
ETCD_INITIAL_CLUSTER="turtle1=https://turtle1.localdomain:2380,turtle2=https://turtle2.localdomain:2380,turtle3=https://turtle3.localdomain:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_CERT_FILE="/etc/pki/tls/certs/turtle1.localdomain.crt"
ETCD_KEY_FILE="/etc/pki/tls/private/turtle1.localdomain.key"
ETCD_PEER_CERT_FILE="/etc/pki/tls/certs/turtle1.localdomain.crt"
ETCD_PEER_KEY_FILE="/etc/pki/tls/private/turtle1.localdomain.key"
ETCD_PEER_CLIENT_CERT_AUTH="true"
ETCD_PEER_TRUSTED_CA_FILE="/etc/pki/tls/certs/ca-bundle.crt"
This config will cause etcd to speak TLS both between cluster members and between client/server. The above is mostly self-explanatory, but there are a couple of subtleties to keep in mind:
  • ETCD_LISTEN_PEER_URLS and ETC_LISTEN_CLIENT_URLS tell etcd which interfaces to bind to; they expect IPs rather than names.
  • Other config items which specify addresses should use FQDNs which match the common names used in the certificates that have been generated.

At this point you can restart the cluster and check it health:

[vagrant@turtle1 ~]$ etcdctl cluster-health
cluster may be unhealthy: failed to list members
Error:  client: etcd cluster is unavailable or misconfigured; error #0: net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x15\x03\x01\x00\x02\x02"
; error #1: dial tcp 127.0.0.1:4001: connect: connection refused

error #0: net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x15\x03\x01\x00\x02\x02"
error #1: dial tcp 127.0.0.1:4001: connect: connection refused
WTF, etcdctl? Why you trying to talk to port 4001? Well, if you read the command line help you find the following:
--endpoints value                a comma-delimited list of machine addresses in the cluster (default: "http://127.0.0.1:2379,http://127.0.0.1:4001")
Ok then... random, but simple enough to fix:
[vagrant@turtle1 ~]$ etcdctl --endpoints 'https://turtle1.localdomain:2379' cluster-health
member 239a42cfc49a9543 is healthy: got healthy result from https://turtle2.localdomain:2379
member 6f3d69975b47e204 is healthy: got healthy result from https://turtle3.localdomain:2379
member fb6279d768c990c6 is healthy: got healthy result from https://turtle1.localdomain:2379
cluster is healthy
Sweet. They all talking TLS?
[vagrant@turtle1 ~]$ etcdctl --endpoints 'https://turtle1.localdomain:2379' member list
239a42cfc49a9543: name=turtle2 peerURLs=https://turtle2.localdomain:2380 clientURLs=https://turtle2.localdomain:2379 isLeader=false
6f3d69975b47e204: name=turtle3 peerURLs=https://turtle3.localdomain:2380 clientURLs=https://turtle3.localdomain:2379 isLeader=false
fb6279d768c990c6: name=turtle1 peerURLs=https://turtle1.localdomain:2380 clientURLs=https://turtle1.localdomain:2379 isLeader=true
Mission accomplished!

Now, in a separate-but-related issue, after I did the TLS conversion above I was still seeing messages like the following in the log:

Apr  5 00:42:43 localhost etcd: rejected connection from "10.0.0.4:46290" (error "tls: first record does not look like a TLS handshake", ServerName "")
Apr  5 00:42:45 localhost etcd: rejected connection from "10.0.0.3:41394" (error "tls: first record does not look like a TLS handshake", ServerName "")
Apr  5 00:42:48 localhost etcd: rejected connection from "127.0.0.1:42742" (error "tls: first record does not look like a TLS handshake", ServerName "")
At first I thought some legacy something in the etcd cluster was still trying to talk HTTP, but then I remembered that the Prometheus servers I'd set up were still doing their thing. It turned out to be easy enough to update the scrape config to use HTTPS:
- job_name: 'etcd'

    scheme: https

    static_configs:
      - targets: ['turtle1.localdomain:2379', 'turtle2.localdomain:2379', 'turtle3.localdomain:2379']

0 Comments:

Post a Comment

<< Home

Blog Information Profile for gg00