Shiny Ideas: Distributing Data, Consistently (2/N)

In my previous post I jabbered a little bit about the utility of having a strongly-consistent data store that's highly-available and easy to work with from an operational standpoint. This, in turn, led us to the wonderful, magical land of consensus algorithms. After reviewing currently-available implementions we landed on etcd as the most viable candidate for further experimentation. What follows is not at all production-worthy, because it ignores the boring crap like tuning and security.

etcd is a KV store which addresses our needs as follows:

Strong consistency: Yes, via the Raft consensus algorithm.
HA: Supports shared-nothing clusters.
Operationally tractable: Requires no special hardware. Readily supports 5-node clusters, which gives us 3 for quorum, 1 broken, and 1 down for maintenance.

It also has a bunch of nice features under the hood which we'll get to at a later date.

So, let's Vagrant ourselves up a 3-node cluster, just for the purposes of experimentation. Here's the bootstrap script:

#!/usr/bin/env bash
IP=$1
NAME=$2

yum update -y

# Set up toy name resolution
cat <> /etc/hosts
10.0.0.2 turtle1
10.0.0.3 turtle2
10.0.0.4 turtle3
EOM

# Install etcd
yum install -y etcd

# Configure etcd for a three node cluster.
sed -i "
s/https/http/g
/ETCD_LISTEN_PEER_URLS/ s/localhost/$IP/
/ETCD_LISTEN_PEER_URLS/ s/^#//
/ETCD_LISTEN_CLIENT_URLS/ s|=\"|=\"http://$IP:2379,|
/ETCD_NAME/ s/default/$NAME/
/ETCD_INITIAL_ADVERTISE_PEER_URLS/ s/localhost/$IP/
/ETCD_INITIAL_ADVERTISE_PEER_URLS/ s/^#//
/ETCD_ADVERTISE_CLIENT_URLS/ s/localhost/$IP/
/ETCD_INITIAL_CLUSTER=/ s|=\".*$|=\"turtle1=http://10.0.0.2:2380,turtle2=http://10.0.0.3:2380,turtle3=http://10.0.0.4:2380\"|
/ETCD_INITIAL_CLUSTER/ s/^#//
" /etc/etcd/etcd.conf

Nothing fancy here. This statically configures the initial cluster membership, which is totally fine for the purposes of messing around. Etcd also has the ability to discover peers via DNS SRV records, which is probably what you'd do in a production scenario. Note also the use of HTTP rather than HTTPS for the sake of convenience; don't do this in prod. This is what the etcd configuration looks like after bootstrapping:

[vagrant@turtle1 ~]$ grep -v '^#' /etc/etcd/etcd.conf
ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
ETCD_LISTEN_PEER_URLS="http://10.0.0.2:2380"
ETCD_LISTEN_CLIENT_URLS="http://10.0.0.2:2379,http://localhost:2379"
ETCD_NAME="turtle1"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.0.0.2:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://10.0.0.2:2379"
ETCD_INITIAL_CLUSTER="turtle1=http://10.0.0.2:2380,turtle2=http://10.0.0.3:2380,turtle3=http://10.0.0.4:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"

Here's the Vagrantfile:

# -*- mode: ruby -*-
# vi: set ft=ruby :

Vagrant.configure("2") do |config|
  config.vm.box = "centos/7"

  config.vm.define "turtle1" do |turtle1|
    turtle1.vm.hostname = "turtle1"
    turtle1.vm.network "private_network", ip: "10.0.0.2",
      virtualbox__intnet: true
    turtle1.vm.provision :shell, path: "bootstrap.sh", args: "10.0.0.2 turtle1"
  end

  config.vm.define "turtle2" do |turtle2|
    turtle2.vm.hostname = "turtle2"
    turtle2.vm.network "private_network", ip: "10.0.0.3",
      virtualbox__intnet: true
    turtle2.vm.provision :shell, path: "bootstrap.sh", args: "10.0.0.3 turtle2"
  end

  config.vm.define "turtle3" do |turtle3|
    turtle3.vm.hostname = "turtle3"
    turtle3.vm.network "private_network", ip: "10.0.0.4",
      virtualbox__intnet: true
    turtle3.vm.provision :shell, path: "bootstrap.sh", args: "10.0.0.4 turtle3"
  end
end

Again, nothing fancy. Set up three VMs on a (Virtualbox internal) private network and bootstrap them per the script above. Note that there are no instructions to start etcd. etcd is happiest when all nodes in the cluster are up and running before any daemons get started, which means that all three instances should be started after turtle3 is completely provisioned. I couldn't figure out how to do this automatically via Vagrant, so we're left with the following, manual invocation:

for host in turtle1 turtle2 turtle3; do
  vagrant ssh ${host} -- 'sudo service etcd start' 2>&1 > ${host}.log &
done;

If all goes well you should now have a functional etcd cluster:

[vagrant@turtle1 ~]$ etcdctl cluster-health
member aeb6950c050a83f3 is healthy: got healthy result from http://10.0.0.4:2379
member ee274cacce804b21 is healthy: got healthy result from http://10.0.0.2:2379
member fe9c64eaf3991d47 is healthy: got healthy result from http://10.0.0.3:2379
cluster is healthy

W00t. In our next episode we'll look at actually doing some stuff.

Thursday, March 28, 2019

Distributing Data, Consistently (2/N)

0 Comments:

Previous Posts