Shiny Ideas: Messing Around With Ceph (1/N)

I decided I needed a DFS, so I did a brief survey of current (FOSS) offerings. Ceph came out on top due to its features and robust developer base, so I figured I'd mess around with it for a little bit, see if I could get it set up and useful.

The good news about Ceph is that it has a tremendous amount of documentation! The bad news about Ceph is... that it has a tremendous amount of documentation. It also consists of a bunch of different services, which can be set up in a variety of configurations, so figuring out the best way to lay everything out takes a little bit of digging. Here's what I've come up with so far:

Monitor nodes: These things maintain cluster state. In an HA setup they use Paxos, which means you need at least 3 for tinkering (5 for an n+2 setup in production).
Manager nodes: Provide a bunch of management functionality. The manager node docs say "It is not mandatory to place mgr daemons on the same nodes as mons, but it is almost always sensible". So three of those too.
Metadata Servers (aka "MDS"): These servers store all the metadata for the Ceph Filesystem. There's a lot of flexibility in how these are laid out. Towards the end of the architecture doc it says "Combinations of standby and active etc are possible, for example running 3 active ceph-mds instances for scaling, and one standby instance for high availability". So let's install three of these things, make two of them active and one of them standby.
Object Storage Daemons (aka "OSD"): These hold the data. In a normal installation these do most of the heavy lifting, and you'll have way more of them than any of the other daemons. Since we've already consumed 3 VMs lets go ahead and install an OSD on each.
Admin node: The preflight checklist strongly suggests a dedicated admin node.
Client: Finally, let's set up a node to act as an FS client.

So, 5 nodes total: 1 for administration, 1 as a client, and three of which will run 1 instance of each type of daemon.

So here's a Vagrantfile:

Vagrant.configure("2") do |config|
  config.vm.box = "centos/7"
  config.vm.provision :shell, path: "dns.sh"
  config.vm.synced_folder ".", "/vagrant", type: "rsync",
    rsync__exclude: "vdisks/"

  config.vm.define "cache" do |cache|
    cache.vm.hostname = "cache"
    cache.vm.network "private_network", ip: "10.0.0.254",
      virtualbox__intnet: true
    cache.vm.provision :shell, path: "dns.sh"
    cache.vm.provision :shell, path: "cache.sh"
  end

  ips = Hash[
    "node1" => "10.0.0.3", "node2" => "10.0.0.4", "node3" => "10.0.0.5"
  ]
  vdisk_dir = "./vdisks"
  unless Dir.exist?(vdisk_dir)
    Dir.mkdir(vdisk_dir)
  end
  (1..3).each do |i|
    config.vm.define "node#{i}" do |node|
      hostname = "node#{i}"
      node.vm.hostname = hostname
      node.vm.network "private_network", ip: ips[hostname],
        virtualbox__intnet: true
      node.vm.provider "virtualbox" do |vb|
        vdisk_file = "#{vdisk_dir}/#{hostname}-ceph.vdi"
        unless File.exist?(vdisk_file)
          vb.customize [
            'createhd', '--filename', vdisk_file, '--variant', 'Fixed',
            '--size', 1024
          ]
        end
        vb.customize [
          'storageattach', :id,  '--storagectl', 'IDE', '--port', 1,
          '--device', 0, '--type', 'hdd', '--medium', vdisk_file
        ]
      end
      node.vm.provision :shell, path: "bootstrap.sh"
      node.vm.provision :shell, path: "ntp.sh"
    end
  end

  config.vm.define "admin" do |admin|
    admin.vm.hostname = "admin"
    admin.vm.network "private_network", ip: "10.0.0.2",
      virtualbox__intnet: true
    admin.vm.provision :shell, path: "bootstrap.sh"
  end

  config.vm.define "client" do |client|
    client.vm.hostname = "client"
    client.vm.network "private_network", ip: "10.0.0.6",
      virtualbox__intnet: true
    client.vm.provision :shell, path: "bootstrap.sh"
  end

end

Various parts of the above blatantly stolen from the Vagrant tips page and EverythingShouldBeVirtual.

There are a handful of marginally interesting things going on in the Vagrantfile:

Note that there's a VM called cache; it doesn't have anything to do with Ceph. As I was building (and rebuilding) the other nodes it quickly became apparent that downloading RPMs consumes the vast majority of the setup time for each machine. So cache is just going to run a caching Squid proxy, which will speed up the build time for the remaining nodes considerably.
There's some jiggery-pokery which adds an additional disk to each of the Ceph nodes; Ceph really likes to have a raw disk device on which to store things.

Here's dns.sh:

#!/usr/bin/env bash

cat <<EOM > /etc/hosts
127.0.0.1 localhost localhost.localdomain
10.0.0.2 admin admin.localdomain
10.0.0.3 node1 node1.localdomain
10.0.0.4 node2 node2.localdomain
10.0.0.5 node3 node3.localdomain
10.0.0.6 client client.localdomain
10.0.0.254 cache cache.localdomain
EOM

Nothing fancy here, just make sure all names resolve appropriately.

And cache.sh:

#!/usr/bin/env bash

yum install -y squid

sed -i '/cache_dir/ s/^#//' /etc/squid/squid.conf

service squid start

sed -i '/\[main\]/a proxy=http://cache.localdomain:3128' /etc/yum.conf
sed -i 's/enabled=1/enabled=0/' /etc/yum/pluginconf.d/fastestmirror.conf

yum update -y

This installs Squid, enables caching, and sets up yum to direct requests through Squid.

ntp.sh:

#!/usr/bin/env bash

yum install -y ntp ntpdate ntp-doc

grep -o 'node[1-9].localdomain' /etc/hosts | grep -v `hostname` | sed -e 's/^/peer /' >> /etc/ntp.conf

ntpdate 0.centos.pool.ntp.org

service ntpd start

This does NTP setup on the 3 main nodes. The Ceph documentation strongly suggests that, in a multi-monitor setup, the hosts running the monitoring daemons should be set up as NTP peers of each other.

Here's bootstrap.sh:

#!/usr/bin/env bash

sed -i '/\[main\]/a proxy=http://cache.localdomain:3128' /etc/yum.conf
sed -i 's/enabled=1/enabled=0/' /etc/yum/pluginconf.d/fastestmirror.conf

cat << EOM > /etc/yum.repos.d/ceph-deploy.repo
[ceph-noarch]
name=Ceph noarch packages
baseurl=https://download.ceph.com/rpm-luminous/el7/noarch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
EOM

yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

yum update -y

yum install -y yum-plugin-priorities

useradd -c /home/ceph-deploy -m ceph-deploy

echo "ceph-deploy ALL = (root) NOPASSWD:ALL" > /etc/sudoers.d/ceph-deploy
chmod 0440 /etc/sudoers.d/ceph-deploy

mkdir -p -m 700 /home/ceph-deploy/.ssh
cat <<EOM > /home/ceph-deploy/.ssh/config
Host admin
   Hostname admin
   User ceph-deploy
Host node1
   Hostname node1
   User ceph-deploy
Host node2
   Hostname node2
   User ceph-deploy
Host node3
   Hostname node3
   User ceph-deploy
Host client
   Hostname client
   User ceph-deploy
EOM
mv /vagrant/id_rsa* /home/ceph-deploy/.ssh/
cp /home/ceph-deploy/.ssh/id_rsa.pub /home/ceph-deploy/.ssh/authorized_keys
chown -R ceph-deploy:ceph-deploy /home/ceph-deploy/.ssh
chmod 0600 /home/ceph-deploy/.ssh/*

This is based largely off of the preflight checklist mentioned above. Note the tweaks to the files under /etc/yum, which configures the machine to use cache for downloading RPMs and disables the fastestmirror plugin (which isn't needed 'cause we're using a local caching proxy). Note also the distribution of a shared SSH key; the ceph-deploy utility needs passwordless SSH access to all nodes. I couldn't figure out an elegant way to generate a shared keypair during vagrant up. So instead I did

ssh-keygen -C 'ceph deploy user' -f id_rsa

to create the keypair and put the resulting files into the Vagrant directory where they're available during execution of bootstrap.sh.

This completes the preliminaries; all of the nodes are ready for installation and configuration of Ceph proper. We'll pick up there next time.

Thursday, April 18, 2019

Messing Around With Ceph (1/N)

0 Comments:

Previous Posts