Shiny Ideas: Distributing Data, Consistently (1/N)

A long time ago, in a galaxy far, far away, I worked for a company that had this really cool system for distrubting data in a strongly consistent manner. Having such a system is useful for all sorts of things, like distributed locks, and leader election, and dynamic service configuration. Lacking such a system where I currently work I decided to see if I could cobble something similar together.

So, what sort of properties should such a system have? It needs to be

Strongly-consistent
Highly-available
Operationally realistic

"Strongly consistent" tends to point towards ACID databases (that's what the 'C' is for, after all). However, most readily-available ACID DBs tend aren't so hot when it comes to the second and third points. Many of them are active/passive, which is sorta-HA-but-we-can-do-better. The ones which are active-active typically need to share fancy-ass hardware (like a SAN) to do their thing. Shared hardware is expensive, and simply isn't available in oh... say... a cloud deployment. The holy grail is an active-active, shared-nothing system.

Also, what did I mean by "operationally tractable"? The system has to be operationally servicable in the real world. Specifically, it should support an N+2 deployment. Why N+2? Well

N nodes are doing their job.
1 node just failed.
1 node is undergoing routine maintenance.

Taking all three criteria into account (strongly-consistent, HA, and operationally realistic) really limits your options, and points to distributed consensus algorithms. Our candidates are:

Paxos
Raft
Chandra-Toueg

A list of general-purpose data storage things that use any of the above for strong consistency:

Various bits of Google infrastructure (Chubby, Spanner): Not generally available for public consumption.
Doozerd: Tailored to the use case, but the last commit was 2013 and I, not being a Go expert, wasn't able to get it to build.
Etcd: Looks similar to Doozerd, but is in active development.
neo4j: Does something called "causal clustering", which I'm not convinced is what I'm looking for. Also, being a graph DB we're into square-peg-round-hole territory.
Clustrix: Commercial, not readily available.
Riak KV: Readily available, and has good docs. However, there's a big-ass warning that the strong consensus code is experimental.

Etcd looks like the best place to start; I'll pick up there next time.

Thursday, March 28, 2019

Distributing Data, Consistently (1/N)

0 Comments:

Previous Posts