Finding Candidate Open Source Distributed File Systems
In my ever-present quest to keep myself entertained at work I've decided that I should mess around with distributed file systems for a bit. The "comparison" page on Wikipedia lists a lot of systems, but its hard to tell which of them are truly viable options for production use and which of them are research projects. This post records a little bit of research in the hopes that it'll be useful to posterity.
Eliminating all systems marked "proprietary" leaves us with:
- Alluxio
- BeeGFS
- Ceph
- GlusterFS
- MooseFS
- Quantcast File System
- LizardFS
- Lustre
- OpenAFS
- OpenIO
- SeaweedFS
- Tahoe-LAFS
- HDFS
- XreemFS
- Ori
If I'm going to be running something in production I'd prefer that it be actively maintained. Projects which don't appear to be under active development anymore:
- Quantcast File System: Last release was in 2015.
- XtreemFS: Last release was in 2015.
How about robustness of development? Which of these are someone's thesis project and which of them have a large development community? Presented in descending order of number of contributors:
- Alluxio: 943 contributors
- Ceph: 737 contributors
- GlusterFS: 191 contributors
- SeaweedFS: 59 contributors
- LizardFS: 35 contributors, also a small-ish number.
- OpenIO: 23 contributors
- MooseFS: 9 contributors
- BeeGFS: Their source code repository doesn't appear to have any data on number of contributors.
- Lustre: Not at all obvious from their repository.
- OpenAFS: Not obvious from their repository.
- Tahoe-LAFS: Not obvious from their repository
- HDFS: Not obvious from their repository
What flavors of storage do they support? I'm mostly interested in a good GFS, with object store a nice-to-have and block devices a distant third.
- Alluxio: "Alluxio sits between computation and storage in the big-data analytics stack. It provides a data abstraction layer for computation frameworks, enabling applications to connect to numerous storage systems through a common interface". Not really what I want.
- Ceph: File system via CephFS, object store, and block storage.
- Gluster: File system via GlusterFS (and maybe object store via SWIFT?)
- Moose: File system via MooseFS.
- LizardFS: File system.
- Lustre: File system.
- OpenAFS: File system.
- OpenIO: Object store (plus non-free/proprietary FUSE connector)
- SeaweedFS: Object store with optional FS support.
- Tahoe-LAFS: Cloud storage-ish model. Doesn't looks like it fits the bill.
- HDFS: Specifically designed for streaming access for large-scale computation; not a general-purpose DFS.
At this point Ceph looks like the front-runner in terms of features and robustness. Next question: Does it run on CentOS 7? According to the Ceph OS Recommendations it does, as long as you don't use btrfs. So Ceph seems like a good place to start dabbling for the time being.
0 Comments:
Post a Comment
<< Home