Rook Ceph Cluster: Research Documentation

Introduction

This document provides comprehensive research into deploying Ceph storage clusters on Kubernetes using the Rook operator. Ceph is a unified, distributed storage system that provides object, block, and file storage with excellent performance, reliability, and scalability.

What is Ceph?

Ceph is a software-defined storage platform that implements object storage on a single distributed cluster and provides interfaces for object, block, and file-level storage. Originally developed at UC Santa Cruz and now maintained by Red Hat as an open-source project.

Ceph Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                           Ceph Storage Cluster                         │
│                                                                         │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │                          RADOS Layer                              │  │
│  │  (Reliable Autonomic Distributed Object Store)                    │  │
│  │                                                                   │  │
│  │  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐              │  │
│  │  │   OSD   │  │   OSD   │  │   OSD   │  │   OSD   │  ...         │  │
│  │  │ (node1) │  │ (node1) │  │ (node2) │  │ (node3) │              │  │
│  │  └─────────┘  └─────────┘  └─────────┘  └─────────┘              │  │
│  │                                                                   │  │
│  │  ┌─────────┐  ┌─────────┐  ┌─────────┐                           │  │
│  │  │   MON   │  │   MON   │  │   MON   │  Cluster State/Maps       │  │
│  │  └─────────┘  └─────────┘  └─────────┘                           │  │
│  │                                                                   │  │
│  │  ┌─────────┐  ┌─────────┐                                        │  │
│  │  │   MGR   │  │   MGR   │  Management & Dashboard                │  │
│  │  └─────────┘  └─────────┘                                        │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐         │
│  │    librbd       │  │    libcephfs    │  │    librados     │         │
│  │   (Block)       │  │     (File)      │  │    (Object)     │         │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘         │
│                                                                         │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐         │
│  │   RBD Driver    │  │      MDS        │  │      RGW        │         │
│  │   (CSI/KRBD)    │  │  (Metadata)     │  │  (S3 Gateway)   │         │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘         │
└─────────────────────────────────────────────────────────────────────────┘

Core Components

MON (Monitors): Maintain cluster membership and state maps. Run in odd numbers (3, 5) for quorum.
OSD (Object Storage Daemons): Store actual data, handle replication, recovery, and rebalancing. One per disk.
MGR (Managers): Provide monitoring, orchestration, and external interfaces (dashboard, Prometheus).
MDS (Metadata Servers): Required only for CephFS. Store filesystem metadata.
RGW (RADOS Gateway): S3/Swift compatible object storage interface.

Deployment Methods Comparison

1. Manual Ceph Deployment

Traditional method using ceph-deploy or cephadm:

Pros:

Direct control over every configuration
Suitable for non-Kubernetes environments

Cons:

Complex manual setup
No Kubernetes integration
Difficult to automate
Separate management plane

2. Rook Ceph Operator (Recommended)

Kubernetes-native deployment using Custom Resource Definitions:

Pros:

Native Kubernetes integration
Declarative configuration
Self-healing and self-scaling
CSI driver integration
Active CNCF graduated project

Cons:

Kubernetes-only
Learning curve for Rook CRDs

3. Helm Chart (Ceph-CSI Only)

Deploy CSI drivers only, connect to external Ceph:

Pros:

Lightweight
Use existing Ceph clusters

Cons:

Doesn't deploy Ceph itself
Requires external Ceph management

4. OpenEBS with cStor (Alternative)

Another Kubernetes-native storage option:

Pros:

Simpler architecture
Lower resource requirements

Cons:

Different storage model
Less mature than Ceph

Why Rook for Kubernetes?

Rook transforms Ceph into a cloud-native storage solution by:

Automated Lifecycle Management: Deployment, upgrades, scaling, and recovery
Kubernetes-Native: Uses CRDs, operators, and standard Kubernetes patterns
CSI Integration: Native PersistentVolume support
Self-Healing: Automatic recovery from component failures
Dynamic Provisioning: On-demand storage allocation

Project Planton's Approach

80/20 Scoping

This component exposes the 20% of configuration that covers 80% of use cases:

Included (Essential):

Cluster-level configuration (MON, MGR, OSD counts)
Storage selection (all nodes/devices or specific)
Block pool configuration with StorageClass
CephFS configuration with StorageClass
Object store configuration with StorageClass
Dashboard and toolbox enablement

Excluded (Advanced):

Fine-grained placement rules
Custom CRUSH maps
Advanced OSD configuration (bluestore options)
Ceph configuration overrides
Multi-cluster replication
Stretched clusters

Component Relationship

┌────────────────────────────────┐
│  KubernetesRookCephOperator    │  ← Install first
│  (Deploys Rook Operator)       │
└────────────────────────────────┘
              │
              ▼
┌────────────────────────────────┐
│  KubernetesRookCephCluster     │  ← This component
│  (Deploys Ceph Cluster)        │
└────────────────────────────────┘
              │
              ▼
┌────────────────────────────────┐
│  Storage Resources             │
│  - CephBlockPool + StorageClass│
│  - CephFilesystem + StorageClass
│  - CephObjectStore + StorageClass
└────────────────────────────────┘

Storage Types

Block Storage (RBD)

RBD (RADOS Block Device) provides block-level storage:

Access Mode: ReadWriteOnce (single pod)
Use Cases: Databases, VMs, stateful applications
Features: Snapshots, cloning, encryption

Performance Characteristics:

High IOPS for random access
Low latency
Suitable for transactional workloads

File Storage (CephFS)

CephFS provides POSIX-compliant shared filesystem:

Access Mode: ReadWriteMany (multiple pods)
Use Cases: Shared content, ML datasets, log aggregation
Features: Snapshots, quotas, multi-tenancy

Performance Characteristics:

Good throughput for sequential access
Suitable for file-based workloads
Requires MDS daemons

Object Storage (RGW)

RADOS Gateway provides S3/Swift-compatible object storage:

Access Mode: HTTP/HTTPS (S3 API)
Use Cases: Backups, logs, media, cloud-native apps
Features: Versioning, lifecycle policies, multitenancy

Performance Characteristics:

High throughput for large objects
REST API access
Suitable for unstructured data

Production Best Practices

Hardware Requirements

Minimum Production Setup:

3 nodes minimum (for replication factor 3)
Dedicated storage nodes recommended
SSD/NVMe for OSDs
10GbE networking

Recommended Specifications:

Component	Minimum	Recommended
CPU	2 cores/OSD	4 cores/OSD
RAM	4GB/OSD	8GB/OSD
Network	1GbE	10GbE
Disk	SSD	NVMe

Network Configuration

Dedicated Network: Separate storage traffic from application traffic
Jumbo Frames: Enable 9000 MTU for storage network
Encryption: Use msgr2 with encryption for security

Failure Domain Design

Configure failure domains based on physical topology:

Host: Replicas on different nodes (default)
Rack: Replicas in different racks
Zone: Replicas in different availability zones

Monitoring

Enable Prometheus integration for:

Cluster health
OSD performance
Pool usage
PG status
RGW metrics

Common Pitfalls

Even Monitor Count: Always use odd numbers (3, 5, 7)
Co-located Daemons: Separate storage nodes from compute
Insufficient Resources: MON/MGR need adequate memory
Network Latency: Storage network should have low latency
Mixed SSD/HDD: Don't mix without device classes
Root Disk as OSD: Never use the OS disk for OSDs

Upgrade Considerations

Test in Staging: Always test upgrades in non-production
One Component at a Time: Upgrade operator, then cluster
Health Checks: Ensure cluster is HEALTH_OK before upgrading
Backup Configuration: Export CephCluster specs

Comparison with Alternatives

Feature	Rook Ceph	OpenEBS	Longhorn
Block Storage	✅	✅	✅
File Storage	✅	❌	❌
Object Storage	✅	❌	❌
CNCF Status	Graduated	Sandbox	Incubating
Maturity	Very High	Medium	Medium
Resource Usage	Higher	Lower	Lower
Scalability	Excellent	Good	Good

Conclusion

Rook Ceph on Kubernetes provides enterprise-grade distributed storage with:

Unified block, file, and object storage
Self-healing and self-scaling capabilities
Native Kubernetes integration
Proven production reliability

The KubernetesRookCephCluster component simplifies deployment by exposing essential configuration options while maintaining production-ready defaults.

Rook Ceph Cluster: Research Documentation

Introduction

What is Ceph?

Ceph Architecture

Core Components

Deployment Methods Comparison

1. Manual Ceph Deployment

2. Rook Ceph Operator (Recommended)

3. Helm Chart (Ceph-CSI Only)

4. OpenEBS with cStor (Alternative)

Why Rook for Kubernetes?

Project Planton's Approach

80/20 Scoping

Component Relationship

Storage Types

Block Storage (RBD)

File Storage (CephFS)

Object Storage (RGW)

Production Best Practices

Hardware Requirements

Network Configuration

Failure Domain Design

Monitoring

Common Pitfalls

Upgrade Considerations

Comparison with Alternatives

Conclusion

References

Rookceph Operator