to manage Pods. You don’t have to worry about the underlying data, because that’s
either handled by persistent volumes, or are ok with just throwing each pod’s data
away when you destroy and recreate it. However, that behavior is not acceptable when
you’re trying to run a database, or any kind of workload that requires the state to be
persisted between the runs. This results in significant additional complexity in the
StatefulSet controller. The main challenge in writing and maturing Kubernetes con‐
trollers has been handling edge cases. StatefulSets are similar in this regard, but it’s
even more urgent for StatefulSets to handle the failure cases correctly, so that we don’t
lose data.
We’ve encountered some interesting use cases for StatefulSets and cases where users
would like to change boundaries that have been set in the core implementation. For
example, we’ve had pull requests submitted to change the way StatefulSets handle
pods during an update. In the original implementation, the StatefulSet controllers
update pods one at a time, and if something breaks during the rollout, the entire roll‐
out is paused, and the StatefulSet requires manual intervention to make sure that data
is not corrupted or lost. Some users would like the StatefulSet controller to ignore
issues where a pod is stuck in a pending state, or cannot run, and just restart these
pods. However, the thing to remember with StatefulSets is that protecting the under‐
lying data is the most important priority. We could end up making the suggested
change in order to allow faster updates in parallel for development environments
where data protection is less of a concern, but require opting in with a feature flag.
Another frequently requested feature is the ability to auto-delete the PersistentVolu‐
meClaims of a StatefulSets when the StatefulSet is deleted. The original behavior is to
preserve the PVCs, again as a data protection mechanism, but there is a Kubernetes
Enhancement Proposal (KEP) for auto-deletion that is under consideration for the
Kubernetes 1.23 release.
Even though there are some significant differences in the way StatefulSets manage
pods versus other controllers, we are working to make the behaviors more similar
across the different controllers as much as possible. One example is the addition of a
minReadySeconds setting in the pod template, which allows you to say, I’d like this
application to be unavailable for a little bit of extra time before sending traffic to it.
This is helpful for some stateful workloads that need a bit more time to initialize
themselves, for example to warm up caches, and brings StatefulSets in line with other
controllers.
Another example is the work that is in progress to unify status reporting across all of
the application controllers. Currently, if you’re building any kind of higher level
orchestration or management tools, you need to have different behavior to handle the
status of StatefulSets, Deployments, DaemonSets, and so on, because each of them
was written by a different author. Each author had a different requirement for what
should be in the status, how the resource should express information about whether
it’s available, or whether it’s in a rolling update, or it’s unavailable, or whatever is hap‐
pening with it. DaemonSets are especially different in how they report status.
Running Apache Cassandra on Kubernetes | 83